2. No bar of entry (cost aside, true in corporate environment).
3. Very gradual learning curve.
4. The efficiency gain vs time invested is exponential.
Power Excel users, much like their VIM/Emacs counter parts don't use a mouse. It is just keyboard short cuts [1][2]. This makes them insanely productive.
Excel is something managers and executives can understand, so it became the default language for data analysis. Now technologists trapped using it have to create ex post facto justifications for why it's really "just misunderstood."
Excel is massively slow, makes it easy for beginners to make massive mistakes, computes lots of types in very odd manners, performs floating point operations wrong, and leads to spaghetti code that is a rat's nest of incomprehensible cross-references.
Worst of all, the lack of code path visibility usually leads to a bus factor of 1.
Sure, one can learn to operate Excel for data analysis with a decent level of efficiency, in the same manner one can cross the Pacific in a canoe, but both are still terrible tools for the job.
Everybody's right. Excel is a powerful, flexible tool that also has almost no guard rails and all but begs people to make profound mistakes and huge messes. There's too many people who sneer at spreadsheets when they should be using them, and there's too many people who use them when they shouldn't.
No contradictions.
It would be interesting to see if anyone could get some power Excel users together and construct a next-generation spreadsheet that encouraged better practices and worked to prevent huge messes. Spreadsheets are like SQL, where the initial release was so far ahead of its time that it managed to entrench itself into the very fabric of computing, even though it's long overdue for a reimagining.
I'm an Excel power user. I think Microsoft is moving in the right direction, with the addition of Tables, PowerBI, PowerPivot and R in SQL Server. What I'd like to see in Excel is:
Hotkey training built into Excel
Python as an optional language along side VBA
Proper Data Tables with Types and Indices, or even SQL in Excel.
Regex Search over Columns
PowerPivot use case training
Web publishing of reports made stupid easy
Could you say a bit more about what you'd like to see in a "reimagined" SQL? Are there any serious efforts to replace it?
There was a comment thread around here a week or two ago where someone pointed out it's kind of insane SQL has stuck around so long, and no one could point to any worthy potential replacements.
>There was a comment thread around here a week or two ago where someone pointed out it's kind of insane SQL has stuck around so long, and no one could point to any worthy potential replacements
SQL is based on relational algebra -- so it's the model with the best theoritical justification out there, even if the syntax could be improved.
It's the other ad-hoc solutions that is crazy that they keep getting suggested. SQL/RDBMS were invented because we had those (key stores etc, tree dbs) and they were crap.
Visual query tools like Tableau don't seem to be going away. I'd love to see an effective open-source alternative to Tableau that doesn't require scripting your own D3 website.
Butler Lampson makes the point (in a recent set of slides) that relations are a good base for DSLs: they have enough complexity to model graphs, functions, sets etc.
That thought has made me wonder if logic programmming has something to offer in the design of new APIs.
SQL in fact deviates from true relational theory, in which the "cells" of a table could themselves have additional structure rather than just being "a string" or "a number". Cells could also be truly absent. SQL's NULL, while something you can make your peace with, could use some tweaking with 21st century experience. SQL's syntax has acquired a lot of cruft over the years to deal with new features... in fact in that sense it reminds me of the evolution of OpenGL and the way it acquired extension after extension until finally it needed to be broken apart into Vulkan and CUDA pieces (to brutally summarize the situation to the point of inaccuracy; please try to see what I mean rather than pick nits with that).
More controversially, I question the entire intent of making the core query language something that is putatively declarative, but then in practice often requires extensive engine-specific annotations to tell the engine how to actually do the query. (More on that https://news.ycombinator.com/item?id=3506345#3507281 ). I think RethinkDB's query language was much more imperative, because of the level of development resources they had, and I bet it actually worked out OK. However, even if I could not sell the development world on making SQL++/SQL-replacement non-declarative, we certainly could do a better job this time around of separating query strategy from query contents in some deliberate manner, rather than hacking crap up.
Imagine if, for instance, you could feed the query optimizer a query, get back a query plan that was actually manipulable and executable, tweak that to your tastes, and then send it back to the DB, rather than working via hints and circumlocutions and hopes and dreams.
It would also be nice if SQL were more composible. The serialized version of SQL is not practical to use string manipulations to combine two queries into a larger query. Many languages have libraries that permit this, but they're always second-class citizens. If I were redesigning SQL I'd want something that handled this more cleanly. I'd seriously consider something RethinkDB-esque in the sense that it didn't have an "english" serialization, but was purely symbolic, leaving it to language authors to figure out how to best represent it in the local language.
Also, bear in mind that most if not all features I describe in this post exist in databases already. (Not sure about that last one.) What I'm saying is that SQL integrates poorly with all that, not that the features don't exist. Recursive queries and common table expressions also seem ripe for some serious rethinking. Plus I think for a long time SQL really kinda limited the sort of DBs that would be produced because if a feature integrated poorly with SQL, it was a lot less likely to come out. (In particular, structured cells took IMHO forever to come out. Possibly the massive market failure of "object databases" also scared DB developers off from that feature too, though. They aren't the same thing but may be closely enough related.)
Most modern relational databases now allow the cells of a table to have additional structure through the use of SQL/XML. We can query into the contents of a cell using XQuery.
I fully concur. This is an excellent summary and suggestion for future progress.
The barriers to moving beyond Excel can be overcome, but it will take some serious effort on many fronts. Both Excel and SQL embody genius concepts, but are such poor implementations that it is easy to conflate the cruft with the advantages.
SQL is not an implementation but a specification and thus cannot be compared to Excel, a very specific implementation of non-monotonic dataflow programming.
Regarding your "stockholm syndrome" comment above: Someone in his car hears a PSA about "some guy wrong-way driving" on the very road he is on and thinks "one? hundreds!". Unless you can beef up your argumentation you are that guy.
> Unless you can beef up your argumentation you are that guy.
That's fallacious too. I can be right, even if my argument is incorrect or unconvincing.
Warren Buffet and Nate Silver are both driving against traffic and both of them are righter than everyone else combined.
> SQL [...] cannot be compared to Excel
What Excel and SQL have in common is that they're both a first attempt at a solution to (different) problems, and they've been too successful to properly iterate on. That's why everyone uses some proprietary extensions to SQL and everyone extends Excel with VB or C#.
Excel is terrifying. Each employee has taken the same concept and written their own bespoke tooling around it which probably has at least one bug. These are "copy and pasted" around a bazillon network drives and then passed on to other people who will modify the undocumented process based on their best understanding of what they think it does (or what it was meant to do...?).
I can still take my ad hoc SQL query data and run decent analysis and produce graphical summaries in less time than it would take me to setup the boilerplate I'd need in C#.
Arguably something like Matlab or R would be similarly quick for a lot of things - but I'm not even slightly sold that they are safer based on my observations of their use. I've certainly seen plenty of formal code that was less readable than a decent spreadsheet.
I'm not really a fan of excel tools, and tooling. VBA has made me want to actually smash my computer in the past. But to claim that it isn't incredibly powerful at working with a few megabytes of raw data is flat out wrong.
Do these videos have anything to do with beginners making mistakes, floating points and other type conversions, bus factor of 1, spaghetti code, etc?
All I see is the same old Martin Shkreli video that has been floated around before, and all you see him do is 'Vim' around as he explains his thoughts -- not on Excel, but on company financials.
Also, if you post a lopsided list of pros, it makes sense to the audience to see someone else post a lopsided list of cons. But then you reply with pettiness. Why?
Lots of the tasks carried out in offices are not technical enough to suffer from the issues you correctly identify after a given hurdle. I work as an economist in a government department, and a lot of the analysis involves ad-hoc projects processing data from different sources and doing some basic plotting/elementary calculations. Excel is perfect for this, but if something is too technical/repetitive it becomes less suitable.
I can vouch for this. I thought spreadsheets where stupid and useless until I was forced to use them by my boss at the CS department here for grading. I didn't realize how easy having your computation and data being in the same place.
It makes it slow, but usually for these sorts of things you want your data to be available on the scale of days, not nanoseconds so it works out. We have a complicated grading sheet here that manages all of the students information all one the same sheet in a google drive. Grades, attendance, recitation attendance, and at the end of the class homework.
It also verifies the test answers against the correct ones to make sure we score exams correctly.
Excel is great for fixed sets of data that need simple map/reductions & input verification because that's all we really use it for. After that move to something like Python&Numpy/R/Julia/Matlab.
I used to work at an NLP company, and we made extensive use of Google Sheets for doing P/R/F calculations on the results of various tests. It was so useful.
The nice thing about Excel is that it does resemble a functional programming tool.
Would love to work with a replacement, even if it is some sort of Pandas/Python/Matplotlibb derivative - but it takes too long to set up things with these tools, and it seems not all operations are as trivial as I want them to be.
I've seen software specifically made to tackle such complex processes and they were buggy as hell. And I've seen 100+ connected Excel spreadsheets managing $500M+ yearly transactions of the buying department of a white goods manufacturer, it was a work of art.
Couldn't agree more, in Indonesia Oil and Gas Upstream Special Task Force use Excel for almost everything, we try to replace Excel with many good app, but alas, when an app failed we just switch to good old Excel, someone even created Montecarlo simulation in Excel!
Indeed -- the most compelling reasons to move trading desks off Excel are non-technical. Model auditability and traceability are no longer 'nice-to-haves' but are compliance requirements.
The efficiency gain vs time invested is not exponential. It's linear at best and plateaus after about 5k hours. It is this plateauing of the curve that is the biggest reason power Excel users move to R or Python. That was certainly my experience after eeking everything I could out of Excel in 15 years of trading floor fixed income. A visual 2d paradigm is excellent for quick productivity but is severely limiting as complexity and data size rises. Even with VB.
People really love it because they can do business analysis without hiring programmers or becoming them. And spreadsheets are a great model for a constrained set of problems.
However, it's not all sweetness and light. Excel even gets some basic calculations wrong - and those ignorant of its quirks happily propagate those errors. More problematically, it can easily be pushed to the degree that your modeling is really beyond the tool or spreadsheet's design strengths without knowing it. And debugging is a pain in the ass. As a result lots of erroneous outputs get presented as meaninful.
You can track changes within Excel and with Office 2016, there is a comparison and merge tool included (that's been long overdue). But since the files are binary encoded, there is no external way to track changes unless someone wants to write up a parser for the XLSX format that can keep up with all the new features that MS adds every release.
For what it's worth, XLSX files aren't binary. They're just XML in a ZIP. They're not particularly nasty to diff once they're extracted - actually, I think they might do pretty well in Git.
The diff is actually not really straightforward - a sheet is two-dimensional and you have to work out column/row inserts/deletes (which themselves are intertwined) before you can even start looking at cell changes. So it's quite non-standard stuff.
> He is saying as you become more competent using excel, the efficiency increases exponentially.
OK, does this mean anything? How have you quantified efficiency? How have you quantified "learning curve"? What data do you have supporting that the relationship is exponential?
Asking people to say clearly and concretely what they mean is not obtuseness.
Nobody here has been able to elaborate on the initial statement "The efficiency gain vs learning curve is exponential". People are just rewording the sentence slightly and passing that off as an explanation. That seems to indicate that nobody knows what the statement means because the statement is vacuous.
It's mostly hand waving, but I think the OP was trying to point out that at the low end, a modest investment in training/learning gives great results in efficiency.
This is true! Then you hit a pretty hard wall with the limitations of the tool.
In the traditional, original sense of the term, learning curves are (presumably) asymptotic to a horizontal line representing total competence.
Somehow, sloppily, "steep" has come to mean difficult to learn, rather than quick to learn.
In the original version:
A steep learning curve means quick learning at the beginning. A shallow curve means that it takes a long time to build up skill.
If all the data you receiving is also coming to you as an Excel format (csv, xls, xlsx), but with major differences in formatting, or wholly inconsistent formatting. Now you have a multi-month long project just to have a consistent import script. Replacing a 1 second task done 2-3's times a day with a 4month project has an ROI on the scale of decades. Not worth it.
Then you add visualization. What is 3-4 keystrokes in Excel is a lot of back of forth, learning a new library, ensuring it works on your system. Vetting the visualizing, dealing with that weird bug on the triple line double axis line chart.
Then you have to validate integer handling and mathematics to ensure your newly written Python, Julia, etc. handles the same as your well vetted Excel Spread Sheet.
Replacing that one slow bloated spread sheet is now nearly a year long project which requires a new employee who will have comparable pay to the person who ALREADY operates excel.
> spread sheet is now nearly a year long project which requires a new employee who will have comparable pay to the person who ALREADY operates excel.
And now you have a scalable system. You can go from something one employee takes all day to look at 2x/day, to something anyone in the company can see in real time on a dashboard of some sort.
This is the big key. A few short VBA macro's can give you regex's and cell swapping. Now you can conditionally swap the programs in other cells, via the contents of other cells.
At one point, VBA in Microsoft Word was my only available programming outlet. I was able, easily, to access DirectDraw and create a faux screen saver. DLL imports are available in VBA, so the entire win32 API is available (in addition to the normal Office automation stuff like sending email, modifying spreadsheets, etc)
Yeah, in a similar situation right now and I find the DLL thing is relatively unknown and incredibly powerful if used well. Recently this 'robotics' evangelist keeps trying to rope us into spending a few mil on his automation and I keep showing him up by automating the same stuff right out of Excel for little to nothing.
1. An incredibly powerful tool.
2. No bar of entry (cost aside, true in corporate environment).
3. Very gradual learning curve.
4. The efficiency gain vs time invested is exponential.
Power Excel users, much like their VIM/Emacs counter parts don't use a mouse. It is just keyboard short cuts [1][2]. This makes them insanely productive.
[1] https://youtu.be/jFSf5YhYQbw
[2] https://youtu.be/0nbkaYsR94c