The reason for this is simple. Excel is 1. An incredibly powerful tool. 2. No ba...

s_q_b · on Nov 8, 2016

This is digital Stockhold syndrome.

Excel is something managers and executives can understand, so it became the default language for data analysis. Now technologists trapped using it have to create ex post facto justifications for why it's really "just misunderstood."

Excel is massively slow, makes it easy for beginners to make massive mistakes, computes lots of types in very odd manners, performs floating point operations wrong, and leads to spaghetti code that is a rat's nest of incomprehensible cross-references.

Worst of all, the lack of code path visibility usually leads to a bus factor of 1.

Sure, one can learn to operate Excel for data analysis with a decent level of efficiency, in the same manner one can cross the Pacific in a canoe, but both are still terrible tools for the job.

jerf · on Nov 8, 2016

Everybody's right. Excel is a powerful, flexible tool that also has almost no guard rails and all but begs people to make profound mistakes and huge messes. There's too many people who sneer at spreadsheets when they should be using them, and there's too many people who use them when they shouldn't.

No contradictions.

It would be interesting to see if anyone could get some power Excel users together and construct a next-generation spreadsheet that encouraged better practices and worked to prevent huge messes. Spreadsheets are like SQL, where the initial release was so far ahead of its time that it managed to entrench itself into the very fabric of computing, even though it's long overdue for a reimagining.

willhslade · on Nov 8, 2016

I'm an Excel power user. I think Microsoft is moving in the right direction, with the addition of Tables, PowerBI, PowerPivot and R in SQL Server. What I'd like to see in Excel is:

Hotkey training built into Excel Python as an optional language along side VBA Proper Data Tables with Types and Indices, or even SQL in Excel. Regex Search over Columns PowerPivot use case training Web publishing of reports made stupid easy

eon1 · on Nov 9, 2016

Both Python and R can be used in Excel via several addons.

Python:

http://www.python-excel.org/

https://www.xlwings.org/

https://datanitro.com/

R:

http://rcom.univie.ac.at/download.html

https://bert-toolkit.com/

However, officially MS have chosen to move ahead with Javascript for add-ins and VBA-like automation:

https://dev.office.com/docs/add-ins/develop/understanding-th...

http://rockthecode.io/blog/javascript-and-excel/

As for SQL, you can use Data Connections from the GUI or ADO with JET/ACE in VBA to query Excel sheets, CSVs, etc as you please.

Regex can be used with the VBScript.Regexp object, but it's a slightly funky (perl-like) syntax and not a great implementation.

claytonjy · on Nov 8, 2016

Could you say a bit more about what you'd like to see in a "reimagined" SQL? Are there any serious efforts to replace it?

There was a comment thread around here a week or two ago where someone pointed out it's kind of insane SQL has stuck around so long, and no one could point to any worthy potential replacements.

coldtea · on Nov 9, 2016

>There was a comment thread around here a week or two ago where someone pointed out it's kind of insane SQL has stuck around so long, and no one could point to any worthy potential replacements

SQL is based on relational algebra -- so it's the model with the best theoritical justification out there, even if the syntax could be improved.

It's the other ad-hoc solutions that is crazy that they keep getting suggested. SQL/RDBMS were invented because we had those (key stores etc, tree dbs) and they were crap.

gregw134 · on Nov 8, 2016

Visual query tools like Tableau don't seem to be going away. I'd love to see an effective open-source alternative to Tableau that doesn't require scripting your own D3 website.

theoh · on Nov 8, 2016

Butler Lampson makes the point (in a recent set of slides) that relations are a good base for DSLs: they have enough complexity to model graphs, functions, sets etc.

That thought has made me wonder if logic programmming has something to offer in the design of new APIs.

nradov · on Nov 8, 2016

There's Tutorial D but it probably doesn't qualify as "serious".

https://en.wikipedia.org/wiki/D_(data_language_specification...

jerf · on Nov 8, 2016

SQL in fact deviates from true relational theory, in which the "cells" of a table could themselves have additional structure rather than just being "a string" or "a number". Cells could also be truly absent. SQL's NULL, while something you can make your peace with, could use some tweaking with 21st century experience. SQL's syntax has acquired a lot of cruft over the years to deal with new features... in fact in that sense it reminds me of the evolution of OpenGL and the way it acquired extension after extension until finally it needed to be broken apart into Vulkan and CUDA pieces (to brutally summarize the situation to the point of inaccuracy; please try to see what I mean rather than pick nits with that).

More controversially, I question the entire intent of making the core query language something that is putatively declarative, but then in practice often requires extensive engine-specific annotations to tell the engine how to actually do the query. (More on that https://news.ycombinator.com/item?id=3506345#3507281 ). I think RethinkDB's query language was much more imperative, because of the level of development resources they had, and I bet it actually worked out OK. However, even if I could not sell the development world on making SQL++/SQL-replacement non-declarative, we certainly could do a better job this time around of separating query strategy from query contents in some deliberate manner, rather than hacking crap up.

Imagine if, for instance, you could feed the query optimizer a query, get back a query plan that was actually manipulable and executable, tweak that to your tastes, and then send it back to the DB, rather than working via hints and circumlocutions and hopes and dreams.

It would also be nice if SQL were more composible. The serialized version of SQL is not practical to use string manipulations to combine two queries into a larger query. Many languages have libraries that permit this, but they're always second-class citizens. If I were redesigning SQL I'd want something that handled this more cleanly. I'd seriously consider something RethinkDB-esque in the sense that it didn't have an "english" serialization, but was purely symbolic, leaving it to language authors to figure out how to best represent it in the local language.

Also, bear in mind that most if not all features I describe in this post exist in databases already. (Not sure about that last one.) What I'm saying is that SQL integrates poorly with all that, not that the features don't exist. Recursive queries and common table expressions also seem ripe for some serious rethinking. Plus I think for a long time SQL really kinda limited the sort of DBs that would be produced because if a feature integrated poorly with SQL, it was a lot less likely to come out. (In particular, structured cells took IMHO forever to come out. Possibly the massive market failure of "object databases" also scared DB developers off from that feature too, though. They aren't the same thing but may be closely enough related.)

nradov · on Nov 8, 2016

Most modern relational databases now allow the cells of a table to have additional structure through the use of SQL/XML. We can query into the contents of a cell using XQuery.

s_q_b · on Nov 8, 2016

I fully concur. This is an excellent summary and suggestion for future progress.

The barriers to moving beyond Excel can be overcome, but it will take some serious effort on many fronts. Both Excel and SQL embody genius concepts, but are such poor implementations that it is easy to conflate the cruft with the advantages.

jlg23 · on Nov 8, 2016

SQL is not an implementation but a specification and thus cannot be compared to Excel, a very specific implementation of non-monotonic dataflow programming.

Regarding your "stockholm syndrome" comment above: Someone in his car hears a PSA about "some guy wrong-way driving" on the very road he is on and thinks "one? hundreds!". Unless you can beef up your argumentation you are that guy.

EdHominem · on Nov 8, 2016

> Unless you can beef up your argumentation you are that guy.

That's fallacious too. I can be right, even if my argument is incorrect or unconvincing.

Warren Buffet and Nate Silver are both driving against traffic and both of them are righter than everyone else combined.

> SQL [...] cannot be compared to Excel

What Excel and SQL have in common is that they're both a first attempt at a solution to (different) problems, and they've been too successful to properly iterate on. That's why everyone uses some proprietary extensions to SQL and everyone extends Excel with VB or C#.

Lio · on Nov 9, 2016

> SQL is not an implementation but a specification

For clarification, is the GP referring to Microsoft SQL Server when then they say 'SQL' or do they actually mean SQL?

Microsoft's product naming convention is confusing IMHO.

eon1 · on Nov 9, 2016

Almost all of these are done or do-able, though. They're just not right there on the surface.

lmm · on Nov 9, 2016

https://www.herculus.io/ was doing the rounds a few days ago - the idea seems to be a spreadsheet with a type system.

Ntrails · on Nov 8, 2016

Excel is terrifying. Each employee has taken the same concept and written their own bespoke tooling around it which probably has at least one bug. These are "copy and pasted" around a bazillon network drives and then passed on to other people who will modify the undocumented process based on their best understanding of what they think it does (or what it was meant to do...?).

I can still take my ad hoc SQL query data and run decent analysis and produce graphical summaries in less time than it would take me to setup the boilerplate I'd need in C#.

Arguably something like Matlab or R would be similarly quick for a lot of things - but I'm not even slightly sold that they are safer based on my observations of their use. I've certainly seen plenty of formal code that was less readable than a decent spreadsheet.

I'm not really a fan of excel tools, and tooling. VBA has made me want to actually smash my computer in the past. But to claim that it isn't incredibly powerful at working with a few megabytes of raw data is flat out wrong.

valarauca1 · on Nov 8, 2016

You are projecting so hard I could show PwerPoints off your forehead. I take it you are rather unhappy in your line of work?

I'm just a lowly DBA re-posting and summarizing comments [1] for karma.

[1] https://news.ycombinator.com/item?id=12448545

threatofrain · on Nov 8, 2016

Do these videos have anything to do with beginners making mistakes, floating points and other type conversions, bus factor of 1, spaghetti code, etc?

All I see is the same old Martin Shkreli video that has been floated around before, and all you see him do is 'Vim' around as he explains his thoughts -- not on Excel, but on company financials.

Also, if you post a lopsided list of pros, it makes sense to the audience to see someone else post a lopsided list of cons. But then you reply with pettiness. Why?

oli5679 · on Nov 8, 2016

Lots of the tasks carried out in offices are not technical enough to suffer from the issues you correctly identify after a given hurdle. I work as an economist in a government department, and a lot of the analysis involves ad-hoc projects processing data from different sources and doing some basic plotting/elementary calculations. Excel is perfect for this, but if something is too technical/repetitive it becomes less suitable.

IndianAstronaut · on Nov 9, 2016

Not to mention excel errors have huge consequences.

http://www.bloomberg.com/news/articles/2013-04-18/faq-reinha...

leshow · on Nov 8, 2016

you mean Stockholm?

Smaug123 · on Nov 8, 2016

I thought it was mildly witty, given the subject matter.

gravypod · on Nov 8, 2016

I can vouch for this. I thought spreadsheets where stupid and useless until I was forced to use them by my boss at the CS department here for grading. I didn't realize how easy having your computation and data being in the same place.

It makes it slow, but usually for these sorts of things you want your data to be available on the scale of days, not nanoseconds so it works out. We have a complicated grading sheet here that manages all of the students information all one the same sheet in a google drive. Grades, attendance, recitation attendance, and at the end of the class homework.

It also verifies the test answers against the correct ones to make sure we score exams correctly.

Excel is great for fixed sets of data that need simple map/reductions & input verification because that's all we really use it for. After that move to something like Python&Numpy/R/Julia/Matlab.

amyjess · on Nov 8, 2016

I used to work at an NLP company, and we made extensive use of Google Sheets for doing P/R/F calculations on the results of various tests. It was so useful.

wuschel · on Nov 8, 2016

The nice thing about Excel is that it does resemble a functional programming tool.

Would love to work with a replacement, even if it is some sort of Pandas/Python/Matplotlibb derivative - but it takes too long to set up things with these tools, and it seems not all operations are as trivial as I want them to be.

gravypod · on Nov 8, 2016

Yea, for exel you can use "=(function)" and you're done! In Google Sheets you can also add in additional JavaScript code to run in the sheets.

inopinatus · on Nov 8, 2016

I know of a retail bank using a huge and complex Excel spreadsheet for their entire loan approvals process.

Did it have bugs? No-one knew.

kermatt · on Nov 8, 2016

> Did it have bugs? No-one knew.

And there is the reason Excel (and spreadsheets in general) are dangerous.

gokhan · on Nov 8, 2016

I've seen software specifically made to tackle such complex processes and they were buggy as hell. And I've seen 100+ connected Excel spreadsheets managing $500M+ yearly transactions of the buying department of a white goods manufacturer, it was a work of art.

goatlover · on Nov 8, 2016

Why would anyone do that? There are much better tools for handling complexity on that level. That's insane.

eon1 · on Nov 9, 2016

Two words: Corporate IT.

Another two: Onboarding Process.

porker · on Nov 8, 2016

I didn't realise you could connect spreadsheets together. Thanks for new knowledge!

walterstucco · on Nov 8, 2016

could be the same Bank I know where they use excel for everything

ayuvar · on Nov 8, 2016

I can personally guarantee you it is literally every bank.

Even if they have a fancy tool, someone is using an Excel spreadsheet to figure out how to subvert it.

adakbar · on Nov 8, 2016

Couldn't agree more, in Indonesia Oil and Gas Upstream Special Task Force use Excel for almost everything, we try to replace Excel with many good app, but alas, when an app failed we just switch to good old Excel, someone even created Montecarlo simulation in Excel!

JackFr · on Nov 8, 2016

Indeed -- the most compelling reasons to move trading desks off Excel are non-technical. Model auditability and traceability are no longer 'nice-to-haves' but are compliance requirements.

VMG · on Nov 8, 2016

Is there any open source framework that has an excel-like GUI and good integration with standard coding tools?

_mhyx · on Nov 8, 2016

LibreOffice, but it's not as good as Excel for the upper-end stuff. Microsoft doesn't have an empire built on Office by accident.

taserian · on Nov 8, 2016

There used to be a couple of them for Python, but it seems they've died out.

- Resolver One ( https://en.wikipedia.org/wiki/Resolver_One )

- Project Dirigible ( https://github.com/pythonanywhere/dirigible-spreadsheet )

vegabook · on Nov 8, 2016

The efficiency gain vs time invested is not exponential. It's linear at best and plateaus after about 5k hours. It is this plateauing of the curve that is the biggest reason power Excel users move to R or Python. That was certainly my experience after eeking everything I could out of Excel in 15 years of trading floor fixed income. A visual 2d paradigm is excellent for quick productivity but is severely limiting as complexity and data size rises. Even with VB.

ska · on Nov 8, 2016

People really love it because they can do business analysis without hiring programmers or becoming them. And spreadsheets are a great model for a constrained set of problems.

However, it's not all sweetness and light. Excel even gets some basic calculations wrong - and those ignorant of its quirks happily propagate those errors. More problematically, it can easily be pushed to the degree that your modeling is really beyond the tool or spreadsheet's design strengths without knowing it. And debugging is a pain in the ass. As a result lots of erroneous outputs get presented as meaninful.

joelthelion · on Nov 8, 2016

Can you version/diff it though? Basically can you have a semi-sane software engineering process when it becomes big?

_9vzr · on Nov 8, 2016

You can track changes within Excel and with Office 2016, there is a comparison and merge tool included (that's been long overdue). But since the files are binary encoded, there is no external way to track changes unless someone wants to write up a parser for the XLSX format that can keep up with all the new features that MS adds every release.

bri3d · on Nov 8, 2016

For what it's worth, XLSX files aren't binary. They're just XML in a ZIP. They're not particularly nasty to diff once they're extracted - actually, I think they might do pretty well in Git.

bjoerns · on Nov 8, 2016

The diff is actually not really straightforward - a sheet is two-dimensional and you have to work out column/row inserts/deletes (which themselves are intertwined) before you can even start looking at cell changes. So it's quite non-standard stuff.

vsgxvhdxh · on Nov 8, 2016

Ever tried editing the xml and rezipping? Also on examination you might notice the odd binary blob in the xml..

kofejnik · on Nov 9, 2016

.docx and .xlsx are zipped xml, there are git plugins that produce reasonable diffs

gaur · on Nov 8, 2016

> 4. The efficiency gain vs learning curve is exponential.

Does this actually mean anything?

DigitalJack · on Nov 8, 2016

Yes. He is saying as you become more competent using excel, the efficiency increases exponentially.

Probably true for smallish values of competency, but it must be logarithmic after that.

threatofrain · on Nov 8, 2016

To paraphrase for Guar, he's asking whether someone means "exponential" or exponential. By "exponential", I mean my feelings.

jackmott · on Nov 8, 2016

probably a sigmoid function

gaur · on Nov 8, 2016

> He is saying as you become more competent using excel, the efficiency increases exponentially.

OK, does this mean anything? How have you quantified efficiency? How have you quantified "learning curve"? What data do you have supporting that the relationship is exponential?

DigitalJack · on Nov 8, 2016

Being obtuse is not a desirable attribute, less so deliberately.

gaur · on Nov 8, 2016

Asking people to say clearly and concretely what they mean is not obtuseness.

Nobody here has been able to elaborate on the initial statement "The efficiency gain vs learning curve is exponential". People are just rewording the sentence slightly and passing that off as an explanation. That seems to indicate that nobody knows what the statement means because the statement is vacuous.

ska · on Nov 8, 2016

It's mostly hand waving, but I think the OP was trying to point out that at the low end, a modest investment in training/learning gives great results in efficiency.

This is true! Then you hit a pretty hard wall with the limitations of the tool.

SkyMarshal · on Nov 8, 2016

I think it means the efficiency gain is exponential (assuming the learning curve is anything less than exponential).

theoh · on Nov 8, 2016

In the traditional, original sense of the term, learning curves are (presumably) asymptotic to a horizontal line representing total competence.

Somehow, sloppily, "steep" has come to mean difficult to learn, rather than quick to learn.

In the original version: A steep learning curve means quick learning at the beginning. A shallow curve means that it takes a long time to build up skill.

gaur · on Nov 8, 2016

You're just rephrasing nonsense. What is "the efficiency gain is exponential" supposed to mean, in concrete terms? This is just empty manager-speak.

SkyMarshal · on Nov 9, 2016

walshemj · on Nov 8, 2016

There are also many downsides

valarauca1 · on Nov 8, 2016

Yes and No.

If all the data you receiving is also coming to you as an Excel format (csv, xls, xlsx), but with major differences in formatting, or wholly inconsistent formatting. Now you have a multi-month long project just to have a consistent import script. Replacing a 1 second task done 2-3's times a day with a 4month project has an ROI on the scale of decades. Not worth it.

Then you add visualization. What is 3-4 keystrokes in Excel is a lot of back of forth, learning a new library, ensuring it works on your system. Vetting the visualizing, dealing with that weird bug on the triple line double axis line chart.

Then you have to validate integer handling and mathematics to ensure your newly written Python, Julia, etc. handles the same as your well vetted Excel Spread Sheet.

Replacing that one slow bloated spread sheet is now nearly a year long project which requires a new employee who will have comparable pay to the person who ALREADY operates excel.

Swizec · on Nov 8, 2016

> spread sheet is now nearly a year long project which requires a new employee who will have comparable pay to the person who ALREADY operates excel.

And now you have a scalable system. You can go from something one employee takes all day to look at 2x/day, to something anyone in the company can see in real time on a dashboard of some sort.

Is that worth it? Depends

digikata · on Nov 8, 2016

Don't know if this is the best study, but it contains an overview of many studies that show that excel calcs tend to have huge error rates.

http://panko.shidler.hawaii.edu/SSR/Mypapers/whatknow.htm

Of course that raises the question would any other software environment have a lower error rate?

gravypod · on Nov 8, 2016

You can say that about every technology but the question is if the good out ways the bad in your specific usecase.

kermatt · on Nov 8, 2016

All at the expense of reproducibility, testing, auditing.

Quick and very, very dirty.

hyperbovine · on Nov 8, 2016

Lol, seriously? Martin Shkreli? I had been wondering what hole he crawled into.

flamedoge · on Nov 8, 2016

does it have regex search though?

JoBrad · on Nov 8, 2016

It's pretty easy to get RegEx exposed in Excel via VBA/extensions

valarauca1 · on Nov 8, 2016

This is the big key. A few short VBA macro's can give you regex's and cell swapping. Now you can conditionally swap the programs in other cells, via the contents of other cells.

You have a Turing Complete spread sheet.

dsp1234 · on Nov 8, 2016

At one point, VBA in Microsoft Word was my only available programming outlet. I was able, easily, to access DirectDraw and create a faux screen saver. DLL imports are available in VBA, so the entire win32 API is available (in addition to the normal Office automation stuff like sending email, modifying spreadsheets, etc)

eon1 · on Nov 9, 2016

Yeah, in a similar situation right now and I find the DLL thing is relatively unknown and incredibly powerful if used well. Recently this 'robotics' evangelist keeps trying to rope us into spending a few mil on his automation and I keep showing him up by automating the same stuff right out of Excel for little to nothing.