Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Former Cython developer here.

To get some perspective to this, consider the alternatives for a scientific programmer: MATLAB, R, FORTRAN, Mathematica, or if you're hip, Julia -- all specifically made for scientific programming with 0% general purpose/web/etc. development going on.

So I would say that the scientific Python community has been doing extremely well in terms of even using a language that isn't designed ground up for scientific computing.

I could write a lot about why that is (and how some of the CS and IT crowd doesn't "get" scientific computing..) -- I'll refrain, I just wanted to say that were you see something and get frustrated, I see the same picture and think it's actually an incredible success, to bring so many scientists onto at least the same ball park as other programmers, even if they are still playing their own game.



As an aside, these sorts of comments about the needs of a "scientific programmer" always irk me.

I've been doing scientific software development now for 20 years. I do non-numerical scientific computing, originally structural biology, then bioinformatics, and now chemical informatics, the last dominated by graph theory.

I rarely use NumPy and effectively never the languages you mentioned. Last year on one project I did use a hypergeometric survival function from SciPy, then re-implemented it in Python so I wouldn't have the large dependency for what was a few tens of lines of code.

Biopython, as another example, has almost no dependencies on NumPy, and works under PyPy.


It would be awesome if you have a blog post or another resource that contains some of your work experience that you can link? I'd be very interested.


The best I can offer are my previous HN comments on the topic, at https://hn.algolia.com/?query=dalke%20numpy&sort=byPopularit... and http://www.dalkescientific.com/writings/diary/archive/2011/1... .

I don't see what's awesome or all that interesting about it.


> MATLAB, R, FORTRAN, Mathematica, or if you're hip, Julia

You are so correct. I have not used Julia, but what do the first four have in common? As languages, they suck. Each in their own way can be used to do amazing, incredible things. But from a development perspective, they are pure torture to code in.

The joy of writing scientific code in Python is that there is a whole set of users, the majority really, who have nothing to do with science. The language must stand on its own, so Python cannot suck.


> I could write a lot about why that is (and how some of the CS and IT crowd doesn't "get" scientific computing..)

Please do! I'm a PhD student in CS, and I don't think I "get" scientific computing (I'm in compilers myself).


I am a PhD in CS, specifically in Programming Languages and Parallel Programming and I believe I do get scientific computing.

Most people doing it did not have a formal CS education. They are biology, physics, mathematics or chemistry majors that have had one or two courses on programming, from other scientific programmers.

There are two main families, one that comes from the Fortran background, which still writes programs like they did in the 80s, with almost no new tooling. Programs are written for some time, and then they are scheduled for clusters that spend months calculating whatever it is.

The other family of scientific programmers, which I believe is the majority, uses a tool like Matlab, or more recently R, to dynamically inspect and modify data (RStudio is a Matlab/Mathematica-like friendly environment for this task) and use libraries written by more proficient programmers to perform some kind of analysis (either machine learning, DNA segmentation, plotting or just basic statistics).

Most of these programmers know 1 or 2 languages (maybe plus python and bash for basic scripting). They write programs that are relatively small and the chances of someone else using that code is low. Thus, the deadline pressure is high and code maintainability is not a priority.

For a non-CS programmer, learning a new programming language is almost impossible, because they are used to that way of doing things, and those libraries. They take much more time to adjust to new languages because they do not see the language logically, like anyone who had a basic compiler course.

Given this context, web apps, rest APIs and all the other trending tech in IT are not commonly used in scientific programming, because they typically do not need it (when they do, they learn it). Datasets are retrieved and stored in CSV and processed in one of those environments (or even in julia or python-pandas).


You're painting an awfully dark picture of scientists' skills. Having been on both sides, I believe the deciding factor is simply the availability of libraries.

If you're doing web development you have an insane amount of languages to chose from because after String, Array, and File are implemented, HTTP is next. Having done a bit of web development, I'd also say a typical project only uses a subset of libraries that is surprisingly small.

Scientific computing is quite different: a paper in structural biology (my former stomping grounds) can easily require a few dozen algorithms that each once filled a 10-page paper. These could easily be packaged as libraries, but it's a niche so it rarely happens. Newer language quite often don't even have a robust numeric library. Leave the beaten tracks and your workload just increased by a magnitude.

That's also why science, unlike "general purpose" programming, often uses a workflow that connects five or more languages or so: a java GUI, python for network/string/fileIO, maybe R for larger computations, all held together by a (typically too long) shell script.

But these workflows are getting better. There's a build tool that formalizes the pipeline somewhat (I forgot the name) and APIs are surprisingly common. The reason why csv will never die is that the data fetched from APIs is usually more static than it is in a typical web app (-> local cache needed) and that scientists often work with data that just isn't a good fit for a database. Postgres just doesn't offer anything that enriches a 15MB gene sequence.


I worked in the academia for a few years about a decade ago and nowadays interact with biology research in the industry for the last couple of years.

The way he painted the scientists skills matches my experience thus far.


Yes, scientists programming skills (as averaged over population) suck. Factor 1: Programming not credited in itself or reviewed in publishing process. Factor 2: Often little education in or focus on programming, relative to wall clock time spent doing it.

But I don't think that is only fixed by more education and making scientists behave more like programmers. I think that to change things one also needs far better alternatives than the options available today, so that people are really encouraged to switch. Somehow, these must be written by people who know their CS and can write compilers, yet engage with the why scientific computing is a mess on the tool side too, not dismiss it as laziness.

I started out as a programmer, I have contributed to Cython, past two years have been pure web development in a startup. So I know very well why MATLAB sucks. Yet, the best tool I have found myself for doing numerical computing is a cobbled mess of Fortran, pure C, C/assembly code generated by Python/Jinja templates, Python/NumPy/Theano...

The scientific Python community and Julia community has been making great progress, but oh how far there is left to go.


I agree, this is also one of the things that drives me against C and more into saner programming languages.

Because the majority of programmers in areas where software isn't the core product being sold, don't spend one second thinking about code quality.

As such tooling that on one side is more forgiving while allowing for fast prototyping, but at the same time enforces some kind of guidelines is probably the way to improve the current workflows.


I've spent large parts of my career floating on the edges of academia and have had to interface with code written by academics many times, and : oh jesus it's almost always a huge mess.


You want to process (typically, with simple arithmetical operations) huge arrays of numbers. Imagine that you want to sum two arrays of 800 MB of floating point numbers each. This is one step inside the loop of your algorithm.

You can do that natively in C, and the result is very fast. You can do that natively in Fortran. And in Matlab, etc.

You cannot do that at all in Python. Well, you can, but it will be orders of magnitude slower.


> You cannot do that at all in Python. Well, you can, but it will be orders of magnitude slower.

NumPy + Cython would beg to differ.


Of course, but not natively. Try doing that in plain python using a list of 8000 lists of 8000 lists of 3 numbers (e.g. a mid-resolution microscopy image).


NumPy is about one order of magnitude slower in very many situations. Cython is in some ways a reimplementation of Fortran in places -- I mean, you can consider it a better alternative but have to use features that are not in Python.

Personally I gave up writing numerical code in Cython and instead wrote it in Fortran and merely wrapped it with Cython...

But yes, being 2x-10x slower is something one can live with for productivity, vs sometimes 1000x slower of Python.


+1, I'm a PhD student in programming languages/PL design


Julia has tons of general purpose development going, including an mvc web framework (genie.jl) and a reactive web app framework called (escher.jl)


First, sorry for being a bit heavy on the hyperbole and saying "0%"; that is almost guaranteed to be wrong as a statement and also let me say I don't know Julia very well at all.

I want to clarify that when I say "0% web development going on", I don't mean that scientific programmers don't do web development (they do! a lot!), I meant that people don't pick it those languages in general if they are only doing web development without a numerical/scientific/statistical aspect to it.

What would be interesting is, do you know of any teams or companies using Julia in anger in a non-scientific setting, with programmers from a non-scientific background?

The scientific Python community certainly makes good use of all the Python web tools!, and being a "scientific stack" in no way precludes the need for general purpose frameworks that can also be used by others. It's at least as much about people and community and habits as about the tooling...


I think its just a matter of time before this happens (its mature enough or an important app is released).

Julia is designed for general purpose computing from day one, but this community is not dealing with the same painpoints that scientists have been.


> do you know of any teams or companies using Julia in anger in a non-scientific setting, with programmers from a non-scientific background?

It's catching on among early-adopters in finance. What do you consider that?


Is it? Where are they? And how many are they?


All over, but with decent sized groups in NYC and London. Hard to quantify exactly, but see the sponsors list of the juliacon events for some examples.


Even APL has a couple of web frameworks...

Having a few libraries of each kind (of varying quality and with very small adoption) != tons.


Slight aside -- I am a long-time user of Cython and would love to contribute, but poking around the source code feels daunting -- like there is a huge learning curve to be able to contribute, and that other contributors will only view newcomers as an annoyance who will drain their time with questions.

Given all this, what's the best way for someone to pick up and start contributing to Cython?


It definitely is daunting; in particular because Cython isn't so much a goal as itself as a tool that developers are motivated to work on it because they use it in some other project they care about (e.g., lxml, Sage, ...). So it's been pragmatic development.

If you don't already, subscribe to and start to follow cython-devel. First step is probably repeat the question there for more up-to-date info than what I can give (I don't even follow any longer).

Think about what you want to achieve/change in Cython. A new feature may be easier than a bugfix, though I'm not sure how many "low hanging features" are left at this point.... Anyway, make sure you understand what the change would involve in changes in the C code. Write a testcase that uses Cython the way you would want it to work (elicit the failure/bug/feature); look at the C code that Cython generates, and make sure you understand why that C code is the wrong code and that you know how you'd want it to look. (Understanding the generated C code at this level and read it almost fluently may take a little bit of getting used to but is an absolute requirement for working with Cython -- eventually you look past all the __pyx_ everywhere).

Then somehow try to beat the Cython codebase into submission in generating that C code... as you repeat this process you'll gradually learn the code.

It seems like Robert Bradshaw and Stefan Behnel are still around, they are very capable and friendly people and I learned a lot of what I know about programming from them, they were very welcoming to me as a new contributor.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: