Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm sorry, but pure single core performance ended circa 2007. This is the end of the road.

It is shameful that in 2016 we still don't have, say, parallel rendering in browsers. All hope is for Servo.



That is definitely false - single core performance still matters a lot in many ML applications, where utilizing many (>8) cores is still difficult.


If by ML, you mean machine learning (and not, e.g., Ocaml et al.), I thought those people were actually into GPUs.


GPUs are useful for deep learning and a few other easily parallelizable algos, but the majority of open source ML software is still stuck in the CPU.


Redis is mostly single thread.


This is not really a great thing about Redis.


It would be a more powerful tool if it was multi-threaded. I'm not an expert, but my impression that it was able to be more powerful in other ways (stability, features) because of the choice to make it mostly single threaded (given labor and complexity constraints).

Is there an alternative RAM database that you like better that is multi-threaded?


As much as Redis is an incredibly potent tool and the quality of craftsmanship on it is very high, there are some incredibly peculiar design decisions that have been made.

Single-treading is one of those. There are times when having more than one thread to help process things would come in very handy, though I recognize that the cost of adding this can be very high.

It's something that will have to be addressed eventually for a single Redis process to take advantage of newer hardware with very low ceilings on CPU power, but huge numbers of cores.


ML on CPUs?


That seems unrealistic. Single core performance still matters for algorithms that can't be efficiently parallelised. Moreover, writing efficient, parallel versions of a lot of algorithms is hard, and often introduces significant overheads of its own that must be outweighed by the better scalability that the parallelisation brings.


> Single core performance still matters for algorithms that can't be efficiently parallelised.

Any real world examples? Especially considering that at this point, sacrificing cores to boost the remaining ones seems to be a really bad deal with current silicon. Core power requirements appear to decrease faster than their actual computational speed does if you go low-power. Even if you lose 40% of performance due to overhead, if the same-TDP CPU package is twice as fast with more cores, you still win. (And who's to say that your implementation can't be improved in the future?)


I honestly don't know how to answer that. Are you suggesting that you know how to parallelise an arbitrary expensive algorithm? Because if you've beaten Amdahl's law, a lot of people would like to make you very, very rich.


Sounds like an ambiguous question. Most algorithms are "arbitrarily expensive". It generally depends on some measure of the data you're putting in. But in case you mean "an arbitrary algorithm", then no, nobody knows how to do that. But it appears that the most useful things people actually want to do lie somewhere in the middle: not trivial to parallelize but also not exactly impossible.


Just to be clear, I meant what I wrote: an arbitrary expensive algorithm, i.e., solving the general case.

As for "most useful things people actually want to do", it seems to me that a lot of relatively computationally expensive software still isn't using lots of cores where they're available in practice today.

One significant example is computer games. Since the advent of GPUs with effectively hundreds or thousands of parallel computations available, rendering hasn't been the bottleneck it once was. Today the bottleneck might instead be the game control logic that runs on a CPU, and is often still either single-core or divided among at most a small, fixed number of cores doing different tasks.

Another common real world example is graphics and image processing software. You'd think there might be a lot of natural data parallelism to exploit, but software in this area has made relatively little use of algorithms that scale to arbitrary numbers of cores so far.

A third example would be real-time processing, say operations on high speed network traffic. In this case you can sometimes dispatch different packets to different cores to process them in parallel, but the amount of processing you can do on any given packet might well be limited by the speed of a single core, because the overheads for cache misses or inter-core communications are prohibitive. If your processing needs to consider more than one packet at once, so you can't just spray packets at different cores as they arrive, then this can become a very significant real world bottleneck.

This isn't to say that none of these problems will ever be solved as we develop more understanding and better tools, but even in 2016 the state of the art is far from using as many cores as we have available efficiently for a lot of real world use cases. Manual parallelisation often has architecture-level implications and few development teams have the experience and foresight to get it right consistently with today's programming tools. Automatic optimisation to exploit data parallelism is an interesting research field but still in its infancy, and many mainstream programming languages have far from ideal semantics for such optimisations because of aliasing issues and the like. Either or probably both of these areas will have to advance considerably before we can assume that scaling out into more cores is generally going to give better performance than scaling up with faster CPUs and related hardware architecture.


If someone doesn't value his life, then sure - single core perfomance ended even in 1988. Every millisecond matters.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: