Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The absolute minimum that average users of the C++ memory ordering system in std atomic loads, etc. [1] should know is that the default memory access scheme is 'sequentially consistent ordering', whereas to get the ultimate performance, one should use the 'relaxed' scheme, or alternatively use 'release/acquire' as described both in the article and in [1].

[1] http://en.cppreference.com/w/cpp/atomic/memory_order



To be honest, the average C++ programmer should not be bothered with any of the concurrency stuff. The chance that you get it wrong and introduce subtle concurrency bugs that will obviously only show up in production environments is simply too great.

What is needed is a bunch of concurrently usable STL containers (eg. checks for invalidating iterators on entering items into a map from another thread), akin to the java.util.concurrent library. That way you separate the tricky concurrency stuff out from the people that probably do not understand it.


I was with you until you said "concurrently usable containers." While they aren't standard, they do exist in C++ (Intel Threading Building Blocks and whatever Microsoft called it's clone of Intel TBB), and they aren't as useful as people believe. C#'s concurrent containers have the same limitations, and to the extent I've used them, Java concurrent containers have the same limitations.

The limitations I'm thinking of is that it's very common to need to do more than one operation atomically, and common concurrent containers are unable to do that. You can atomically add or remove a single item from a concurrent queue, stack or hash map, and sometimes that's enough. But if you want to iterate a whole container, say searching for all items that meet some criteria, you need a lock.


To iterate the whole container, I don't need a lock. I just use a container that offers cheap immutable snapshots, e.g. Scala TrieMap and I store only immutable items in it. Doing similar structures in C++ is much harder if not impossible without efficient GC.


I have to say, I have been pleasantly surprised time and time again by the powerful ease of data-structure-heavy algorithm implementation afforded by thread-safe, atomic std::shared_ptr, the rest of <memory> and <atomic>, judicious use of move semantics and open mp.

It's also a huge relief to have deterministic deallocation back in my life. Praise Jesus for that.

It's not everyone's cup of tea, but if you need the speed, it's there for you - with a solid, clean and expressive implementation. Your classic symbolic regression genetic algorithm with halls-of-fame and whatnot is an absolute cinch, for example, and runs gob-smackingly fast with -O3 -march=native, and a good ten minutes of profiling data for pgo.


Using 'relaxed' (or acquire/release) for performance without some vague understanding of the semantic implications (of relaxed especially, and the memory model more generally) is a recipe for disaster and some confusing bugs.


As a corollary, if you see someone using anything other than the default, check their code carefully.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: