ARC (if ‘A’ means atomic) is simply not a good fit for modern hardware. With traditional tracing GCs you get blazing fast allocation and get to amortize the cost of deallocation, while you pay a price for ref counting on every usage (or at best on almost every usage with a good compiler).
A is for Automatic, as in the compiler adds the calls needed to manage the reference counting.
If the language and compiler can guarantee a reference is strictly contained within a thread, it doesn't have to be atomic. Usually that is not the case, so atomic operations are used.
You need more than that. Reference counting amplifies reads into writes. To ameliorate that (somewhat) you need to keep the references separate from the data (a particular problem with Rust). Otherwise you end up with lots of writes scattered all over memory, causing false sharing and lots of unnecessary communication over buses between your cores. One way to do this is to keep the references together in separate cache lines or pages, away from the data. And this only partly alleviates the problem.
While this is true on a microscopic scale, the motivation for using RC is macroscopic: to release memory ASAP to get predictable, low memory usage. From the RC point of view the real enemy is swap and the memory management system's primary objective is to eliminate swap.
This was exactly Apple's reasoning for using refcounting in Swift. Although, as I understand it they worked around problem (2) by building their own silicon.
If you're on the server then just buy more RAM and use a GC, I agree.
But if you're making consumer software then buying more RAM isn't an option. Additionally, RAM costs battery life even when it isn't in use.
Prompt is even better than predictable. In particular, reliably knowing the refcount on immutable data (in the persistent structure sense) has reached one means you can mutate it undetectably.
I don't think there is really a distinction here. Swift uses atomic reference counting. Granted the details are different between approaches (swift tries to elide references counts when it can), but there are not two different "arcs" being discussed here, they refer to the same concept.
Atomic references counting is a necessary technology when you want to do reference counting for an object shared by multiple threads. This applies both when doing manual reference counting (Rust's Arc, C++'s std::shared_ptr) or when doing automatic reference counting (objects in Swift, Objective C, Python).
Choosing to use automatic reference counting in a language is a design choice, with positives and negatives. Atomic reference counting is an implementation detail, a strict necessity whenever you can't be sure an object will only be referenced by one thread (in essence, non-atomic reference counting is just an optimization).
What do you mean by 'usage'? References don't change when reading or writing the data being reference counted, they change when being passed to a function or returned from one. In C++ it's rare that you would even use reference counting, since that essentially means you don't know the lifetime of your value. Most variables are going to be on the stack and the vast majority of dynamic allocation is going to be referenced from a single scope at one time.
The reality is that it takes gross incompetence to have a speed impact from reference counting.
Yeah I meant usage as “getting it into/out of scope”.
> In C++ it's rare that you would even use reference counting, since that essentially means you don't know the lifetime of your value
Sure, because it is a manual memory managed language with RC being an escape hatch only. But there are plenty of problems/programs where you simply can’t know the lifetime of your objects, e.g. Chrome uses a proper GC for C++ as well.
I don't know what you mean by escape hatch, but it generally just isn't necessary and memory is managed automatically by scope. The bigger point here is that it just isn't a significant part of execution time.
But there are plenty of problems/programs where you simply can’t know the lifetime of your objects
Like what? I can only think of one, which is passing memory allocated in one thread to another thread.
Chrome uses a proper GC for C++ as well
This is an anecdote, it doesn't prove or disprove anything in the bigger picture.
That’s just an implementation detail. The point is that the object’s lifetime is only known at runtime and will be reclaimed when a counter reaches zero, this is reference counting. Whether you have to manually inc/dec that counter, or the language does it for you through some abstraction is besides the point, it is automatic memory management either way, as it.. manages memory automatically.
> Like what? I can only think of one, which is passing memory allocated in one thread to another thread
Any programming language, both parsing into an AST, AST manipulations, interpretation (and that is a very wide category, not only for things you would think of as proper languages). But even some games may want to use GC for some in-game objects, as the lifetime of those is fundamentally dependent on user action.
Would the litany of managed languages and their widespread usage be less anecdotal?
I'm not sure what points you are trying to make now, but originally you were talking about reference counting being slow and I was saying in a language like C++ it has no impact on performance because it is only necessary when giving memory allocated in one thread to another thread.
both parsing into an AST, AST manipulations, interpretationeven some games may want to use GC for some in-game objects, as the lifetime of those is fundamentally dependent on user action
Here you are conflating the lifetime of resources inside the various scopes of a program with dynamic resources in a game. These are not the same thing. Language level reference counting will not save you or help you to know when to unload a level or a texture. Just because there is control over resources doesn't mean reference counting. Likewise even in something like java you need to set links to heap allocated objects to null so that they can be garbage collected. The language doesn't magically know when you need to unload a level.
Because reference counting is slow. Sure, if you use it for 2 objects that are always in scope you won’t see its effect, but I would argue about “using RC” at all with such a small number.
> only necessary when giving memory allocated in one thread to another thread
RC count can be larger than one even when only a single thread using it. But I’m not familiar with this usage and it is not really RC for memory management anymore, more like a lock-less data structure.
I wasn’t talking about texture/level loading/unloading because it is more complex, but things like using scripting languages for part of the game logic.
I think you're just repeating yourself, but I'm not sure what question you're answering.
RC count can be larger than one even when only a single thread using it. But I’m not familiar with this usage and it is not really RC for memory management anymore, more like a lock-less data structure.
I don't understand what you are saying here.
things like using scripting languages for part of the game logic.
Scripting languages are slow for a lot of reasons, like pointer chasing and excessive memory allocations. Reference counting is a very small piece of that puzzle.
Can't atomically assign a pointer and increment a counter, need to use a lock. If make a long list using shared_ptr will overflow the stack when the head destructor executes.
Can't atomically assign a pointer and increment a counter
You can do that in multiple ways.
First you can use the extra bits of a pointer for a counter to fit it all into 64 bits.
Second, you can use a 128 bit compare and swap which has been supported by CPUs for about 20 years now.
Third, you can not use pointers and use indices of whatever bit resolution you want, using the extra bits for a counter.
Finally, how does a garbage collector change this ?
If make a long list using shared_ptr will overflow the stack when the head destructor executes.
If we set aside for a second the insanity in making a linked list where every pointer destruction calls the next pointer in the list's destructor, how is this unique to a shared_ptr ?
Did you know that the counter is stored in a different place than the pointer? All current atomic<shared_ptr> implementations use a lock. Stack overflow is not unique to a shared_ptr, but GC pointers don't have this problem. The reference counting has advantages, but it cannot fully replace GC pointers.
I think you're confusing shared_ptr with reference counting as a technique in general. Can you answer the questions I have above without talking about shared_ptr?
Stack overflow is not unique to a shared_ptr, but GC pointers don't have this problem
No one should ever have this problem. It is a ridiculous way to make a linked list in the first place.
You're still trying to compare one basic smart pointer implementation to garbage collection in general.
You told me something was impossible to do without garbage collection and I explained three different ways that I've already done it, then you just keep trying to talk about something that was never up for discussion in the first place. You hallucinated shared_ptr into the conversation from nowhere.
Without talking about smart pointers, what am I missing from the list above? Why do some lock free algorithms need garbage collection?
I'm talking about C++ too, but I write lock free algorithms and data structures all the time and they have nothing to do with with smart pointers in any way.
Why don't you answer my questions above? They confront the ABA problem directly since I explained three ways to keep counts paired with pointers or indices. What can't be done without a garbage collector? Why do you keep giving vague recommendations to read about general topics? Give me a specific deeply technical answer if you can.
I think you're mixing up allocation with a data structure. There are lots of implementations of lock free lists and stacks in C++ out there. One simple implementation is an array where every index holds the next index. A current variable holds the next index to deal with and a version number.
When you want to allocate an index you check the current index and version, and replace it with the index points to if the version is the same. Freeing is the reverse since you have an index to give the list.
These indices are used to coordinate to a second array where you can store whatever data you want.
Still, I'm not sure what garbage collection changes about these techniques. Lock free lists have been studied for a long time, they have nothing to do with memory allocation.
Described algorithms ignore the ABA problem. The boost implementation avoids the ABA problem, but does not free up memory at all and stores pointers on 48 bits which is not enough on new architectures.
Once again, you haven't mentioned at all how garbage collection changes anything, even though that was what you originally said and never backed up with anything.
Described algorithms ignore the ABA problem.
I literally wrote a method for doing that, an index with a version that can be checked to make sure nothing changed.
Also if you're going to say that heavily tested implementations ignore the ABA problem you need to explain why you think that or why you think they won't work and again, why garbage collection changes anything.
stores pointers on 48 bits which is not enough on new architectures.
48 bits is the size of the memory controller on modern CPUs and exceeding that would need over 281 terabytes of memory.
Again, the original question is what lock free algorithm can be done with garbage collection that can't be done without it?
New processors use 56 bits of virtual address. It doesn't matter if you have that much memory, because addresses are virtual. Also, newer versions of Android do not allow the use of unused address bits.
> Like what? I can only think of one, which is passing memory allocated in one thread to another thread.
How would you implement a graph that requires heap allocation for each node and supports adding/removing nodes and links with purely scope-based memory management (no shared_ptr or equivalent, no tracing GC), while still reclaiming memory as soon as possible (so no arena-based solutions)?
There are also single-threaded concurrent scenarios where object ownership can be ambiguous, but those are similar to the multi-threaded scenarios you discuss.