ARC (if ‘A’ means atomic) is simply not a good fit for modern hardware. With tra...

magicalhippo · on Jan 12, 2023

> ARC (if ‘A’ means atomic)

A is for Automatic, as in the compiler adds the calls needed to manage the reference counting.

If the language and compiler can guarantee a reference is strictly contained within a thread, it doesn't have to be atomic. Usually that is not the case, so atomic operations are used.

rwmj · on Jan 12, 2023

You need more than that. Reference counting amplifies reads into writes. To ameliorate that (somewhat) you need to keep the references separate from the data (a particular problem with Rust). Otherwise you end up with lots of writes scattered all over memory, causing false sharing and lots of unnecessary communication over buses between your cores. One way to do this is to keep the references together in separate cache lines or pages, away from the data. And this only partly alleviates the problem.

zarzavat · on Jan 12, 2023

While this is true on a microscopic scale, the motivation for using RC is macroscopic: to release memory ASAP to get predictable, low memory usage. From the RC point of view the real enemy is swap and the memory management system's primary objective is to eliminate swap.

rwmj · on Jan 12, 2023

To summarise:

- your application is using close to 100% available RAM

- you don't care about the impact of all those extra writes clogging up CPU caches and bouncing cache lines back and forth across internal buses

- you can't afford to pay for a small amount of extra RAM

In that case reference counting sounds ideal. Good luck!

zarzavat · on Jan 13, 2023

This was exactly Apple's reasoning for using refcounting in Swift. Although, as I understand it they worked around problem (2) by building their own silicon.

If you're on the server then just buy more RAM and use a GC, I agree.

But if you're making consumer software then buying more RAM isn't an option. Additionally, RAM costs battery life even when it isn't in use.

JonChesterfield · on Jan 13, 2023

Prompt is even better than predictable. In particular, reliably knowing the refcount on immutable data (in the persistent structure sense) has reached one means you can mutate it undetectably.

zarzavat · on Jan 12, 2023

FWIW there is a clash of terminology here.

- ARC (upper case): Automatic Reference Counting (in Swift/ObjC)

- Arc (title case): Atomic reference counting (in Rust)

I assume OP intended the former.

celeritascelery · on Jan 13, 2023

I don't think there is really a distinction here. Swift uses atomic reference counting. Granted the details are different between approaches (swift tries to elide references counts when it can), but there are not two different "arcs" being discussed here, they refer to the same concept.

tsimionescu · on Jan 13, 2023

Atomic references counting is a necessary technology when you want to do reference counting for an object shared by multiple threads. This applies both when doing manual reference counting (Rust's Arc, C++'s std::shared_ptr) or when doing automatic reference counting (objects in Swift, Objective C, Python).

Choosing to use automatic reference counting in a language is a design choice, with positives and negatives. Atomic reference counting is an implementation detail, a strict necessity whenever you can't be sure an object will only be referenced by one thread (in essence, non-atomic reference counting is just an optimization).

magicalhippo · on Jan 12, 2023

Ah good to know. Never used Rust so haven't come across Arc, only ARC in various other languages.

CyberDildonics · on Jan 12, 2023

you pay a price for ref counting on every usage

What do you mean by 'usage'? References don't change when reading or writing the data being reference counted, they change when being passed to a function or returned from one. In C++ it's rare that you would even use reference counting, since that essentially means you don't know the lifetime of your value. Most variables are going to be on the stack and the vast majority of dynamic allocation is going to be referenced from a single scope at one time.

The reality is that it takes gross incompetence to have a speed impact from reference counting.

kaba0 · on Jan 12, 2023

Yeah I meant usage as “getting it into/out of scope”.

> In C++ it's rare that you would even use reference counting, since that essentially means you don't know the lifetime of your value

Sure, because it is a manual memory managed language with RC being an escape hatch only. But there are plenty of problems/programs where you simply can’t know the lifetime of your objects, e.g. Chrome uses a proper GC for C++ as well.

CyberDildonics · on Jan 12, 2023

with RC being an escape hatch only

I don't know what you mean by escape hatch, but it generally just isn't necessary and memory is managed automatically by scope. The bigger point here is that it just isn't a significant part of execution time.

But there are plenty of problems/programs where you simply can’t know the lifetime of your objects

Like what? I can only think of one, which is passing memory allocated in one thread to another thread.

Chrome uses a proper GC for C++ as well

This is an anecdote, it doesn't prove or disprove anything in the bigger picture.

kaba0 · on Jan 12, 2023

> managed automatically by scope

That’s just an implementation detail. The point is that the object’s lifetime is only known at runtime and will be reclaimed when a counter reaches zero, this is reference counting. Whether you have to manually inc/dec that counter, or the language does it for you through some abstraction is besides the point, it is automatic memory management either way, as it.. manages memory automatically.

> Like what? I can only think of one, which is passing memory allocated in one thread to another thread

Any programming language, both parsing into an AST, AST manipulations, interpretation (and that is a very wide category, not only for things you would think of as proper languages). But even some games may want to use GC for some in-game objects, as the lifetime of those is fundamentally dependent on user action.

Would the litany of managed languages and their widespread usage be less anecdotal?

CyberDildonics · on Jan 12, 2023

I'm not sure what points you are trying to make now, but originally you were talking about reference counting being slow and I was saying in a language like C++ it has no impact on performance because it is only necessary when giving memory allocated in one thread to another thread.

both parsing into an AST, AST manipulations, interpretation even some games may want to use GC for some in-game objects, as the lifetime of those is fundamentally dependent on user action

Here you are conflating the lifetime of resources inside the various scopes of a program with dynamic resources in a game. These are not the same thing. Language level reference counting will not save you or help you to know when to unload a level or a texture. Just because there is control over resources doesn't mean reference counting. Likewise even in something like java you need to set links to heap allocated objects to null so that they can be garbage collected. The language doesn't magically know when you need to unload a level.

kaba0 · on Jan 13, 2023

Because reference counting is slow. Sure, if you use it for 2 objects that are always in scope you won’t see its effect, but I would argue about “using RC” at all with such a small number.

> only necessary when giving memory allocated in one thread to another thread

RC count can be larger than one even when only a single thread using it. But I’m not familiar with this usage and it is not really RC for memory management anymore, more like a lock-less data structure.

I wasn’t talking about texture/level loading/unloading because it is more complex, but things like using scripting languages for part of the game logic.

CyberDildonics · on Jan 13, 2023

Because reference counting is slow.

I think you're just repeating yourself, but I'm not sure what question you're answering.

RC count can be larger than one even when only a single thread using it. But I’m not familiar with this usage and it is not really RC for memory management anymore, more like a lock-less data structure.

I don't understand what you are saying here.

things like using scripting languages for part of the game logic.

Scripting languages are slow for a lot of reasons, like pointer chasing and excessive memory allocations. Reference counting is a very small piece of that puzzle.

pebal · on Jan 14, 2023

Reference counting has limitations. It does not release cyclic structures, not suitable for lock-free algorithms and may overflow the stack.

CyberDildonics · on Jan 14, 2023

It does not release cyclic structures

I don't think anyone is debating that.

not suitable for lock-free algorithms and may overflow the stack.

This you will have to explain. I see people make vague assertions like this but I never see a good explanation.

pebal · on Jan 14, 2023

Can't atomically assign a pointer and increment a counter, need to use a lock. If make a long list using shared_ptr will overflow the stack when the head destructor executes.

CyberDildonics · on Jan 14, 2023

Can't atomically assign a pointer and increment a counter

You can do that in multiple ways.

First you can use the extra bits of a pointer for a counter to fit it all into 64 bits.

Second, you can use a 128 bit compare and swap which has been supported by CPUs for about 20 years now.

Third, you can not use pointers and use indices of whatever bit resolution you want, using the extra bits for a counter.

Finally, how does a garbage collector change this ?

If make a long list using shared_ptr will overflow the stack when the head destructor executes.

If we set aside for a second the insanity in making a linked list where every pointer destruction calls the next pointer in the list's destructor, how is this unique to a shared_ptr ?

pebal · on Jan 14, 2023

Did you know that the counter is stored in a different place than the pointer? All current atomic<shared_ptr> implementations use a lock. Stack overflow is not unique to a shared_ptr, but GC pointers don't have this problem. The reference counting has advantages, but it cannot fully replace GC pointers.

CyberDildonics · on Jan 14, 2023

I think you're confusing shared_ptr with reference counting as a technique in general. Can you answer the questions I have above without talking about shared_ptr?

Stack overflow is not unique to a shared_ptr, but GC pointers don't have this problem

No one should ever have this problem. It is a ridiculous way to make a linked list in the first place.

pebal · on Jan 14, 2023

I see you just don't want to see the problem. Look at this document: https://www.open-std.org/jtc1/sc22/WG21/docs/papers/2014/n41...

CyberDildonics · on Jan 14, 2023

You're still trying to compare one basic smart pointer implementation to garbage collection in general.

You told me something was impossible to do without garbage collection and I explained three different ways that I've already done it, then you just keep trying to talk about something that was never up for discussion in the first place. You hallucinated shared_ptr into the conversation from nowhere.

Without talking about smart pointers, what am I missing from the list above? Why do some lock free algorithms need garbage collection?

pebal · on Jan 15, 2023

I am writing about shared_ptr because C++ is one of a top language. Read about the ABA problem.

CyberDildonics · on Jan 15, 2023

I'm talking about C++ too, but I write lock free algorithms and data structures all the time and they have nothing to do with with smart pointers in any way.

Why don't you answer my questions above? They confront the ABA problem directly since I explained three ways to keep counts paired with pointers or indices. What can't be done without a garbage collector? Why do you keep giving vague recommendations to read about general topics? Give me a specific deeply technical answer if you can.

pebal · on Jan 15, 2023

Show an implementation of a concurrent lock-free stack without delayed freeing.

CyberDildonics · on Jan 15, 2023

I think you're mixing up allocation with a data structure. There are lots of implementations of lock free lists and stacks in C++ out there. One simple implementation is an array where every index holds the next index. A current variable holds the next index to deal with and a version number.

When you want to allocate an index you check the current index and version, and replace it with the index points to if the version is the same. Freeing is the reverse since you have an index to give the list.

These indices are used to coordinate to a second array where you can store whatever data you want.

Here are some other techniques.

https://people.csail.mit.edu/shanir/publications/Lock_Free.p... https://www.boost.org/doc/libs/1_55_0/boost/lockfree/stack.h... https://lumian2015.github.io/lockFreeProgramming/lock-free-s...

Still, I'm not sure what garbage collection changes about these techniques. Lock free lists have been studied for a long time, they have nothing to do with memory allocation.

pebal · on Jan 15, 2023

Described algorithms ignore the ABA problem. The boost implementation avoids the ABA problem, but does not free up memory at all and stores pointers on 48 bits which is not enough on new architectures.

CyberDildonics · on Jan 16, 2023

Once again, you haven't mentioned at all how garbage collection changes anything, even though that was what you originally said and never backed up with anything.

Described algorithms ignore the ABA problem.

I literally wrote a method for doing that, an index with a version that can be checked to make sure nothing changed.

Also if you're going to say that heavily tested implementations ignore the ABA problem you need to explain why you think that or why you think they won't work and again, why garbage collection changes anything.

stores pointers on 48 bits which is not enough on new architectures.

48 bits is the size of the memory controller on modern CPUs and exceeding that would need over 281 terabytes of memory.

Again, the original question is what lock free algorithm can be done with garbage collection that can't be done without it?

pebal · on Jan 16, 2023

You need to implement some form of GC when you want to free memory in a lock-free container. Read the thread and the employee's statements from Intel: https://community.intel.com/t5/Intel-oneAPI-Threading-Buildi...

New processors use 56 bits of virtual address. It doesn't matter if you have that much memory, because addresses are virtual. Also, newer versions of Android do not allow the use of unused address bits.

tsimionescu · on Jan 13, 2023

> Like what? I can only think of one, which is passing memory allocated in one thread to another thread.

How would you implement a graph that requires heap allocation for each node and supports adding/removing nodes and links with purely scope-based memory management (no shared_ptr or equivalent, no tracing GC), while still reclaiming memory as soon as possible (so no arena-based solutions)?

There are also single-threaded concurrent scenarios where object ownership can be ambiguous, but those are similar to the multi-threaded scenarios you discuss.

CyberDildonics · on Jan 13, 2023

In the case of nodes in a graph, you have to keep track of where the connections go anyway, so you can count the list of connections.