"Among these, Pony is the only language that concurrently garbage collects actors."
I must not be understanding that claim correctly. Erlang does garbage collection on a per actor basis (at least that's my understanding). If the claim is that multiple actors can garbage collect simultaneously, then I guess that means Erlang only GCs one actor at a time. If so, why can't Erlang GC multiple actors at a time? It does full SMP for actor execution.
> If so, why can't Erlang GC multiple actors at a time? It does full SMP for actor execution.
It does. The article is incorrect. Erlang has a fully concurrent garbage collector among actors. One actor's GC running on one CPU scheduler will not interfere with execution of actors running in other CPU schedulers.
There's a difference between GC'ing the memory reachable from an actor and GC'ing the actors themselves. Erlang requires a "poison pill" message to kill actors.
Also, looking at the white paper I see really weird numbers.
This section in particular:
Benchmarks and preliminary comparisons with Erlang,
Scala, Akka, and libccpa Table1 & Table2
It seems that you have the exact same numbers for Erlang and Scala in table2, this is very hard to believe. You either accidentally put down the same number twice or you measured the performance of your benchmarking tool, otherwise it is extremely unlikely that two entirely different systems give the the same exact measurement. Similar story in table1. maybe I am missing something terribly obvious but this looks off to me.
>It seems that you have the exact same numbers for Erlang and Scala in table2, this is very hard to believe.
The numbers are taken from another source and it seems they didn't have the exact numbers, hence the ~9s figure which is then used to calculate 333,333. They seem to have gotten the numbers by looking up the graphs on this page:
I am not an expert in benchmarking, so maybe I'm missing something. But how is that not crazy?
If I were looking to compare two things, I would run all the benchmarks on a single machine under my control. I might look at previously published reports to make sure I was getting comparable numbers. But there is no way I would publish a comparison that I merely hoped was apples to apples. I've just had too many benchmarks depend on subtle issues, ones that I had presumed were irrelevant.
There is but what is relevant from the design of the Erlang concurrent GC is that your actor operations latency is not impacted by it. This is why Erlang is extremely suitable for HTTP routers and request dispatch because you can maintain tight SLA on the p99.99 latency as opposed something like JVM where the GC locks up all of the executions, or at least this used to be the case.
The Pony object garbage collector is fully concurrent, the reachable memory for any actor is GC'd totally independently. At the same time, Pony allows (safely, with no data races) sharing pointers across actors, for performance (ie without copying).
There's a paper on the type system that allows this:
What I am saying is that the Erlang GC is good enough from the practical point of view. I am not sure what value are you trying to add with the "fully concurrent" GC.
It's not the same thing. The post is talking about automatically figuring out that nobody knows the Pid of a process and hence that process can be reclaimed. This is a transitive notion: If a group of processes can't be "reached" from another group and the latter group is the "important" one, then you can just kill off the first group of proceses. This is why it is GC-like behaviour. Like in a GC, processes can "leak" if you forget to throw the Pid away.
Erlang's method is to form linked webs of processes and then the death of a process "poisons the web" and kills off all processes in the web. By trapping exits, you can put in stopgaps for this behaviour, which is what supervisors do, among other things.
Process handling in Erlang is more akin to "manual memory management" or ARC/RAII style memory management here.
> The post is talking about automatically figuring out that nobody knows the Pid of a process and hence that process can be reclaimed.
This seems pretty impossible in distributed Erlang. Perhaps the Pid was sent to another node (which may be alive but not currently dist connected), or was serialized and may be deserialized later.
It's like jlouis said, with Erlang you have to kill your processes off when you've finished with them and if you don't they leak. In Pony that's done for you automatically.
Thanks, I think you end up killing your Erlang processes most of the time because this is the model you follow when programming in Erlang. Using HTTP as example, while in other systems it is a really bad idea to have 1 req -> 1 process (or thread) mapping in Erlang it is encouraged. When the request is answered and the response is sent back the process dies. I think is a fairly simple model. I guess I need to look into Pony more to understand the importance of this in it.
Can you explain some more or point to the exact part of the research paper. When you say "GC'ing actors themselves vs GC'ing memory reachable from actor" what exactly do you mean? Are you talking about the process dictionary and mailbox vs the state of actor passed through its loop function?
For arguments' sake here is what a simple Erlang actor looks like:
I guess the claim was that such an erlang actor would run forever when nobody sends it a message to stop it. And with pony it would be detected that the actor is no longer reachable and it would be automatically stopped.
Erlang can "leak" processes if the process doesn't exit once it's no longer needed. Pony (I assume) will clean up a process automatically once there are no more references to it.
Exactly: it will clean an actor, once there are no messages to it, and it has no work to do, and there are no references to it (or any references come from such actors which have no work themselves -- i.e. are in a cycle).
I'm wondering about something. In big systems you'll see actors on different platforms communicating with eachother. For example, a javascript actor would communicate with a server actor. How would garbage collection work in that case? Would cycles be detected across machine boundaries?
Perhaps you're thinking of some different concept of actor?
I don't recall Javascript supporting or having any library that would enable actor model of concurrency.
The general idea of actor model is that each actor is a process with a queue (mail box) and they talk to each other via msg. Javascript does not have such a thing nor does a server actor?
I must not be understanding that claim correctly. Erlang does garbage collection on a per actor basis (at least that's my understanding). If the claim is that multiple actors can garbage collect simultaneously, then I guess that means Erlang only GCs one actor at a time. If so, why can't Erlang GC multiple actors at a time? It does full SMP for actor execution.