Maintaining your own VM is no big deal. Compared to Facebook, HHVM is a tiny codebase. A team made of relatively small number of people (high quality, but low quantity) can and do maintain VMs like HotSpot and V8. LuaJIT is maintained by a single person.
As someone who has worked on a VM, I couldn't disagree more. It can take years to hash out things as simple as ideally performing primitives for a target architecture, and then things change.
Add to that the complexity of optimizing compilers, specification of byte code formats and a consistent virtual machine memory model that can be relied upon across architectures, and the art and science of highly concurrent garbage collectors, and your "no big deal" is a load of hogwash.
Hotspot alone is nearly 20 years of big deal.
> A team made of relatively small number of people (high quality, but low quantity) can and do maintain VMs ...
The number of people doesn't matter in this equation; your small team of (expensive, rare, high quality people) can't build a world-class VM in a day. Or a month. Or a year. Maybe in 5 or 10 years, just ask Microsoft.
> ... like HotSpot and V8. LuaJIT is maintained by a single person.
LuaJIT's said "single person" has been working on it for what, 10 years? It's an extremely impressive implementation and I don't want to bag on it, but even still, it lags in certain areas, eg, its GC implementation isn't up to par with the state of the art.
The author's skillset is extremely rare, and LuaJIT itself is an anomaly in the field. Using such a one-off example doesn't really hold water to prove that it's ideal for a company to internalize maintaining a VM for their own custom language built on top of PHP.
I am not trying to belittle efforts necessary for the state of the art VM or programming language implementation. I get paid to do these stuffs, and I am on my third VM/PL project now. It is also true these things take time and not very parallelizable, so while man-month may not be that big, you can't make it faster by throwing more people.
On the other hand, I maintain it still is no big deal compared to rewriting Facebook. I also maintain while skillset is rare, Facebook apparently had no trouble so far and will have no trouble in the future finding (I remind you, small number of) people to work on VM. I also remind you Facebook has been working on alternative PHP implementation for 6 years now, 2 years in private(2008~2010) and 4 years in public(2010~2014). It has been profitable for them for 6 years, will be profitable in the future, and profitability does not need "sharing maintenance load with the wider industry". They can maintain it fine thank you very much. Because, in the end, VM is no big deal.
If they're wasting money on bad management decisions, they're wasting stockholder money.
They're also continuing to propagate an outwardly facing engineering culture that will make it even harder to hire people to help dig them out of the PHP hole -- perpetuating this further.
Your argument is simply another take on survivor bias fallacy.
> I get paid to do these stuffs, and I am on my third VM/PL project now ... Because, in the end, VM is no big deal.
You keep saying that, and yet, there keep being so few high quality VMs.
What do you consider to be high quality VM? How many do you expect to see and how many do you find?
Adaptive JIT and generational GC would be a good baseline. Limiting myself to open source VM, I think (at least) HotSpot, Mono, V8, JavaScriptCore, PyPy, SBCL, Racket qualify. J9, CLR, Chakra, Allegro CL also qualify, but not open source. SpiderMonkey, LuaJIT, HHVM lack generational GC. All these projects are actively developed, and there are doubtlessly more, e.g. I am not faimilar with Smalltalk VM, some of which are commercial. Research VM like JikesRVM, Maxine qualify. I believe Bartok qualifies too.
I am not sure what you are arguing for. If you are arguing for Quercus route(PHP-on-JVM), I think it's unclear Quercus route is better than HHVM route. If you are arguing for not running existing PHP codebase, I think you are being unrealistic.
It's not survivorship bias. Facebook is an existence proof that there is no "PHP hole" that they are in, that it's largely a myth propagated by programming language nerds who have never tried scaling a site in PHP. When was the last time you heard about a site closing up because of PHP-induced technical debt? You don't. People re-write sites because of poor architecture, not because of poor programming languages, and PHP (in general) does not prevent you from building a site with good architecture, both from a software structural standpoint and an operational standpoint.
PHP's APIs are ugly. It's language semantics are a bit hairy until you get the hang of it. But there are parts of PHP that are extremely elegant and easy to reason about. It's OOP support provides all that you need to produce re-usable and easily understood code.
Facebook's work on PHP has focused on largely two dimensions: reducing CPU cycles and increasing static/runtime type checking. The former is something that only really matters at massive scale: PHP is generally fast enough since most of the time PHP processes are I/O bound reading from a database or memcached. It's only for sites like Facebook where if you squeeze out an additional 10% TPS from your boxes that you will start seeing large absolute cost reduction that this level of optimization starts to make sense. On the type checking side, this is something you might start to want in any dynamically typed language when you have millions of lines of code and want to ensure basic guarantees that it will run, and is something that you'd probably see Facebook doing if they were a Ruby or Python shop anyway. It has nothing to do with PHP but with the classic dynamic vs static typing tradeoff.
Should you be writing your chat server in PHP? No. But 90% of the code you write for a large website is HTTP response code rendering HTML or JSON. PHP excels at this and you can pretty much hire any developer off the street to start cranking out code if you give them a solid foundation to build on.
Facebook has already proven that they are able to make improvements that have substantially helped them to the point where this team is likely paying for itself many times over. It doesn't need to be perfect - it needs to offer return on investment, and it has.
It's possible that they could eventually get a total rewrite to give a better return, but frankly I don't think you have any idea of the enormity of trying to convert a multi-million line production platform from one language to another.
In any case, one does not preclude the other. Arguably, many of the changes they have made, such as gradual typing, and their ability to now slowly introduce other changes without breaking their existing codebase, means they have a platform for slowly firming up their codebase and migrate it towards a position where a full rewrite (should they decide to do one in the future) could be made substantially less painful.