Note that Chrome (or the application) chooses to reverse the self-view so that it acts like a mirror, whereas Firefox (or its application) chooses to display what the other person sees.
I don't think it's a hard choice for the browser to make- send the data as it is captured by the camera. How that is displayed is something else, but as timdorr demonstrates, it's not difficult to toggle.
In what way? The developer should have no problem justifying the decision. "It's what the camera sensor sees".
I absolutely understand the application in webchat and why you would want it to be mirrored at a page level, but I'm at a loss to understand why the browser implementation would try to reflect that.
I agree that it's a hard choice; I had to make this exact choice last week when working on a video chat service. Possibly because of the nature of the service (tutoring/teaching) it felt incredibly unnatural to see a non-mirrored image of myself. I tried it out both ways and it really did feel quite peculiar, and I had a hard time figuring out my own movements (kinda like trying to shave using two mirrors).
Are there any usability studies on video self-views? What are the precedents from other video conferencing software like Apple's FaceTime and Google Hangouts?
I expect it heavily depends on the application, context and user role. So it's a great thing to offer developers the choice of whether to mirror or not.
Because then the thumbnail where you see your own face behaves more like a real-world mirror, which is easier to reason about if you want to adjust your position.
It's a hard choice to make.