Language choices are a mess. There can easily (and often) be conflicting data based on:
- accept-language header
- URL that includes language/region codes as a subdomain or part of the path
- language preferences set in a cookie or account
- IP region detection
In the end, any website is trying to provide the right language most often for their users, and there are no easy answers. When I access webmail from an Internet cafe in China, I don't want the interface popping up in Chinese just because the browser's accept-language is configured for Chinese. Fortunately, it doesn't.
Most web users have never even heard of accept-language, it's just automatically configured by whatever language their browser was installed in, which isn't always the language you want to be browsing in. (E.g. you bought your laptop overseas because it was cheaper, so it runs in English instead of your own language.) It's not a surprise that IP address detection provides the best default experience most of the time, which can then be overridden by URL or user choice, and that accept-language is fairly irrelevant.
* In all cases, a fairly visible language picker is displayed at the top of the page, with internationalized language names.
* If someone goes to a language-specific subdomain (fr.dolphin-emu.org, cy.dolphin-emu.org, ast.dolphin-emu.org, ...), they get this version.
* If someone goes to the generic/english dolphin-emu.org, the system checks whether the user has a "nocr" cookie. If so, they get the english website. Otherwise, they get redirected based on their Accept-Language.
* If a user uses the language picker, we assume they know what they want and set the "nocr" cookie to disable redirections in the future.
* When the user gets redirected from the standard/english version to an internationalized version, a message is shown in english saying that they have been redirected based on their browser preferences, with a link to go back to the english version (and set the "nocr" cookie).
I thought for a pretty long time about this and think it is a good compromise between providing the best version for our users and not being annoying/guessing too much. In the end, more than 50% of our users now are shown internationalized versions of our website, which is a very good number in my opinion.
They do make sense for many users, and they are the closest you can find to a proper graphical representation of languages. When I add a language that I know to be official in several countries, I look at my analytics to see where most users come from and use the flag from their country. I can't remember a time where it did not also match the country with the most speakers.
It's a common enough practice that most people usually know what it means, but there's a reason you don't see flags on Wikipedia, Facebook, or Youtube. Languages are spoken in many countries, and countries are multilingual. There are quite a few articles around the web on this topic, but that's basically what they boil down to: languages are not countries. Some users may be confused or offended that their flag is not represented.
And as a Canadian I find it generally a little weird that the Canadian flag often means Canadian French, and I have to click the US flag to get English (which is of course a slightly different English than Canadian English which is probably unavailable).
I guess it's something like "language most unique to that country", no but that's not right either... I don't know.
Unless you have different pricing per country or something orthogonal to language, I'm sure than a speaker of Canadian French can figure out that clicking the French flag may help them understand this page better. It's a common enough idiom on the web.
I think in the case of more than one country per language, you're right, just picking a big and/or well-known country as "representative" is fine: French flag for French, US or UK flag for English, German flag for German.
The bigger problem is the other situation, of more than one language per country. India has ~13 languages with >10m native speakers, and using the Indian flag for all of them would be pretty confusing. You could pick state flags (e.g. the flag of Gujarat to represent Gujarati), but that can be a politically tricky issue. In some cases choosing a representative flag for a language has even stronger political overtones, like using the flag of the Kurdistan independence movement to represent the Kurdish language. Plus, it's not always that clear which flag to pick, and user recognition may not be as high as in the French-flag-for-French case.
- accept-language header
- URL that includes language/region codes as a subdomain or part of the path
- language preferences set in a cookie or account
- IP region detection
In the end, any website is trying to provide the right language most often for their users, and there are no easy answers. When I access webmail from an Internet cafe in China, I don't want the interface popping up in Chinese just because the browser's accept-language is configured for Chinese. Fortunately, it doesn't.
Most web users have never even heard of accept-language, it's just automatically configured by whatever language their browser was installed in, which isn't always the language you want to be browsing in. (E.g. you bought your laptop overseas because it was cheaper, so it runs in English instead of your own language.) It's not a surprise that IP address detection provides the best default experience most of the time, which can then be overridden by URL or user choice, and that accept-language is fairly irrelevant.