Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What I'd like to know is, given the explosion of the character set for emoji, does the rationale for Han unification still make sense? The case for not allowing national variants seems less and less compelling with every emoji they add.

This is a bit of a hobby horse, but imagine if every time you read an article in English on your phone some of the letters were replaced with "equivalent" Greek or Cyrillic one and you can get an idea of the annoyance. Yeah, you can still read it with a bit of thought, but who wants to read that way?



I agree that Han unification was an unfortunate design decision, but I'd argue that the consortium is following a consistent approach to the Han unification with emoji. For example, they treat "regional" vendor variations in emoji as a font issue. If you get a message with the gun emoji, unless you have out-of-band information regarding which vendor variant is intended, there's no way in software to know if it should be displayed as a water gun (Apple "regional" variant) or a weapon (other vendor variants). Which is not that different from a common problem stemming from Han unification.


I don't disagree, but my point is more than their concern was about having "too many characters" in Unicode, which no longer seems to be a real concern, so what would be the harm of adding national variants?


Have skin tone variants (which is somethine Unicode chose to add rather than added because of existing use) is consistent with not have distinct variants for glyphs from different languages?


Han unification was a try to fit CJK characters into 16bit BMP. Finally BMP is failed so meaningless but reverting it also produces huge compatibility issue.


Of course, the old characters must be left alone. But I'm not seeing what stops them from introducing new ones.


New characters have same glyph as old characters. It's the nightmare. For example, I can't find old one by searching new one. It's hard to know the reason for normal people. Should all software support searching by both characters? I don't expect all western developer take care. Equality comparison also fails without special support.


That is a bad exuse since it would preclude adding any new characters for existing languages. Would you have made the same objection for U+1E9E "ẞ", which was added in 2008?

Also, equality comparison already requires special support, e.g. normalization before comparison.

Sure, there would be an period where software support is incomplete but that is a bad reason to keep things broken forever.


It doesn't seem unfeasible to make a search that would support both.


It's possible to implement, but it makes confusion than benefit until all existing software support it.


> were replaced with "equivalent" Greek or Cyrillic one

The subset of equivalent letters, or different ones? If they looked the same, it wouldn't bother me if the letters in the center were a single codepoint between European languages:

https://upload.wikimedia.org/wikipedia/commons/8/84/Venn_dia...


I am disappointed that that diagram omits ꙮ [0]

[0] https://en.wikipedia.org/wiki/Multiocular_O


The problem is they don't look the same. So imagine, for instance, Я instead of "R" or И instead of "N" (I don't think the sounds are actually equivalent but let's run with it for the sake of example). Not insurmountable. One could still read a text with these substitutions. But it'd be distracting, and extra detrimental for people who don't speak English as their first language.


The ones in the center are in all three sets, they do look the same. The outer areas are out of bounds.


It doesn't make sense but there's also no way to fix it now. Once the Han characters were unified, there's no non-trivial way to ununify them.


To an extent that's true, but introducing national variant characters in addition to the unified ones would at least allow careful writers to avoid the problem.


Exactly, this is not rocket science: Introduce variantes of the affected characters in unicode (either variant selectors or distinct codepoints, doesn't matter too much but variant selectors could allow falling back to the old context-based detection). Then wait for software to be updated to use the variants based on the input language. This allows the writer to verify the variant used which will then be the same in all contexts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: