What I'd like to know is, given the explosion of the character set for emoji, do...

AlanYx · on Jan 27, 2022

I agree that Han unification was an unfortunate design decision, but I'd argue that the consortium is following a consistent approach to the Han unification with emoji. For example, they treat "regional" vendor variations in emoji as a font issue. If you get a message with the gun emoji, unless you have out-of-band information regarding which vendor variant is intended, there's no way in software to know if it should be displayed as a water gun (Apple "regional" variant) or a weapon (other vendor variants). Which is not that different from a common problem stemming from Han unification.

emodendroket · on Jan 27, 2022

I don't disagree, but my point is more than their concern was about having "too many characters" in Unicode, which no longer seems to be a real concern, so what would be the harm of adding national variants?

account42 · on Jan 28, 2022

Have skin tone variants (which is somethine Unicode chose to add rather than added because of existing use) is consistent with not have distinct variants for glyphs from different languages?

fomine3 · on Jan 28, 2022

Han unification was a try to fit CJK characters into 16bit BMP. Finally BMP is failed so meaningless but reverting it also produces huge compatibility issue.

emodendroket · on Jan 28, 2022

Of course, the old characters must be left alone. But I'm not seeing what stops them from introducing new ones.

fomine3 · on Jan 28, 2022

New characters have same glyph as old characters. It's the nightmare. For example, I can't find old one by searching new one. It's hard to know the reason for normal people. Should all software support searching by both characters? I don't expect all western developer take care. Equality comparison also fails without special support.

account42 · on Jan 28, 2022

That is a bad exuse since it would preclude adding any new characters for existing languages. Would you have made the same objection for U+1E9E "ẞ", which was added in 2008?

Also, equality comparison already requires special support, e.g. normalization before comparison.

Sure, there would be an period where software support is incomplete but that is a bad reason to keep things broken forever.

emodendroket · on Jan 28, 2022

It doesn't seem unfeasible to make a search that would support both.

fomine3 · on Jan 28, 2022

It's possible to implement, but it makes confusion than benefit until all existing software support it.

digisign · on Jan 27, 2022

> were replaced with "equivalent" Greek or Cyrillic one

The subset of equivalent letters, or different ones? If they looked the same, it wouldn't bother me if the letters in the center were a single codepoint between European languages:

https://upload.wikimedia.org/wikipedia/commons/8/84/Venn_dia...

account42 · on Jan 28, 2022

I am disappointed that that diagram omits ꙮ [0]

[0] https://en.wikipedia.org/wiki/Multiocular_O

emodendroket · on Jan 28, 2022

The problem is they don't look the same. So imagine, for instance, Я instead of "R" or И instead of "N" (I don't think the sounds are actually equivalent but let's run with it for the sake of example). Not insurmountable. One could still read a text with these substitutions. But it'd be distracting, and extra detrimental for people who don't speak English as their first language.

digisign · on Jan 31, 2022

The ones in the center are in all three sets, they do look the same. The outer areas are out of bounds.

shalmanese · on Jan 28, 2022

It doesn't make sense but there's also no way to fix it now. Once the Han characters were unified, there's no non-trivial way to ununify them.

emodendroket · on Jan 28, 2022

To an extent that's true, but introducing national variant characters in addition to the unified ones would at least allow careful writers to avoid the problem.

account42 · on Jan 28, 2022

Exactly, this is not rocket science: Introduce variantes of the affected characters in unicode (either variant selectors or distinct codepoints, doesn't matter too much but variant selectors could allow falling back to the old context-based detection). Then wait for software to be updated to use the variants based on the input language. This allows the writer to verify the variant used which will then be the same in all contexts.