Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Its funny that zero width space is considered weird and twitter fails on it. Its quite common in my language (Persian).


From what I remember (can't test right now), a zero-width space is okay as long as there are other (printable) characters in there too. This seems reasonable, because allowing a tweet to be a single zero-width space would make it appear be empty and probably lead to some confusing display issues.

I'm pretty sure I've used it to "end" a hashtag early, like in this made-up example:

    I've eaten two #banana<ZWS>s today!
In my language, the possessive form doesn't take an apostrophe ("Alices Adventures" instead of "Alice's"), so for hashtags and user names it can be desirable to use the ZWS as an invisible apostrophe.


    > Its quite common in my
    > language
I'd love to hear more details on why?


Ligatures. It's easy to not notice them at all in english &#64258; (fl) vs. fl [fly vs. fly] but some languages use them very extensively and the combinations are more significant.

https://en.wikipedia.org/wiki/Zero-width_non-joiner

* It's entirely possible that the browser you're using isn't doing a very good job with ligatures which explains the strange look of my examples


ZWJ and ZWNJ are also common in Indic scripts. It's basically used to control the appearance of glyphs, for example half-forms and consonant clusters (क्‍ष vs क्ष, both are kṣa). As usual, wikipedia has good examples. The Unicode Standard also contains details about these.

ZW[N]J as a standalone character or at the beginning of a word is very unusual on a day-to-day basis, so it's understandable that Twitter fails to recognize this pattern.

¹ https://en.wikipedia.org/wiki/ZWJ

² https://en.wikipedia.org/wiki/ZWNJ


Ah ha!

    > When a ZWJ is placed between two
    > emoji characters, it can also result
    > in a new form being shown, such as
    > the family emoji, made up of two adult
    > emoji and one or two child emoji
That makes a lot of sense too, and I hadn't put sufficient work into how that's implemented -- retrospectively that makes perfect sense.


I noticed that on new Emojis on my MacBook. Some of the new emojis like ‍ are rendered as "guy behind a MacBook" on my PC but on phones without the emoji as "guy emoji" and "computer emoji".

Same for ‍️ (male version of raise hand). On phones without the Emoji, it's just "male emoji" and "female raise hand emoji".

/e: oh, HN is stripping Emojis


This made me wonder if anyone had tried combining word2vec with emojis, and then I came across this:

https://github.com/uclmr/emoji2ve


which is a dead link



Apologies, and thanks!


Not OP, but in Norwegian the correct way to write "Tom's car" is "Tom sin bil", the car of Tom. But the creep of English and laziness allows for "Toms car", esp. in informal writing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: