Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The Unicode solution chosen by Python 3 required a major backwards incompatibility, but that wasn't the only possible solution to Python 2's Unicode problems. The biggest problem in Python 2 was simply that the implicit conversion between bytes and characters used the ascii codec and would fail if any non-ASCII characters were found. Python 3 chose to solve this problem by forcing the developer to specify a codec explicitly whenever this conversion happened. If instead they had simply changed the default codec from ascii to utf-8, many of the ubiquitous Unicode-related bugs in Python 2 code would have simply been fixed in a mostly backwards-compatible way. This wouldn't have fixed the problems for people who commonly use non-UTF-8 encodings, but that seems to be rare and getting rarer (in my experience; my understanding is that the biggest remaining exception is the use of UTF-16 on Windows).


This! Exactly this. The stupid decision to use ascii as the default codec (and to disobey the system locale while doing so!), combined with silent Unicode coercion, doomed Python 2 to a lifetime of i18n pain. Just changing those two behaviors would have provided a much more evolutionary path forward without opening the Python 2 to 3 chasm.


This needs more upvotes. You're exactly right.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: