Unicode/UTF-8 in your Eclipse Java projects

I love Unicode, with the heart symbol ironically replaced with the broken character (question mark) symbolASCII is dead! Long live Unicode!

You may not yet be convinced that you should abandon all legacy encodings used to encode text in all the different languages used around the world and switch to Unicode instead. If so, please first read this excellent article from Joel on Software to explain to you why. Don’t worry, I’ll be waiting here patiently for you to come back and read how to achieve the transition in your Eclipse Java projects.

Joel on software: Unicode and Character sets

Changing to Unicode? Yes, we can!

When first reading about Unicode, codepoints, character sets, character encodings and byte order marks, you might feel overwhelmed and start wondering whether this thing is even worth your while and how difficult it will be for you to convert your projects to use it. Are you really going to sell your software in Asia? Maybe you can do without it after all?

Don’t be worried. In fact, you can immediately forget about almost everything you read, especially when you work in Java, which is already using Unicode internally. It is not that difficult at all. I will go one step further and proclaim that the hardest thing about Unicode is not Unicode itself, but all the legacy encodings used in other software and files creeping into your project and breaking things. Stick to Unicode everywhere and you won’t have any problems with character encodings ever again. Promised!

Continue reading