Use MySQL utf8mb4 if you want full Unicode support

MySQL’s utf8 is broken

MySQL really made a mess here. What they are calling utf8 really isn’t. Hidden away in the MySQL manual we can read this:

“The character set named utf8 uses a maximum of three bytes per character and contains only BMP characters.”

Loosely translated: MySQL utf8 is broken. Don’t use it.

MySQL-logo


As I explained in a previous post, utf-8 characters contain max. 4 bytes per character. As MySQL encode only three, it can’t encode any characters outside of the Basic Multilingual Plane.

The problem is that the name `utf8` for the broken encoding is so misleading that hardly any developer will think twice about it, unless it’s pointed out to them. Which is why I wrote this post. So that I won’t have to type this in again and again in all the forums and stack overflow posts where people are confused by this.

Fixing MySQL’s Unicode support

Fortunately for us it’s possible (and relatively easy) to fix this situation. Just remember this:

MySQL’s utf8 is really not utf8. It’s fake utf8. MySQL calls the real utf8 `utf8mb4`

So just use `utf8mb4` whenever you need utf8 and you’ll be fine.
Read more about it on Mathias Bynens’ blog and Joni Salonen’s blog.

One response to “Use MySQL utf8mb4 if you want full Unicode support

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s