WordPress 2.2 Unicode problem: diagnosis and fixes

Posted On 24/09/2007

Filed under Techtips

Comments Dropped leave a response

I found the culprits that cause scramble Unicode in WordPress 2.2. when displaying Asian character.

In short, it is NOT the fault of WordPress code. It is something to do with transition problem cause by hosting provider.

Quick and dirty fix

  1. Open wp-config.php and look for the following line.
    define(‘DB_CHARSET’, ‘utf8’);
    define(‘DB_COLLATE’, ”);
  2. Either delete the two line(not recommended) or empty the field value as shown
    define(‘DB_CHARSET’, ”);
    define(‘DB_COLLATE’, ”);
  3. Save the file and write a post include special character and save it to check the outcome

————
What cause the piece of WordPress code not working
If you looking into the above “fix”, you will learn that we are empty out the utf8 parameter. I think many people are curious : isn’t it WordPress are suppose to be Unicode enabled?

Actually, the problem lies on the MySQL database provided by hosting provider. Anyone using a web hosting services should learn that, the MySQL database given to you is only a subset of a main MySQL database. This mean you cannot control the Internal character encoding method without manually go into MySQL and set it the correct encoding yourself.

For example, in my case, the my default MySQL database tables are using latin1 character set, collations set to latin1_swedish_ci. However, MySQL has been setup in a way that it will take UTF8 Unicode character.

This is unexpected by WordPress developer when they “assume” all hosting provider create MySQL database and make UTF8 the default character set for encoding and storage.

The relation among displayable character, Character encoding/decoding and character storage are not difficult to explain.

The best real life example of such computing world of handling International language are SMS language, for example, the word PLZ.

  1. Character encoding/decoding. If you know SMS language, you will read PKZ as “please”, instead of the puzzling meaningless “Pee-L-Zee”.
  2. Character Storage. For example, the storage medium to write down the word PLZ in a paper notepad is different than key in the phone storage.

(Actually the best example are Pigeon English)

Just think of you jumble up any of the the above step, you can never understand the word “PLZ”.

In the WordPress case , the developer has set the “encoding/decoding” as “UTF8″, which is a total different setting than the MySQL structure created by the Web Hosting site.

Since WordPress will retrieve the default MySQL character encoding/decoding set if we set the database character set value to null :
define(‘DB_CHARSET’, ”);

Thus, it fix the Unicode character scramble problem.

If you are adventurous user like me, you can always dig out the correct character set of the database table and set it yourself.

For example, because my Web hosting company has fix latin1 as my WordPress MySQL default character set, and latin1_swedish_ci as the default collation, by setting the following in my wordpress wp-config.php
define(‘DB_CHARSET’, ‘latin1’);
define(‘DB_COLLATE’, ‘latin1_swedish_ci’);

It will display and store UTF8 character without a problem. The results is exactly the same as
define(‘DB_CHARSET’, ”);
define(‘DB_COLLATE’, ”);

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s