International Data and Unicode

sukishan · Aug 18, 2009, 06:40 PM

International Data and Unicode
Storing data in multiple languages within one database is difficult to manage when using only character data and code pages. It is difficult to find one code page for the database that can store all the required language-specific characters. It is also difficult to ensure the proper translation of special characters when being read or updated by different clients running various code pages. Databases that support international clients should always use Unicode data types instead of non-Unicode data types.

For example, a database of customers in North America has to handle three major languages:

Spanish names and addresses for Mexico.

French names and addresses for Quebec.

English names and addresses for the rest of Canada and the United States.
When you use only character columns and code pages, care has to be taken to ensure the database is installed with a code page that will handle the characters of all three languages. More care must be taken to ensure the proper translation of characters from one of the languages when read by clients running a code page for another language.

With the growth of the Internet, it is becoming more important than ever before to support many client computers running different locales. It is difficult to pick a code page for character data types that will support all of the characters required by a worldwide audience.

The easiest way to manage character data in international databases is to always use the Unicode nchar, nvarchar, and ntext data types in place of their non-Unicode equivalents (char, varchar, and text). If all the applications that work with international databases also use Unicode variables instead of non-Unicode variables, character translations do not have to be performed anywhere in the system. All clients will see exactly the same characters in data as all other clients.

For systems that could use single-byte code pages, the fact that Unicode data needs twice as much storage space as non-Unicode character data is at least partially offset by eliminating the need to convert extended characters between code pages. Systems using double-byte code pages do not have this issue.

SQL Server 2000 stores all textual system catalog data in columns having Unicode data types. The names of database objects such as tables, views, and stored procedures are stored in Unicode columns. This allows applications to be developed using only Unicode, which avoids all issues with code page conversions

Discussion Community

News:

International Data and Unicode

sukishan