Collations

sukishan · Aug 18, 2009, 06:39 PM

physical storage of character strings in Microsoft® SQL Server™ 2000 is controlled by collations. A collation specifies the bit patterns that represent each character and the rules by which characters are sorted and compared.

SQL Server 2000 supports objects that have different collations being stored in a single database. Separate SQL Server 2000 collations can be specified down to the level of columns. Each column in a table can be assigned different collations. Earlier versions of SQL Server support only one collation for each instance of SQL Server. All databases and database objects created in an instance of SQL Server 7.0 or earlier have the same collation.

How Character Data Is Stored
In a computer, characters are represented by different patterns of bits being either ON or OFF. There are 8 bits in a byte, and the 8 bits can be turned ON and OFF in 256 different patterns. A program that uses 1 byte to store each character can therefore represent up to 256 different characters by assigning a character to each of the bit patterns. There are 16 bits in 2 bytes, and 16 bits can be turned ON and OFF in 65,536 unique patterns. A program that uses 2 bytes to represent each character can represent up to 65,536 characters.

Single-byte code pages are definitions of the characters mapped to each of the 256 bit patterns possible in a byte. Code pages define bit patterns for uppercase and lowercase characters, digits, symbols, and special characters such as !, @, #, or %. Each European language, such as German or Spanish, has its own single-byte code page. Although the bit patterns used to represent the Latin alphabet characters A through Z are the same for all the code pages, the bit patterns used to represent accented characters such as 'é' and 'á' vary from one code page to the next. If data is exchanged between computers running different code pages, all character data must be converted from the code page of the sending computer to the code page of the receiving computer. If the source data has extended characters that are not defined in the code page of the receiving computer, data is lost. When a database serves clients from many different countries/regions, it is difficult to pick a code page for the database that contains all the extended characters required by all the client computers. Also, there is a lot of processing time spent doing the constant conversions from one code page to another.

Single-byte character sets are also inadequate to store all the characters used by many languages. For example, some Asian languages have thousands of characters, so must use two bytes per character. Double-byte character sets have been defined for these languages. Still, each of these languages have their own code page, and there are difficulties in transferring data from a computer running one double-byte code page to a computer running another.

Discussion Community

News:

Collations

sukishan