Collation Comparison and Ordering Rules

Started by sukishan, Aug 18, 2009, 06:44 PM

Previous topic - Next topic

sukishan

Collation Comparison and Ordering Rules
Most of the comparison and ordering rules defined in a collation are governed by the dictionary definition of the correct sequence of characters for the alphabet or language. The attributes you can control are whether comparisons and sorts of character and Unicode data should be:

* Based on the dictionary conventions that define the correct sequence of characters in the language or alphabet associated with the collation, or based on the sequence of the binary bit patterns representing the different characters.

* Case-sensitive or case-insensitive. For example, defining whether 'a' is equal or not equal to 'A'. If you choose case-insensitive, comparisons always ignore case, so the uppercase version of a character evaluates to being equal to the lowercase version of the character. When you choose case-insensitivity, the relative sequence in which uppercase and lowercase are sorted is undefined unless you also specify uppercase preference. Uppercase preference affects only sort operations and specifies that uppercase versions of a character come earlier in the sort sequence than lowercase versions of the same character. Uppercase preference has no affect on comparisons, so 'A' still evaluates to being equal to 'a' when uppercase preference is on. Uppercase preference can be specified only in SQL collations, not in Windows collations.


* Sensitive or insensitive to accented characters, also known as extended characters. Accented characters are those characters that have a diacritical mark, such as the German umlaut (ë) or the Spanish tilde (~). For example, accent sensitivity defines whether 'a' is equal or not equal to 'ä'.

When you choose a collation, you can specify if you want binary behavior, or dictionary sorting that is sensitive or insensitive to case and accents:

* In binary collations, comparisons and sorting are based strictly on the bit pattern of the characters. This is the fastest option. Because uppercase characters are stored with different bit patterns than their corresponding lowercase characters, and accented characters have different bit patterns than characters without accents, binary sort orders are always case-sensitive and accent sensitive. Binary collations also ignore dictionary sequences that have been defined for specific languages. They simply order the characters based on the relative value of the bit patterns that represent each character. While the bit patterns defined for Latin characters, such as 'A' or 'z', are such that binary sorting yields the correct results, the bit patterns for some extended characters in some code pages may be different than the ordering sequence defined in dictionaries for the language associated with a collation. This can lead to occasional ordering and comparison results that are different than what a speaker of the language might expect.

* If you do not specify a binary collation, SQL Server uses the dictionary ordering of the collation you have chosen. Dictionary order means characters are not sorted or compared based only on their bit patterns. The collation follows the conventions of the associated language regarding the proper sequence for characters. For example, case-insensitive sort orders must use dictionary rules to determine which lowercase and uppercase bit patterns are equal.
Although the bit patterns in a code page generally yield the correct comparison and ordering results for any language that uses the code page, the conventions for some of the languages may require different results than are generated for the bit patterns of a small number of characters. For example, the Czech, Hungarian, and Polish collations use the same code page, 1250, which was designed for the Slavic languages. Each of these languages, however, use slightly different conventions for the sequence in which accented characters should be sorted.

If you do not specify binary sorting, all SQL Server operations follow the dictionary conventions for sorting and comparing characters. When the dictionary order is used, you can specify whether you want the collation to be sensitive or insensitive to both case and accented characters.

Case-sensitivity applies to SQL identifiers and passwords as well as to data. If you specify a binary or case-sensitive default sort order for an instance of SQL Server or database, all references to objects must use the same case with which they were created. For example, consider this table:

CREATE TABLE MyTable (PrimaryKey int PRIMARY KEY, CharColumn nchar(10))
If the CREATE TABLE statement is executed on an instance of SQL Server or database that has a case-sensitive or binary sort order, all references to the table must use the same case that was specified in the CREATE TABLE statement:

-- Object not found error because case is not correct:
SELECT * FROM MYTABLE
-- Invalid column name error because case is not correct
-- for the WHERE clause reference to the PrimaryKey column.
SELECT *
FROM MyTable
WHERE PRIMARYKEY = 123
-- Correct statement:
SELECT CharColumn
FROM MyTable
WHERE PrimaryKey = 123
A good beginning makes a good ending