SQL and PL/SQL Programming with Unicode |
SQL is the fundamental language with which all programs and users access data in an Oracle database either directly or indirectly. PL/SQL is a procedural language that combines the data manipulating power of SQL with the data processing power of procedural languages. Both SQL and PL/SQL can be embedded in other programming languages.
Oracle Express provides products such as SQL and PL/SQL for inserting and retrieving Unicode data. Data is transparently converted between the database and client programs, which ensures that client programs are independent of the database character set and national character set. In addition, client programs are sometimes even independent of the character data type, such as NCHAR or CHAR, used in the database.
The PL/SQL and SQL engines process PL/SQL programs and SQL statements on behalf of client-side programs such as server-side PL/SQL stored procedures. They allow PL/SQL programs to declare CHAR, VARCHAR2, NCHAR, and NVARCHAR2 variables and to access SQL CHAR and NCHAR data types in the database.
This topic describes Unicode-related features in SQL and PL/SQL that you can deploy for multilingual applications. It includes the following topics:
|
See Also:
|
Unicode is a universal encoded character set that enables you to store information in any language, using a single character set. Unicode provides a unique code value for every character, regardless of the platform, program, or language.
Unicode has the following advantages:
It simplifies character set conversion and linguistic sort functions.
It improves performance compared with native multibyte character sets.
It supports the Unicode data type based on the Unicode standard.
You can store Unicode characters in an Oracle database in two ways:
You can create a Unicode database that enables you to store UTF-8 encoded characters as SQL CHAR data types.
You can support multilingual data in specific columns by using Unicode data types. You can store Unicode characters into columns of the SQL NCHAR data types regardless of how the database character set has been defined. The NCHAR data type is an exclusively Unicode data type.
There are three SQL NCHAR data types:
When you define a table column or a PL/SQL variable as the NCHAR data type, the length is always specified as the number of characters. For example, the following statement creates a column with a maximum length of 30 characters:
CREATE TABLE table1 (column1 NCHAR(30));
The maximum number of bytes for the column is determined as follows:
For example, if the national character set is UTF8, then the maximum byte length is 30 characters times 3 bytes for each character, or 90 bytes.
The national character set, which is used for all NCHAR data types, is defined when the database is created. The national character set can be either UTF8 or AL16UTF16. The default is AL16UTF16.
The maximum column size allowed is 2000 characters when the national character set is UTF8 and 1000 when it is AL16UTF16. The actual data is subject to the maximum byte limit of 2000. The two size constraints must be satisfied at the same time. In PL/SQL, the maximum length of NCHAR data is 32767 bytes. You can define an NCHAR variable of up to 32767 characters, but the actual data cannot exceed 32767 bytes. If you insert a value that is shorter than the column length, then Oracle pads the value with blanks to whichever length is smaller: maximum character length or maximum byte length.
|
Note: UTF8 may affect performance because it is a variable-width character set. Excessive blank padding ofNCHAR fields decreases performance. Consider using the NVARCHAR data type or changing to the AL16UTF16 character set for the NCHAR data type.
|
The NVARCHAR2 data type specifies a variable length character string that uses the national character set. When you create a table with an NVARCHAR2 column, you specify the maximum number of characters for the column. Lengths for NVARCHAR2 are always in units of characters, just as for NCHAR. Oracle subsequently stores each value in the column exactly as you specify it, if the value does not exceed the column's maximum length. Oracle does not pad the string value to the maximum length.
The maximum column size allowed is 4000 characters when the national character set is UTF8 and 2000 when it is AL16UTF16. The maximum length of an NVARCHAR2 column in bytes is 4000. Both the byte limit and the character limit must be met, so the maximum number of characters that is actually allowed in an NVARCHAR2 column is the number of characters that can be written in 4000 bytes.
In PL/SQL, the maximum length for an NVARCHAR2 variable is 32767 bytes. You can define NVARCHAR2 variables up to 32767 characters, but the actual data cannot exceed 32767 bytes.
The following statement creates a table with one NVARCHAR2 column whose maximum length in characters is 2000 and maximum length in bytes is 4000.
CREATE TABLE table2 (column2 NVARCHAR2(2000));
You can input Unicode string literals in SQL and PL/SQL as follows:
Put a prefix N before a string literal that is enclosed with single quote marks. This explicitly indicates that the following string literal is an NCHAR string literal. For example, N'résumé' is an NCHAR string literal. This has the same limitation as enclosing the string literal with single quote marks, where the data can be lost during the conversion to the server's database character set. To avoid the potential loss of data, you can set the environment variable ORA_NCHAR_LITERAL_REPLACE to true. This will transparently replace the N' internally and preserve the text literal for SQL processing. By default, this environment variable is set to false to maintain backward compatibility.
Use the NCHR(n) SQL function, which returns a unit of character code in the national character set, which is AL16UTF16 or UTF8. The result of concatenating several NCHR(n) functions is NVARCHAR2 data. In this way, you can bypass the client and server character set conversions and create an NVARCHAR2 string directly. For example, NCHR(32) represents a blank character.
Because NCHR(n) is associated with the national character set, portability of the resulting value is limited to applications that run with the same national character set. If this is a concern, then use the UNISTR function to remove portability limitations.
Use the UNISTR('string') SQL function. UNISTR('string') converts a string to the national character set. To ensure portability and to preserve data, include only ASCII characters and Unicode encoding in the following form: \xxxx, where xxxx is the hexadecimal value of a character code value in UTF-16 encoding format. For example, UNISTR('G\0061ry') represents 'Gary'. The ASCII characters are converted to the database character set and then to the national character set. The Unicode encoding is converted directly to the national character set.
The last two methods can be used to encode any Unicode string literals.