1 overview, 2 multi-byte encoding schemes, 1 fixed-width encoding schemes – Oracle Audio Technologies Application 9i User Manual

Page 90: 2 variable-width encoding schemes, Section 7.1, "overview, Section 7.2, "multi-byte encoding schemes, 1 overview, 2 multi-byte encoding schemes

background image



Oracle9i Application Server Wireless Edition Configuration Guide

7.1 Overview

This release of Wireless Edition supports single-byte, multi-byte, and fixed-width
encoding schemes which are based on national, international, and vendor-specific

If the character set is single byte, and that character set includes only composite
characters, the number of characters and the number of bytes are the same. If the
character set is multi-byte, there is generally no such correspondence between the
number of characters and the number of bytes. A character can consist of one or
more bytes, depending on the specific multi-byte encoding scheme.

A typical situation is when character elements are combined to form a single
character. For example, in the Thai language, up to three separate character
elements can be combined to form one character, and one Thai character would
require up to 3 bytes when TH8TISASCII or another single-byte Thai character set is
used. One Thai character would require up to 9 bytes when the UTF8 character set
is used.

7.2 Multi-byte Encoding Schemes

Multi-byte encoding schemes are needed to support ideographic scripts used in
Asian languages like Chinese or Japanese since these languages use thousands of
characters. These schemes use either a fixed number of bytes to represent a
character or a variable number of bytes per character.

7.2.1 Fixed-width Encoding Schemes

In a fixed-width Multi-byte encoding scheme, each character is represented by a
fixed number of n bytes, where n is greater than or equal to two.

7.2.2 Variable-width Encoding Schemes

A variable-width encoding scheme uses one or more bytes to represent a single
character. Some Multi-byte encoding schemes use certain bits to indicate the
number of bytes that represent a character. For example, if two bytes is the
maximum number of bytes used to represent a character, the most significant bit
can be toggled to indicate whether that byte is part of a single-byte character or the
first byte of a double-byte character. In other schemes, control codes differentiate
single-byte from double-byte characters. Another possibility is that a shift-out code
is used to indicate that the subsequent bytes are double-byte characters until a
shift-in code is encountered.