10 specifications, 1 metadata text encoding – Sonnox Fraunhofer Codec Toolbox User Manual

Page 34

Advertising
background image

back to contents

10 Specifications

10.1

Metadata Text Encoding

Text encoding formats are a way of describing the internal byte representation of individual
characters. The ID3 and iTunes metadata formats strictly define which text encodings are
supported, so that any text tags (e.g. Title, Artist) can be read and written by different applications.
This section describes how the Manager handles metadata text encoding for the different metadata
formats supported by the Manager.

There are 3 different types of text encoding defined in the ID3 and iTunes metadata standards:

Latin–1

This is an extension of ASCII. It is very space efficient, storing only 1 byte per character;
however it has very limited character support.

UTF–16

Was a standard introduced in 1990 to address universal character support. It is far more
comprehensive than latin–1, but is also significantly less space efficient.

UTF–8

This is a later revision of UTF–16. It supports the same number of characters as UTF–16,
but has the benefit of being far more space efficient. UTF–8 is generally accepted as the de
facto text encoding standard, and is what the Manager uses to store metadata tags
internally.

The Manager always writes iTunes metadata text tags using UTF–8 encoding. The handling of ID3
text encoding, however, is more subtle, as it supports Latin–1, UTF–16 and UTF–8 text encoding
formats. It is important to note that UTF–8 support is only available in ID3v2.4.

To ensure maximum space efficiency, all text tags less than or equal to ID3v2.3 are written with
Latin–1 encoding. If the characters in the tag are not in the Latin–1 subset, the tag is written as
UTF–16. Files with ID3v2.4 text tags will always have the text encoding fixed to UTF–8.

32

Advertising