UTF-8


UTF-8 (Unicode Transformation Format - 8-bit)

Encoding

Some of the challenges UTF-8 wanted to solve:

  1. In English it's necessary to get hid of all the zeros an ASCII character has when represented in UTF-32.
  2. Old computers interpret eight zeros in a row (NULL character) as this is the end of a string.
  3. Be backwards compatible.

For others: Two bytes

Three bytes