Ronaldo Vitto Lewerissa

Software engineering learning documentation.

Encoding in 250 words

Computer speak in 1 and 0 (ones and zeroes), commonly referred to as binaries.

To produce character like A, 3, or ø it needs to know how to map each of those letters to binaries. Same situation applies when computer want to interpret data from binaries.

It may interpret a byte of 01000001 as A or perhaps B, depends on who sets up the rule in the first place.

Encoding basically a set of rule on mapping letters to binaries, and vice versa.

Know that a sequence of binary can be converted into to number (base 10) or hexadecimal (base 16). So keep it mind that a byte of 01000001 is equivalent to 65, and equivalent to 0x41 (0x indicates the that it’s a hex).

ASCII

One of the earliest encoding known is ASCII/ANSI.

They map decimal values from 0 to 127 to Western alphabets and control codes (tab, escape, backspace, etc) .

Mapping numerical values from 0-127 only takes up 7 bit of space. It does not specify numerical values from 128 to 255. So basically it wastes a bit for every byte. And it does not define characters for the rest of the world.

Unicode

Unicode isn’t necessarily an encoding but it does provide an interesting idea about code points.

All characters in the whole world is mapped to a form of U+WXYZ, whereas W, X, Y, and Z, are all hexadecimal values, which are able to hold numerical values 0-65535.

Now, there are several encoding to map this code points to real binaries:

  • UTF-7

  • UTF-8

  • UTF-16

  • UTF-32

Further readings:

Related concepts:

Written by Ronaldo Vitto Lewerissa

Read more posts by this author.