Member-only story
ASCII, Unicode, Charset. English, please?
It is easy to forget that a computer cannot store videos, photos or even numbers as it is. The only thing it understands is bits. A bit is either a 0 or 1, yes or no, or true or false. Basically, all it understands is a blip of electricity or no electricity. But, we humans don’t understand a blipping of electricity! Encoding scheme to the rescue. It acts as a translator between humans and computer.
01100010 01101001 01110100 01110011
b i t s
This translator or the encoding scheme has decided that blips of 01100010 means b. The above encoding scheme is named ASCII! ASCII helps translate 95 human characters which include A-Z, a-z and numbers 0–9 along-with some spaces, tab and backspaces. Now, that solves the problem! So, we now know that ASCII is the translator which decides the conversion rules between human and computer and charset is the set of human created characters the translator can handle.
That’s cool. Thanks, ASCII for your help. But wait, what about other languages like German, French, Marathi, Korean. Hindi? There is no translator for that. So, human’s can’t communicate to the computer using these languages. Not fair! So, the world ended up with thousands of encoding schemes and standards.
Finally, some frustrated engineer decided to solve this mess and came out with a plan to unify these encoding standards. This plan is named Unicode! It is a large table of codepoints which point to characters. In layman terms it means 65 stands for A, 66 for B and 9000 for something. Now, how these characters are encoded is a different conversation altogether. To give an overview, some use 32 bits encoding called UTF-32. But someone said why use 32 bits when some characters just use 8 bits? There came variable length encoding UTF-16 and UTF-8.
To summarize, a character can be encoded in different bit sequences and any particular bit sequence can represent many different characters, depending on the encoding scheme.