Encoding and encryption are both routines performed on data; however
the end results are quite different. In the case of encryption the purpose is
to disguise the data such that it can’t be read, except by the intended
recipient. On the other hand, encoding is used merely to work the data into a
more suitable format. Sometimes these methods are used in conjunction, as we’ll
see later in this article, but frequently developers mistakenly substitute
encryption with encoding, which can cause some very serious security issues.
While there are many different encoding algorithms, one of the most
widely used in web development is base64. As the name suggests, base64 maps
6-bit blocks of binary data into 64 different character representations. The
phrase “hello world!” in base64 encoding appears as “aGVsbG8gd29ybGQh,” a
somewhat random looking set of characters. However, if we examine the string in
more detail we see right away that a very limited set of characters are in use,
and applying base64 decoding gives the original “hello world!” message back.
A typical application for encoding is transmitting binary data across
the Internet. If not encoded, the binary data will likely become corrupt. This
is because some systems may interpret the data differently. To ensure this
doesn’t happen we can encode the data before sending it, and decode it upon
arrival.
It’s important to point out that encoding should never be used in place
of encryption. The reason for this is due to the very nature of encoding, which
allows data to be easily converted from one representation to another. Only the
algorithm is needed, no key is required. To an attacker, it’s like coming
across a front door with a dozen knobs and no lock. The only thing standing
between them and what’s inside is finding the right knob to turn.
When storing or transmitting sensitive information encryption should
always be used. As with encoding algorithms there are many different encryption
algorithms (ciphers), perhaps even more than the former. It’s worth noting that
ciphers typically have very short life spans, and while popular ciphers in use
today have withstood rigorous attacks, it’s likely that will not always be the
case.
What makes a good cipher is high entropy. The more random a string
appears, the more difficult it is to crack. Because of this many encrypted
strings contain unreadable characters, which can often be lost or corrupt when
transmitting or reading. To prevent that from happening we can encode the
encrypted string into a readable format before storing or transmitting.
Anyway, let’s take a look at an example. Our “hello world!” phrase,
encrypted using the AES cipher looks like “R4OuDkO7P5Z6fLHzpuC8ZQ==” in base64
encoding, or “4783ae0e43bb3f967a7cb1f3a6e0bc65” in hexadecimal (I’ve left out
the plain-text representation here because it contains almost no readable
characters). The only way to convert this data back into its original form is
to decrypt it using the same algorithm and key that was used to encrypt it.
Since the key is kept private, an attacker will have a very difficult time
recovering the plain text even if he knows the algorithm used.
It’s easy to mistake an encoded string with an encrypted string,
especially if we assume an attacker has no idea what encoding is being used.
For example, if we take our base64 encoded “hello world!” string and reverse it
we get “hQGby92dg8GbsVGa.” An attacker may recognize this as a base64 encoded
string, and it is a valid one, however, simply decoding this without first
reversing the string will result in what appears to be a random collection of
characters. While this may deter some attackers, it’s absolutely no substitute
for encryption.