Data encryption has become a sad necessity for responsible data managers. However cryptography is jargon-heavy even by the discouraging standards of the IT world – symmetric and asymmetric cryptosystems, public versus private keys, digital signatures, hash algorithms, RSA, DES, Rijndael, PGP, MD5, SHA-1, https, secure sockets, Camellia, IDEA; what does it all mean? What are the differences? Relative advantages and disadvantages? Hopefully this article will clear some of the fog.

**Substitution ciphers**

Although we tend to use the words ‘code’ and ‘cipher’ interchangeably, technically they're two entirely different things. When you substitute each letter in a message for a different symbol that's a cipher. A code on the other hand means assigning a secret meaning to a word or phrase.

For example, if "The birds are flying south" means "Flee! The police are on to us!" that's a code. But the simple schoolboy “code”, 1 = 'A', 2 = 'B' etc. (invented, legend has it, by Julius Caesar), is a cipher, a substitution cipher in fact. So the ASCII “code” is actually a kind of cipher for example.

Substitution ciphers were good enough for nearly two thousand years but they eventually failed in the face of improving technology. The Enigma cipher used by the German military during World War II is a substitution cipher and it was broken long before the days of computers, both by the Poles and more famously the British. Don’t, however, dismiss substitution ciphers. If you just want to deter prying eyes a substitution cipher using multiple substitutions and several different substitutions schemes offers a reasonable level of encryption for virtually no computational effort. (This is the way Enigma works and after all, it did take Alan Turing to break it).

**Random substitution**

There is a form of substitution cipher that does offer excellent security. By making random substitutions, the patterns that make other substitution ciphers vulnerable are eliminated. The simplest form of random substitution is to break a message into blocks of 4 bytes (4 ASCII characters) and add a pseudo-random number to each block. To recover the plaintext (the input message) subtract the same series of pseudo-random numbers from the ciphertext (the encrypted message). The key is simply the seed value for the pseudo-random number generator.

This form of encryption is widely used (the file encryption options offered by word processors often use this method) as it’s simple, highly reliable and will defeat all but the most determined and skillful attackers.

**Symmetric cryptosystems: DES and AES**

If, however, you need higher levels of security there are plenty of alternatives. Block ciphers break the message into fixed-length blocks, then each block of plaintext is converted into a block of ciphertext using a sequence of arithmetic operations and/or substitutions. The best known of these is DES (the Data Encryption Standard), developed by the National Institute of Standards and Technology in the US.

DES uses 64-bit blocks with a 64-bit key (although only 56 bits are significant; the other 8 are parity bits). The bits within a block are shuffled and XOR’ed with the key in a sequence of 16 substitutions called “rounds”, to create the ciphertext. Applying the same process (with the same key) to the ciphertext restores the original plaintext, so the process is symmetric. It has been very widely used, by the US government and commercial organisations around the world, including many financial institutions. It is easy to code (and there are good public domain implementations) and as it only involves bit-shifting operations combined with a few small look-up tables it doesn’t impose too much of a computational load.

Unfortunately, while it was secure enough in 1976 when it was introduced, advances in computer hardware mean the relatively short key is now vulnerable to a brute-force attack. A decent supercomputer or Beowulf cluster could check all possible DES keys in just a few days and the time can’t be far off when even a desktop PC will be enough to crack any DES-encrypted message.

Consequently the National Institute of Standards and Technology (NIST) no longer recommends DES and has instead proposed a successor, AES (Advanced Encryption Standard, also known as Rijndael – pronounced "rein-dahl" – from the names of its two inventors). AES is similar to DES in principle but uses much longer keys (128, 192 or 256 bits) and is specifically designed to resist the most sophisticated cryptographic attacks, methods such as timing analysis (looking for correlations between a plaintext and the time taken to encrypt it) and power analysis (looking for variations in the processor power requirements for encrypting different plaintexts). It has very low memory requirements so is particularly suited for embedded applications such as smart cards.

DES and AES are by no means the end of the story as far as symmetric encryption systems go. Microsoft uses a proprietary symmetric encryption system for Windows XP key validation and other systems you may encounter include RC4, RC6 and IDEA, while the European Union’s cryptography committee, NESSIE (New European Schemes for Signatures, Integrity and Encryption), recommends the Japanese Camellia cipher as an alternative to AES.

**Asymmetric cryptosystems and RSA**

Symmetric block ciphers such as DES and AES can provide very high levels of security. However they have one obvious weakness, in that both sender and receiver must share the key, yet keep it secret from anyone else. This poses a particular problem for Internet commerce since a secret key would no longer be secret if it were sent over the Internet, and if it was embedded in a browser it could be discovered by reverse-engineering the program. On the other hand without some way to encrypt web traffic, sensitive details such as credit card numbers would be available to anyone with the slightest knowledge of TCP/IP.

The solution lies in an ingenious group of ciphers known as asymmetric or public key/private key systems. In asymmetric systems the key used to encrypt a message is not the same as that used to decrypt it. If a message has been encrypted using one key of a pair it cannot be decrypted even by someone else who has that key (crucially, knowing one key doesn’t provide knowledge of the other). Only the matching key of the pair can be used for decryption.

This seems rather extraordinary – almost magical on the face of it. If you know both the encryption algorithm and the key, how is it possible that the encryption process can’t simply be reversed to recover the original message? Nonetheless, that’s exactly how it is. Given a pair of keys, a message encrypted with one can only be decrypted with the other and vice-versa.

There are a number of asymmetric key systems but the best known and most widely used is RSA, named for its (three) co-inventors. Originally patented, the patent expired in September 2000 and the algorithm is now in the public domain. The Secure Sockets Layer used for secure communications on the Internet uses RSA (the https protocol is simply http over SSL).

Unfortunately, nothing in life is free, and so it is with asymmetric cryptosystems. Since d can be computed from e given p and q, and p and q are the factors of N, they must be chosen so large that N cannot be factorised in any reasonable time. As computer power has grown so too has the ability to compute the factors of very large numbers. Current hardware means key lengths should be 1024 bits for complete security.

Raising such very large numbers to very large exponents is computationally much more demanding than the bit shifting and XOR'ing of symmetric cryptosystems such as DES, so asymmetric encryption is really only practical for short messages. A common workaround when encrypting long messages is to use RSA to encrypt a short preamble containing a DES or AES key selected at random, then send the main body of the message encrypted with that key. A recipient with the corresponding private key can decrypt the preamble and use the key it contains to decipher the rest of the message. Modern web browsers use exactly this method to conduct secure communications.

**Digital signatures**

An increasingly important use for asymmetric encryption is digital signing. A digital signature is the reverse of public key encryption. Just like an ordinary signature it is used to prove the identity of the sender of a message. This can happen in several ways. The simplest is to send a random message as both plaintext and ciphertext. The recipient deciphers the ciphertext version using the published public key and if the two versions match it proves the sender was in possession of the private key.

One drawback of this form of signature is that it only verifies itself, not any message to which it is attached. An alternative form uses a redundancy or hash function to create a message digest from a message in order to verify the source and reliability of the message. Suppose A and B (the Alice and Bob so beloved of cryptographers) want to exchange messages. A encrypts a message using B’s public key and appends the hashed value of the message encrypted with her own private key. On receiving the message B deciphers it using his private key, and also deciphers the accompanying message digest using A’s public key. If it matches the hash value he computes from the message he received, he can conclude that:

(a) The message originated from A (the only person who could have encrypted the digest correctly) and

(b) The message has not been altered in transit.

The RSA algorithm has become the standard for digital signature applications and the company founded by the co-inventors of RSA, RSA Data Security Inc, has developed (and published) a number of improvements on this basic scheme. Digital signatures on the Web are based on RSA, as is the popular PGP – Pretty Good Privacy – package.

Find your next job with techworld jobs