Encryption and hashing for Laravel developers: Part 1 - Symmetric Encryption

December 2nd, 2020

Hey all. Before I dive in, I would like to start by saying I am by no means a cryptography expert and my knowledge on this topic is very limited. I'm not presenting myself as highly skilled at cryptography and should not be giving advice on anything cryptography related. It's also important to note that you should never try to build your own encryption or hashing algorithms, but rather use battle-tested algorithms, because people smarter than you and me have come up with these. Now let's get into the today's topic!

You may know that this blog is mostly related to Laravel stuff (for now 😉). My second to last post, however, was 2 years ago. Since then, I've grown as an engineer and developer. I've broaden my knowledge in computer science and programming. That's why I want to dive into some cooler stuff, like queues, Laravel Horizon, encryption, template engine and try to demystify all of that to average Laravel developer or anyone who wants to learn more.

Today I'll start with encryption and hashing. I'll be covering what these mean, what are the differences, how secure they are, when to use each, the ideas behind them, maybe a bit of math and the actual implementation in Laravel. I know every single Laravel developer has at least once encountered hashing through the Hash facade, or the bcrypt helper — when creating a user and hashing the password. But do you know what that does? How secure is that? Even though they sound very similar (they both hide a message behind some cryptic, weird, coded message), they are not interchangeable, meaning you cannot use hashing on places where you would use encryption. So what are the differences? I'll go over these topics in several articles in this mini-series to help you understand each of them.

Today, I'll start with encryption. We'll describe what encryption is, the types of the encryption and start with symmetric encryption. In the next article, we'll take a look at the asymmetric encryption.

Encryption

Essentially, the idea behind encryption is securely sending a message through the unsecure communication channel by applying cryptographic algorithms -- encryption and decryption. The core algorithm is that we take our message m (called plaintext), use some encryption key e, apply encryption algorithm on it c = E(m, e) and get a message c (called ciphertext). We send that ciphertext through the unsecure channel to some user. User can use the decryption key d and the decryption algorithm m = D(c, d) to create an original plaintext out of the ciphertext.

Types of Encryption

If we use the same key for encryption and decryption, e = d = K, we get the symmetric encryption. Typical symmetric encryption algorithms are AES and DES.

But if we don't use the same key for both encryption and decryption, meaning e != d (we use two different keys, called public and private key), this is called the asymmetric encryption. Typical asymmetric algorithm are RSA and Diffie-Hellman.

Both of these have their pros and cons and their use-cases. Let's start with symmetric encryption.

Symmetric Encryption

As I've already mentioned, in symmetric encryption, the exact same keys are used to both encrypt and decrypt some message. We use the encryption key to encrypt, send that message (but we never send that key) to the receiver and that receiver can use the same key to decrypt the message. For anyone intercepting the message, that ciphertext will look like bunch of random characters and will have no idea what it means.

Properties

In symmetric encryption, there is a rule known as Kerckhoff's principle. That rule dictates that a cryptographic system should be secure, even if all of the details (ciphertext), except for the key, are publicly known. Meaning, if I retrieve some encrypted data from your database, I cannot decrypt that message if I don't have your encryption key. However, it's important noting that no encryption or hashing algorithm is secure for brute force attack. Every single algorithm, both for encryption and hashing, can be cracked using brute force. Well, in theory... If we encrypt using the key with the size of 16 bytes, and the attacker has 1 billion computers, and each computer can test 1 billion keys per second, it would take the attacker 10 thousand billion years to crack. Yeah... now imagine even larger keys.

There are 2 properties of symmetric cryptography systems:

Diffusion — if we change just one bit in the original message (plaintext), or just one bit in the key, the ciphertext should be changed a lot (statistically at least 50% of ciphertext should be different)
Confusion — every bit from the ciphertext should depend on more than one bit from the key, but the connection between them should be hidden

By following these properties, we can see how properly-designed encryption algorithm can be secure. Changing just one bit in the key or the plaintext will affect a lot of bits from the ciphertext.

AES

Laravel uses the AES-256-CBC cipher by default (which can be seen from the config/app.php). What does that mean? Well, the encryption algorithm is AES, 256 is the size of the key (32 bytes) and CBC is the mode (Cipher Block Chaining). We'll cover what each of these mean. Since AES is very widely used, we'll tackle that. It's worth noting that this notation, AES-256-CBC is not something Laravel has come up with, it's the standard notation.

AES (Advanced Encryption Standard) is the specification for an encryption. It's so called block cipher, meaning the plaintext message is separated into blocks of data then all of them are operated on separately. AES groups the plaintext into the blocks of 16 bytes (128 bits). So, if you have a string of length 64, when encrypting, that string will be split into 4 parts of 16 characters. Each block is converted to a 4x4 matrix that looks kinda like this:

Matrix

AES then runs bunch of operations on every block and repeats that multiple times. Those repetitions are called iterations, or rounds. There are a lot of iterations on every block (for example 14 rounds for 32 byte keys). The general overview of AES is this:

AES

In AES, you plug in some plaintext value, it runs bunch of operations, and spits out ciphertext after specific number of rounds. To decrypt the data, AES will simply run these same operations backwards. This is weird, right? At least for me it was... Even if you know the exact operations that will run, with exact values it will substitute, you still can't possibly decrypt the ciphertext unless you have the key. As you can see in the image, the essential actions that AES uses (in sequence) are (images taken from Wikipedia):

SubBytes — every byte from the matrix is substituted with a different value. This value is determined using the predefined lookup table S

ShiftRows — every row in the matrix (except first) are shifted to the left. Second row is shifted once, third twice, etc

MixColumns — this operation takes a column and left-multiplies it to some predefined matrix then replaces the entire column with the values that have been calculated.

AddRoundKey — every byte in the matrix is combined with a byte of the key using bitwise xor operation producing a new 4x4 matrix

AddRoundKey

These steps are repeated multiple times until very confusing block is generated. There are some other key operations (key expansion, etc), but we'll not cover these in this article. As you can see, we never use any random operations to manipulate the block, but rather a set of very-well defined tables and values. Are there any speed downsides to using encryption? Well, most modern Intel CPUs have a direct AES implementation in the hardware. That means that AES is very, very fast on all computers as the CPU has first-party support for that.

It's very interesting to me, that using simple mathematical and logical operations like bit shifting, bitwise XOR, substitution, we can generate a very confusing ciphertext that's practically impossible to reverse back into the original value.

This is sort of the high-level overview how AES works. I won't get too much into math behind as it's very complex, but it's important to know the basics. How do we then actually securely encrypt a message of any length? AES works in these so called modes and one of them is CBC. There are others, such as EBC (Electronic Codebook) which encrypts blocks then simply concatenates them, but it's not really recommended to use EBC for the reasons I won't get into.

CBC (Cipher Block Chaining) mode

In CBC mode, AES will XOR each block with the previous ciphertext block before encrypting a block. This means that every ciphertext depends on every single plaintext block that has been encrypted before current one. When encrypting the first block, since we have no previous ciphertext to XOR with, we XOR with random sequence of bytes. Other than the key, this is the first random value we encountered. This random sequence of bytes is called an IV, or initialization vector. For AES, the IV is 16 random bytes. The overview of how CBC mode works can be seen in the image below (and the decryption is identical just in reverse):

Cool, now we know how to encrypt entire message to get a ciphertext. Now what? Is it done? Well, not yet. Let's say somebody sends you an encrypted message and you actually own the same secret key. Can you decrypt the message? First of all, the CBC mode uses a random sequence of bytes, the IV. You don't have that IV block. This is our first issue, the sender needs to send us the IV as well as the ciphertext. Now imagine that you also know the exact IV. Can you decrypt the message now? Yes, but... can you really be sure that the message that has been sent wasn't tampered with by a malicious attacker? That this is the actual message that the sender has sent to you? No, you can't be sure.

That's why we use something called digital signature to verify the authenticity of the message. Essentially, there's a code, called MAC (Message Authentication Code) sent along the ciphertext and the IV. This code can be used to verify the authenticity.

Verification

The MAC can be created using different methods, for example using hash functions. Essentially, hash function is a one-way function (very easy to compute in one direction and very hard to break in other) that creates a message digest (hash) of a message. It cannot be computed in reverse direction, meaning given a message digest, you cannot find the original message. The nature of the hashing is that it looks very random, like the hash you receive is complete gibberish, however there are very well-tested operations used to create a hash.

Due to the fact that creating a hash is not random, you can actually verify that two hashes are equal, since every message you hash produces the same digest each time you run the hash function. To ensure this is true, check out the code snippet below (try it yourself, you'll get the same output). Why is this different for Laravel password hasher? We'll cover that when we tackle hashing in the next article!

$message = 'This is my secret message.';

>>> hash('sha256', $message)
=> "20bee756d7b4d0a39a665b7b22b908779477b35ec63190914c8f70f08dd06387"

>>> hash('sha256', $message)
=> "20bee756d7b4d0a39a665b7b22b908779477b35ec63190914c8f70f08dd06387"

When leveraging hashing for creating MAC, we need a keyed hash-function. That's a function that actually receives the secret key as one of the inputs and the digest depends on that key. This is called HMAC and the PHP has a native implementation for HMAC hash functions.

For the HMAC hash input, we can concatenate the ciphertext with the IV, create a hash using that value, then pass that hash along the ciphertext and the IV. When decrypting, we can apply the same hash function on the IV and the ciphertext (using the same secret key) and ensure the 2 hashes are equal. If 2 hashes are the same, we can be pretty sure the message wasn't tampered with. How would the attacker be able to create the same hash? He would have to hash using the same IV as the sender, same ciphertext and the same encryption key. Pretty hard to crack, right?

Implementation in Laravel

Now that we've covered the basics (yes, those are just the basics) of encryption, let's dive into Laravel to see how encryption is implemented there. As we've noted, we need to somehow pass the IV and the hash alongside the ciphertext. Laravel's symmetric encryption system is located in the Illuminate\Encryption\Encrypter class. You can also use encrypt and decrypt helper functions. They just call the Encryption class' corresponding methods. If we dive into the class, first of all we can see the encryption key and the cipher in the class constructor. Laravel, by default, passes these values from the config file (bound in the EncryptionServiceProvider). These values are config('app.key') and config('app.cipher'). Application key is loaded from the APP_KEY environment variable. Yes, this is the point of the magic APP_KEY variable that your every single application needs.

Key and key generation

I'm sure that your APP_KEY looks something like this: APP_KEY=base64:random-gibberish. We also know the encryption key needs to be random sequence of bytes, to make it very hard to guess, and since we know that it's hard to represent bytes as a text, we can use base64 coding to convert these bytes to a block of text. Laravel prefixes the encoded key with base64: prefix just so it knows how to parse it. This can be seen from the parseKey method in the EncryptionServiceProvider. When parsing the key, Laravel retrieves everything after the prefix and decodes that back to a sequence of bytes.

The way key generation works (php artisan key:generate) is that we generate the encryption key of size that we the cipher uses:

$key = base64_encode(
    Encrypter::generateKey(config('app.cipher'))
);

// write this to .env
"APP_KEY=base64:".$key

Do you remember how big the key should be for AES-256-CBC? We use key with size of 32 bytes (256 bits). For AES-128-CBC, we use 16 bytes. Now let's dig into the Encrypter's generateKey method.

public static function generateKey($cipher)
{
    return random_bytes($cipher === 'AES-128-CBC' ? 16 : 32);
}

Well, who would have thought? Laravel also uses PHP's native random_bytes method to create random sequence of bytes, and the size is 32, for AES-256.

The Encryption Implementation

Now that we know what is the point of the APP_KEY, and we have generated that secret key that we'll use for encryption, we can take a look at how the encryption is implemented in Laravel. This is done via the encrypt($value, $serialize) method on the Encrypter class. Second parameter, $serialize, simply determines whether the value should be serialized before encryption. Doesn't matter right now.

You remember the CBC steps, right? Create a random IV, encrypt the plaintext (CBC mode) using that IV and the encryption key (APP_KEY), sign using MAC (combination of IV and ciphertext), then somehow represent all of these values in a single value.

If we dig into codebase, the first thing we see is this:

$iv = random_bytes(openssl_cipher_iv_length($this->cipher));

This lets me know that Laravel is using OpenSSL implementation, which is the golden standard for encryption implementation. Cool. We see the openssl_cipher_iv_length function is called. This function just spits out the length of the initialization vector (IV) for the cipher. For AES-256-CBC, how big this vector is? You're right, it's 16 bytes (the same size as the AES block). So, Laravel just generates a random 16 bytes and stores that in the $iv variable.

The next step is....? You guessed, encrypting the plaintext!

$value = openssl_encrypt($value, $this->cipher, $this->key, 0, $iv);

if ($value === false) {
    throw new EncryptException('Could not encrypt the data.');
}

Again, we use OpenSSL's function to encrypt the data. This function should receive the following (try to guess yourself which values it needs):

plaintext ($value)
cipher (AES-256-CBC) ($this->cipher)
encryption key ($this->key)
the IV ($iv)

This method performs encryption and spits out the ciphertext based on all of these values. So you can see, it's not that we're encrypting the data ourselves, we ALWAYS use battle-tested solutions. This method also returns false if encryption could not be completed, so Laravel checks for that.

Now that we have created the ciphertext, we need to sign it with MAC, right? We already know that it will combine the IV and the ciphertext and hash it.

$mac = $this->hash($iv = base64_encode($iv), $value);

protected function hash($iv, $value)
{
    return hash_hmac('sha256', $iv.$value, $this->key);
}

It's exactly what it does. We base64 encode IV (as this is just 16 random bytes), concatenate IV and the ciphertext then hash it. In this case, Laravel uses keyed SHA256 hashing via the PHP's hash_hmac function. This function is different than hash function in a way that it creates a hash using HMAC, which we already know.

When the MAC is created, we're done. We just need to reduce this value to a single string that can be sent through the unsecure channel. The way Laravel does this is by creating a JSON from the IV, MAC and the ciphertext.

$json = json_encode(compact('iv', 'value', 'mac'), JSON_UNESCAPED_SLASHES);

This JSON value is then base64 encoded and returned from the method. Yep! This means that once you get that huge ugly value from the encrypter, it's base64 encoded version of a JSON.

>>> base64_decode(encrypt('hello world'))
=> "{"iv":"NrhWvCbY77YobjIdrorgkQ==","value":"soXgUh6OXWsHpkKh4kfNYT3Hl62T7n5DfQ4GMljMEaY=","mac":"06a409f87485f6f3a1b80e307942b9e2a23ab9ddbff19c6ccc4600f47ea5b5b3"}"

We got everything we need, cool! Now, since symmetric encryption uses a single key, we can pass that same value to the decrypt($value) method in the Encrypter and take look at how this works!

The Decryption Implementation

We learned that base64 code of a JSON is contained when you encrypt a value. Let's assume this is the same thing passed to the decrypt method. Can you try to guess how this method will look? I'll give you a minute... Well, we first decode and unpack the JSON. Let's try this ourselves.

$iv = "NrhWvCbY77YobjIdrorgkQ==";
$value = "soXgUh6OXWsHpkKh4kfNYT3Hl62T7n5DfQ4GMljMEaY=";
$mac = "06a409f87485f6f3a1b80e307942b9e2a23ab9ddbff19c6ccc4600f47ea5b5b3";
$payload = [
    'iv' => $iv,
    'mac' => $mac,
    'value' => $value,
];

$coded = base64_encode(json_encode($payload))
// eyJpdiI6Ik5yaFd2Q2JZNzdZb2JqSWRyb3Jna1E9PSIsIm1hYyI6IjA2YTQwOWY4NzQ4NWY2ZjNhMWI4MGUzMDc5NDJiOWUyYTIzYWI5ZGRiZmYxOWM2Y2NjNDYwMGY0N2VhNWI1YjMiLCJ2YWx1ZSI6InNvWGdVaDZPWFdzSHBrS2g0a2ZOWVQzSGw2MlQ3bjVEZlE0R01sak1FYVk9In0=

echo decrypt($coded) // prints "hello world"

Once unpacked, we can verify the message by computing the MAC ourselves and comparing the MAC sent in the payload. Luckily, PHP offers hash_equals method to compare 2 hash values.

If we dig into the decrypt method, we see that the first line in the decrypt method is the following:

$payload = $this->getJsonPayload($payload);

The getJsonPayload decodes base64 message, decodes JSON into array and simply validates everything that we need for decryption is in the array. That are IV, value, MAC and verifies that IV of the proper size (16 bytes). However, this method also has something fun:

if (! $this->validMac($payload)) {
    throw new DecryptException('The MAC is invalid.');
}

Before you dig into the validMac method, try to guess how this will look? We need to compare 2 hashes, the one from the payload and the one we computed ourselves. Remember, that same class has a hash method that concatenates the IV and the ciphertext.

protected function validMac(array $payload)
{
   return hash_equals(
       $this->hash($payload['iv'], $payload['value']),
       $payload['mac']
   );
}

Hooray! We exactly know what every method does and how it does that! Once the message is verified, we know it hasn't been tampered with. We can just decrypt using our encryption key and the IV, then simply return back that original plaintext by using the same OpenSSL's implementation.

$decrypted = openssl_decrypt(
   $payload['value'], $this->cipher, $this->key, 0, $iv
);

return $decrypted;

Summary

This is pretty much what I wanted to cover on symmetric encryption. As you may remember, there are asymmetric algorithms that use different secret keys for encryption and decryption. We'll tackle these in the next article in this series.

Now that you know that encryption is very secure and very fast, there are no reasons you shouldn't use it. The way I use it is that I encrypt the data before storing it in the database and decrypt it after retrieval from the database (or automatically by using encrypted Eloquent cast). This is important for sensitive data like names, phone numbers, addresses, API keys, SSH keys stored in the database, sometimes even passwords.

For an example, think of Laravel Forge. Forge needs to store your server's SSH keys in the database, your database password so it can inject it into your site's .env file, your DigitalOcean tokens so it can create new servers for you. Hashing these values is not a solution for these problems, as it cannot be retrieved later. If Forge hashed your DigitalOcean token, it would not be able to connect to its API later. That's why it should encrypt the data, and decrypt later when it needs it. However, if someone hacked Forge and accessed its database, it would not be able to decrypt the sensitive information without also having its APP_KEY. This is also a downside of the symmetic encryption — if somebody receives access to your encryption key, your data is compromised.

Thank you for reading, if you have any questions, feel free to DM me on Twitter, @jcrnkovic95! I again repeat, this article covers very simplified idea behind encryption and AES and covers only the basics. Keep an eye on the next article on asymmetric encryption and see how you actually use encryption, both symmetric and asymmetric, every single day. It won't cover too much of a Laravel, but it will cover SSL certificates, your Forge servers and more.