Since you are using a web browser to read this, you are probably at least aware of Transport Level Security (TLS), which is used in wire protocols such as HTTPS. In ScyllaDB, TLS is used to secure network connections between database nodes and/or client endpoints. Encrypting network traffic, also known as data in transit encryption, is essential both to protect sensitive data, and to ensure that traffic actually originates from where you think it did.
ScyllaDB already supports data in transit encryption, supporting both node-to-node (intra-cluster) encryption as well as client-to-node encryption.
But securing data is transit is sometimes not enough. What if your servers themselves are compromised? This is where data at rest encryption comes into play. Data at rest secures the information persisted in a computer, such as on an SSD or HDD volume.
When you deal with sensitive data or multi-tenant deployments where you require isolation between clients (sets of keyspaces/tables), having data in clear text on disk can be a security problem. In a physical deployment, you have the risk of the actual disks being stolen or misplaced, which is why most organizations have strict routines for disposing of discarded storage media.
To alleviate the issue you can use whole-disk encryption solutions like LUKS, Bitkeeper or AWS disk encryption, but this does not handle encrypting different keys for different tables.
To solve this ScyllaDB now supports per-table and per-node transparent data at rest encryption.
Encryption at rest for ScyllaDB Enterprise, available starting with release 2019.1.1, protects data in its persisted state on disk, such as SSTables and commit logs. You can configure any user table as well as the parts of the system storage that includes client data to use any symmetric key algorithm, key width and block mode supported by OpenSSL in a file block encryption scheme.
In this post, we will try to explain some of the key elements and workflows when using transparent data encryption in scylla.
Why not just rely on my cloud vendor for data-at-rest encryption?
While AWS provides on-disk encryption, there are also some limitations to it. First, you have no direct access to the keys used to encrypt the data. In case you need to do forensics, your data on disk is encrypted even from your own view. Also, by doing your own encryption, you can use the same keys across all nodes and backups. You can also use the ScyllaDB-native implementation to manage multi-tenant data access, where different groups may have access to different parts of your data.
In summary:
- you gain control over the encryption keys
- you can encrypt each table with a different key
- data is still transparently encrypted if it gets moved to a different volume, for instance in case of a backup
Plus, of course, relying on a cloud vendor to do disk encryption isn’t even an option for your own on-premises deployments.
Encryption keys
Encryption keys are used for encrypting system data, such as commit logs, hints, user tables and/or other user table keys. (To be clear, “encryption keys” are external to the database, and refer to cryptographic keys. They are not related to the term “key” as used within the database such as primary, partition, or clustering keys.)
An encryption key is either stored as the contents of a local file with a single pre-generated key or, in future versions of ScyllaDB Enterprise, with a named Key Management Interoperability Protocol (KMIP) format key stored on a separate (third-party) KMIP server.
Encryption keys are identified by name. For local file keys a file with the same name as the key is expected to exist, with the appropriate access rights, in the directory designated by the scylla.yaml configuration option:
system_key_directory:
Transparent user data encryption
Transparent user data encryption enables encrypted storage of persisted SSTable data. Tables are configured for encryption by setting CQL properties on the table while creating or updating the table.
create table . (......) WITH
scylla_encryption_options = {
'cipher_algorithm' : “<alg/mode/padding>”,
'secret_key_strength' : <128/256…>,
'key_provider': ,
[...]
}
;
The default key provider is the local file key storage provider. With no extra options ScyllaDB will read or create a local key file in the key directory configured in scylla.yaml
. ScyllaDB supports three storage providers; local (available at initial release of this feature), plus replicated and KMIP key storage (in future versions of ScyllaDB Enterprise).
Note that since data insertion in ScyllaDB typically passes through the commit log and/or batch/hints log before being fully committed to a final SSTable on disk, you should also configure system-level encryption to be sure all sensitive data is protected. You can read the documentation on how to configure system-level encryption here.
The keys used can be either pre-generated or created on demand on first write to disk. File based key management will rely on the same key being available on disk in the named file whereas, in a future version, providers such as KMIP or the replicated key provider use an id-based key lookup to name and later retrieve the keys used by any given disk file.
Local key storage
This initial implementation, available beginning in ScyllaDB Enterprise Release 2019.1.1, is a simple file-based key storage scheme where each key is kept in clear-text files on disk, in a key storage directory configured in scylla.yaml
.
This is the default key storage manager in ScyllaDB, since in its simplest usage it requires very little extra configuration. However, care should be taken so that no outside party can easily access the key data from the file system, i.e. you should take care of setting the permissions for the key directory.
You could also consider keeping the key directory on a network drive (using TLS for the file sharing) to avoid having keys and data on the same storage media, should it become stolen or discarded.
To use this provider for user data encryption, set
‘key_provider’: ‘LocalFileSystemKeyProviderFactory’,
‘secret_key_file’: (with optionally pre-created key(s)).
In the scylla_encryption_options
attributes.
If secret_key_file
is not specified, it will default to a file named data_encryption_keys
in the system configuration directory. If this file does not exist and contain an appropriate key it and/or the directory must be writable by the scylla process and a new key will be generated.
System level encryption
System level encryption applied to semi-transient on-disk data, such as commit log, batch log and hinted handoff data. In each of these user table data is temporarily stored until fully persisted to final sstable on disk.
System encryption is configured in scylla.yaml
:
System_info_encryption:
enabled:
key_provider: (optional)
Depending on key provider, additional arguments declaring provider properties are also required. Note the replicated key provider is not allowed here.
ScyllaDB also allows you to encrypt sensitive parts of the scylla.yaml configuration file, such as KMIP server passwords. The configuration_encryptor
tool accepts a system key file on disk and encrypt or decrypt your scylla config file automatically.
Example 1: Creating a new, encrypted table
Using the simplest use case of encrypting a single table using local key file provider:
- Create an encryption key
> <scylla-tools/bin>/local_file_key_generator
- Copy the key file to the same path on all nodes in the cluster.
- Create the table
> cqlsh cqlsh> CREATE KEYSPACE ks WITH replication={ 'class' : 'SimpleStrategy', 'replication_factor' : 1 } ; cqlsh> CREATE TABLE ks.test (pk text primary key, c0 int) WITH scylla_encryption_options = { 'key_provider' : 'LocalFileSystemKeyProviderFactory', ‘secret_key_file’ : ‘<path-to-key-file>’ };
- Insert and select some data
cqlsh> INSERT INTO ks.test (pk, c0) VALUES (‘apa’, 1); cqlsh> SELECT * from ks.test; pk | c0 ------+---- apa | 1
- Flush data update to sstable
>nodetool flush
The created cql table behaves as a regular table, but all data in sstable files written for it will now be written encrypted to disk. You can verify this by for example copying the sstable to a different machine and try reading it with tools such as sstabledump. This will fail.
Example 2: Encrypting an existing table
Using the simplest use case of encrypting a single table using local key file provider:
- Create an encryption key
> <scylla-tools/bin>/local_file_key_generator
- Copy the key file to the same path on all nodes in the cluster.
- Enable encryption of the table
> cqlsh cqlsh> ALTER TABLE ks.test WITH scylla_encryption_options = { 'key_provider' : 'LocalFileSystemKeyProviderFactory', ‘secret_key_file’ : ‘<path-to-key-file>’ };
- Upgrade existing sstables to encrypt them
> nodetool upgradesstables -a ks test
Note that until you re-write the sstables in step 4, all disk files already existing will remain unencrypted, even though newly created file
Example 3: Disable encryption for an encrypted table
To un-encrypt an encrypted table:
- Disable table encryption
cqlsh> ALTER TABLE ks.test WITH scylla_encryption_options = { 'key_provider' : 'none’ };
- Upgrade existing sstables to decrypt them
Note: until all existing sstables are re-written unencrypted, the encryption key used needs to remain available or data loss will occur.
> nodetool upgradesstables -a ks test
Conclusion
Transparent data encryption provides a flexible and low overhead option to increase data security. It typically has a minimal cpu overhead (depending on encryption algorithm selected), generally much lower than for example sstable compression or similar, and no additional disk footprint.
There are, however, no silver bullets. When deploying an encryption solution you need to decide which type of attack you are guarding against; loss of disk media or intrusion attacks. To ensure data safety you need to ensure you protect disk-based keys properly, using proper file permissions, non-local storage etc.
Lastly, be sure to backup all your encryption keys (including those stored in ScyllaDB tables) to prevent data loss, as there will generally be no way to recover data without them.
Properly applied, transparent data encryption can help secure sensitive data, even when deployed in semi-public environments such as cloud solutions or shared server farms.