Summary
A normal operation in Cassandra is bulk loading data between clusters. For example, loading the data from production to a development environment for testing purpose. When TDE (transparent data encryption) is enabled, secure environments require additional tasks for successful bulk loading. You will probably encounter streaming errors due to the wrong encryption key. This article describes how to bulk load the data between two TDE-enabled clusters.
Symptoms
When there is a wrong key used during bulk load, DSE reports this error:
ERROR [STREAM-IN-/10.xxx.xxx.xxx] 2017-11-14 23:59:17,458 StreamSession.java:538 - [Stream #d2652c60-c997-11e7-b788-759f95545270] Streaming error occurred
java.io.IOException: Failed to decrypt data
Caused by: com.datastax.bdp.cassandra.crypto.KeyAccessException: Unable to find key for cipher=AES strength=128 id=d9b3dd70-c764-11e7-abc4-793ec23f8a8c
Cause
There are two type of keys used during streaming of encrypted data: decryptor and encryptor.
- The decryptor is used to decrypt the SSTable during streaming. The decryptor must be the same key that encrypted the data on the source cluster previously.
- The encryptor is used to encrypt the SSTable on the target cluster. The encryptor is the key configured in the encryption option in CQL table schema on the target cluster.
The decryptor and encryptor could be the same key, or different keys. Most users tend to maintain different keys among clusters. The error during bulk loading is caused by using the wrong key during decryption.
Solution
To successfully complete the bulk loading operation:
- Copy the encryption key file used on the source cluster to the target cluster. The key exists in the directory that listed in dse.yaml, "system_key_directory" option. The default directory is "/etc/dse/conf". You must retain the name of encryption key when you copy the key from the source to the target cluster. For example, when the key file is named "system_key" on the source cluster, the same file name needs to be used on the target cluster.
- On the source cluster, get the key entries from dse_system.encrypted_keys table:
cqlsh> SELECT * from dse_system.encrypted_keys ;
key_file | cipher | strength | key_id | key
------------+--------+----------+------
system_key | AES | 128 | d9b3dd70-c764-11e7-abc4-793ec23f8a8c | kmbYE1KLkmW3Hzg7dIPt1rWk3j6hR+gM7bxd/pRd7gU=
- On the target cluster, insert the same key entry:
cqlsh> INSERT INTO dse_system.encrypted_keys (key_file , cipher , strength , key_id , key ) VALUES ( 'system_key', 'AES', 128, d9b3dd70-c764-11e7-abc4-793ec23f8a8c, 'kmbYE1KLkmW3Hzg7dIPt1rWk3j6hR+gM7bxd/pRd7gU=');
- On the target cluster, there are two entries in dse_system.encrypted_keys table, one from the source cluster as the decryptor, the other existing on the target cluster as the encryptor.
cqlsh> SELECT * from dse_system.encrypted_keys ;
key_file | cipher | strength | key_id | key
------------+--------+----------+------
system_key | AES | 128 | d9b3dd70-c764-11e7-abc4-793ec23f8a8c | kmbYE1KLkmW3Hzg7dIPt1rWk3j6hR+gM7bxd/pRd7gU=
system_key_dev | AES | 256 | 81847700-c99d-11e7-b9d9-23f36e5077c2 | 6YXE07AcEv61jvT6x7rdj6AHde0N6OHzxALNRnW1s7nVDFFQDArh64LousF8bXmy
- After the operations above, sstableloader should be able to run successfully.
In the real world, the default key file name "system_key" is likely to be used on different clusters. Because of this default key file name, there will be a problem when you copy the key file from the source to the target cluster. Two different keys with the same name cannot exist in the same directory. In this case, complete the following steps before you copy keys and key files from the source cluster:
- Rekey the target cluster to use a different key name. You can rename the existing key or generate a new key. Follow the steps in Rekeying existing data to re-encrypt the existing data on the target cluster.
Tips: To avoid having to rekey the data on the target cluster, DataStax recommends following best practices for encryption key management. Do not use the default name "system_key" for the encryption keys if there will be bulk loading among the clusters which use different encryption keys. Instead, specify the key file name when generating the key:
dsetool createsystemkey 'AES/ECB/PKCS5Padding' 128 <new_system_key>