Using sstableloader to move data from a source cluster to a destination cluster can be confusing. It can be difficult to determine what are the necessary components for the source and for the destination.
Various errors can occur if the sstableloader is not setup correctly. Usually an exception is thrown if a configuration error does occur.
sstableloader is used when needing to move data from a source to a destination. sstableloader opens sstable files on the source, makes a connection to the DSE or Cassandra instance on the destination and uses inserts to insert the data from the sstables.
What this really means is that the SOURCE is not necessarily a DSE/Cassandra instance but the destination is a DSE/Cassandra instance. The source can be sstables in a DSE/Cassandra data directory, sstables copied to another location, or sstables generated by an application. For our example we are assuming that the source is a running DSE/Cassandra instance so the sstables are in a data directory.
if an ls -l was done on the data directory for ATest/users the output could look like:
-rw-r--r-- 3 cassandra cassandra 46 May 10 16:50 ATest-users-hf-1-CompressionInfo.db
-rw-r--r-- 3 cassandra cassandra 233 May 10 16:50 ATest-users-hf-1-Data.db
-rw-r--r-- 3 cassandra cassandra 24 May 10 16:50 ATest-users-hf-1-Filter.db
-rw-r--r-- 3 cassandra cassandra 83 May 10 16:50 ATest-users-hf-1-Index.db
-rw-r--r-- 3 cassandra cassandra 4340 May 10 16:50 ATest-users-hf-1-Statistics.db
-rw-r--r-- 3 cassandra cassandra 68 May 10 16:50 ATest-users-hf-1-TOC.txt
drwxr-xr-x 2 cassandra cassandra 4096 May 10 16:50 backups
drwxr-xr-x 3 cassandra cassandra 4096 May 10 16:50 snapshots
These reside in <data>/ATest/users on the SOURCE machine.
We want to move the data to our DESTINATION machine so we move to the DESTINATION machine do the following:
-- create the keyspace ATest and the Columnfamily users.
-- copy the cassandra.yaml onto the SOURCE machine.
-- Moving to the SOURCE machine, place cassandra.yaml in the CLASSPATH of the sstableloader utility.
-- Run sstableloader.
Here is a checklist for running sstableloader.
-- Cassandra/DSE does NOT have to be running on the SOURCE
-- Cassandra/DSE does have to be running on the DESTINATION
-- sstableloader will be run from SOURCE.
-- identify the sstables on the SOURCE machine.
-- copy the cassandra.yaml from the DESTINATION machine to the SOURCE machine.
-- place the cassandra.ymal in the CLASSPATH of the sstableloader utility on the SOURCE machine.
-- create the keyspace/columnfamily on the DESTINATION instance.
-- run sstableloader from the SOURCE machine.
Example running sstableloader from the directory /usr/tools/ on the SOURCE machine.
sstableloader --debug -v -d <DESTINATION> /var/dse/cassandra/data/ATest/users
Always check the documentation for the latest version of sstableloader/bulk loader configuration and parameters.
There is a known error if there are other files or directories in the data directory of the SOURCE machine that will cause sstableloader to stop. The loader is expecting the files to be in the formation of <keyspace>-<table>-<generation>-<sequence>-<type>.db. If there are other files or directories not in the format when it attempts to read the file it throws an exception.
Explained in https://issues.apache.org/jira/browse/CASSANDRA-5587