Cassandra backs up data by taking a snapshot of all on-disk data files (SSTable files) stored in the data directory. You can take a snapshot of all keyspaces, a single keyspace, or a single table while the system is online.
Using a parallel ssh tool (such as pssh), you can snapshot an entire cluster. This provides an eventually consistent backup. Although no one node is guaranteed to be consistent with its replica nodes at the time a snapshot is taken, a restored snapshot resumes consistency using Cassandra's built-in consistency mechanisms.
After a system-wide snapshot is performed, you can enable incremental backups on each node to backup data that has changed since the last snapshot: each time an SSTable is flushed, a hard link is copied into a /backups subdirectory of the data directory (provided JNA is enabled)
1. Taking a snapshot
Run the nodetool snapshot command, specifying the hostname, JMX port, and keyspace.
$ nodetool -h hostname -p jmx port snapshot mykeyspace
For example:
$ nodetool -h localhost -p 7999 snapshot test
The snapshot is created in data_directory_location/keyspace_name/table_name/snapshots/snapshot_name directory. Each snapshot directory contains numerous .db files that contain the data at the time of the snapshot.
For example:
Packaged installs:
/var/lib/cassandra/data/mykeyspace/mytable/snapshots/23939834298/mykeyspace.db
Tarball installs:
install_location/data/data/mykeyspace/mytable/snapshots/23939834298/mykeyspace.db
Deleting snapshot files
To delete all snapshots for a node, run the nodetool clearsnapshot command. For example:
$ nodetool -h localhost -p 7199 clearsnapshot
Enabling incremental backups
Edit the cassandra.yaml configuration file on each node in the cluster and change the value of incremental_backups to true.
Restoring from a Snapshot
1. Shut down the node.
2. Clear all files in the commitlog directory:
Packaged installs: /var/lib/cassandra/commitlog
Tarball installs: install_location/data/commitlog
3. Delete all *.db files in data_directory_location/keyspace_name/table_name directory, but DO NOT delete the /snapshots and /backups subdirectories. where data_directory_location is Packaged installs: /var/lib/cassandra/data and Tarball installs: install_location/data/data
4. Locate the most recent snapshot folder in this directory:
data_directory_location/keyspace_name/table_name/snapshots/snapshot_name
5. Copy its contents into this directory: data_directory_location/keyspace_name/table_name directory.
2. Clear all files in the commitlog directory:
Packaged installs: /var/lib/cassandra/commitlog
Tarball installs: install_location/data/commitlog
3. Delete all *.db files in data_directory_location/keyspace_name/table_name directory, but DO NOT delete the /snapshots and /backups subdirectories. where data_directory_location is Packaged installs: /var/lib/cassandra/data and Tarball installs: install_location/data/data
4. Locate the most recent snapshot folder in this directory:
data_directory_location/keyspace_name/table_name/snapshots/snapshot_name
5. Copy its contents into this directory: data_directory_location/keyspace_name/table_name directory.
6. If using incremental backups, copy all contents of this directory:
data_directory_location/keyspace_name/table_name/backups
7. Paste it into this directory: data_directory_location/keyspace_name/table_name
data_directory_location/keyspace_name/table_name/backups
7. Paste it into this directory: data_directory_location/keyspace_name/table_name
8. Restart the node.