This article discusses an issue where the OpsCenter Backup Service fails on nodes with too many data files.
- OpsCenter 6.0.x
When performing a backup using the OpsCenter Backup Service, a snapshot attempt fails and reports that there are too many files to backup. Here is an example error in
2017-09-20 01:23:45,678 [MyCluster] ERROR: Snapshot of keyspaces [playlist] on node 10.1.2.3 failed: \
Cannot take snapshot, number of files to be backed up 56789 is larger than the queue size 10000. \
Increase queue size by changing backup_file_queue_max in the agent configuration. (MainThread)
Prior to the snapshot phase, the OpsCenter Backup Service traverses each keyspace's data directory and records metadata about each file onto a work queue. This queue is used for determining which files should get a snapshot and sent to the respective backup destination.
The issue arises during the traversal of the data directory which unnecessarily includes the snapshot subdirectories when compiling the list of files for the work queue and results in having too many files for processing.
Follow these procedures to allow backups to complete successfully.
OPTION A - Remove old snapshots
Step A1 - Manually delete snapshots which you no longer require on the affected node and the affected keyspace. For example, to delete a specific snapshot for the
$ nodetool clearsnapshot -t <snapshot_name> -- playlist
Step A2 - Repeat the step above until the number files in the subdirectories for the playlist is below
backup_file_queue_max (default is 10000 files).
OPTION B - Increase the queue size
Step B1 - For the affected node, increase the maximum backup queue size by adding the following line to
Step B2 - This would cause the agent to consume more memory. We recommend increasing the agent's heap size. By default, the agent is allocated 128MB of heap in
JVM_OPTS="$JVM_OPTS -Xmx128M -Djclouds.mpu.parts.magnitude=100000 -Djclouds.mpu.parts.size=16777216"
Increase the heap to 256MB or 512MB.
Step B3 - Restart the agent for the changes to take effect.
backup_file_queue_max was deprecated in OpsCenter 6.1.0 (internal ID OPSC-8045) since the functionality it provided was no longer required. Upgrade to OpsCenter 6.1.0+ to take advantage of the new Backup Service algorithm.