DataStax Help Center

Opscenter is unable to restore from S3 bucket when the number of backup files is very large

Summary

Opscenter times out when retrieving the list of backup files from Amazon S3, when the bucket contains a large number of files. This has been observed when restoring a backup from an S3 bucket. A large number of files in the bucket can happen if the backup Retention policy is set to Retain All.

Symptoms

When trying to restore from an S3 bucket that contains a very large number of backup files, this error is displayed in the Opscenter UI:

Location validation error: Unexpected token <

The corresponding exception in the opscenterd.log is:

    [opscenterd] ERROR: Problem while calling decorator (error): [Errno 104] Socket closed
  File "/usr/share/opscenter/lib/py/twisted/internet/defer.py", line 1122, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)

  File "/usr/share/opscenter/lib/py/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)

which shows that the socket was closed due to a timeout.

The socket timeout occurs in 5 minutes, and this is not configurable within Opscenter. Therefore, even if the default_api_timeout is set to a high value, i.e. 10 minutes, the socket timeout will occur before Opscenter timeout and will interrupt the restore.

Workaround

One way to avoid this issue before it occurs is to keep the number of backup files in the bucket to a reasonable level. There is no definite number of files that will trigger the issue, however, creating more buckets to store the backup files for different months or quarters should help.

Set an expiration policy for the backups in opscenter in order to keep the number of active backup as low as possible in the bucket.

Once the issue occurs and Opscenter is unable to retrieve the backup files, users will have to manually create a new bucket and move some of the backup files to the new bucket and attempt to restore from the new bucket.

Solution

A fix for this behaviour is developed and tracked within this jira: OPSC-11574.
Developers are improving how Opscenter will attempt to list the files in the bucket and will better handle the timeout errors.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk