Summary
This article discusses a scenario causing OpsCenter backups to S3 locations to fail when using the AWS credentials provider.
Applies to
- OpsCenter 6.1.10
- OpsCenter 6.5.1 or newer
Symptoms
Existing OpsCenter backup jobs fail to run. In the OpsCenter log, it reports that a backup pre-check on the S3 destination failed due to an expired authentication token. Here is an example [formatted] entry from opscenterd.log
running OpsCenter 6.5.3:
2018-09-27 20:34:34,199 [opscenterd] WARN: Marking request 1044b754-0736-40aa-ae8c-bdf3130e20f5 as failed: {'metadata': \
{'created': False, 'errors': [u'Destination pre-check failed. Verify you are able to read, write, and delete from the destination. \
Error: clojure.lang.ExceptionInfo: throw+: \
{:type :opscenter-common.s3.api/ExpiredToken, :msg "The provided token has expired.", :http-code "400", :failure-side "Client"} \
{:type :opscenter-common.s3.api/ExpiredToken, :msg "The provided token has expired.", :http-code "400", :failure-side "Client"}']}, \
'message': u'Destination pre-check failed. Verify you are able to read, write, and delete from the destination. \
Error: clojure.lang.ExceptionInfo: throw+: \
{:type :opscenter-common.s3.api/ExpiredToken, :msg "The provided token has expired.", :http-code "400", :failure-side "Client"} \
{:type :opscenter-common.s3.api/ExpiredToken, :msg "The provided token has expired.", :http-code "400", :failure-side "Client"}', 'sstables': ...
Similarly, the following error is reported on the OpsCenter web UI:
Cause
OpsCenter uses the AWS SDK for Java to access AWS infrastructure services such as S3. For AWS environments that use EC2 instance profile credentials (IAM role credentials that exist in the EC2 instance metadata), OpsCenter allows the use of these credentials when backing up to an S3 bucket without requiring users to explicitly provide API keys (OPSC-5161).
When the S3 bucket is added as a backup location for the first time using AWS Credential Provider chain
as the credentials source, the AWS SDK retrieves the EC2 instance credentials and OpsCenter saves them in a credentials object that gets cached in memory. However, instance profile credentials are temporary and the associated session token eventually expire leading to the ExpiredToken
exception reported in the logs (OPSC-15138).
Workaround
Follow the steps outlined in one the following options:
OPTION 1 - Temporarily switch to supplying the AWS credentials when adding an Amazon S3 backup location until the OpsCenter server and agents can be upgraded to a version which contains the fix for OPSC-15138.
OPTION 2 - Restart the OpsCenter service to force a new copy of the credentials object to be pushed to the agents. Note that this is a temporary measure and may not be viable if the expiration on the IAM role for the EC2 instance is set too low[1].
Additionally, be aware that restarting OpsCenter might affect other jobs or tasks (e.g. Nodesync, other backups, or alerting) for all managed clusters.
OPTION 3 - Edit the scheduled backup and temporarily change the value of the S3 Bucket
field (e.g. to a non-existing bucket) and save the job. Then edit the job again but this time set the correct bucket to trigger the agents to rebuild the S3 destination, effectively refreshing the cached credentials object. Note that this is a temporary measure and may not be viable if the expiration on the IAM role for the EC2 instance is set too low[1].
[1] If the expiration on the IAM role is set to a short time span (such as 4 hours), it might be impractical to implement the workaround because OpsCenter would need to be restarted every 4 hours. In this case, implementing Option 1 is a better choice.
Solution
OPSC-15138 has been fixed and delivered in OpsCenter 6.5.5 and OpsCenter 6.7.1.
Upgrade to the latest version of OpsCenter to take advantage of all the new improvements and latest fixes.
See also
KB article - OpsCenter unable to connect to S3 bucket using default AWS credential provider chain