KeyError: 'x-datastax-agent-version' in OpsCenter logs when Repair Service is enabled
Summary
This article provides information and troubleshooting for multiple errors in OpsCenter that are related to KeyError: 'x-datastax-agent-version'.
Applies to
DataStax Enterprise (DSE) 5.1
DataStax Enterprise 6.0
DataStax Enterprise 6.7
OpsCenter 6.5
OpsCenter 6.7
Symptoms
A failing query is referenced in opscenterd.log with KeyError: 'x-datastax-agent-version'.
The WARN message repeats ignoreta=xyz a large amount of times. For readability, the URL WARN line is truncated in this example.
2019-04-24 02:26:13,228 [ClusterName] ERROR: Error processing HTTP response: Traceback (most recent call last):
File "/usr/share/opscenter/Lib/site-packages/opscenterd/Agents.py", line 103, in onSuccess
KeyError: 'x-datastax-agent-version'
(http-client-1)
2019-04-24 02:26:13,232 [ClusterName] WARN: HTTP request <https://1.2.3.4:61621/v1/tokenranges?partitions=1048576&ignoreks=solr_admin&ignoreks=dse_insights_local&ignoreks=dse_system&ignoreks=system_virtual_schema&ignoreks=system_views&ignoreks=system&ignoreks=dse_system_local&ignoreks=system_schema&ignoreks=dse_perf&ignoreks=dse_insights&ignoreta=prefix.objectName&ignoreta=prefix.objectName&ignoreta=prefix.objectName&ignoreta=prefix.objectName&ignoreta=prefix.objectName&ignoreta=prefix.objectName&<shorten for readability>
File "/usr/share/opscenter/Lib/site-packages/opscenterd/Agents.py", line 154, in onSuccess
KeyError: 'x-datastax-agent-version'
The agent.log response is based on the timestamp and IP address provided in the log:
WARN [qtp78782356-2274] 2019-04-24 02:26:13,225 URI is too large >8192
WARN [qtp78782356-2274] 2019-04-24 02:26:13,226 badMessage: 414 for HttpChannelOverHttp@47f3a964{r=479,c=false,a=IDLE,uri=-}
WARN [qtp78782356-2274] 2019-04-24 02:26:13,228 badMessage: java.lang.IllegalStateException: too much data after closed for HttpChannelOverHttp@47f3a964{r=479,c=true,a=COMPLETED,uri=null
Cause
This issue occurs when Repair Service is enabled in OpsCenter and in these scenarios:
HTTP query refused by the agent for the call to /v1/tokenranges due to a larger than allowed HTTP URI (more than 8 KB) as shown the example error message URI is too large > 8192. In this case, the cluster contains a lot of Materialised Views (MV) and some tables use NodeSync (DSE 6.0 and later) which are automatically added to the list of ignoreta for repair as indicated in the opscenter.log. This scenario generates an HTTP query larger than allowed by the HTTP endpoint on the agent.
When too many tables are added to the ignore_tables list in the [repair_service] section of the cluster configuration file. See the Basic Repair Service configuration reference documentation.
Troubleshooting HTTP call /v1/tokenranges
The KeyError: 'x-datastax-agent-version' indicates a problem processing the query on the agent side and is not necessarily an issue of compatibility between datastax-agent and OpsCenter.
To troubleshoot, run a status query against the API of the agent directly using cURL. Caution: unless you are aware of the impact of running the URI provided in the warning against the agent, do not directly run the URI provided in the warning/error message.
See the Support Knowledge Base article extracting diagnostics via the API for examples and instructions on running safe queries against the agent. See also the datastax-agent API example curl commands documentation.
A simple example command to request the status of the node without SSL: curl -v -L http://<agentIP>:61621/v1/connection-status?
A simple example command to request the status of the node with SSL: curl -v -k --cert /var/lib/opscenter/ssl/opscenter.pem --key /var/lib/opscenter/ssl/opscenter.key -L https://<agentIP>:61621/v1/connection-status?
Workaround
If update is not possible, DataStax recommends stopping the service in OpsCenter and regularly running manual repair on all the nodes.
On the first run:
$ nodetool repair
Perform maintenance every week or so, depending on the gc_grace_seconds:
$ nodetool repair -pr
Solution
The Repair Service in OpsCenter is unable to proceed when this issue occurs. This issue has been investigated by development under reference OPSC-13245 and is fixed in OpsCenter 6.7.5 as per the Release Notes Repair Service section - note that both the agents and OpsCenter need to be updated to leverage the feature change
Troubleshooting on a different HTTP call
If the issue occurs on a different HTTP call, consider raising a ticket for Support and be sure to provide both the agent and opscenter logs.
Summary
This article provides information and troubleshooting for multiple errors in OpsCenter that are related to
KeyError: 'x-datastax-agent-version'
.Applies to
Symptoms
A failing query is referenced in opscenterd.log with
KeyError: 'x-datastax-agent-version'
.The WARN message repeats
ignoreta=xyz
a large amount of times. For readability, the URL WARN line is truncated in this example.The agent.log response is based on the timestamp and IP address provided in the log:
Cause
This issue occurs when Repair Service is enabled in OpsCenter and in these scenarios:
/v1/tokenranges
due to a larger than allowed HTTP URI (more than 8 KB) as shown the example error messageURI is too large > 8192
.In this case, the cluster contains a lot of Materialised Views (MV) and some tables use NodeSync (DSE 6.0 and later) which are automatically added to the list of
ignoreta
for repair as indicated in the opscenter.log. This scenario generates an HTTP query larger than allowed by the HTTP endpoint on the agent.ignore_tables
list in the [repair_service] section of the cluster configuration file. See the Basic Repair Service configuration reference documentation.Troubleshooting HTTP call /v1/tokenranges
The
KeyError: 'x-datastax-agent-version'
indicates a problem processing the query on the agent side and is not necessarily an issue of compatibility betweendatastax-agent
andOpsCenter
.To troubleshoot, run a status query against the API of the agent directly using cURL.
Caution: unless you are aware of the impact of running the URI provided in the warning against the agent, do not directly run the URI provided in the warning/error message.
See the Support Knowledge Base article extracting diagnostics via the API for examples and instructions on running safe queries against the agent. See also the datastax-agent API example curl commands documentation.
A simple example command to request the status of the node without SSL:
curl -v -L http://<agentIP>:61621/v1/connection-status?
A simple example command to request the status of the node with SSL:
curl -v -k --cert /var/lib/opscenter/ssl/opscenter.pem --key /var/lib/opscenter/ssl/opscenter.key -L https://<agentIP>:61621/v1/connection-status?
Workaround
If update is not possible, DataStax recommends stopping the service in OpsCenter and regularly running manual repair on all the nodes.
On the first run:
Perform maintenance every week or so, depending on the gc_grace_seconds:
Solution
The Repair Service in OpsCenter is unable to proceed when this issue occurs. This issue has been investigated by development under reference OPSC-13245 and is fixed in OpsCenter 6.7.5 as per the Release Notes Repair Service section - note that both the agents and OpsCenter need to be updated to leverage the feature change
Troubleshooting on a different HTTP call
If the issue occurs on a different HTTP call, consider raising a ticket for Support and be sure to provide both the agent and opscenter logs.