Symptoms
- Some gaps are visible in the OpsCenter graphs for some given metrics such as "Total Disk Used" or "Load" and some data points appear to be missing. As an example:
Applies to
- DataStax Enterprise 6.7 and prior versions
- DataStax OpsCenter 6.7 and prior versions
Cause
The issue can occur if the metrics collector thread is unable to cope with the gathering of data operated by the agents.
The agent log reflects it as follow:WARN [jmx-general-metrics] 2019-09-07 10:49:04,906 Too many outstanding requests 270 pausing collectors
Solution
Warning: setting a higher value can lead to some overhead and extra load on the nodes and performance testing should be considered. As a general recommendation, DataStax also recommends using a dedicated cluster for OpsCenter storage, which can be done at no additional cost:
See the Datastax FAQ
The rollup_rate
number depends on the number of tables and other metrics collected. For further information, refer to DataStax Agent configuration documentation
For example, based on the formula in the documentation above, the default rollup_rate
in the agent configuration file address.yaml
allows to monitor for just under 300 tables per node (200 metrics per second * 60 = 12 000 metrics per minute, divided by 40 metrics collected per table -> 300 tables).
So for a cluster containing 600 tables, the rollup_rate
in the address.yaml
should be doubled to 400.
After applying the change and restarting the agents, confirm the warning mentioned in the cause section doesn’t appear anymore in the log and that new data doesn’t show gaps in the graphs.