After an upgrade from 5.0 to 5.0.1 OpsCenter was no longer showing metrics in the dashboard area of the web UI
Metrics were not showing in the dashboard and also the following errors were seen in the agent.log files:
WARN [Thread-1] 2015-08-04 12:48:16,223 Cassandra operation queue is full, discarding cassandra operation WARN [Thread-1] 2015-08-04 12:48:16,223 85413 operations dropped so far.
and (on some older releases)
WARN [rollup-snapshot] 2014-10-16 08:52:06,116 Thrift operation queue is full, discarding thrift operation
The agent collects metrics from the local node and then writes this into the OpsCenter keyspace either on the local node or another cluster. As more metrics are collected by the agent, the queue size may need to increase to buffer this data before it is written back into the cluster.
The default settings are:
thrift_max_conns = 1 if the OpsCenter data is stored on the local node, 3 if its stored on another cluster
async_pool_size = 1 if the OpsCenter data is stored on the local node, 3 if its stored on another cluster
async_queue_size = 5000
Overview of OpsCenter architecture
The following settings can be changed in the agent address.yaml file. The agent process will need to be restarted for these settings to take effect.
This is a known issue outlined in the following internal jira
OPSC-4618 - Metrics queue overflow when C* is overloaded
This issue is resolved in OpsCenter 5.2 and later releases
The following documentation link also discusses how to control data collected from OpsCenter. This is something to consider if you have a large amount of keyspaces.
Storing OpsCenter data on a different cluster