Symptoms
- While browsing OpsCenter UI for information related to some cluster, the dashboard appear to have incomplete information, with the Services frame on the top right of the dashboard or appearing incomplete. ie: “Error retrieving repair data”, “Error retrieving Backup data”, “Error retrieving Performance data”, "Error retrieving Best Practices data"
- Looking at the HTTP debugger on Chrome or Firefox specifically for the HTTP “services” call , the response data is large (ie: > 1MB)
- Reviewing the http log of OpsCenter, some API calls show an HTTP 400 for some specific best practices checks:
user@ip:~$ grep '" 400 ' /var/log/opscenter/http.*log http.2.log:2020-05-20 10:50:19,245 [opscenterd] INFO: "<IP> GET /dse6011/bestpractice/results?rules=check-queries-prepared-statements HTTP/1.1" 400 158 "http://<IP>:8888/opscenter/index.html" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36" (MainThread)
Applies to
- DataStax Enterprise (DSE) - All versions: 5.1, 6.0, 6.7, 6.8
- OpsCenter - All versions: 6.5, 6.7, 6.8
Cause
This issue can be a by-product of anti-patterns on DSE with problematic queries that are captured as part of the Best Practices checks under OpsCenter Services > Best Practices Service > details link and the “Performance Service - Slow Queries Advisor” checks.
If large queries are detected during the checks, they will be recorded as part of the Best Practices that failed and can lead to unresponsiveness in the UI or some specific frames
Important information
Under the “Performance Service - Slow Queries Advisor”, check if some specific rules failed. In the scenario where this was encountered, problematic queries were logged in the “Use prepared statements” and “Minimize keys in IN clause” as a result of usage of non-prepared statement with a very large number of parameters in the IN clause.
These issues when flagged by OpsCenter should be addressed at the soonest and not ignored. Please take careful note of the problems OpsCenter reference and address them with the application team that leverage DSE.
The scope of this KBA however is to solve a side-effect of the problem above which leads to unwanted behaviour in OpsCenter
Workaround
1- Increase the value of the default_api_timeout
of OpsCenter UI as per OpsCenter configuration properties as a first step, doubling the default value from 10 to 20. OpsCenter will need to be restarted afterward.
The above may be sufficient to fix the problem. If sufficient, skip steps 2 to 8
2- In OpsCenter, go to “your_cluster_name”, Services (the cog wheel icon) and click on “Details” of Best Practice Services.
3- Go to “Performance Service - Slow Queries Advisor”, and click on the “Failed” rules check
4- Click on the “Logs” button at the top of the window. It may or may not respond
5- Go to the HTTP log of OpsCenter as described in the Symptoms section and run grep '" 400 ' /var/log/opscenter/http.*log
. This is used to highlight the problematic checks. In the example output in the Symptoms section, we can see that the check-queries-prepared-statements
failed. In the scenario where this was noted, the check-huge-in-clause
was also impacted.
6- Disable the rule that fails in OpsCenter (again, please refer to the Import Information section above, these should not be ignored)
WARNING: the following step will erase the full history of some rules checks. Please proceed carefully and make sure the troubleshooting above has been followed before continuing.
DataStax cannot be held responsible for data loss associated to mis-use of the following section. In case of doubt, please contact our support team.
7- In CQLSH, delete the history of the failing rules in the Best Practices. In this example, check-huge-in-clause
and check-queries-prepared-statements
. OpsCenter keyspace name can vary depending on the value set for [storage_cassandra] keyspace
value. ie:
DELETE FROM "<OpsCenter_keyspace>".bestpractice_results where key = 'check-queries-prepared-statements';" DELETE FROM "<OpsCenter_keyspace>".bestpractice_results where key = 'check-queries-huge-in-clause';"
8- If the issue occur on the API call of slow-queries - which can be checked by running a direct call on http://<opscenterIP>:8888/<clustername>/perf/slow-queries
and checking the size of the response (using network monitoring via F12), consider disabling it via dsetool perf cqlslowlog disable
. See the dsetool perf doc for further information
9- Restart OpsCenter
Solution
This issue has been recorded in DataStax under reference OPSC-16618. Please register to this technote to get new updates or contact support for more information