DataStax Help Center

Setting Solr max_solr_concurrency_per_core via JMX causes deadlock with index flushing

Summary

This article discusses a known issue where Solr indexing stops for a period of time after setting the IndexPool concurrency via JMX.

Symptoms

After changing the IndexPool concurrency via JMX, the following message is returned after the flush_max_time_per_core timeout is reached:

Exception caught: java.lang.RuntimeException: Timeout while waiting for workers when flushing pool mysearch.content_items Index; current timeout is 300000 millis, consider increasing it, or reducing load on the node.
Failure to flush may cause excessive growth of Cassandra commit log.

Similar entry in the system.log:

WARN  [RMI TCP Connection(178)-10.231.137.188] 2015-09-28 19:35:51,453  WorkPool.java:527 - Timeout while waiting for workers when flushing pool mysearch.content_items Index; current timeout is 300000 millis, consider increasing it, or reducing load on the node.
Failure to flush may cause excessive growth of Cassandra commit log.

and also:

INFO  [linkcurrent_search.content_items Index WorkPool work thread-3] 2015-09-28 19:35:51,454  AbstractMetrics.java:110 - Cannot record QUEUE latency of 15 minutes because higher than 10 minutes.

Other symptoms include:

  • Lucene threads stop progressing with a high queue time
  • CPU is idle for a period of time until timeout is reached

Cause

The issue is a result of a bug in DataStax Enterprise 4.8.0 relating to the way the WorkPool flushes indexes resulting in a deadlock (no changes or activity for outstanding tasks) when the Search concurrency property is changed via JMX, i.e. while there are outstanding tasks in the queue (defect ID DSP-6772).

Workaround

If using DSE Search version 4.8.0, modify the max_solr_concurrency_per_core property in dse.yaml on each Search node and perform a rolling restart for the changes to take effect.

By making the changes in dse.yaml and restarting DSE, this guarantees that there are no outstanding Solr tasks in the queue which could deadlock and therefore does not trigger the bug.

Solution

DSP-6772 was resolved in DataStax Enterprise 4.8.1. Upgrade to this version (or newer) to obtain the fix.

See also

DataStax doc - DataStax Enterprise 4.8 release notes

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk