DataStax Help Center

OpsCenter repair service fails

Summary

On the OpsCenter console, repair service is shown as FAILED
However manual repairs are successful.

Symptoms

An error like this is shown on the repair_service.log

Repair service needs to run 2 range repairs in parallel; maximum is 1. The repair cannot complete without adversely affecting the cluster. Data left: 862620416738.87, time left: 763080.00, required throughput: 1130445.58, actual throughput: 1087764.45. More information on tuning the repair service can be found here:http://www.datastax.com/documentation/opscenter/help/repair_services_advanced.html

Cause

By default, single repairs have a pre-defined timeout of 3600 secs
Whenever a repair segment takes longer than 1 hour, OPSC repair service mark it as if it failed
However the repair session on system log may still be in progress - normally

Workaround

Set the following OpsCenter section/parameters on /etc/opscenter/opscenterd.conf
This will allow for a 4 hour window on single repairs and a max of 4 concurrent repair sessions 

[repair_service]
max_parallel_repairs = 4 
single_repair_timeout = 14400

Solution

Support has requested a formal revision of this mechanism, in the meantime the workaround should suffice to get a functional repair service operation as the default values might not work in instances where repair sessions I/O and other factors do not allow it to complete within an hour.

Was this article helpful?
1 out of 1 found this helpful
Have more questions? Submit a request

Comments

  • Avatar
    José Martínez Poblete

    If setting a large number of max_parallel_repairs/single_repair_timeout along with a disabled compaction/streaming throttle on cassandra.yaml does not work due to the amount of tables/data on the cluster, large number of pending compactions, disk I/O, etc. then customer might be better off running nodetool repair around the cluster in a staggered fashion - perhaps using crontabs - to complete within the smallest gc_grace_period configured on their schema

  • Avatar
    José Martínez Poblete

    In some cases you might have to also set this parameter as well

    min_throughput = <adjust as per repair\_center.log>

    To deal with current OPSC limitations on how throughput is calculated

Powered by Zendesk