Summary
This article discusses an issue with starting AlwaysOn SQL service.
Applies to
- DataStax Enterprise 6.7.x
- DataStax Enterprise 6.0.x
Symptom
By default, AlwaysOn SQL is disabled. Attempts to start the AlwaysOn SQL service are unsuccessful. The command alwayson-sql start
shows "Starting" with progressive "dots" but never completes. For example:
$ dse client-tool alwayson-sql start Starting . . . . . . . . . . . . . . . . . . . . . . . . .
The Spark Master web UI shows the service in WAITING
state. For example:
The DSE logs do not indicate any progress either. Here is an example extract of debug.log
from a DSE 6.7.2 node:
INFO [ALWAYSON-SQL] 2019-03-28 15:41:13,087 ALWAYSON-SQL AlwaysOnSqlRunner.scala:196 - Starting AlwaysOn SQL INFO [dispatcher-event-loop-4] 2019-03-28 15:41:20,304 Logging.scala:54 - Registering app AlwaysOnSQL__Analytics__10.101.32.62 INFO [dispatcher-event-loop-4] 2019-03-28 15:41:20,305 Logging.scala:54 - Registered app AlwaysOnSQL__Analytics__10.101.32.62 with ID app-20190328154120-0001 INFO [dispatcher-event-loop-4] 2019-03-28 15:41:20,305 CassandraPersistenceEngine.scala:40 - Adding new object app_app-20190328154120-0001 of type org.apache.spark.deploy.master.ApplicationInfo INFO [dse-rm-failure-detector-notifications-1] 2019-03-28 15:41:20,306 Logging.scala:54 - Application DseAppEndpointId(app-20190328154120-0001,Some(/10.101.32.62:35385),0 seconds) has been recognized as alive. Acknowledging. INFO [dispatcher-event-loop-4] 2019-03-28 15:41:20,312 Logging.scala:54 - Application has been re-registered: app-20190328154120-0001
Eventually, the service fails to start:
DEBUG [ALWAYSON-SQL] 2019-03-28 15:43:16,525 ALWAYSON-SQL ServiceRunner.java:219 - Service ALWAYSON-SQL FAILED changing state: NOT_STARTED -> SHUTTING_DOWN DEBUG [ALWAYSON-SQL] 2019-03-28 15:43:16,525 ALWAYSON-SQL ServiceRunner.java:219 - Service ALWAYSON-SQL FAILED changing state: STARTING -> SHUTTING_DOWN DEBUG [ALWAYSON-SQL] 2019-03-28 15:43:16,525 ALWAYSON-SQL ServiceRunner.java:215 - Service ALWAYSON-SQL changing state: RUNNING -> SHUTTING_DOWN DEBUG [ALWAYSON-SQL] 2019-03-28 15:43:16,525 ALWAYSON-SQL ServiceRunner.java:205 - Service ALWAYSON-SQL set state: RUNNING -> SHUTTING_DOWN DEBUG [ALWAYSON-SQL] 2019-03-28 15:43:16,526 ALWAYSON-SQL ServiceRunner.java:215 - Service ALWAYSON-SQL changing state: SHUTTING_DOWN -> NOT_STARTED DEBUG [ALWAYSON-SQL] 2019-03-28 15:43:16,526 ALWAYSON-SQL ServiceRunner.java:205 - Service ALWAYSON-SQL set state: SHUTTING_DOWN -> NOT_STARTED
Cause
In this instance, the AlwaysOn SQL service is in a WAITING state because it's waiting for resources to become available. The cluster in this example has only 6 cores available:
However, the service has been allocated more cores (7) than is available (6) in conf/spark-alwayson-sql.conf
:
spark.executor.memory 1G spark.executor.cores 7
Since there are insufficient resources available, the service never gets allocated as requested and does not succeed in starting.
Solution
Verify the available resources when configuring the AlwaysOn SQL service in the same way that resource allocations are verified for jobs before being submitted.
If the number of Spark worker cores is not explicitly set, the DSE Resource Manager sets the number of system cores available to Spark workers on the node. By default, it is set to 70% of available system cores in dse.yaml
:
resource_manager_options: worker_options: cores_total: 0.7
In the example above, setting spark.executor.cores
to a lower value, like 2, is more appropriate and would allow the service to start successfully. Similarly if the AlwaysOn SQL service is not explicitly allocated cores in conf/spark-alwayson-sql.conf
, the DSE Resource Manager allocates 25% by default to the service workpool:
workpools: - name: alwayson_sql cores: 0.25