DataStax Help Center

How does Spark HA work in DSE 4.7

Summary and Terminology

This technote is designed to help you understand how the SparkMaster automatic High Availabity(HA) functionality works within DataStax Enterprise 4.7 (DSE 4.7).

DSE's integration of Spark uses the same node as Hadoop's Jobtracker for the SparkMaster. This means that when using Spark with DSE, with or without Hadoop, the JobTracker (JT) is synonymous with the SparkMaster and the Reserve JobTracker is synonymous with the Reserve SparkMaster.

The commands and notation(JT/RT/PT) used for defining and identifying the SparkMaster via dsetool are the same for DSE's Implementation of both Spark and Hadoop. This is described further in the Spark Introduction section of the DSE Documentation.

Leader Election

DSE 4.7 uses leader election for automatically determining the SparkMaster. This means you don't actually have to explicitly set the JobTracker or Reserve JobTracker as SparkMaster because a node in your spark ring will automatically be defined as a master by the leader election mechanism.

The node that has been elected can be determined by one of the following five methods:

cql: select address from dse_system.real_leaders;

 

mbean: com.datastax.bdp.plugin.SparkPlugin.MasterAddress

 

dsetool ring ( workload column explicitly states (JT) )

 

dsetool listjt ( Analytics-ACTUAL )

 

dsetool sparkmaster

 

If no Reserve SparkMaster has been defined ( dsetool setrjt ) and the existing master goes offline, the leader election mechanism will automatically assign a new SparkMaster. This can be identified via the same methods as above.

Therefore there is no strict requirement to specify a primary and reserve JobTracker to act as SparkMaster.

There is a caveat to this approach however:

This automatic election will only work if a LOCAL QUORUM of nodes in ring are UP. So if you have 5 nodes in your DC, and 3 are down then automatic election will not take place. So if one of the DOWN nodes was your master, no new master would be elected when a LOCAL QUORUM of nodes is unavailable.

To overcome this (within limits) , you can still explicitly define a SparkMaster (JT) and Reserve SparkMaster (RT) using the dsetool setjt and dsetool setrjt.

By doing this you are effectively bypassing the need for a LOCAL QUORUM of UP nodes for failover to occur, assuming that one or both of the SparkMaster or Reserve SparkMaster nodes remains online.


If you define a primary and reserve SparkMaster node, you can identify it like so:

cql: select candidates from dse_system.registered_leaders;
candidates ------------------------------------------------------------ {'PRIMARY': '172.31.17.246', 'SECONDARY': '172.31.28.113'}

 

dsetool ring
Note: Ownership information does not include topology, please specify a keyspace. Address DC Rack Workload Status State Load Owns Token 172.31.22.168 Analytics rack1 Analytics(TT) Up Normal 82.97 KB 20.00% -9223372036854775808 172.31.28.108 Analytics rack1 Analytics(TT) Up Normal 133.3 KB 20.00% -5534023222112865485 172.31.17.246 Analytics rack1 Analytics(JT) Up Normal 147.52 KB 20.00% -1844674407370955162 172.31.28.113 Analytics rack1 Analytics(RT) Up Normal 97.39 KB 20.00% 1844674407370955161 172.31.19.66 Analytics rack1 Analytics(TT) Up Normal 143.23 KB 20.00% 5534023222112865484 where: (JT) is the active Job Tracker (SparkMaster) (PT) is an inactive primary tracker, when the primary tracker is down (RT) is an inactive reserve tracker, when the reserve tracker is up while there is a primary tracker

 

dsetool listjt
DC JobTracker Analytics-ACTUAL 172.31.17.246 Analytics-SECONDARY 172.31.28.113

In the above output the Analytics-ACTUAL represents the current SparkMaster ( which is our defined primary as per the dse_system.registered_leaders output)

If the primary and reserve masters both go down, then as a LOCAL QUORUM of nodes is UP , the leader election process will automatically elect a new primary.

For example:

PRIMARY goes down. We switch to the secondary. The secondary is now listed as Analytics-ACTUAL

dsetool listjt
DC                JobTracker
Analytics-PRIMARY 172.31.17.246
Analytics-ACTUAL  172.31.28.113

dsetool ring
Note: Ownership information does not include topology, please specify a keyspace.
Address DC Rack Workload Status State Load Owns Token
172.31.22.168 Analytics rack1 Analytics(TT) Up   Normal 82.97  KB 20.00% -9223372036854775808 
172.31.28.108 Analytics rack1 Analytics(TT) Up   Normal 133.3  KB 20.00% -5534023222112865485 
172.31.17.246 Analytics rack1 Analytics(PT) Down Down   147.52 KB 20.00% -1844674407370955162 
172.31.28.113 Analytics rack1 Analytics(JT) Up   Normal 97.39  KB 20.00%  1844674407370955161 
172.31.19.66  Analytics rack1 Analytics(TT) Up   Normal 143.23 KB 20.00%  5534023222112865484

*NOTE* the (PT) designation for the down primary.

CQL confirms the current leader ( or master )

cqlsh> select address from dse_system.real_leaders;

address
---------------
172.31.28.113

SECONDARY goes down, primary remains down. As long as a LOCAL QUORUM of nodes is UP, we automatically elect a new master.

dsetool listjt
DC                  JobTracker
Analytics-PRIMARY   172.31.17.246
Analytics-ACTUAL    172.31.19.66
Analytics-SECONDARY 172.31.28.113

*NOTE* we now have a new ACTUAL master because both the defined primary
JT and reserve JT are down.

The dsetool ring ouput looks like so:

dsetool ring
Note: Ownership information does not include topology, please specify a keyspace.
Address DC Rack Workload Status State Load Owns Token
172.31.22.168 Analytics rack1 Analytics(TT) Up Normal 82.97 KB 20.00% -9223372036854775808 
172.31.28.108 Analytics rack1 Analytics(TT) Up Normal 133.3 KB 20.00% -5534023222112865485 
172.31.17.246 Analytics rack1 Analytics(PT) Down Down 147.52 KB 20.00% -1844674407370955162 
172.31.28.113 Analytics rack1 Analytics(RT) Down Down 97.39 KB 20.00% 1844674407370955161 
172.31.19.66  Analytics rack1 Analytics(JT) Up Normal 143.23 KB 20.00% 5534023222112865484 


*NOTE* the ring output now shows equivalent info, both (PT) and (RT) are down but we have a newly elected (JT)

CQL confirms the current leader ( or master )

cqlsh> select address from dse_system.real_leaders;

address
--------------
172.31.19.66

If at this point the newly elected ACTUAL JT goes down as well, leaving us with 3 nodes out of 5 down, no new JT would be elected until one of the 3 down nodes it brought back online.

PRIMARY comes back up. It becomes the master again.

dsetool listjt
DC                  JobTracker
Analytics-ACTUAL    172.31.17.246
Analytics-SECONDARY 172.31.28.113

dsetool ring
Note: Ownership information does not include topology, please specify a keyspace.
Address DC Rack Workload Status State Load Owns Token
172.31.22.168 Analytics rack1 Analytics(TT) Up Normal 82.97 KB 20.00% -9223372036854775808 
172.31.28.108 Analytics rack1 Analytics(TT) Up Normal 133.3 KB 20.00% -5534023222112865485 
172.31.17.246 Analytics rack1 Analytics(JT) Up Normal 175.5 KB 20.00% -1844674407370955162 
172.31.28.113 Analytics rack1 Analytics(RT) Down Down 97.39 KB 20.00%  1844674407370955161 
172.31.19.66  Analytics rack1 Analytics(TT) Up Normal 148.2 KB 20.00%  5534023222112865484 

CQL confirms the current leader ( or master )

cqlsh> select address from dse_system.real_leaders;

address
---------------
172.31.17.246

Order of Precedence in leader election:

The Primary and Reserve nodes take precedence in determining the spark master but you are not limited to a primary or secondary as a third node, or potentially a fourth or fifth node etc etc could be elected master if the primary and reserve nodes are unavailable, as long as you have a LOCAL QUORUM of nodes UP in the ring.

What should I use to programmatically determine the current Spark Master ?

From a programmatic viewpoint you are better served using the cql to identify the SparkMaster as the mbean requires a LOCAL QUORUM of UP nodes to return a value. So if a LOCAL QUORUM of nodes is down but your current JT remains online, the cql select should continue to give you the correct value.

Reference Material:

Managing the Job Tracker using dsetool commands

About the highly available Spark Master

 

Was this article helpful?
1 out of 1 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk