Overview
This article discusses how high usage of lightweight transactions (LWT) might contribute to cluster load and how to determine which application might be using LWTs.
Applies to
- DataStax Enterprise 6.7
- DataStax Enterprise 6.0
- DataStax Enterprise 5.1
- DataStax Enterprise 5.0
How do I determine which application is using LWTs?
If higher than normal latencies are being seen in the cluster, but I do not see a higher latencies or reads and writes on any given user table, how do I determine which application might be causing a problem when I suspect there is an increase in LWTs?
Answer
LWTs are a compare-and-set (CAS) operation in the database used to achieve serial consistency. In some cases this transaction is necessary, but there is a trade off with performance. For more details, see this DataStax Academy blog post on LWTs. The paxos protocol requires several stages which requires planning to prevent additional loads on the cluster.
All LWTs are logged in the system.paxos
table. This table has an id
column that correlates directly to the table cf_id
column of the table system_schema.tables
. Reviewing this table is ideally suited to DSE Analytics workloads and does not require running Spark in the cluster you are investigating. You can instead run spark on a standalone test node, point it at your cluster, and then run the analysis query.
Example
This example shows two ways or running using the spark shell to leverage the power of Spark for testing. If you find that you need to run Spark regularly, you can use this example method as a template for building into a Spark application you run with the dse spark submit
command.
First, set up DSE spark shell on your test node and connect to the cluster:
scala> spark.conf.set("spark.cassandra.connection.host","10.101.32.48")
Then, run these commands:
scala> val paxos = spark.read.cassandraFormat("paxos", "system").load() paxos: org.apache.spark.sql.DataFrame = [row_key: binary, cf_id: string ... 7 more fields] scala> val tables = spark.read.cassandraFormat("tables", "system_schema").load() tables: org.apache.spark.sql.DataFrame = [keyspace_name: string, table_name: string ... 18 more fields] scala> val results = paxos.join(tables).where($"cf_id" === $"id") results: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [row_key: binary, cf_id: string ... 27 more fields] scala> results.select("keyspace_name","table_name","id").groupBy("keyspace_name","table_name").count().show +-------------+------------+-----+ |keyspace_name| table_name|count| +-------------+------------+-----+ | dsefs|transactions| 42| | dsefs| locks| 26| | dse_leases| leases| 1| | mc| numbers| 2| +-------------+------------+-----+
Alternatively, perform both steps in one command line:
scala> spark.read.cassandraFormat("paxos", "system").load() .join(spark.read.cassandraFormat("tables", "system_schema").load()) .where($"cf_id" === $"id").select("keyspace_name","table_name","id") .groupBy("keyspace_name","table_name").count().show