DataStax Help Center

New messages about the use of batches in Apache Cassandra 2.1

Summary

This document provides more details about new messages about batches that are now logged in Apache Cassandra 2.1 and also describes the correct use of batches.

Symptoms

In DSE 4.7 (Cassandra 2.1) a new message is reported in the log if an incorrect use of a batch is detected:

WARN [Thrift:16003] 2015-07-15 12:40:46,528 NoSpamLogger.java:94 - Unlogged batch covering 12 partitions detected against table [tablename]. You should use a logged batch for atomicity, or asynchronous writes for performance.

Cause

A batch that inserts data across many partition keys has been detected.

As the message suggests, batches should only be used when atomicity (consistency across tables) is needed. In such cases very small logged batches are fine.

In general, batches should always be avoided when loading data, They do not provide any performance advantage and can very easily overload the coordinator. Even the use of unlogged batches is not recommended, especially when the data in the batch does not relate to the same partition key.

Solution

The use of asynchronous writes (using for example a prepared insert statement through one of the latest 'TokenAware' drivers) is recommended when inserting data into the cluster and should always be preferred against batches when consistency across tables is not required.

You can read more info about the correct use of batches in Cassandra at the following pages:

http://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html

https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e

 

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

  • Avatar
    Alvin Lam

    RE: batches should only be used when atomicity (consistency across tables) is needed

    Aren't there some small windows of vulnerabilities where a batch is not 100% atomic?

    For example, a batch series of inserts are successful if a hint was stored on the coordinator. If that coordinator node were to fail before the hint was delivered, then the mutation would not be sent leaving the data inconsistent.

Powered by Zendesk