Summary
This article illustrates the need for precision when filtering on a timestamp column.
Symptoms
Consider the following table:
CREATE TABLE test.timestamp_table (
col1 int,
col2 timestamp,
PRIMARY KEY (col1, col2)
) WITH CLUSTERING ORDER BY (col2 ASC);
The table contains the following data:
col1 | col2
------+---------------------------------
5 | 2018-06-05 12:32:44.000000+0000
10 | 2018-06-10 12:32:44.000000+0000
13 | 2018-06-13 12:32:44.000000+0000
15 | 2018-06-14 13:22:07.636000+0000
1 | 2018-06-01 12:32:44.000000+0000
(5 rows)
Attempts to read the data do not return expected results:
cqlsh:test> SELECT * FROM timestamp_table WHERE col1=5 AND col2='2018-06-05 12:32:44';
col1 | col2
------+------
(0 rows)
cqlsh:test> SELECT * FROM timestamp_table WHERE col1=15 AND col2='2018-06-14 13:22:07+0000';
col1 | col2
------+------
(0 rows)
Cause
The timestamp
data type is encoded as the number of milliseconds since epoch, i.e. since January 1 1970 00:00:00 GMT. This is an important distinction when filtering data on a timestamp column as precision is required.
This is highlighted by the fact that when the clients send the request, it gets transformed server-side using the server’s locale. Without specifying the 4-digit timezone (RFC-822 format), the server’s local timezone gets used for the query.
In addition if the data has a higher precision than the filter (e.g. filtering to seconds when the data has millisecond precision), the query fails to get a match and does not return the expected data.
Solution
Specifying the timezone in the predicate is recommended for best practice. From the example above, the correct way to query the data is by adding the equivalent 4-digit timezone (e.g. +0000
for UTC):
cqlsh:test> SELECT * FROM timestamp_table WHERE col1=5 AND col2='2018-06-05 12:32:44+0000';
col1 | col2
------+---------------------------------
5 | 2018-06-05 12:32:44.000000+0000
(1 rows)
It is also recommended to query the data with the highest precision. For example, the predicate should resolve to the millisecond to match the data (e.g. instead of just 13:22:07
, specify 13:22:07.636
):
cqlsh:test> SELECT * FROM timestamp_table WHERE col1=15 AND col2='2018-06-14 13:22:07.636+0000';
col1 | col2
------+---------------------------------
15 | 2018-06-14 13:22:07.636000+0000
(1 rows)
See also
JIRA - CASSANDRA-10428 cqlsh: Include sub-second precision in timestamps by default