Summary
This article discusses a scenario when CQL Solr queries can return null values unexpectedly.
Applies to
- DataStax Enterprise 5.1
- DataStax Enterprise 6.0
- DataStax Enterprise 6.7
Symptoms
When running a Solr CQL query, a non-indexed field returns 'null' even though a value exists for the field. Consider the following table:
CREATE TABLE customer.orders (
item_id UUID,
item text,
lastname text,
firstname text,
solr_query text,
order_type text,
PRIMARY KEY (item_id, lastname));
For this example, we'll create a search index (Solr core) for the table using the following custom schema.xml file:
$ cat customer_orders_schema.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.TrieDateField" name="TrieDateField"/>
<fieldType class="org.apache.solr.schema.UUIDField" name="UUIDField"/>
<fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
<fieldType class="org.apache.solr.schema.BoolField" name="BoolField"/>
<fieldType class="org.apache.solr.schema.TextField" name="TextField">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</types>
<fields>
<field indexed="true" multiValued="false" docValues="true" name="item_id" stored="true" type="UUIDField"/>
<field indexed="true" multiValued="false" docValues="true" name="lastname" stored="true" type="StrField"/>
<field indexed="true" multiValued="false" docValues="true" name="firstname" stored="true" type="StrField"/>
</fields>
<uniqueKey>(item_id,lastname)</uniqueKey>
</schema>
Next, we'll use dsetool to create the Solr core:
$ dsetool create_core customer.orders reindex=true schema=./customer_orders_schema.xml solrconfig=./customer_orders_config.xml
Now we'll insert data into the table:
insert into customer.orders (item_id, item, lastname, firstname, order_type) values (a6f94417-e27b-444f-9d0c-dccb588e421f, 'grill', 'Smith', 'Joe', 'app');
Running the following CQL Solr query returns 'null' for the 'order_type' field instead of the inserted value of 'app':
cqlsh> SELECT item_id, lastname, firstname, order_type FROM customer.orders WHERE item_id = a6f94417-e27b-444f-9d0c-dccb588e421f and solr_query='{"q": "*:*","fq":"(firstname:Joe)"}';
item_id | lastname | firstname | order_type
--------------------------------------+----------+-----------+------------
a6f94417-e27b-444f-9d0c-dccb588e421f | Smith | Joe | null
(1 rows)
Cause
Solr single-pass CQL queries and HTTP queries use the custom Lucene stored fields codec which use the Solr schema to discover which fields to return.
There are a number of ways to generate a single-pass query instead of the standard two-pass query, without realizing it is happening:
- Specify the distrib.singlePass Boolean parameter in the query expression:
SELECT * FROM keyspace.table WHERE solr_query = '{"q" : "*:*", "distrib.singlePass" : true}'
- Use a token() or partition key restriction in the WHERE clause:
SELECT * FROM keyspace.table WHERE token(id) >= 3074457345618258604 AND solr_query = '{"q" : "*:*"}'
- Execute a COUNT query:
SELECT count(*) FROM keyspace.table WHERE solr_query = '{"q" : "*:*"}'
- Specify only the primary key elements in the SELECT clause, using the table from the example above, we could have:
SELECT item_id FROM customer.orders WHERE solr_query='{"q": "*:*","fq":"(item_id:a6f94417-e27b-444f-9d0c-dccb588e421f)"}';
Solution
If you end up with a Solr single-pass CQL query through any of the above methods, the custom Lucene stored fields codec will be used.
That is, they use our custom Lucene stored fields codec, rather than just reading entire CQL rows as non-single pass CQL Solr queries would because the codec only knows about fields, indexed or not, that exist in the schema.xml.
Using the example above, we could add the following line to the 'customer_orders_schema.xml' file:
<field indexed="false" multiValued="false" docValues="false" name="order_type" stored="true" type="StrField"/>
After changing the file, we would need to reload the core:
dsetool reload_core customer.orders reindex=false schema=./customer_orders_schema.xml solrconfig=./customer_orders_config.xml
You'll notice that the same query returning 'null' for the 'order_type' field above, will now return the stored value of 'app' correctly:
cqlsh> SELECT item_id, lastname, firstname, order_type FROM customer.orders WHERE item_id = a6f94417-e27b-444f-9d0c-dccb588e421f and solr_query='{"q": "*:*","fq":"(firstname:Joe)"}';
item_id | lastname | firstname | order_type
--------------------------------------+----------+-----------+------------
a6f94417-e27b-444f-9d0c-dccb588e421f | Smith | Joe | app
(1 rows)
We could also re-write the query to avoid generating a single-pass query to achieve the same results:
cqlsh> SELECT item_id, lastname, firstname, order_type FROM customer.orders WHERE solr_query='{"q": "*:*","fq":"(item_id:a6f94417-e27b-444f-9d0c-dccb588e421f)"}';
item_id | lastname | firstname | order_type
--------------------------------------+----------+-----------+------------
a6f94417-e27b-444f-9d0c-dccb588e421f | Smith | Joe | app
(1 rows)
*Note that if you use the generateResources=true option when creating the Solr core, it will include all table fields in the schema.xml file by default, for example:
dsetool create_core keyspace.table generateResources=true
In DSE 5.1 and earlier, the stored="true" property is required in the schema.xml file for the field to be returnable in a query.
In DSE 6.0 and later, the field must be present but with indexed=false in the schema.xml.