Summary
After upgrading DSE Search from DSE 5.0 or earlier to DSE 5.1 or later, indexing errors might occur if you have previously used analyzed text fields with docValues. These indexing errors can cause invalid indexes on nodes, thereby causing incorrect query results. In some cases, unwanted performance issues can occur, particularly if there is a large number of indexing errors.
Important: docValues on analyzed text fields will always produce unreliable results even in releases earlier than DSE 5.1. DataStax does not support using the TextField
Solr field type with solr.KeywordTokenizer
and solr.LowerCaseFilterFactory
to achieve single-token, case-insensitive indexing on a CQL text field if you are using functions like grouping, sorting or faceting.
Applies to
- DSE 5.1.0 to 5.1.13
- DSE 6.0.0 to 6.0.7
- DSE 6.7.0 to 6.7.2
Symptoms
You may observe that a previously working query in DSE 5.0 like this example:
$ curl "http://10.101.33.246:8983/solr/wiki.solr/select/?q=id:999888000&facet=true&facet.field=title&facet.limit=10&wt=json&indent=true" {
"responseHeader":{
"status":0,
"QTime":21},
"response":{"numFound":1,"start":0,"maxScore":8.079184,"docs":[
{
"id":"999888000",
"date":"01-JUN-2010 16:48:42.000",
"title":"MC testing solr"}]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"title":[
"MC testing solr",1,
"(You're Puttin') A Rush on Me",0,
"1,2-Dithietane",0,
"1,3-Dithietane",0,
"1,4,2-Dithiazole",0,
"1700s in Scotland",0,
"1906 (film)",0,
"1921 (film)",0,
"1944 Sun Bowl",0,
"1950 Orange Bowl",0]},
"facet_dates":{},
"facet_ranges":{},
"facet_intervals":{}}}
After upgrading to DSE 5.1, this same query returns an empty result for faceting. For example :
$ curl "http://10.101.34.15:8983/solr/wiki.solr/select/?q=id:999888000&facet=true&facet.field=title&wt=json&indent=true" {
"responseHeader":{
"status":0,
"QTime":379},
"response":{"numFound":1,"start":0,"maxScore":8.079604,"docs":[
{
"id":"999888000",
"date":"01-JUN-2010 16:48:42.000",
"title":"MC testing solr"}]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"title":[]},
"facet_ranges":{},
"facet_intervals":{},
"facet_heatmaps":{}}}
You may also observe nodes using higher than normal CPU and errors in the /var/log/cassandra/system.log
- these errors might also start to occur on a reindex:
ERROR [wiki.solr Index WorkPool work thread-0] 2019-07-29 15:18:51,241 Cql3SolrSecondaryIndex.java:772 - [wiki.solr]: Exception writing document id 23743719 to the index; possible analysis error: cannot change DocValues type from SORTED to SORTED_SET for field "title"
org.apache.solr.common.SolrException: Exception writing document id 23743719 to the index; possible analysis error: cannot change DocValues type from SORTED to SORTED_SET for field "title"
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:206)
at com.datastax.bdp.search.solr.handler.update.CassandraDirectUpdateHandler.indexDoc(CassandraDirectUpdateHandler.java:709)
at com.datastax.bdp.search.solr.handler.update.CassandraDirectUpdateHandler.addDoc(CassandraDirectUpdateHandler.java:150)
at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex.doIndex(AbstractSolrSecondaryIndex.java:1285)
at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex.doUpdate(AbstractSolrSecondaryIndex.java:1007)
at com.datastax.bdp.search.solr.Cql3SolrSecondaryIndex$2.run(Cql3SolrSecondaryIndex.java:761)
at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex$2.run(AbstractSolrSecondaryIndex.java:943)
at com.datastax.bdp.concurrent.Worker.run(Worker.java:86)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: cannot change DocValues type from SORTED to SORTED_SET for field "title"
at org.apache.lucene.index.FieldInfos$FieldNumbers.verifyConsistent(FieldInfos.java:291)
at org.apache.lucene.index.FieldInfos$FieldNumbers.setDocValuesType(FieldInfos.java:335)
at org.apache.lucene.index.DefaultIndexingChain.indexDocValueRT(DefaultIndexingChain.java:661)
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:604)
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:525)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:337)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:450)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1722)
at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:314)
at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:246)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:201)
... 12 common frames omitted
Cause
Using TextField types with docValues is not recommended and will not reliably return correct facets if there is case sensitive matching involved. The values stored in docValues will not be lower-cased and provide incorrect matching for sorting, aggregation, and so on.
Tip: DataStax recommends using the new LowerCaseStrField
custom type introduced in DSE 5.1. The custom field type provides the following features:
- Converts the data into lowercase and correctly stores the lowercase data in
docValues
. (NotedocValues
will default to false for all field types) - Converts the query values to lowercase.
Workaround
There are a couple of ways to eliminate the error here:
- Change the schema to set
docValues=false
for the field and re-index (withoutdeleteAll
). This change will break any faceting etc type queries, but then you can switchdocValues
back and reindex.
Or
- Reindex in place (don't change any schema) but use
deleteAll=true
.
Solution
The correct solution is to change the schema to use the LowerCaseStrField type for any previous text fields you use with docValues as described in the DataStax documentation Using LowerCaseStrField with search indexes.
Changing a field type in the schema will of course require a full reindex.