DataStax Help Center

Running SOLR queries via cqlsh and being aware of tokenizers

Summary

Solr queries may be run via cqlsh as outlined in our documentation here. However one must be aware of tokenizers that are being used for fields as unexpected results could be returned.

Symptoms

Taking a simple example with a common tokenizer in the solr schema.xml

<tokenizer class="solr.StandardTokenizerFactory"/>

 

When searching in cqlsh an unexpected match can occur if for example if the "-" character is used in a query string:

cqlsh:dse> select guid, posted_by from my_table where solr_query='posted_by:no-one-ever-posted-this-3326' limit 5;

 guid                                 | posted_by
--------------------------------------+--------------------------------------
 42519b89-331f-3ae6-9fb7-6a851166cf1d | 81cac0b5-78cb-3326-93b9-d6a6ccc5ed47
 42519b89-331f-3ae6-9fb7-6a851166cf1d | 81cac0b5-78cb-3326-93b9-d6a6ccc5ed47
 42519b89-331f-3ae6-9fb7-6a851166cf1d | 81cac0b5-78cb-3326-93b9-d6a6ccc5ed47
 42519b89-331f-3ae6-9fb7-6a851166cf1d | 81cac0b5-78cb-3326-93b9-d6a6ccc5ed47
 42519b89-331f-3ae6-9fb7-6a851166cf1d | 81cac0b5-78cb-3326-93b9-d6a6ccc5ed47

Cause

In the above example the StandardTokenizer will split value with the "-" character and store them in the Solr index a separate strings. The query above will then match on the values individually.

Tip: the Solr Admin UI analysis tool will show you how values are stored in the index. 

Solution

Ensure the query takes the tokenizer into account and / or a different type of tokenizer is implemented that will suit the type of queries required

Further reading

https://cwiki.apache.org/confluence/display/solr/Tokenizers

https://cwiki.apache.org/confluence/display/solr/Analysis+Screen

Was this article helpful?
1 out of 1 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk