Summary
On a 2-node cluster with separate workload, 1-solr and 1-spark, a query run using the csc (cassandraSQLcontext) fails with this error:
solr search queries only supported on the 'solr_query' field
Symptoms
The following is typical of the error seen in the spark shell:
scala> csc.sql("select * from keyspace1.table1 where col3 = 'ABCDEF'").show()
WARN 2016-05-25 10:41:55,490 org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 6.0 (TID 12, 172.31.29.142): java.io.IOException: Exception during execution of SELECT "col1", "col2", "col3", "col4", "col5" FROM "keyspace1"."table1" WHERE token("col3") <= ? AND "col3" = ? ALLOW FILTERING: Solr search queries only supported on the 'solr_query' field
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.com$datastax$spark$connector$rdd$CassandraTableScanRDD$$fetchTokenRange(CassandraTableScanRDD.scala:215)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$13.apply(CassandraTableScanRDD.scala:229)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$13.apply(CassandraTableScanRDD.scala:229)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at com.datastax.spark.connector.util.CountingIterator.hasNext(CountingIterator.scala:12)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143)
Cause
The cassandraSQLcontext (csc
) does not correctly interpret the select on non solr_query
fields. If it is required to run select on fields indexed by DSE solr from the spark shell using csc
, the query has to be run from a mixed workload node where solr and spark are both started.
Solution
The solution for separate workload clusters is to use the hive context. CassandraSQLcontext is being phased out starting with DSE5 and the standard will be the hive context, hc
.
The same query above must be re-written as:
scala> hc.sql("select * from keyspace1.table1 where col3 = 'col3'").show()