Summary
After dropping vertex or edge labels from the schema in earlier versions of DSE Graph, you can get different count results from queries, depending on whether you use OLTP or OLAP.
Applies to
DSE Graph and DSE Analytics workloads on:
- DataStax Enterprise 6.0.0-6.0.4
- DataStax Enterprise 6.7.0
Symptoms
When a vertex or edge label is dropped from the graph schema, some underlying Cassandra records are not cleaned up. Although the TinkerPop graph engine ignores them, OLAP sees and counts these records.
Cause
The count differences are shown in this simple example of a graph with 2 vertices and two edges created in gremlin-console:
system.graph('test').create()
:remote config alias g test.g
schema.config().option("graph.allow_scan").set(true)
schema.propertyKey("name").Text().single().create()
schema.vertexLabel("person").partitionKey("name").create()
schema.edgeLabel("knows").multiple().create()
schema.edgeLabel("knows").connection("person", "person").add()
graph.addVertex(label, "person","name", "Matt")
graph.addVertex(label, "person","name", "Jim")
Matt = g.V().hasLabel('person').has('name','Matt').next()
Jim = g.V().hasLabel('person').has('name','Jim').next()
g.addE('knows').from(Matt).to(Jim)
g.addE('knows').from(Jim).to(Matt)
We have Matt and Jim, and they each know each other. Next, drop the edge label 'knows':
schema.edgeLabel('knows').drop()
In regular OLTP mode, we'll see:
gremlin> g.E()
gremlin> g.E().count()
==>0
But if we switch to OLAP mode, we'll see a different count:
gremlin> :remote config alias g test.a
==>g=test.a
gremlin> g.E()
gremlin> g.E().count()
==>2
Workaround
To workaround this issue in DSE 6.0.0-DSE 6.0.4 and DSE 6.7.0, use the OLAP engine to drop vertex or edge labels instead of the OLTP TinkerPop engine. For example, use OLAP like:
a.V().hasLabel('bought').drop() a.E().hasLabel('bought').drop()
Solution
This issue (DSP-15885) is fixed in DSE 6.0.5 and DSE 6.7.1.
1. Upgrade to DSE 6.0.5 or DSE 6.7.1 or later.
2. Use the new clean up tool to remove the orphaned database rows.
- Gremlin console:
graph.cleanUp()
- From Spark:
spark.dseGraph("graphName").cleanUp()
3. Run repair on the graph keyspaces:
nodetool repair -pr gods