Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cassandra.range.batch.size Not Respected #143

Open
spmallette opened this issue Sep 19, 2013 · 0 comments
Open

cassandra.range.batch.size Not Respected #143

spmallette opened this issue Sep 19, 2013 · 0 comments
Labels

Comments

@spmallette
Copy link
Member

Not sure if this a Faunus issue of a Titan issue (or something else for that matter). I've been getting highly inconsistent edge counts with Faunus on a large graph as well as many timeout exceptions when writing a Titan+Cassandra graph to sequence file. Having read a some mailing list posts and other odds and ends I ended up adding this setting:

cassandra.range.batch.size=256

and all my timeout problems went away. however, when i do that, i get highly inconsistent edge counts. On this particular graph, the out edge count exceeds the in-edge count by nearly a half a billion.

When I remove that setting, I get timeoutexceptions:

java.lang.RuntimeException: TimedOutException()
    at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
    at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
    at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
    at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
    at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
    at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
    at com.thinkaurelius.faunus.formats.titan.cassandra.TitanCassandraRecordReader.getProgress(TitanCassandraRecordReader.java:70)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:513)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:538)
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: TimedOutException()
    at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12932)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
    at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734)
    at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718)
    at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346)
    ... 17 more

Interestingly, the cassandra.range.batch.size, seems to behave more like a cap as opposed to a batch size as when i have it set to 256, the degree distribution shows a maximum of roughly that size. I've confirmed with regular Gremlin that there definitely are vertices with edge counts exceeding 256.

If it behaved as a batch size, then this would solve my timeout problems. anyway, i could be way off base on this...just reporting what i'm seeing as i'm feeling really stuck right now.

Interestingly, if i drop that setting the faunus job fails, but when it fails the discrepancy between in/out vertex counts is far less disparate at the point of failure.

Referencing this, as it seems to all demonstrate that cassandra.range.batch.size is acting like a cap:

#99

Using titan 0.3.1 btw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant