Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 67: Update Elasticsearch and improve geoshape indexing #79

Merged
merged 5 commits into from
Apr 14, 2017

Conversation

sjudeng
Copy link
Contributor

@sjudeng sjudeng commented Feb 3, 2017

Features

Compatibility

ES Client ES Server Version
Supported Tested
HTTP REST 2.x,5.x 5.3.0,5.2.2,5.2.1,5.1.2,5.0.2,2.4.4,2.3.5,2.2.2*,2.1.2*,2.0.2*
Transport 2.x 2.4.4,2.3.5,2.2.2*,2.1.2*,2.0.2*

* Pre-2.3.0 versions do not support indexing non-point geometries or contains operations and require additional dependency exclusions to satisfy dependency convergence checks

Breaking Changes

  • Elasticsearch 1.x is no longer supported. Justification is that currently Elasticsearch is on version 5.2 and 1.7.x was end of life as of 2017-01-16 (1.5.x EOL date was 2016-09-23). See https://www.elastic.co/support/eol for more information.
  • Node client is no longer supported. As a result support for starting embedded Elasticsearch instances is limited. Embedded instances are managed automatically in testing and when using JanusGraph server (janusgraph.sh). In all other cases Elasticsearch must already be running to use with JanusGraph. As a mitigation and convenience, an Elasticsearch distribution is packaged with JanusGraph. See the updated JanusGraph Elasticsearch documentation page for more information.
  • TTL is no longer supported. TTL was deprecated in 2.0 and removed in 5.0 with guidance to migrate to an index-per-timeframe approach (see https://www.elastic.co/guide/en/elasticsearch/reference/2.4/mapping-ttl-field.html)
  • The default client/port was changed from Transport/9300 to REST/9200
  • Geoshape Kryo serialization now requires a custom serializer. The serializer, org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator is defined in the janusgraph-hadoop-core module to avoid adding an (unshaded) Kryo dependency to janusgraph-core. This serializer must be registered with Spark when indexing Geoshape properties.
spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator

Dependency Updates

Dependencies were updated as shown in the following table.

Dependency Previous Version New Version
Elasticsearch 1.5.1 2.4.4
Lucene 4.10.4 5.5.2
Spatial4j 0.4.1 0.5
Jackson2 2.4.4 2.6.6
Netty 3.6.6 3.10.5
Joda 1.6.2 2.8.2
commons-cli 1.2 1.3.1

Additional Elasticsearch version properties were added to allow changing the version separately for different parts of the system.

Property Scope Default
elasticsearch.version Core version used for constructing settings, query objects, etc. 2.4.4
elasticsearch.rest.version REST client version 5.3.0
elasticsearch.dist.version Default distribution version used for running tests (see below) and in release 5.3.0
elasticsearch.groovy.inline Flag for whether to enable Groovy dynamic scripting. Required only for ES 2.x. false

Testing

Updated project configuration to download and run a local Elasticsearch instance during testing and updated all janusgraph-es test suites to use same. A separate property, elasticsearch.dist.version, can be used to run the janusgraph-es test suites against both 2.x and 5.x ES distributions. Tests were updated to use REST client by default but separate Transport client subclasses were also added. This allows transport client testing when using ES 2.x but not 5.x (since transport client connections to ES 5.x are not currently supported).

The following entries in the Travis build matrix show the cases required to test all supported ES major distribution and client variants. Note that elasticsearch.groovy.inline must be set to true when testing against ES 2.x distributions.

# REST client tests against default ES 5.x distribution version
- MODULE='es' ARGS='-DthreadCount=1'
# REST client tests against specific ES 5.x distribution version
- MODULE='es' ARGS='-DthreadCount=1' -Delasticsearch.dist.version=5.2.2
# REST and Transport client tests against default ES 2.x distribution (v2.4.4)
- MODULE='es' ARGS='-DthreadCount=1 -Pelasticsearch2 -Delasticsearch.dist.version=2.3.5

All tests are passing.

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4:10:55.873s
[INFO] Finished at: Sat Mar 11 06:13:06 UTC 2017
[INFO] Final Memory: 141M/1459M
[INFO] ------------------------------------------------------------------------

@janusgraph-bot janusgraph-bot added the cla: yes This PR is compliant with the CLA label Feb 3, 2017
@sjudeng sjudeng changed the title Update to Elasticsearch 2.4.2 and improve geoshape indexing Update to Elasticsearch 2.4.4 and improve geoshape indexing Feb 3, 2017
@amcp
Copy link

amcp commented Feb 4, 2017

How did you test these changes? In other words, do the tests perform better now or at least the same that they were before changing the version?

Copy link

@amcp amcp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took a look. looks good. where can i see the test results?


@Override
public boolean hasNegation() {
return false;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on why this is not supported. I think it is because when shapes intersect at their edges, part of the shapes are on the outside.

Copy link
Contributor Author

@sjudeng sjudeng Feb 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A JanusGraphPredicate can only have a negation if it can be defined directly as another JanusGraphPredicate. This isn't possible for Geo.Contains (e.g. negation is not Geo.WITHIN for example).

Note this is a separate question from whether the negation of the Contains spatial predicate exists mathematically, though I'm not sure it does because of subtleties in the definition.

https://en.wikipedia.org/wiki/DE-9IM#Spatial_predicates
http://lin-ear-th-inking.blogspot.com/2007/06/subtleties-of-ogc-covers-spatial.html

@@ -23,7 +23,7 @@
*/
public enum ParameterType {

MAPPING("mapping"), INDEX_POSITION("index-pos"), MAPPED_NAME("mapped-name"), STATUS("status");
MAPPING("mapping"), INDEX_POSITION("index-pos"), MAPPED_NAME("mapped-name"), STATUS("status"), INDEX_GEO_MAX_LEVELS("index-geo-max-levels"), INDEX_GEO_DIST_ERROR_PCT("index-geo-dist-error-pct");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one line for each enum and comment on each?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good call. I documented the ones I added but wasn't sure about the others.

String inlineScriptsKey = "script.inline";
String inlineScriptsVal = settings.get(inlineScriptsKey);
if (null != inlineScriptsVal && !"true".equals(inlineScriptsVal)) {
log.warn("JanusGraph requires Elasticsearch inline scripting. Setting {} to true. " +
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requirement sounds like a user error. loglevel=error and fail fast?

@@ -387,12 +414,22 @@ private static String getDualMappingName(String key) {
return key + STRING_MAPPING_SUFFIX;
}

private static Map<Geo, ShapeRelation> spatialPredicates() {
return Collections.unmodifiableMap(Stream.of(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return ImmutableMap.of(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better thanks

queryBuilder = QueryBuilders.geoBoundingBoxQuery(key).bottomRight(southwest.getLatitude(), northeast.getLongitude()).topLeft(northeast.getLatitude(), southwest.getLongitude());
} else if (shape.getType() == Geoshape.Type.POLYGON) {
queryBuilder = QueryBuilders.geoPolygonQuery(key);
for (int i=0; i<shape.size(); i++) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stream.map.forEach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done but note while Geoshape has a get(int i) method it is not a Collection so doesn't support stream directly if that's what you were after.

gf.createLineString(new Coordinate[] {new Coordinate(x-1,y-1), new Coordinate(x,y)})}));
};

private Function<Geoshape,Geoshape> makeMultiPolygon = place -> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static final?

return Geoshape.geoshape(gf.createMultiPoint(new Coordinate[] {new Coordinate(x,y), new Coordinate(x+1,y+1)}));
};

private Function<Geoshape,Geoshape> makeMultiLine = place -> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static final?

return Geoshape.line(Arrays.asList(new double[][] {{x,y},{x,y+1},{x+1,y+1},{x+1,y}}));
};

private Function<Geoshape,Geoshape> makeMultiPoint = place -> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static final?

return Geoshape.polygon(Arrays.asList(new double[][] {{x,y},{x,y+1},{x+1,y+1},{x+1,y},{x,y},{x,y}}));
};

private Function<Geoshape,Geoshape> makeLine = place -> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static final?

}

private Function<Geoshape,Geoshape> makePoly = place -> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static final? it seems like these functions dont depend on member state, so lets make it clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

@sjudeng sjudeng force-pushed the elasticsearch-geoshape-upgrade branch from da289e2 to c229ff1 Compare February 4, 2017 17:38
@sjudeng
Copy link
Contributor Author

sjudeng commented Feb 4, 2017

@amcp Thanks for the review. All updates are there with a couple exceptions as mentioned above.

All tests pass in this branch with the additional application of the fix in #76. Attached is relevant output from a test run that completed two days ago but note this was on a combined master branch which also included other open PRs (#76, #77, #78 and #81).

test.out.excerpt.txt

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 04:49 h
[INFO] Finished at: 2017-02-02T00:46:25-06:00
[INFO] Final Memory: 146M/1538M
[INFO] ------------------------------------------------------------------------

@amcp
Copy link

amcp commented Feb 19, 2017

Can we establish a 72 hour comment period on this PR? Would like to get these changes merged in.

@@ -64,6 +64,11 @@
<artifactId>spatial4j</artifactId>
</dependency>
<dependency>
<groupId>com.vividsolutions</groupId>
<artifactId>jts</artifactId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a LGPL library. Are there alternatives? It looks like the code is only used in test cases. Maybe we can make it an optional, off-by-default, inclusion via a Maven build flag or profile. TinkerPop had to do something similar for Neo4j.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pluradj Good catch.
@mbrukman We should have some license automation checks too.
@sjudeng Please let us know if there are any alternatives.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbrukman I created #128 to track automating dependency license check as part of CI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JTS is currently a non-optional dependency in janusgraph-solr and is distributed with JanusGraph (and previously Titan) releases.

janusgraph-solr/pom.xml

What could be done is to mark it as optional in the poms so it's available for tests and users with requirements for relevant features (Solr, Geoshapes, etc.) can manually add it to their lib directory.

Given it's already a dependency can this be a separate issue or is it a blocker here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the longer term solution might be a new Eclipse Foundation project aiming to replace JTS.

https://github.com/locationtech/jts
https://www.locationtech.org/projects/technology.jts
https://www.locationtech.org/proposals/jts-topology-suite

It's being developed by LocationTech, who also develop Spatial4j, through-which the JTS dependency comes in to Solr/JanusGraph. I'd guess the roadmap is for a stable new-JTS release and then a Spatial4j update/release migrating to same, at which point JanusGraph could update as well (likely involving a Solr version upgrade).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha, you're right @sjudeng - had no idea that dependency was around for so long.
Good find on that new project. any indications that they are close to a release? It seems like they've been at it for a while.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything on a release timeline. There's recent mailing list activity but there was a gap for a few months before that. Also I don't think a full migration in JanusGraph would be possible until Spatial4j and Solr were updated to use JTS 2. The good news is the core Spatial4j developer, who I think also is a core contributor on the Solr spatial connectors, has been active on the JTS 2 mailing list so he is at least aware of the effort.

@amcp
Copy link

amcp commented Feb 21, 2017

@pluradj I created #133 to track the issue of marking JTS optional in janusgraph-solr. I think it is a high priority item, and we should try to fix it before our initial release. Can @sjudeng move on with his PR?
@sjudeng please rebase.

import com.spatial4j.core.shape.jts.JtsGeometry;
import com.vividsolutions.jts.geom.Coordinate;
import com.vividsolutions.jts.geom.Geometry;
import com.vividsolutions.jts.geom.GeometryFactory;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This usage is directly in janusgraph-core not in a test flow. @amcp I think we should hold off on including any of the LGPL code until we figure out an optional and clean way to do so. @sjudeng is it possible to separate out the ES 2.4.4 update from the Geoshape indexing improvements?

Copy link

@amcp amcp Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the little I saw in janusgraph-es, it will require some not-minor refactoring, and probably we need to add new StoreFeatures like supportsGeo. Interesting.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pluradj Put another way, I think removing JTS dependency in this PR is way out of scope for this PR. Titan/JanusGraph depended on JTS up until today, so I think it is OK to upgrade these versions and then figure out a way off of current JTS. Sound OK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few specialty functions in there that use JTS. I'll need to verify/test but if it's made an optional dependency it would still compile and be used in tests but wouldn't be included in distributions (at least not through janusgraph-core). Is this acceptable? In this way users of relevant (e.g. polygon) features could manually add JTS to their JanusGraph classpath but other users would be unaffected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amcp OK, I think we're in agreement. In the current code, you can use Solr without JTS as long as you don't define a Solr schema with it, although the example schema.xml includes it. Updating the docs and making that dependency optional will be handled with #133.

@sjudeng JTS wouldn't be included in distributions -- that's what we're shooting for. Users would need to add the jar themselves to take advantage of those features, and the docs alerts them of the licensing restrictions they're agreeing to, similar to what is in the Solr wiki.

@pluradj
Copy link
Member

pluradj commented Feb 21, 2017

@ptgoetz @hsaputra any other ideas on how to mitigate LGPL usage?

@sjudeng sjudeng force-pushed the elasticsearch-geoshape-upgrade branch from e001b59 to 2818c12 Compare February 22, 2017 12:45
@sjudeng
Copy link
Contributor Author

sjudeng commented Feb 22, 2017

Updated to make JTS optional in janusgraph-core as discussed. When JTS is not on the path geoshape functions involving point, circle and rectangle shapes will still work as expected. User must add JTS to their classpath when use of line and polygon shapes is required, as indicated in documentation.

All tests are passing.

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3:26:18.819s
[INFO] Finished at: Wed Feb 22 09:29:09 UTC 2017
[INFO] Final Memory: 139M/1481M
[INFO] ------------------------------------------------------------------------

Note during testing JTS was also marked optional in janusgraph-solr to ensure proper behavior building/running distribution as well as unit tests, but this PR does not include that update which will be handled under #133.

@pluradj
Copy link
Member

pluradj commented Feb 22, 2017

@sjudeng after building the distribution zip, Elasticsearch isn't starting with bin/janusgraph.sh start. I think some of the files under janusgraph-dist/src/assembly/static might need to be updated.

@sjudeng sjudeng force-pushed the elasticsearch-geoshape-upgrade branch from 2818c12 to f3be16d Compare February 23, 2017 01:45
@sjudeng
Copy link
Contributor Author

sjudeng commented Feb 23, 2017

@pluradj Just pushed updates to resolve this issue.

Are there automated integration tests that catch this or do you just test manually? When building a release this execution runs, which seems to be at least a pseudo integration test on the distribution. Maybe something similar could be added to run janusgraph.sh start/stop and/or other relevant scripts.

@pluradj
Copy link
Member

pluradj commented Feb 23, 2017

@sjudeng I tested it manually, but automation sounds like a good idea.

The fix worked fine on Ubuntu, and I was able to connect through the Gremlin Server remotely with the Gremlin Console.

I'm getting this error in the log/elasticsearch.log when run the bin/janusgraph.sh start on Mac:

[2017-02-23 11:09:19,991][ERROR][bootstrap                ] Exception
java.lang.IllegalStateException: jar hell!
class: groovy.beans.Bindable
jar1: /Users/pluradj/src/github/JanusGraph/janusgraph/pull/janusgraph-0.1.0-SNAPSHOT-hadoop2/bin/../lib/groovy-2.4.7-indy.jar
jar2: /Users/pluradj/src/github/JanusGraph/janusgraph/pull/janusgraph-0.1.0-SNAPSHOT-hadoop2/bin/../lib/groovy-2.4.7.jar
	at org.elasticsearch.bootstrap.JarHell.checkClass(JarHell.java:280)
	at org.elasticsearch.bootstrap.JarHell.checkJarHell(JarHell.java:186)
	at org.elasticsearch.bootstrap.JarHell.checkJarHell(JarHell.java:87)
	at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:180)
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:286)
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:45)

It's odd that the stack trace is getting chopped. I tried manually cleaning up some of the jar conflicts, groovy-2.4.7-indy.jar then high-scale-lib-1.0.6.jar, but then ended up with a clash on elasticsearch-2.4.4.jar with janusgraph-es-0.1.0-SNAPSHOT.jar.

@sjudeng
Copy link
Contributor Author

sjudeng commented Feb 23, 2017

@pluradj The workaround for the JarHell issue was to override it in JanusGraph as described in the notes above. I think the original JarHell class is getting picked up first under OSX for whatever reason.

To test would you change bin/elasticsearch.in.sh and replace the ES_CLASSPATH definition on L13 with the following?

ES_CLASSPATH="`dirname $0`/../lib/janusgraph-es-0.1.0-SNAPSHOT.jar:`dirname $0`/../lib/*"

If this works I'll look at getting that updated (obviously without the hard coded version).

@sjudeng sjudeng force-pushed the elasticsearch-geoshape-upgrade branch from 7a735b3 to f4bb6a3 Compare March 14, 2017 19:45
@sjudeng sjudeng force-pushed the elasticsearch-geoshape-upgrade branch from f4bb6a3 to 6963319 Compare April 1, 2017 21:09
@sjudeng sjudeng changed the title Update Elasticsearch and improve geoshape indexing Issue 67: Update Elasticsearch and improve geoshape indexing Apr 1, 2017
@sjudeng sjudeng force-pushed the elasticsearch-geoshape-upgrade branch from 6963319 to 2660ed5 Compare April 4, 2017 01:27
@sjudeng
Copy link
Contributor Author

sjudeng commented Apr 6, 2017

@amcp, @pluradj, @jerryjch - Because commit b34be73 is pretty large and was added (together with the smaller 7a2c590 and 2660ed5) after your initial approvals do you want to dismiss your reviews until you (or @hsaputra, @twilmes, @dylanht) have time to review the new commits?

Copy link

@amcp amcp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more comments

<type>zip</type>
<exclusions>
<exclusion>
<groupId>*</groupId>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment as to the meaning of these *

.travis.yml Outdated

matrix:
- MODULE='berkeleyje'
- MODULE='cassandra'
- MODULE='es' ARGS='-DthreadCount=1'
- MODULE='es' ARGS='-DthreadCount=1 -Delasticsearch.dist.version=2.4.4'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a variable for es version here and below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this all out into an elasticsearch2 profile, which I think resolves this.

@@ -13,5 +13,7 @@ JanusGraph.
[options="header"]
|==========================
| JanusGraph | Cassandra | HBase | Elasticsearch | Solr | TinkerPop
| 0.1.0 | 1.2.z, 2.0.z, 2.1.z | 0.94.z, 0.96.z, 0.98.z, 1.0.z | 1.5.z | 5.2.z | 3.0.z
| 0.1.0 | 1.2.z, 2.0.z, 2.1.z | 0.98.z, 1.0.z, 1.2.z | 2.z, 5.z* | 5.2.z | 3.2.z
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this PR is about ES why are we changing HBase versions? if we forgot to do it then its ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this concern being addressed? Always nice to move away from 0.94 but should it be addressed by this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased to pick up #176 and now just updating the ES version here.

Copy link

@amcp amcp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more comments

===============================
Beginning with Elasticsearch 5.0 JanusGraph uses sandboxed https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-painless.html[Painless scripts] for inline updates, which are enabled by default in Elasticsearch 5.x.

Using JanusGraph with Elasticsearch 2.x requires enabling Groovy inline scripting by setting `script.engine.groovy.inline.update` to `true` on the Elasticsearch cluster (see https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting-security.html#enable-dynamic-scripting[dynamic scripting documentation] for more information).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we need 2.4+, update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean in terms of compatibility? All 2.x releases are supported. Test suite has been run against 2.4.4, 2.3.5, 2.2.2, 2.1.2, 2.0.2 (though see note at the top regarding limitations in versions before 2.3.0).

@@ -190,6 +189,9 @@ public ElasticSearchIndex(Configuration config) throws BackendException {
maxResultsSize = config.get(INDEX_MAX_RESULT_SET_SIZE);
log.debug("Configured ES query result set max size to {}", maxResultsSize);

scriptLang = client.getMajorVersion() < 5 ? "groovy" : "painless";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull 5 out into constant

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or even better into enum ESMajorVersion where it would contain TWO and FIVE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (good idea)

@@ -194,6 +194,11 @@ public void close() throws IOException {
client.close();
}

@Override
public int getMajorVersion() {
return 2;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use the MajorVersion enum described above

@@ -76,6 +82,29 @@ public void close() throws IOException {
}

@Override
public int getMajorVersion() {
if (majorVersion == null) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to un-nest code lets check majorVersion != null first and return, and then do the logic to compute it

@@ -223,4 +252,19 @@ private Response performRequest(String method, String path, byte[] requestData)
return response;
}

@JsonIgnoreProperties(ignoreUnknown=true)
private static final class ClusterInfo {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment as to why we have a map to represent versions? is this class consistent with a remote ES deployment model? thinking along the lines of AWS ES where JanusGraph may not be in control of the cluster topology with its configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version is obtained from a ES request with default path (e.g. http://localhost:9200/). The response is expected to be of the form below and this class is used for unmarshalling to avoid nested Map casts. But would AWS ES (or other deployment models) not return a response of this form to this request? I can look into this more.

{
  "name" : "X29X896",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "ROrFjDOqSIGLBDsZhxFOcA",
  "version" : {
    "number" : "5.2.1",
    "build_hash" : "db0d481",
    "build_date" : "2017-02-09T22:05:32.386Z",
    "build_snapshot" : false,
    "lucene_version" : "6.4.1"
  },
  "tagline" : "You Know, for Search"
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amcp I confirmed that AWS ES does support requests with default path (e.g. https://search-estest-*.*.es.amazonaws.com/) and the response is formatted as expected and shown above.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking.

Copy link

@amcp amcp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more comments

Map<String,Object> doc = new HashMap<>();
for (IndexEntry e : mutation.getAdditions()) {
KeyInformation keyInformation = informations.get(storename).get(e.field);
switch (keyInformation.getCardinality()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you only care about one value then maybe say if keyInformation.getCardinality != SINGLE continue and get rid of the switch-case-break

@@ -563,6 +569,22 @@ private String getAdditionScript(KeyInformation.IndexRetriever informations, Str
return script.toString();
}

private Map<String,Object> getAdditionDoc(KeyInformation.IndexRetriever informations, String storename, IndexMutation mutation) throws PermanentBackendException {
Map<String,Object> doc = new HashMap<>();
for (IndexEntry e : mutation.getAdditions()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe switch to a .stream().map().filter().collect()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did take out the switch but I wasn't able to get a clean implementation using stream API in this case. Let me know if you want me to take another crack at it.

String lang = script.get("lang");
update.setScript(new Script(inline, ScriptService.ScriptType.INLINE, lang, null));
}
if (request.getSource().containsKey("doc")) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

externalize doc to constant

UpdateRequestBuilder update = client.prepareUpdate(indexName, type, id);
if (request.getSource().containsKey("script")) {
Map<String,String> script = ((Map<String, String>) request.getSource().get("script"));
String inline = script.get("inline");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

externalize inline to constant

Script script = new Script(inline, ScriptService.ScriptType.INLINE, null, null);
UpdateRequestBuilder update = client.prepareUpdate(indexName, type, id).setScript(script);
UpdateRequestBuilder update = client.prepareUpdate(indexName, type, id);
if (request.getSource().containsKey("script")) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

externalize script to constant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, externalized "script", "inline" and "doc.

if (request.getSource().containsKey("script")) {
Map<String,String> script = ((Map<String, String>) request.getSource().get("script"));
String inline = script.get("inline");
String lang = script.get("lang");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

externalize lang to constant

@amcp
Copy link

amcp commented Apr 6, 2017

@sjudeng I reviewed everything once, i will take a look at the enormous b34be73 again later today

Copy link
Member

@hsaputra hsaputra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a glance and LGTM. But for full correctness it definitely need to rely on the tests since code changes are just too large.

…ygon geometries, and support for geoIntersect, geoContains and geoDisjoint predicates. Support for indexing multi-point/line/polygon properties is implemented but untested.

Signed-off-by: sjudeng <[email protected]>
… is compatible with both ES 2.x and 5.x servers because the ES 5.x REST API is backwards-compatible with 2.x. Split ES version properties out by search builder version (elasticsearch.version), REST client version (elasticsearch.rest.version), and the distribution version to use in testing and releases (elasticsearch.dist.version). Create Transport test variants for all test suites. Update Travis build matrix to include janusgraph-es test suite variants for ES 5.x and 2.x as well as REST and transport clients.

Signed-off-by: sjudeng <[email protected]>
…x. Painless scripts are enabled by default and do not require custom dynamic scripting configuration to use. Added property, elasticsearch.groovy.inline, to control whether Groovy dynamic scripting is enabled on embedded ES instances.

Signed-off-by: sjudeng <[email protected]>
@sjudeng sjudeng force-pushed the elasticsearch-geoshape-upgrade branch from 2660ed5 to 83a1da2 Compare April 7, 2017 14:44
@sjudeng sjudeng force-pushed the elasticsearch-geoshape-upgrade branch from 83a1da2 to 230382c Compare April 7, 2017 14:55
@sjudeng
Copy link
Contributor Author

sjudeng commented Apr 7, 2017

@amcp and @hsaputra - Thanks for the feedback. Everything should be addressed with a couple exceptions as mentioned above. If you haven't already please also review the breaking changes section in the PR notes above.

@sjudeng
Copy link
Contributor Author

sjudeng commented Apr 10, 2017

@amcp, @jerryjch, @pluradj, @hsaputra, @twilmes, @dylanht - I think this PR has been sufficiently reviewed at this point. There were two early commits that were reviewed/approved by three other committers and the more recent commits have been approved by two other committers. All issues have been mitigated and there has been a community discussion+vote on the dev list. In particular on the issue of ES 1.x support, the 0.1 release branch will maintain the original ES v1.5.1 support.

Unless there are any objections I would like to merge this PR later this week.

@hsaputra
Copy link
Member

+1 @sjudeng

@sjudeng sjudeng merged commit cb9c06a into JanusGraph:master Apr 14, 2017
@sjudeng sjudeng deleted the elasticsearch-geoshape-upgrade branch May 23, 2017 03:17
bwatson-rti-org pushed a commit to bwatson-rti-org/janusgraph that referenced this pull request Mar 9, 2019
…e-upgrade

Issue 67: Update Elasticsearch and improve geoshape indexing
micpod pushed a commit to micpod/janusgraph that referenced this pull request Nov 5, 2019
…e-upgrade

Issue 67: Update Elasticsearch and improve geoshape indexing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This PR is compliant with the CLA index/elasticsearch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants