-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STORM-3100 : Introducing CustomIndexArray to speedup HashMap lookups #2711
Conversation
bc9cbb7
to
9d875a0
Compare
Looks like tester_bolt_metrics.py under storm-core is failing. I see that it can be run via mvn clojure:test under storm-core. But unable to figure out what really going wrong. Any help would be appreciated. thanks. |
@roshannaik |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good. I have a number of nits/questions Mostly about the tests.
Integer lowerIndex = Integer.MAX_VALUE; | ||
Integer upperIndex = Integer.MIN_VALUE; | ||
|
||
for (Map.Entry<Integer, T> entry : src.entrySet()) { // calculate smallest & largest indexes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would src.keySet()
be simpler because we don't use the values?
|
||
this.elements = makeNullInitializedArray(elemCount); | ||
|
||
for (Map.Entry<Integer, T> entry : src.entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would calling set
improve code reuse a little?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, calling set() directly instead of elements.set() would have been nice..But normally I dont call instance methods from constructors since the object is not considered as fully initialized until the constructor has finished.
/** | ||
* Always returns true as this cannot be empty. | ||
*/ | ||
public boolean isEmpty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have this at all. It does not implement collection so we don't need it.
/** | ||
* Returns the index range as a Set. | ||
*/ | ||
public Set<Integer> indices() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we are keeping the order of the indices here, but it is not documented in the javadocs, and it is not needed, as the only place this is used is for doing a set difference.
It would also be nice to indicate that this is not a cheap call so people don't abuse it calling it a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sue.. I just felt its intuitive and nice to maintain the integers in order even if not absolutely necessary. Updating the javadoc.
@@ -63,7 +64,7 @@ | |||
public static Tuple generateTestTuple(String source, String index, String type, String id) { | |||
TopologyBuilder builder = new TopologyBuilder(); | |||
GeneralTopologyContext topologyContext = new GeneralTopologyContext(builder.createTopology(), | |||
new Config(), new HashMap<>(), new HashMap<>(), new HashMap<>(), "") { | |||
new Config(), new CustomIndexArray<String>(0,1), new HashMap<>(), new HashMap<>(), "") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra space.
@@ -56,7 +57,7 @@ public static Tuple generateTestTuple(final String source, | |||
GeneralTopologyContext topologyContext = new GeneralTopologyContext( | |||
builder.createTopology(), | |||
new Config(), | |||
new HashMap<>(), | |||
new CustomIndexArray<String>(0,1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: java should be able to figure out that it is a String
from the required type for GeneralTopologyContext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is mostly for readability .. when reading this code its unclear what the deduced type will be without further checking the parameter types for the constructor. Happy to change if you prefer the other way.
@@ -56,7 +57,7 @@ public static Tuple generateTestTuple(final String source, | |||
GeneralTopologyContext topologyContext = new GeneralTopologyContext( | |||
builder.createTopology(), | |||
new Config(), | |||
new HashMap<>(), | |||
new CustomIndexArray<String>(0,1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also why does this need to be 0,1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just any non empty range. will change it to (0,0)
@@ -69,8 +70,8 @@ public void testTimeFormat() { | |||
} | |||
|
|||
private TopologyContext createTopologyContext(Map<String, Object> topoConf) { | |||
Map<Integer, String> taskToComponent = new HashMap<Integer, String>(); | |||
taskToComponent.put(7, "Xcom"); | |||
CustomIndexArray<String> taskToComponent = new CustomIndexArray<>(0,8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 0,8 if all we are setting is 7?
@@ -112,15 +112,15 @@ public Task(Executor executor, Integer taskId) throws IOException { | |||
public List<Integer> getOutgoingTasks(Integer outTaskId, String stream, List<Object> values) { | |||
if (debug) { | |||
LOG.info("Emitting direct: {}; {} {} {} ", outTaskId, componentId, stream, values); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think grouping check should happen even if debug is not enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was part of an older PR. From what I recall ... on profiling I had noticed that the grouping check was expensive in the critical path due to the fact that it needed lookups in three (now down to 2) hashmaps : streamComponentToGrouper & componentGrouping. Since neither were keyed on Integer, the CustomIndexArray style optimization was not possible.
@@ -660,8 +661,8 @@ public static Tuple testTuple(List<Object> values, MkTupleParam param) { | |||
} | |||
} | |||
|
|||
Map<Integer, String> taskToComp = new HashMap<>(); | |||
taskToComp.put(task, component); | |||
CustomIndexArray<String> taskToComp = new CustomIndexArray<>(task,task+1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If lower and upper are inclusive why do we need task, task+1?
Sorry for not being able to address the comments sooner. Have been traveling and should be able to get back to this soon after jul 12 th. |
https://travis-ci.org/apache/storm/jobs/403972342 Travis CI build is failing and failures look like related to the changeset. Could you check above build result and track? |
…tId & ExecutorTransfer.localReceiveQueues
Looks like the build still fails and relevant to this patch. |
Spent sometime trying to triage this failure but unable to figure out the cause. i know the issue has to do with this patch as it passes w/o this patch. I think it will take me some more time to get to the bottom of this. I did notice another (kind of significant) performance issue which is simple to address. Given that this is a relatively minor optimization, I am planning to provide a fix for that one first, and then revert back to this. |
Ran these perf topos to check perf impact:
1) ConstSpoutIdBoltNullBoltTopo With 1 Acker
Before: Throughput= 1,067,128 /sec . Latency = 1.038 ms
After: Throughput= 1,294,511 /sec . Latency = 1.037 ms
2) ConstSpoutIdBoltNullBoltTopo With 0 Acker
Before: Throughput= 4,389,133
After: Throughput= 4,748,744 /sec
3) TVL : --rate 350000 --spouts 1 --splitters 2 --counters 2 -c topology.max.spout.pending=500 -c topology.acker.executors=2 -c topology.disable.loadaware.messaging=true
No significant change observed in Throughput or Latency.