Profiling Example 2: MVP1 query: How could we be faster? #2373

edeutsch · 2024-09-11T15:08:34Z

This is the classic drug treats disease query that is an MVP1 query. What may treat Castleman Disease?
https://arax.ci.transltr.io/?r=293768

Here's my analysis:

0.15s Launch, setup, and launch query to xDTD
3.9s xDTD has returned its results. Now Expand only starts sending queries to KPs
------- 0.1s since KP request: Automat-DrugCentral responds already!
------- 0.3s since KP request: MolePro responds already!
------- 0.6s since KP request: RTX-KG2 responds, nice!
----- <1.0s several other KPs are queried and respond with no edges, but do so in less than a second
------- 2.4s since KP request: Service Provider lumbers across the finish line panting heavily
------- 30.0s since KP request: knowledge-collaboratory does not respond in 30 seconds and request is abandoned
0.6s Add NGD edges to the graph
2.6s Remove general concepts from the knowledge graph
0.1s Resultify and Ranker and post processing
(did not record S3 storage but probably 0.3 seconds)

37.8s: Total processing time from receipt to begin storing Response
3.9s: Time spent obtaining xDTD results
30.0s: Time spent waiting for KPs to respond: MolePro, RTX-KG2 are sub-second. Knowledge collaboratory times out at 30s
3.9s: Other processing of data

Two local processing steps appear to stand out:

Computing NGD edges. 0.6 seconds seems pretty reasonable, but could it be 0.06 seconds?

Removing general concepts: 2.6 second seems quite slow, our slowest general processing step by far. Could this be 0.26 seconds?

Conclusion: How could we be faster?

We could timeout our KPs faster. I think I overheard that Aragorn times out their KPs at 10s
It appears that we are getting information from xDTD serially, not in parallel with other KPs. I wonder if it is possible to treat xDTD a bit more like a regular KP and launch its "fetch code" in parallel with waiting for other KPs? That may be tricky, but would provide a speed boost since it appears to be not the fastest and fetching of data from other traditional KPs only seems to begin after the xDTD code is complete.
We could remove general concepts faster? My sense is that that could be a lot faster, knowing nothing about what's actually happening here.
We could cache the whole initial query. If this same exact query has been done before very recently, why do it again?
We could cache KP queries/results. If we sent an exact same query to a KP very recently, why do it again?

isbluis · 2024-09-12T02:11:11Z

Linking comment from Profiling Example 1 issue, since it is also relevant:

edeutsch · 2024-09-25T19:03:58Z

Closing this after spawning issue #2388

edeutsch changed the title ~~Profiling Example 2: MVP1 query~~ Profiling Example 2: MVP1 query: How could we be faster? Sep 11, 2024

edeutsch mentioned this issue Sep 25, 2024

Speed enhancement: Make Removing general concepts from query graph faster? #2388

Closed

edeutsch closed this as completed Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profiling Example 2: MVP1 query: How could we be faster? #2373

Profiling Example 2: MVP1 query: How could we be faster? #2373

edeutsch commented Sep 11, 2024

isbluis commented Sep 12, 2024

edeutsch commented Sep 25, 2024

Profiling Example 2: MVP1 query: How could we be faster? #2373

Profiling Example 2: MVP1 query: How could we be faster? #2373

Comments

edeutsch commented Sep 11, 2024

isbluis commented Sep 12, 2024

edeutsch commented Sep 25, 2024