Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotation fetching and rendering performance, again #515

Open
jgonggrijp opened this issue Jan 27, 2022 · 0 comments
Open

Annotation fetching and rendering performance, again #515

jgonggrijp opened this issue Jan 27, 2022 · 0 comments
Labels
bug Something isn't working performance An opportunity to speed things up
Milestone

Comments

@jgonggrijp
Copy link
Member

In long sources, such as https://read-it.hum.uu.nl/explore/source/493/annotations, it is apparent that the annotations in the annotation list panel appear much sooner than the highlights in the text. On my machine, these events happen after roughly 15 and 50 seconds, respectively, for this particular source. Especially the latter delay is confusing for users; on several occasions, users have reported to me that their annotations were gone, while in reality, they just took longer to appear than the user expected (the resulting perception of "not working" is why I'm attaching the "bug" label to this ticket). The 15 second delay for the list, while certainly also far from ideal, is a bit easier to cope with because a spinning wheel animation is shown in the meanwhile, conveying to the user that something is happening. Still, it would be great if we could substantially reduce that delay.

The time taken to fetch and render the annotations scales roughly linearly with the number of annotations, which in turn scales roughly linearly with the length of the text, due to IRISA's NLP scanner adding a bunch of preannotations to every source. The above example text has 1372 (automated) annotations at the time of writing; assuming that the time between initial request and complete highlight rendering is exacty 50 seconds, that means that a single annotation takes 36 ms on average. In a source with 10k annotations (which is rare, but does occur despite me frequently warning users to split their sources), it will currently take a 6-minute wait until all hightlights have been rendered, although the user will likely perceive a less long delay due to the appearance of the highlights being spread out over time.

Over the past development history of this project, performance of annotation fetching and rendering has been a recurring issue. We came a long way; before #292, a source like the above example would not render at all, instead causing the PC to become unresponsive and overheat. In #302, we shaved a couple of seconds off the request to the backend. In #436, we attempted to boost the perceptual rendering speed further by "cheating", i.e., by fetching the annotations in pages and already rendering the first batch while the next is still fetching. This however turned out to have the opposite effect of what we intended and we quickly removed pagination again after deployment, in hotfix 0.11.1 (da75f42).

Pagination did not work, because Fuseki does not take into account that all parts of an annotation need to be kept together when we request the annotations to be ordered by position, and also because it implements pagination by bluntly recomputing the entire query on every next page. If Fuseki implemented cursor-based pagination, things might have been a bit better.

Off the top of my head (i.e., this is mostly based on what I remember from past measurements), the 50-second time of the example source breaks down roughly as follows:

  • 13 seconds fetching the annotations (the period with a spinner animation), of which
    a. 7 seconds of query processing in Fuseki (see Feature/fast backend #302);
    b. 6 seconds of network transport time and intermediate processing by rdflib-django, especially parsing the XML response from Fuseki and then re-serializing the data to Turtle (see Forward BlazeGraph query result as-is #307).
    c. 2 seconds processing the annotation data in the frontend and rendering the annotation list.
    d. 35 seconds rendering the highlights in (or rather, behind) the text.

As discussed in #302 (comment), point a. could potentially be improved upon by taking the OPTIONAL clause out of the SPARQL query that is used to fetch the annotations. For identified annotations, this would result in additional requests to the backend, which could potentially partly or entirely offset the time saved by avoiding OPTIONAL. This cost, in turn, could potentially be reduced by first issueing a query that only fetches all the identified items and then merging the results of both queries in a single graph. This is relatively easy to do, so might be worth trying.

In that same issue comment, #307 is first mentioned as a possible solution to point b.

Point c. is clearly least in need of optimization. However, option 4 for tackling d. that I discuss below, would likely have the side effect of somewhat speeding up c. as well.

Point d. explains the delay between the annotation list and the highlight layer. At first sight, it is an obvious contender for optimization. Unfortunately, a large fraction of those 35 seconds is spent in calls to the native Selection.addRange(); just a single call takes in the order of 20 ms, and we have to make one such call for every annotated text segment (a text segment is different from an annotation, but their numbers scale roughly proportionally; see #292 for an in-depth explanation). I currently see a few ways in which we could still hope to make a (perceptual) improvement (note the diminishing returns):

  1. The call to addRange is actually part of a workaround for a bug in Safari. A low-hanging fruit would be to use bug detection and disable the workaround for browsers that don't need it. However, for Safari users (and possibly users of other browsers that share this bug), this will obviously not improve performance in any way.
  2. The highlights are implemented as color bands within line segments within longer text segments within a containing element that is placed behind the text. A lesser, but still subtantial portion of the 35 seconds is spent inserting all those tiny elements into the document. A modest, but possibly still perceptible improvement might be obtained by temporarily taking the wrapper element out of the DOM while the first few pages of highlights are inserted, because this results in a single big repaint instead of many small ones. For optimal perceptible speed gain, this requires the text segments to be inserted in strict order of text position.
  3. Likewise, computation of the ranges could be prioritized for the first page or the first few pages, in order to be able to paint those as early as possible. This is essentially Feature/api pagination #436 but implemented entirely on the client side; the process as a whole takes just as long, but the user will see the result sooner in the place where she looks first.
  4. The overall algorithm that computes all the line text segments, creates the corresponding views and inserts them in the highlight layer is asymptotically optimal; it requires only a single linear pass over all the begin- and endpoints of all annotations to identify all the overlapping regions. However, it is currently written in a canonical Backbone fashion, which involves lots of intermediate abstractions, automagic and event handlers. On the plus side, this can be written concisely and fits in well with the conventions elsewhere in the application, but on the downside, it incurs a lot of function call overhead. Some, possibly a lot, of that function call overhead could be eliminated by concentrating the algorithm in a single function with a big nested for loop. It could still have collections and views as inputs and outputs. This is however a relatively big effort which will likely pay off less than the previous three options.
  5. Theoretically, there might be a way to render line segments as just single elements and obtain the color bands through CSS instead. This would cut the number of DOM insertions at least by a third (more in densely annotated sources), at the cost of greatly complicating all the logic to determine which colors should be shown in which place. It is also unlikely that this approach will be able to accurately represent all possible scenarios, so this avenue should really only be taken if additional performance gains are still needed after the previous four options.
@jgonggrijp jgonggrijp added bug Something isn't working performance An opportunity to speed things up labels Jan 27, 2022
@jgonggrijp jgonggrijp added this to the Ideally milestone Jan 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance An opportunity to speed things up
Projects
None yet
Development

No branches or pull requests

1 participant