Opds dumper need to read all archives to have illustration size. #625

mgautierfr · 2021-10-25T10:55:05Z

https://github.com/kiwix/libkiwix/blob/master/src/opds_dumper.cpp#L57-L70 make the opds dumper read all zim file in the library to create the opds stream.

The normal behavior should be that we trust the library.xml (when loading from it) and do not read the file to update the metadata information we know about the books.
This kind of method should be on the library side (Manager::readBookFromPath or Book::update)

On my side, with 14 zim files on usb extern hard drive, the first opds request from the home page takes more than 1 minutes.

The text was updated successfully, but these errors were encountered:

kelson42 · 2021-10-28T15:20:49Z

Seems a blocker for 10.0.0 to me. We deal with thousands of ZIM files in library.kiwix.org, we have to trust the library.xml and it should be performant.

kelson42 · 2021-10-29T10:01:30Z

@veloman-yunkan This is really a performance blocker for the next release. Please have a look.

mgautierfr · 2021-10-30T10:04:25Z

To be exact, we don't read all archives in the library but all archives returned by the opds stream. If the stream is paged (limited to N books) we read only N archives, but if not, we read all archives.
But, another issue, we do this every time, for each requests, as there is no cache. The fs cache help a bit but that all.

veloman-yunkan · 2021-10-31T10:01:12Z

https://github.com/kiwix/libkiwix/blob/master/src/opds_dumper.cpp#L57-L70 make the opds dumper read all zim file in the library to create the opds stream.

The normal behavior should be that we trust the library.xml (when loading from it) and do not read the file to update the metadata information we know about the books. This kind of method should be on the library side (Manager::readBookFromPath or Book::update)

On my side, with 14 zim files on usb extern hard drive, the first opds request from the home page takes more than 1 minutes.

@kelson42 @mgautierfr

The said change in OPDSDumper was introduced by #577 while fixing #533. The current format of the library XML doesn't allow for multiple illustrations, that's why the needed information is read from the ZIM archives. To address this issue, we must enhance the library XML format:

allow multiple illustrations per book
include information about dimensions of each illustration

Tools that generate library XMLs must be updated as well.

In this context, I wonder why we need a separate XML format for the library. Can't we replace the <book> nodes in the library XML with (probably slightly customized) OPDS <entry> nodes, so that we can eliminate some functional duplication between kiwix::Book::updateFromXml() and Book::updateFromOpds()?

mgautierfr · 2021-11-08T14:55:50Z

In this context, I wonder why we need a separate XML format for the library. Can't we replace the nodes in the library XML with (probably slightly customized) OPDS nodes, so that we can eliminate some functional duplication between kiwix::Book::updateFromXml() and Book::updateFromOpds()?

This is a question we are asking ourselves (@kelson42 and I)
We've already discussed this in live during the hackathon but here is where we are now :

It would be interesting to somehow merge the two formats. Less code is better.
But we have to be careful. Although the two formats are pretty similar, they respond to different purposes: The OPDS stream is here to expose a remote catalog and library.xml is here to describe a local library.
There are small subtle differences (opds stream has no path. It contains only links to illustrations, not the illustration data itself). We don't know if this difference in usage is a blocker or not.
We need a transition plan (we don't want the user to "lose" its library when he updates his software).
Maybe we could move this xml parsing/generating part in a separated library to allow other projects (CMS, ..) to not reinvent the wheel.

Reuse the <entry> node seems a good idea (I always dislike this huge attributes list in the <book> node)
@veloman-yunkan If you have a idea of how we could do this, please do not hesitate to speak about it (hero or in another dedicated issue).

kelson42 · 2021-11-21T15:40:06Z

Related to kiwix/overview#59

mgautierfr added bug kiwix-serve labels Oct 25, 2021

kelson42 added this to the 10.0.0 milestone Oct 28, 2021

kelson42 assigned veloman-yunkan Oct 29, 2021

veloman-yunkan mentioned this issue Oct 31, 2021

Support for multiple illustrations in a Book #630

Merged

mgautierfr mentioned this issue Nov 3, 2021

Better, simpler handle_meta. #626

Closed

mgautierfr mentioned this issue Nov 8, 2021

Would it be a good idea to have multiple zim files share the same favicon URL when possible? #609

Open

mgautierfr closed this as completed in #630 Nov 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opds dumper need to read all archives to have illustration size. #625

Opds dumper need to read all archives to have illustration size. #625

mgautierfr commented Oct 25, 2021

kelson42 commented Oct 28, 2021

kelson42 commented Oct 29, 2021

mgautierfr commented Oct 30, 2021

veloman-yunkan commented Oct 31, 2021

mgautierfr commented Nov 8, 2021

kelson42 commented Nov 21, 2021

Opds dumper need to read all archives to have illustration size. #625

Opds dumper need to read all archives to have illustration size. #625

Comments

mgautierfr commented Oct 25, 2021

kelson42 commented Oct 28, 2021

kelson42 commented Oct 29, 2021

mgautierfr commented Oct 30, 2021

veloman-yunkan commented Oct 31, 2021

mgautierfr commented Nov 8, 2021

kelson42 commented Nov 21, 2021