Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes the numbering scheme in the XOAI resumption token cursor #81

Closed
wants to merge 113 commits into from

Conversation

landreev
Copy link

@landreev landreev commented Jun 3, 2022

copy-and-pasting from #30:

The OAI spec says the cursor position should start with 0; the XOAI implementation starts with 1.
I.e., currently the resumption token under the 1st page of results looks like this:

<resumptionToken cursor="1">
   MToxMDB8Mjp8Mzp8NDp8NTpvYWlfZGM=
</resumptionToken>

It should instead say cursor="0" etc.

- Move from org.dspace to io.gdcc group
- Make version a variable
- Make Maven use up-to-date plugins via plugin management
- Add release profile with necessary plugins for releases to Maven Central
This adds the code as-is with the exception of the packages being
renamed to the io.gdcc namespace.
- Add README note
- Pull extractors from the library into our codebase
- Replace custom made Hamcrest XPathMatcher with XMLUnit one
- Add missing package-info.java with hints about origin
…ing release only)

This is due to being able to install the JARs locally without a published version of
the main/parent POM, as this might fail due to broken submodules.

Might be reverted later.
The necessary bits of the lyncode/xml-io library were moved to
our submodule xoai-xmlio to decouple us from the non-maintained
upstream XML library.
- The code had some explicit generics written out that have been removed as unnecessary.
- Some class vars were assigned via constructor but not set final when never changed after the fact.
- A varargs function has been updated with a compiler hint about its safe usage.
- Move stax2-api to newer version via parent POM
- Update StAX2 Parser Woodstox to latest version from FasterXML
- Make the parser scope runtime and optional to allow swapping for
  different version (appserver provided etc) or even switch
  to other implementation (like Aalto)
- Make class variables final where possible
- Remove explicit, unnecessary generics
- Remove some minor explicit and unnecessary modifiers like public for interface etc
Adapt xmlio.XmlWriter to implement AutoClosable and make use
of it in xml.XmlWriter by using a try-with-resources to avoid
missing close() calls.
Replace custom made Hamcrest XPathMatcher with XMLUnit one.
- Remove the usages of Commons Lang3 from xoai-data-provider
- Replace usages of random String generation with custom random
  generator living inside xoai-common util package
  io.gdcc.xoai.util.Randoms
poikilotherm and others added 28 commits May 17, 2022 19:10
…nextEvent()

This is necessary to use these basic routines within the EchoElement,
as we will not read from a String there, but make it capable to read from
an InputStream, too.
Before, the XML string sent to the EchoElement was stuffed into a XmlReader
with a ByteArrayInputStream. Now we extend this to be capable of reading
from an arbitrary InputStream when given on object creation.

This commit also adds benchmarking tests using JMH to learn about the speed
decrease that parsing the XML from the input stream causes. It get's compared
to the "native" copy of input to output as seen in Dataverse. Run it via
`mvn -Pbenchmark clean test`

Another small change happened, too: instead of the deprecated Stack class
we now use the replacement "Deque".
This implementation of FilterInputStream has been copied from Apache POI
or, more precise, its origin at Inbot.

Currently, the Dataverse OAI-PMH data provider uses this filter
to remove the XML declaration from the pregenerated XML metadata files.

It is being added here, but flagged as deprecated, purely for
benchmarking reasons.
This is a fast alternative to EchoElement, which does not do
any XML parsing before it copies XML data from an InputStream
into the XmlWriter.

It takes care of tricking the writer into accepting the data
without further addo, but REQUIRES that the writer already
contains a wrapping element (you cannot write at root with this).
…XOAIMetadata

All of these share the common interface XmlWritable, so we can store the element to write out
as metadata as this type (no switch or if necessary).

This also moves the data handling to using classes - they create the data and the XML writables,
this class is just used for the modeling of a fluent API.

It is linked to the creation of items, which are generated by the application using this library
via the repository interfaces. The application may decide how to fill in metadata when creating items.
When adding new transformers, ignore null ones and proceed.
Simplifies adding transformers from context and metadata format in data-provider.
…fault

Instead of having to override the function unnecessary, provide a default of return an empty list
Instead of an Item, make the methods sending an identifier only, return an object of type
ItemIdentifier to make it more clear this does not carry metadata - this interface does not expose
a getMetadata(), which is done with the Item interface only.

The getItem() method is changed to send along a metadata format, so the application
can expose pregenerated or cached metadata within the specific format.

See also classes CopyElement and Metadata.copyFromStream()
- Shell out the XSL pipeline handling to a new MetadataHelper to avoid
  duplicated code in GetRecordHandler and ListRecordsHandler
- Send metadata format via ItemRepository.getItem(identifier, format)
  to retrieve an item filled with metadata
- The refactored Metadata class allows to distinguish if the underlying
  data needs processing or not. Reusing this here to skip the XSL
  pipeline when unnecessary. It's up to the application to provide
  valid metadata that create a validatable OAI-PMH response!
- Skipping the processing allows for pregeneration/caching of
  potentially large metadata like DDI codebooks
- Refactor InMemoryItem and InMemoryItemRepo with a more consice API
- Extend the GetRecordHandler and ListRecordsHandler tests:
    - Add explicit test example cases that include non-deleted, random metadata items
    - Add explicit test example cases that include non-deleted items
      associated with a CopyElement and InputStream
    - Verify the correct existance, but do not validate the OAI-PMH response
      (yet?)
No one should use this FilterInputStream, so we exclude it from
being shipped with the xoai-common JAR. As it's only being used
for EchoElementBenchmark, it can happily live within src/test
- java.util.Date has many flaws and should not be used anymore
- Changing all necessary classes and tests to use Instants
- Simplified the implementation of UTCDateProvider
- Added lots of test for the date provider class to ensure compatibility

See also: https://stackoverflow.com/a/59940399
- java.util.Date has many flaws and should not be used anymore
- Changing all necessary classes and tests to use Instants

See also: https://stackoverflow.com/a/59940399
- java.util.Date has many flaws and should not be used anymore
- Changing all necessary classes and tests to use Instants

See also: https://stackoverflow.com/a/59940399
…rovider interface #19

Instead of creating an instance every time, lets just use static
methods. Deleting the implementation and sticking with the interface
makes it still changeable.
This is a change originally done by @mike-podolskiy90 in commit
f0445e0

It's slightly extended with:
1) do this for ListSets and and ListIdentifiers, too and
2) only add the number if there are results by checking in the
   ResumptionTokenHelper
19 replace time and 8 GBIF change with totalResults
… spec says the first position must be 0,

not 1. (#30)
@landreev landreev closed this Jun 3, 2022
@poikilotherm poikilotherm deleted the 30-resumptiontoken-cursor branch June 17, 2022 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants