Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Ariadne portal and/or ADS #34

Open
dietervu opened this issue Jun 25, 2024 · 4 comments
Open

Support for Ariadne portal and/or ADS #34

dietervu opened this issue Jun 25, 2024 · 4 comments
Assignees

Comments

@dietervu
Copy link
Contributor

dietervu commented Jun 25, 2024

For Ariadne portal example link:

https://portal.ariadne-infrastructure.eu/resource/d37baec6fe87dcdb28108a90f0f4ea010dd6d758eef7e232dab8517e818179b9

There is an API available that exposes the metadata as JSON or XML (see landing page links)

In fact the part that DOG should be able to access is the PDF, which seems only available via the original location:

https://archaeologydataservice.ac.uk/library/browse/issue.xhtml?recordId=1178177&recordType=GreyLitSeries
(DOI: https://doi.org/10.5284/1076468)

The missing part is the ADS API (or possibly the OASIS API) which would provide access to all metadata fields in a machine readable manner.

Also to be investigated: which sections of the ADS site are relevant?

@dietervu
Copy link
Contributor Author

Preliminary answer from Julian:

It would probably help us answer your question by knowing the context and what you want to do, but here are some initial pointers.

The ARIADNE portal is an aggregator and only holds what we regard as metadata (although in some cases we ingest the whole of a dataset at record level so it tells the full story) But generally the fuller datasets (and certainly all downloadable files) are all held by repositories such as ADS. As well as the json and XML downloads, there is a public SPARQL endpoint for the whole ARIADNE knowledge base. I can send you the link if you are interested.

For ADS we have a few APIs, including an OASIS one which Tim knows all about, and are planning to update some others. But we are also in the process of migrating our three separate search interfaces to a single search using the ARIADNE portal framework, and the ARIADNE triple store - so all our metadata would then be interrogable via the ARIADNE SPARQL endpoint. (As you might imagine these three search interfaces are confusing to users who may not know if they want an archive, a brief site record, a journal article or report!) This will happen within the next two years, starting with ArchSearch and Archives, so within the ATRIUM timescale.

Does that help at all? if you can explain what underlies the question we may be able to give more information.

@dietervu
Copy link
Contributor Author

Adding some relevant pointers for the ARIADNE sparql endpoint:

@dietervu
Copy link
Contributor Author

dietervu commented Sep 2, 2024

follow-up tom Tim and Julia:

Thanks a lot for the information. For us and for the short term I think it might make most sense to setup a concrete implementation for a simple case, where we try to process the text in a PDF report that has a DOI, just as the example I mentioned in my earlier mail: https://archaeologydataservice.ac.uk/library/browse/issue.xhtml?recordId=1178177&recordType=GreyLitSeries

Now our concrete question is: what would be the best way to extract the link to the PDF (https://archaeologydataservice.ac.uk/catalogue/adsdata/arch-882-1/dissemination/pdf/acarchae2-347153_1.pdf) when you start with the DOI? (https://doi.org/10.5284/1076468)

Worst case we could try to parse the HTML, but maybe the OASIS API has some specific call for this?

answer Tim:

I'm afraid the OASIS API won't be of use here, primarily as it only returns the DOI and not the link to the specific file(s). The DOI is registered using a UID for the metadata landing page. It's the page itself that then queries another underlying database to pull out the relevant file(s). At the moment there's nothing practical we have to hand that allows an external user to extract this data. Could you try the parsing approach for now? If this proves too difficult then let me know and I'll have a think.

@MichalGawor is currently working on a plugin that is based on HTML parsing

@dietervu
Copy link
Contributor Author

dietervu commented Oct 7, 2024

First implementation available at https://alpha-dog.clarin.eu/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants