Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AgroPortal Hackathon #36

Open
twktheainur opened this issue Apr 16, 2018 · 19 comments
Open

AgroPortal Hackathon #36

twktheainur opened this issue Apr 16, 2018 · 19 comments
Labels
Resource Participant is providing (access to) resources, such as an ontology etc

Comments

@twktheainur
Copy link

Dear OpenMinted Team,

Our implementation is concluded and we were already given feedback on the metadata we produce for the Agroportal, SIFR Bioportal, NCBO Bioportal and Biblioportal ontologies by Penny. We integrated the suggested changes. Please note that the submissions made on the test platform were prior to our integration of the changes resulting from the feedback on the metadata.

I am creating this issue for a final check. Everything is working, including future milestones (Biblioportal and NCBO Bioportal support, although for NCBO Bioportal the api call to retrieve all ontology metadata at once does not function due to a server timeout outside of our control).

For the final deliverable, we have a gihub project (https://github.com/agroportal/ncboproxy), with a detailed readme explaining the general architecture, some details about the OMTD-Share adaptation specifically, deployment instructions (although the API is based on the production web-services and will not require any deployment on the part of the OpenMinted team) and a Javadoc documentation.

Do you think the level of detail is sufficient, so that the content of the gihub project may be used as the last deliverable in full? Otherwise, we can produce a standalone PDF document reprising this information. In the latter case, should we include a full printout of the code (as the instructions appear to suggest) or would a link to the GitHub project suffice?

I am also including some metadata outputs for Agroportal, SIFR Bioportal, NCBO Bioportal and Biblioportal for your reference:

AgroportalSample.zip
SIFRBioportalSample.zip
BiblioportalSample.zip
NCBOBioportalSamples.zip

Best Regards,
On behalf of the AgroPortal team,
Andon Tchechmedjiev

@gkirtzou gkirtzou self-assigned this Apr 16, 2018
@pennyl67
Copy link
Collaborator

Hi Andon
I have downloaded and run the sample XML files with a validator, and they are mostly ok apart from the following remarks:

  • should have a fixed value: "text" instead of the value it now has; from what I see, the value that is mostly used is "Ontological/Terminological resource, electronic distribution, RDF". I suggest you add them as , since I've seen that the is already correctly filled in.
  • one of the files, contains an invalid value for an email (a url site instead of email)
  • in the agrovoc.xml, as regards the element: is it possible to generate the name of the url from the APIkey as a value? e.g.
    <ms:domain classificationSchemeName="other" schemeURI="http://data.agroportal.lirmm.fr/categories/NATRES?apikey=d245163b-98b4-41a4-a66d-09c1847b756f">Natural Resources, Earth and Environment?
    Apart from these, the metadata records seem ok.

@antleb Could you also please check the technical details and see if the XML files can be imported to the registry as required?

@greenwoodma
Copy link
Member

@pennyl67 looks as if some XML elements have disappeared from at least the first bullet point?

@twktheainur
Copy link
Author

twktheainur commented Apr 16, 2018

@pennyl67

  1. I have now fixed the value as "text". I had though, from your initial feedback, that the value should be a description text rather than the literal value "text".
  2. The invalid email is caused by incorrectly input metadata for that ontology on Biblioportal, unfortunately not something I have control over. If that could be a significant issue for the platform, I can put a filter that only allows valid emails in this field, although it means that some ontologies will not have contact emails at all (which may render the XML invalid as per the XSD specs.)
  3. That can be done quite easily, I will make the change

@pennyl67
Copy link
Collaborator

@greenwoodma you are right! Thanks for noticing!
@twktheainur

  1. sorry if I had misinformed you; it's definitely a misunderstanding; anyway, the values you have used could very well be used for "keyword" - no need to lose them
  2. ok, this could be a problem. If we leave an invalid email in this field, the file won't be uploaded because it won't validate. If there's no contact information, the file is again considered invalid. But in the sample I saw, the email with the url site was used for the contactPerson, while you had correctly mapped the url to the landingPage element as well. So, you could simply not use the communicationInfo template at all for the person. Would that help? Or are there other cases we should also consider?
  3. Perfect!

@twktheainur
Copy link
Author

@pennyl67 For number 2, I think the best option would be to remove the contextPerson entry altogether in the case where the value supplied for email is invalid.

I will make the changes, deploy the updated adapter code and notify you here when it's done

@jonquet
Copy link

jonquet commented Apr 16, 2018

Hello all, sorry for being late, thanks @twktheainur for reporting on our project.

To come back on point 2: whatever we decide, the key aspect if to go down to the ontology contact person to actually let them know they should correct the metadata. I think this is ok to say to someone uploading an ontology to one of the 4 repositories: if you fill in this ans this more carefully, your ontology will be available also in the OMTD platform.
I will contact the owner of the ontology in BiblioPortal that is invalid.

As of producing the final deliverables (T3, D4 and T4) we will do it offline as official PDF documents referring to the GitHub project (https://github.com/agroportal/ncboproxy) in the case of T3.

On our side, we still need to:
a. Fix the timeout when producing the Zip file for the NCBO BioPortal
b. Implement with the NCBO team the rerouting from the bioontology.org and ontoportal.org domains
These two last points shall be discussed soon with @graybeal and @alexskr
c. Produce the final deliverables

@twktheainur
Copy link
Author

@pennyl67 I have made the corrections. I am attaching a full metadata export for all ontologies on SIFR BioPortal (29 ontologies), AgroPortal (98 ontologies), BiblioPortal (26 ontologies) and NCBO BioPortal (779 ontologies)

Agroportal_ontologies_omtd-share_metadata-16_04_2018-20_20.zip
SifrBioportal_ontologies_omtd-share_metadata-16_04_2018-20_20.zip
Biblioportal_ontologies_omtd-share_metadata-16_04_2018-20_16.zip
NCBOBioportal_ontologies_omtd-share_metadata-16_04_2018-20_46.zip

@pennyl67
Copy link
Collaborator

@twktheainur
Thanks once more!
So, I have validated the files and did some sample checking - there's no way I can check each and every file - and the only remaining things I found are:

  • some ontologies contain no contactInfo; I think if there is no other information, you could use the URI of the ontology as a landing page?
  • mainly for SifrBioportal ontologies: the element nonStandardLicenceTermsURL must be a URL; you can use the element nonStandardLicenceTermsName (for the name of a licence, if there is one) and nonStandardLicenceTermsText for textual fields. (By the way, @jonquet if you can recommend to ontology contact persons to add licences, especially standard ones such as CC latest version, it would be great! it's one of the elements that are important for processing.)

@twktheainur
Copy link
Author

Thank you for the feedback @pennyl67.
I will add the ontology URI in the portal as a fallback landingPage.
For the nonStandardLicenceTermsURL, the regular expression that checked for URLs contained an error. I have replaced it by an exhaustive regex to match valid URLs, the problem should be now solved.

Yesterday I noticed that the openminted project group on github had an omtd-model maven project that is available on maven central and that also creates a JAXB binding of the XSD specification in exactly the same manner as our implementation. Consequently, I have included omtd-model as a dependency of our implementation and replaced our jaxb bindings. This should improve maintainability when future versions of the specifications are released.

@pennyl67
Copy link
Collaborator

Thanks @twktheainur!

@greenwoodma greenwoodma added the Resource Participant is providing (access to) resources, such as an ontology etc label Apr 18, 2018
@pennyl67
Copy link
Collaborator

@twktheainur When you have the updated metadata records for the ontologies, could you upload them again for the final check? Thanks!

@gkirtzou gkirtzou removed their assignment Apr 19, 2018
@twktheainur
Copy link
Author

@pennyl67 Here are the updated metadata records for the 4 portals.

Agroportal_ontologies_omtd-share_metadata-19_04_2018-13_34.zip
SIFRBioPortal_ontologies_omtd-share_metadata-19_04_2018-13_23.zip
BiblioPortal_ontologies_omtd-share_metadata-19_04_2018-13_26.zip
NCBOBioPortal_ontologies_omtd-share_metadata-19_04_2018-13_45.zip

I tried to submit a few ontologies on the test platform directly from the URL of our API call rather than by copy pasting the XML, as it is the intended role of that API, which seems to be working fine

@pennyl67
Copy link
Collaborator

I've run the validation test again and I get the following errors:

  • from BiblioPortal: one metadata (test23.xml) has no contactInfo (in fact, I don't know if it's valid, as the description is a simple bla bla)
  • AgroPortal: an error that I don't recall having seen before: the languages are missing from some records (e.g. ADO.xml) although the metalanguages module is filled.
  • NCBOBioPortal: again the missing contactInfo error - which is what you said you had corrected; maybe I got a wrong file?
  • SifrBioPortal: languages error as in AgroPortal.
    Can you check and send me updated files?

@twktheainur
Copy link
Author

twktheainur commented Apr 23, 2018

Apologies, it appears my initial reply did no go through and remained unposted, which I have realised just now

@pennyl67 Some issues were introduced when I switched to using omtd-model as a dependency after the previous round of fixes. I have debugged the issues, the output should now be ok.

Concerning the TEST23 ontology, I believe it is a test ontology that someone submitted to the portal publicly. There is also a TEST ontology. Given that anyone can submit content to biblioportal and that most users are not technologically savvy, such errors are more prone to happen on biblioportal, however it is not excluded the same could happen on the other portals too. All I can do in this care is notify the people in NCBO so that they can address the issue.

Agroportal_ontologies_omtd-share_metadata-19_04_2018-16_58.zip
SifrBioPortal_ontologies_omtd-share_metadata-19_04_2018-16_54.zip
Biblioportal_ontologies_omtd-share_metadata-19_04_2018-17_00.zip
NCBOBioPortal_ontologies_omtd-share_metadata-19_04_2018-17_29.zip

@jonquet
Copy link

jonquet commented Apr 23, 2018

Just reported to delete TEST ontology from BiblioPortal and enter correct information for Contact info (name+email)

@pennyl67
Copy link
Collaborator

Thanks @jonquet and @twktheainur
I'll get back to you with any news on the validation - I didn't have the time to check today

@pennyl67
Copy link
Collaborator

Hi @twktheainur I only found three invalid records (DOCC, NCC, and NCCO in the NCBIO portal. Again, the empty contactInfo problem - maybe some files were not parsed with the suggested solution (i.e. using the resourceIdentifier)?
Can you check again and let me know?
Thanks

@twktheainur
Copy link
Author

twktheainur commented Apr 24, 2018

@pennyl67 Thank you for the feedback, a corner case wasn't handled properly. I have pushed a fix. I am attaching the corrected version of the three incriminated records

DOCC_NCC_NCCO.zip

@pennyl67
Copy link
Collaborator

@twktheainur Thanks! Then all the files are now valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resource Participant is providing (access to) resources, such as an ontology etc
Projects
None yet
Development

No branches or pull requests

5 participants