The poster paper "Converter: Enhancing Interoperability in Research Data Management" of this repository has been accepted by ESWC 2024.

DCAT Converter

The project converts the oai_dc formatted meta data into dcat_ap format and save them in rdf format. This converter has been integrated to harvest the data of Berlin University Alliance. The portal of this project is available since last November: META4BUA

Meta Data Portal

Project Folder Structure

.
├── LICENSE
├── README.md
├── configs                                   ---> all the config.ini files of the repositories.
│   ├── config_fuberlin.ini
│   ├── config_huberlin.ini
│   └── config_tuberlin.ini
├── data                                      ---> RDFs will be in this folder.
│   ├── fu_berlin.rdf
│   ├── hu_berlin.rdf
│   ├── sample.rdf
│   └── tu_berlin.rdf
├── dockerfile
├── requirements.txt                           ---> Libraries
├── schema                                     ---> dcat elements, terms, and dcat_ap, oai_dc
├── schema_matching_experiments                ---> Prompting and similarity calculation to find correspondences between dcat elements and terms according to [1]
└── src
    ├── converter_service.py                   ---> The converter will run from this script.
    ├── data_crawler.py
    ├── dcat_ap.py
    ├── matches.py
    ├── schema_matcher
    └── utils.py

How to run on a conda environment

Create a conda environment

$ conda create -n dcat_env python=3.9

Activate enviroment

$ conda activate dcat_env

Install Requirements

$ pip install -r requirements.txt

config_reponame.ini under configs folder has the specific data crawling parameters.

1. Repository APIs

1.) Refubium Repository

https://refubium.fu-berlin.de/oai/dnb?verb=ListRecords&metadataPrefix=xMetaDissPlus

2.) Depositonce Repository

https://api-depositonce.tu-berlin.de/server/oai/request?verb=ListRecords&metadataPrefix=oai_dc

3.) Edoc Repository

https://edoc.hu-berlin.de/oai/request/?verb=ListRecords&metadataPrefix=oai_dc

Note: The data is harvested in partitions (100 records per request). Entire data is not harvested and imported into the portal with each update.

2. Data Collection and Converter

Run the script below:

$ python src/converter_service.py

3. Pipeline

BOP Docker Compose

https://github.com/sefeoglu/bop-docker-compose

BOP UI, which is based on piveau UI .
```
https://github.com/sefeoglu/bua-bop-ui
```
BOP RDF Importer, which was modified from piveau rdf importer
```
https://github.com/sefeoglu/bop-consus-importing-rdf
```

Note: UI and RDF importer's images above are registered on fokus servers, so please register their images on your own server.

References

[1] Conversational Ontology Alignment with ChatGPT.

@misc{norouzi2023conversational,
      title={Conversational Ontology Alignment with ChatGPT}, 
      author={Sanaz Saki Norouzi and Mohammad Saeid Mahdavinejad and Pascal Hitzler},
      year={2023},
      eprint={2308.09217},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Citation

@misc{efeoglu2024converter,
      title={Converter: Enhancing Interoperability in Research Data Management}, 
      author={Sefika Efeoglu and Zongxiong Chen and Sonja Schimmler and Bianca Wentzel},
      year={2024},
      eprint={2404.13406},
      archivePrefix={arXiv},
      primaryClass={cs.DL}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DCAT Converter

Meta Data Portal

Project Folder Structure

How to run on a conda environment

1. Repository APIs

2. Data Collection and Converter

3. Pipeline

References

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

DCAT Converter

Meta Data Portal

Project Folder Structure

How to run on a conda environment

1. Repository APIs

2. Data Collection and Converter

3. Pipeline

References

Citation