diff --git a/datasets/hgnc/config.yml b/datasets/hgnc/config.yml index 12c5972..b0df4a6 100644 --- a/datasets/hgnc/config.yml +++ b/datasets/hgnc/config.yml @@ -1,5 +1,6 @@ dataset_to_process: "hgnc" +# THIS FILE IS NOT USED BY GITHUB ACTIONS ## For XML workflows # dir_to_process: diff --git a/datasets/hgnc/download/download.sh b/datasets/hgnc/download/download.sh index d2b63be..99847a0 100755 --- a/datasets/hgnc/download/download.sh +++ b/datasets/hgnc/download/download.sh @@ -3,12 +3,12 @@ ## Download sample TSV files from GitHub (OMOP CDM mappings, about 15M) wget -N ftp://ftp.ebi.ac.uk/pub/databases/genenames/hgnc/tsv/hgnc_complete_set.txt -# Convert TSV to CSV for RMLStreamer +# Convert TSV to CSV for RML Mapper sed -e 's/"//g' -e 's/\t/","/g' -e 's/^/"/' -e 's/$/"/' -e 's/\r//' hgnc_complete_set.txt > hgnc.csv +##Example to run quick python scripts: # pip install pandas - # python3 < concepts.csv\"\"\"\n", - "# os.system(cmd_convert_csv)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Process and load concepts\n", - "\n", - "We will use CWL workflows to integrate data with SPARQL queries. The structured data is first converted to a generic RDF based on the data structure, then mapped to BioLink using SPARQL. The SPARQL queries are defined in `.rq` files and can be [accessed on GitHub](https://github.com/MaastrichtU-IDS/d2s-project-template/tree/master/datasets/hgnc/mapping).\n", - "\n", - "Start the required services (here on our server, defined by the `-d trek` arg):\n", - "\n", - "```bash\n", - "d2s start tmp-virtuoso drill -d trek\n", - "```\n", - "\n", - "Run one of the following d2s command in the d2s-project folder:\n", - "\n", - "```bash\n", - "d2s run csv-virtuoso.cwl hgnc\n", - "d2s run xml-virtuoso.cwl hgnc\n", - "```\n", - "\n", - "[HCLS metadata](https://www.w3.org/TR/hcls-dataset/) can be computed for the hgnc graph:\n", - "\n", - "```bash\n", - "d2s run compute-hcls-metadata.cwl hgnc\n", - "```\n", - "\n", - "## Load the BioLink model\n", - "\n", - "Load the [BioLink model ontology as Turtle](https://github.com/biolink/biolink-model/blob/master/biolink-model.ttl) in the graph `https://w3id.org/biolink/biolink-model` in the triplestore\n" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.2" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} \ No newline at end of file