Skip to content

GenomicDataInfrastructure/starter-kit-htsget

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About htsget

The htsget protocol was selected for the Data Reception part of the GDI starter kit. Htsget enables for retrieval of files from the storage-and-interfaces archive. The specification of the protocol can be found here. The implementation can be found here.

This repository contains the implementation for the GDI starter kit, packaged in the docker-compose.yml file. For a demo setup, you also need to setup the GDI starter-kit storage-and-interfaces. More details regarding that and the way to run the services can be found in the sections below.

HTSGET Configuration

The configuration for the htsget server exists in the folder called config-htsget-rs. Detailed description of the configuration options can be found in the reference implementation repository. The most important settings are:

resolvers.object_types

  • send_encrypted_to_client defines whether crypt4gh should be enabled (whether htsget will assume that the files retrieved from sda download service are encrypted), and make the calculations accordingly.
  • private_key, public_key allow for predefined the keys that htsget will use for communication with sda-download. If the keys are not defined, htsget will create keypairs for every request (preferred default).

resolvers.storage

  • response_url defines the url that will be used in the response to the client from htsget.
  • forward_headers defines whether headers from the client will be forwarded to sda-download (should be set to true in order to use the authentication mechanism).

resolvers.storage.endpoints

  • index defines the location of the index file in sda-download. This is assumed to be non-encrypted in all cases, as it does not contain sensitive information.
  • file defines the location of the header and file requested from sda-download by htsget in order to calculate the ranges according to the request.

The values in the config file config-htsget-rs/download-config.toml allow for requesting encrypted partial or full files.

Running the services

The htsget product of the starter kit depends on the storage-and-interfaces product. Specifically, the data served has to be ingested and stored in the archive included in the storage-and-interfaces repository. This is achieved by using the docker-compose-demo.yml, as below.

To start the services, start the individual docker compose environments from their respective root directories:

docker compose -f docker-compose-demo.yml up -d  # in the folder starter-kit-storage-and-interfaces
docker compose up -d # in the folder starter-kit-htsget

The logs for the two docker compose files can be accessed using the following commands for storage-and-interfaces and htsget respectively

docker compose -f docker-compose-demo.yml logs -f # in the folder starter-kit-storage-and-interfaces
docker compose -f docker-compose.yml logs -f # in the folder starter-kit-htsget

Accessing data with htsget

In order to test the htsget implementation, there needs to be some data ingested into the archive. The demo setup of storage-and-interfaces provides one dataset, DATASET0001, containing the file htsnexus_test_NA12878.bam.

Preparations

Get an authentication token from the auth service of storage-and-interefaces using

token=$(curl -s -k https://localhost:8080/tokens | jq -r '.[0]')

You also need a crypt4gh key pair. This will be used for (re-)encrypting the file before it's sent to you.

crypt4gh generate -n demokey
pubkey=$(base64 -w0 demokey.pub.pem)

Retrieving data

Now you should be able to make the requests to the htsget server. To request the byte range of chromosome 11 of the file htsnexus_test_NA12878.bam run:

curl -v -H "Client-Public-Key: $pubkey" -H "Authorization: Bearer $token" -H -k http://localhost:8088/reads/DATASET0001/htsnexus_test_NA12878?referenceName=11

The request will return a ticket of how to download the requested partial file:

{
 "htsget": {
   "format": "BAM",
   "urls": [
     {
       "url": "data:;base64,Y3J5cHQ0Z2gBAAAAAgAAAA=="
     },
     {
       "url": "http://localhost:8443/s3-encrypted/DATASET0001/htsnexus_test_NA12878.bam.c4gh",
       "headers": {
         "Range": "bytes=16-123",
         ...
       }
     },
     {
       "url": "data:;base64,ZAAAAAAAAACxHxjMhagEVY+4bVEZYuqYGK5Ph3jrffrMhXpc3wYWenp2ofohEUwSBOuZF3kH6TEiQsjSPGaE1bvdMQ2uUuuHLWicplUneE77G079sTW8rJIJJ1VgZecPi9cTfQ=="
     },
     {
       "url": "http://localhost:8443/s3-encrypted/DATASET0001/htsnexus_test_NA12878.bam.c4gh",
       "headers": {
         "Range": "bytes=124-1049147",
         ...
       }
     },
     {
       "url": "http://localhost:8443/s3-encrypted/DATASET0001/htsnexus_test_NA12878.bam.c4gh",
       "headers": {
         "Range": "bytes=2557120-2598042",
         "accept": "*/*",
         ...
     }
   ]
 }
}

This repsonse contains byte ranges (eg. "Range": "bytes=124-1049147") as parts of url requests. This should guide you to make requests to http://localhost:8443/s3-encrypted (which is sda-download from storage-and-interfaces) to retrieve data for chromosome 11 from the file:

curl 'http://localhost:8443/s3-encrypted/DATASET0001/htsnexus_test_NA12878.bam' -H "Authorization: Bearer $token"  -H "Client-Public-Key: $pubkey" -H "Range: bytes=16-123" -o p11-00.bam.c4gh
curl 'http://localhost:8443/s3-encrypted/DATASET0001/htsnexus_test_NA12878.bam' -H "Authorization: Bearer $token"  -H "Client-Public-Key: $pubkey" -H "Range: bytes=124-1049147" -o p11-01.bam.c4gh
curl 'http://localhost:8443/s3-encrypted/DATASET0001/htsnexus_test_NA12878.bam' -H "Authorization: Bearer $token"  -H "Client-Public-Key: $pubkey" -H "Range: bytes=2557120-2598042" -o p11-02.bam.c4gh

The response from hstget also lists two data sections:

"url": "data:;base64,Y3J5cHQ0Z2gBAAAAAgAAAA=="

and

"url": "data:;base64,ZAAAAAAAAACxHxjMhagEVY+4bVEZYuqYGK5Ph3jrffrMhXpc3wYWenp2ofohEUwSBOuZF3kH6TEiQsjSPGaE1bvdMQ2uUuuHLWicplUneE77G079sTW8rJIJJ1VgZecPi9cTfQ==

These segments are part of the requested data. Save the data (eg. Y3J5cHQ0Z2gBAAAAAgAAAA==) to files

echo Y3J5cHQ0Z2gBAAAAAgAAAA== | base64 -d > start.b64
echo ... | base64 -d > mid.b64

Then concatenate all segments:

cat start.b64 p11-00.bam.c4gh mid.b64 p11-01.bam.c4gh p11-02.bam.c4gh > htsnexus_11.bam.c4gh 

Make sure that the file can be decrypted with your private key:

crypt4gh decrypt -s demokeys.sec.pem -f htsnexus_11.bam.c4gh

Finally, check that samtools can open the new file:

samtools view htsnexus_11.bam

or, if you don't have samtools installed

docker run -it --rm -v $(pwd):/tmp staphb/samtools /bin/bash

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published