Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate parquet file from SQLite/basex data #80

Open
realmarcin opened this issue Dec 15, 2021 · 0 comments
Open

generate parquet file from SQLite/basex data #80

realmarcin opened this issue Dec 15, 2021 · 0 comments

Comments

@realmarcin
Copy link
Collaborator

The task is to recreate the original parquet file that we used in notebooks for data science and model training. The parquet file is a compressed format that makes a smaller size file and improves loading.

This can possibly be a python script that queries basex and save the resulting table in parquet format.

The original parquet file included all harmonized attributes from the Biosample XML and in addition a set of selected (non-attribute) properties:

primaryId
sraId
sampleName
ownerAbbr
ownerName
taxonomyId
taxonomyName
organismName
status
statusDate
model
package
packageName
title
accession
submissionDate
lastUpdate
publicationDate
dnaSource
entrezTarget
entrezLabel
entrezValue
paragraph

@turbomam @wdduncan @hrshdhgd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant