Metazoa Assemblies

This repository houses the scripts for the manuscript, "Towards a genome sequence for every animal: where are we now?"

dedup_list_tab.py takes as input genome assembly metadata harvested from NCBI through the NCBI datasets tool in the form of a csv. The CSV must be sorted first by taxid and second by contig N50. Then it will choose the assembly for each taxon with the longest contig N50. The script will also discover whether an annotation exists for that species on NCBI (for any assembly).

scrape_assembly_info.py is a web scraper based on Beautiful Soup that will scrape metadata that is not included in the standard NCBI datasets metadata. All that the script needs as input is an assembly accession number.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
README.md		README.md
dedup_list_tab.py		dedup_list_tab.py
scrape_assembly_info.py		scrape_assembly_info.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metazoa Assemblies

About

Releases

Packages

Languages

pbfrandsen/metazoa_assemblies

Folders and files

Latest commit

History

Repository files navigation

Metazoa Assemblies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages