diff --git a/README.md b/README.md index 0def9a03..5c2fe82a 100644 --- a/README.md +++ b/README.md @@ -33,6 +33,9 @@ Please cite: [Kim J, Steinegger M. Metabuli: sensitive and specific metagenomic --- +### Update in v1.0.8 +- Added `extract` module to extract reads classified into a certain taxon. + ### Update in v1.0.7 - **Metabuli became faster 🚀** - Windows: *8.3* times faster @@ -40,11 +43,11 @@ Please cite: [Kim J, Steinegger M. Metabuli: sensitive and specific metagenomic - Linux: *1.3* times faster - Test details are in release note. - Fixed a bug in score calculation that could affect classification results. + ### Update in v1.0.6 - Windows OS is supported. > Metabuli v1.0.6 is too slow on Windows OS. Please use v1.0.7 or later. - ### Update in v1.0.4 - Fixed a minor reproducibility issue. - Fixed a performance-harming bug occurring with sequences containing lowercased bases. @@ -124,8 +127,8 @@ Downloaded files are stored in `OUTDIR/DB_NAME` directory, which can be provided ## Classification ``` -metabuli classify [options] -- INPUT : FASTA or FASTQ file of reads you want to classify. +metabuli classify [options] +- INPUT : FASTA/Q file of reads you want to classify. (gzip supported) - DBDIR : The directory of reference DB. - OUTDIR : The directory where the result files will be generated. - Job ID: It will be the prefix of result files. @@ -200,6 +203,34 @@ It is for an interactive taxonomy report (Krona). You can use any modern web bro Metabuli can classify reads against a database of any size as long as the database is fits in the hard disk, regardless of the machine's RAM size. We tested it with a MacBook Air (2020, M1, 8 GiB), where we classified about 15 M paired-end 150 bp reads (~5 GiB in size) against a database built with ~23K prokaryotic genomes (~69 GiB in size). +--- +## Extract +After running the `classify` command, you can extract reads that are classified under a specific taxon. +This requires the FASTA/Q files used in the `classify` step and the `JobID_classifications.tsv` file, which is generated as one of the output files. + +``` +metabuli extract --tax-id TAX_ID + +- FASTA/Q : The FASTA/Q file(s) used during the `classify` step. +- read-by-read classification : The JobID_classifications.tsv file generated by the `classify` step. +- DBDIR : The same DBDIR used in the `classify` step. +- TAX_ID : The taxonomy ID of the taxon at any rank (e.g., species, genus) from which you want to extract the reads. + + +# Paired-end +metabuli extract read_1.fna read_2.fna JobID_classifications.tsv dbdir --tax-id TAX_ID + +# Single-end +metabuli extract --seq-mode 1 read.fna JobID_classifications.tsv dbdir --tax-id TAX_ID + +# Long-read +metabuli extract --seq-mode 3 read.fna JobID_classifications.tsv dbdir --tax-id TAX_ID + +``` +#### Output +- For paired-end samples: `read_1_TAX_ID.fna` and `read_2_TAX_ID.fna` +- For single-end or long-read samples: `read_TAX_ID.fna` + --- ## Custom database