Update README.md for extract module

steineggerlab · Sep 21, 2024 · 8ddb1c0 · 8ddb1c0
1 parent 5d950ad
commit 8ddb1c0
Showing 1 changed file with 34 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -33,18 +33,21 @@ Please cite: [Kim J, Steinegger M. Metabuli: sensitive and specific metagenomic
 
 
 ---
+### Update in v1.0.8
+- Added `extract` module to extract reads classified into a certain taxon.
+
 ### Update in v1.0.7
 - **Metabuli became faster 🚀**
   - Windows: *8.3* times faster
   - MacOS: *1.7* times faster
   - Linux: *1.3* times faster
   - Test details are in release note.
 - Fixed a bug in score calculation that could affect classification results.
+
 ### Update in v1.0.6
 - Windows OS is supported.
 > Metabuli v1.0.6 is too slow on Windows OS. Please use v1.0.7 or later.
 
-
 ### Update in v1.0.4
 - Fixed a minor reproducibility issue.
 - Fixed a performance-harming bug occurring with sequences containing lowercased bases.
@@ -124,8 +127,8 @@ Downloaded files are stored in `OUTDIR/DB_NAME` directory, which can be provided
 
 ## Classification
 ```
-metabuli classify <i:FASTA> <i:DBDIR> <o:OUTDIR> <Job ID> [options]
-- INPUT : FASTA or FASTQ file of reads you want to classify. 
+metabuli classify <i:FASTA/Q> <i:DBDIR> <o:OUTDIR> <Job ID> [options]
+- INPUT : FASTA/Q file of reads you want to classify. (gzip supported)
 - DBDIR : The directory of reference DB. 
 - OUTDIR : The directory where the result files will be generated.
 - Job ID: It will be the prefix of result files.  
@@ -200,6 +203,34 @@ It is for an interactive taxonomy report (Krona). You can use any modern web bro
 Metabuli can classify reads against a database of any size as long as the database is fits in the hard disk, regardless of the machine's RAM size.
 We tested it with a MacBook Air (2020, M1, 8 GiB), where we classified about 15 M paired-end 150 bp reads (~5 GiB in size) against a database built with ~23K prokaryotic genomes (~69 GiB in size).
 
+---
+## Extract 
+After running the `classify` command, you can extract reads that are classified under a specific taxon.
+This requires the FASTA/Q files used in the `classify` step and the `JobID_classifications.tsv` file, which is generated as one of the output files.
+
+```
+metabuli extract <i:FASTA/Q> <i:read-by-read classification> <i:DBDIR> --tax-id TAX_ID
+
+- FASTA/Q : The FASTA/Q file(s) used during the `classify` step.
+- read-by-read classification : The JobID_classifications.tsv file generated by the `classify` step.
+- DBDIR : The same DBDIR used in the `classify` step.
+- TAX_ID : The taxonomy ID of the taxon at any rank (e.g., species, genus) from which you want to extract the reads.
+
+
+# Paired-end
+metabuli extract read_1.fna read_2.fna JobID_classifications.tsv dbdir --tax-id TAX_ID
+
+# Single-end
+metabuli extract --seq-mode 1 read.fna JobID_classifications.tsv dbdir --tax-id TAX_ID
+
+# Long-read 
+metabuli extract --seq-mode 3 read.fna JobID_classifications.tsv dbdir --tax-id TAX_ID
+
+```
+#### Output
+- For paired-end samples: `read_1_TAX_ID.fna` and `read_2_TAX_ID.fna`
+- For single-end or long-read samples: `read_TAX_ID.fna`
+
 ---
 
 ## Custom database