Skip to content

Commit

Permalink
Update README.md for extract module
Browse files Browse the repository at this point in the history
  • Loading branch information
jaebeom-kim authored Sep 21, 2024
1 parent 5d950ad commit 8ddb1c0
Showing 1 changed file with 34 additions and 3 deletions.
37 changes: 34 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,18 +33,21 @@ Please cite: [Kim J, Steinegger M. Metabuli: sensitive and specific metagenomic

---
### Update in v1.0.8
- Added `extract` module to extract reads classified into a certain taxon.

### Update in v1.0.7
- **Metabuli became faster 🚀**
- Windows: *8.3* times faster
- MacOS: *1.7* times faster
- Linux: *1.3* times faster
- Test details are in release note.
- Fixed a bug in score calculation that could affect classification results.

### Update in v1.0.6
- Windows OS is supported.
> Metabuli v1.0.6 is too slow on Windows OS. Please use v1.0.7 or later.

### Update in v1.0.4
- Fixed a minor reproducibility issue.
- Fixed a performance-harming bug occurring with sequences containing lowercased bases.
Expand Down Expand Up @@ -124,8 +127,8 @@ Downloaded files are stored in `OUTDIR/DB_NAME` directory, which can be provided

## Classification
```
metabuli classify <i:FASTA> <i:DBDIR> <o:OUTDIR> <Job ID> [options]
- INPUT : FASTA or FASTQ file of reads you want to classify.
metabuli classify <i:FASTA/Q> <i:DBDIR> <o:OUTDIR> <Job ID> [options]
- INPUT : FASTA/Q file of reads you want to classify. (gzip supported)
- DBDIR : The directory of reference DB.
- OUTDIR : The directory where the result files will be generated.
- Job ID: It will be the prefix of result files.
Expand Down Expand Up @@ -200,6 +203,34 @@ It is for an interactive taxonomy report (Krona). You can use any modern web bro
Metabuli can classify reads against a database of any size as long as the database is fits in the hard disk, regardless of the machine's RAM size.
We tested it with a MacBook Air (2020, M1, 8 GiB), where we classified about 15 M paired-end 150 bp reads (~5 GiB in size) against a database built with ~23K prokaryotic genomes (~69 GiB in size).

---
## Extract
After running the `classify` command, you can extract reads that are classified under a specific taxon.
This requires the FASTA/Q files used in the `classify` step and the `JobID_classifications.tsv` file, which is generated as one of the output files.

```
metabuli extract <i:FASTA/Q> <i:read-by-read classification> <i:DBDIR> --tax-id TAX_ID
- FASTA/Q : The FASTA/Q file(s) used during the `classify` step.
- read-by-read classification : The JobID_classifications.tsv file generated by the `classify` step.
- DBDIR : The same DBDIR used in the `classify` step.
- TAX_ID : The taxonomy ID of the taxon at any rank (e.g., species, genus) from which you want to extract the reads.
# Paired-end
metabuli extract read_1.fna read_2.fna JobID_classifications.tsv dbdir --tax-id TAX_ID
# Single-end
metabuli extract --seq-mode 1 read.fna JobID_classifications.tsv dbdir --tax-id TAX_ID
# Long-read
metabuli extract --seq-mode 3 read.fna JobID_classifications.tsv dbdir --tax-id TAX_ID
```
#### Output
- For paired-end samples: `read_1_TAX_ID.fna` and `read_2_TAX_ID.fna`
- For single-end or long-read samples: `read_TAX_ID.fna`

---

## Custom database
Expand Down

0 comments on commit 8ddb1c0

Please sign in to comment.