-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transcript quantification results are inconsistent #225
Comments
Dear @biochristmas OUT.transcript_grouped_tpm.tsv contain all the reference transcripts, even those with 0 counts. There are no novel transcripts in this file. These files are produced by different algorithms, so some inconsistency is expected. Although I'd agree it's quite noticeable n your example. Best |
@andrewprzh , I am using version 3.4.1 of IsoQuant。I have a new question: I noticed that when I run this command to expand transcript annotations on the reference genome: isoquant.py --reference ./reference.fa --genedb ./reference.gtf --fastq ./1_clustered.fasta ./2_clustered.fasta ./3_clustered.fasta --data_type pacbio_ccs --model_construction_strategy all --fl_data --threads 60 -o isoquant_result. Note: The FASTA sequences provided in the --fastq parameter are the results of clustering full-length transcript sequences using Isoseq3, and the number of sequences in clustered.fasta is much fewer compared to flnc.fasta. When I execute this command, the resulting file OUT.extended_annotation.gtf contains a total of 107,290 transcripts. However, when I use flnc.fasta (the full-length transcript sequences before clustering) as the input for the --fastq parameter, the resulting file OUT.extended_annotation.gtf contains 185,626 transcripts. |
Yes, the number of reported transcripts can be affected by the number of sequences (something that we also call "read support"). So such result is expected. If you use Best |
Hi, I ran this command to perform Isoquant quantification with the aim of obtaining transcript quantification results for all samples: isoquant.py --reference genome.fa --genedb ./merge.combined.gtf --fastq CK.derRNA.fastq T1.derRNA.fastq T2.derRNA.fastq --data_type nanopore --model_construction_strategy all --threads 60 -o isoquant_quant. The GTF annotation file contains 320,000 transcripts. Based on Isoquant quantification, two quantification result files were generated: OUT.transcript_model_grouped_tpm.tsv and OUT.transcript_grouped_tpm.tsv. The number of transcripts (rows) in OUT.transcript_model_grouped_tpm.tsv is 260,000, while in OUT.transcript_grouped_tpm.tsv it is 320,000. Can I interpret the OUT.transcript_grouped_tpm.tsv file as containing the total transcript quantification results? Additionally, I noticed that the TPM values for the same transcript in OUT.transcript_model_grouped_tpm.tsv and OUT.transcript_grouped_tpm.tsv are inconsistent. Third question: If all TPM values for a transcript are zero in the OUT.transcript_grouped_tpm.tsv file, then that transcript will not appear in the OUT.transcript_model_grouped_tpm.tsv file. Therefore, I believe that the OUT.transcript_model_grouped_tpm.tsv file contains quantification results for transcripts that are expressed, while the OUT.transcript_grouped_tpm.tsv file contains quantification results for all transcripts, regardless of whether all sample TPM values are zero.
The text was updated successfully, but these errors were encountered: