-
Notifications
You must be signed in to change notification settings - Fork 6
/
Resp.to.review.tex
165 lines (97 loc) · 7.21 KB
/
Resp.to.review.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
\documentclass[11pt]{article}
\usepackage{longtable}
\usepackage{graphicx}
\usepackage{lineno}
%\usepackage{amssymb}
\usepackage{color}
\definecolor{darkgreen}{rgb}{0,0.5,0}
\usepackage{hyperref}
\hypersetup{colorlinks=true, urlcolor=blue, citecolor=darkgreen}
\usepackage{natbib}
\usepackage{fullpage}
\usepackage{setspace}
\usepackage{listings}
\usepackage{scrextend}
\renewcommand{\familydefault}{\sfdefault}
\def\changemargin#1#2{\list{}{\rightmargin#2\leftmargin#1}\item[]}
\let\endchangemargin=\endlist
\begin{document}
\begin{flushleft}
{\Large
\textbf{Improving transcriptome assembly through error correction of high-throughput sequence reads}
}
\\
\vspace{4mm}
\noindent
Matthew D MacManes$^{1}$$^\ast$ and
Michael B. Eisen$^{1,2}$ \\
\vspace{5mm}
\bf{1} \textnormal{UC Berkeley. California Institute of Quantitative Biology, Berkeley, CA, USA} \\
\bf{2} \textnormal{Howard Hughes Medical Institute} \\
\vspace{2mm}
\bf{$\ast$} \textnormal{Corresponding author: \href{mailto:[email protected]}{[email protected]}, Twitter: \href{https://twitter.com/PeroMHC}{$@$PeroMHC}}
\end{flushleft}
\vspace{4mm}
\section*{Response to Reviewers Comments}
\hspace{5mm} Please find the attached paper, which is a revision of manuscript number 2013:04:462:0:0. The manuscript has undergone substantial change in response to comments from Titus, Mick, and Gerard (Editor). My reading of these comments suggested several major deficits, all of which have been addressed. We now use paired-end data, compare our results using simulated data to 'real' data, and make our data publicly available. In addition to these major changes (which required that we redo correction, assembly, and analysis), we also address the other concerns, including justifying our choice of correction methods and data types.
The paper is much stronger as a result of this review process, and I hope you will find the changes satisfactory. \\
\noindent
Best, Matt MacManes
\begin{enumerate}
\item The use of real versus fake data.
\begin{addmargin}[2em]{2em}
This is an excellent suggestion. In response, I have done a parallel analysis using a random 30M PE read subset of the Trinity mouse dataset. I choose to use 30M reads rather than the full 50M read dataset to maximize comparability, though moderate differences in coverage still exist. \\
\noindent
With regards to analyses, the paper continues to focus mostly on the simulated data, though I now state that the results of the simulation study have been corroborated by analysis of real data. That the finding are robust to dataset should be interpreted as evidence of general utility of these methods.
\end{addmargin}
\item Use paired end reads.
\begin{addmargin}[2em]{2em}
I have used 30 million PE reads in this version of the paper. The results are unchanged, though the magnitude of the improvement is reduced. I believe this to be due to the general improvement of the assembly quality secondary to use of PE reads.
\end{addmargin}
\item Deposit the data!.
\begin{addmargin}[2em]{2em}
The data are now publicly available. The reads are available on my personal Dropbox (\url{https://www.dropbox.com/s/rkl0ihqom28smb2/empiric.reads.tar.gz} and \url{https://www.dropbox.com/s/mp8fu0tijox69ki/simulated.reads.tar.gz}). They will be moved to Dryad upon acceptance. The assemblies are available at \url{http://dx.doi.org/10.6084/m9.figshare.725715}. The code has been moved to a Github Gist \url{https://gist.github.com/macmanes}
\end{addmargin}
\item Justify the use of error correction techniques
\begin{addmargin}[2em]{2em}
The specific error correction techniques were chosen in an attempt to cover the breadth of computational techniques, as well as to include tools commonly used, and recently benchmarked. I make this explicit in the manuscript. Of note, this version removes the program ECHO from the analysis. Given that it would have taken at least 4 weeks to run with the new PE data and empirically derived dataset, I did not have the time, given the constraints on resubmission.
\end{addmargin}
\item Use more than just Illumina data.
\begin{addmargin}[2em]{2em}
While I agree that expanding the paper to include different types of sequence data (454 and IonTorrent) would be interesting, it is beyond the scope of this current paper, which focuses on the current (and likely future, given the number of deployed machines) most popular sequencing technology. In addition, that each technology has a unique error model, understanding the results would not be easy, given simulated data not accessible.
\end{addmargin}
\item What's going on with splice variants?
\begin{addmargin}[2em]{2em}
Good question... As Titus notes in his review, the analysis of splice variants is tricky. I am quite interested in developing these methods, perhaps as part of a larger project (in connection with my work with Trinity). I think that attempting to dig into splicing here would be too much. I do note the issue in the discussion (line 230 -- ), as a potential issue with error correction.
\end{addmargin}
\item Variable coverage and miscorrections?
\begin{addmargin}[2em]{2em}
This is a really interesting point, and one I would have expected the developers of the various error correctors to have addressed-- They have not. I do see signs of low expressed reads suffering from a less efficient correction (I suspect this is why Seecer did so poorly in my trials). Also, when looking at the number of corrections per read-- reads that are miscorrected tended to have more corrections than did appropriately corrected reads. (line 232 -- ). About 5\% of reads are miscorrected. I would expect that number to be reduced with higher coverage.
\end{addmargin}
\item Use other assemblers.
\begin{addmargin}[2em]{2em}
This is a useful suggestion, and one which I would have loved to do. However, the extremely large amount of work that benchmarking many different assemblers is not feasible given the constraints on time for revisions. I would expect that the general patterns to hold, though the magnitude of difference may vary as some assemblers may be better/worse that Trinity at 'bubble popping'.
\end{addmargin}
\item Do I recommends these methods for genomic data?
\begin{addmargin}[2em]{2em}
Yes, all (with exception to Seecer) have been developed for genomic reads, and should be even more efficient with those data, given even coverage.
\end{addmargin}
\item Add a pipeline?
\begin{addmargin}[2em]{2em}
I have included a set of commands that should facilitate correction using Reptile-- \url{https://gist.github.com/macmanes/5878728}.
\end{addmargin}
\item Why do errors remain?
\begin{addmargin}[2em]{2em}
Good question-- it seems logical to assume that increasing coverage would solve this, but maybe not.. We do see a suggestion that splice isoforms really messes up error correction (and we know their effects on assembly all too well). Also, it may be something to go with low complexity sequence causing spurious matching..
\end{addmargin}
\item Other issues.
\begin{addmargin}[2em]{2em}
\begin{itemize}
\item I have formalized punctuation.
\item changed text to \textit{de Bruijn}
\item Added citations to highlight the effect of errors (line 22)
\item LaTeX tag has been removed.
\end{itemize}
\end{addmargin}
\end{enumerate}
\end{document}