-
Notifications
You must be signed in to change notification settings - Fork 5
/
HISTORY
305 lines (281 loc) · 14.9 KB
/
HISTORY
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
2.2.3: July, 2012
- Stabilized output by removing pointer sorting.
- Fixed a colour space bug in which FW/BW changes a base call made by
SW, and the change is not propagated to the output.
- As before, a colour space "." is treated as a "0" with qv=0. Now, no
crossover is reported if "0" fits the called donor read.
- Added option to change the default crossover probability (which is
.03). This is used to derive certain internal parameters. During
mapping, input qvs (when available) override this value.
2.2.2: December 12, 2011
- Changed default 'Match Window Overlap Length' to 90%
- Increased the default buffer size of mergesam
- Fixed a read in error that may occur with mergesam
2.2.1: October 31, 2011
- Added binary search to contig lookup, this speeds up SHRiMP when used
with many contigs in a single instance
- Fixed a bug with old SHRiMP output format score
2.2.0: Aug, 2011
- Added mapping quality values and base quality values (see README.)
- Vastly improved mergesam program that replaces all previous merging
scripts (see SPLITTING_AND_MERGING for reference.)
- Various algorithmic improvements behind the scenes.
- Changed default scoring scheme. Scores optimized for mapping hg data.
- Changed default thresholds.
- Changed default to SAM format output. For the old shrimp format, see
--shrimp-format.
- Changed default to half-paired mode: paired reads will first be mapped
as pairs, then also as singletons. To select only the best mapping,
see --single-best-mapping. NOTE: Half-paired mode will slow down
gmapper compared to its previous versions, but we felt it is desirable
to have it. If you need only paired mappings, see --no-half-paired.
- Changed the default set of seeds to 3 seeds of weight 12, (Thank you
Marta Gardea.)
- Changed default Smith-Waterman mode to global. This forces the entire
read to map to the reference. For the old (local) alignment mode, see
--local. Notably, mapping qualities are unavailable in this mode.
- Changed default allowable insert size interval to 0,1000. (It used to
be 50,2000).
- Fixed some bugs involving pairing mode and insert sizes.
- Removed limit on the number of OMP threads. (It used to be 50.)
- NOTE: Insert sizes in gmapper follow the old SAM specification: 5' to
5'. The latest SAM specification defines it as leftmost to rightmost
mapped base in a template. These are equivalent for "opp-in" data, but
not for the other pairing modes.
2.1.1c: Feb 21, 2011
- Fixed col-bw issue, where mate pair insert size was not computed correctly
2.1.1b: Feb 17, 2011
- Fixed a getopt issue with ICC compiled binaries, did not allow for options to be passed to shrimp
or mergesam
2.1.1: Feb 8, 2011
- Read trimming implemented, on the fly
- Mergesam has been re-written from scratch to support faster operations
- New heap structure put into place for filtering final alignments - fixed several bugs
- Improved half-paired runtime when running in n=3 mode
- Fixed a insert size inconsistency, internal insert size calculations were not to SAM specification,
cause bounds to be off
- Moved percentage score comparisons to double, instead of integer, improved accuracy
- Added expected-isize parameter to tie-break among high scoring pairs
2.1.0: Dec 28, 2010
- Added new mirna mode
- Output of gmapper now in same order as reads in multi-thread mode
- Added mergesam program to replace merging scripts
- Added MAPQ annotation option to mergesam program
2.0.4: Nov 19, 2010
- Fixed a bug in both cs and ls global alignments, that in some cases caused
clipping to be in output alignments
- Added support for spaces in FASTQ quality files, this is translated to a '!'
- Fixed merging script to support half-paired mode and be slightly more stable
- TODO: A more stable, and faster merging script for next SHRiMP release
- Fixed an error that only outputted 30 alignments, no matter what the '-o' flag
was specified to be
2.0.3: Oct 28, 2010
- Fixed a global alignment issue resulting in clipped alignments.
- Added flag --half-paired , to map each pair independently if
mapping a pair fails.
- Added flag --sam-r2 to report SAM r2 field for alignments.
- Added --sam-header flag to replace output SAM header with file
- Added --read-group flag to output read-group in SAM for alignments
- Fixed a issue with blank quality values for some reads
- Implemented a global letter space alignment
- Fixed a typo in split-db.py
- Fixed an error in length 18 default seeds, last seed was incorrect
2.0.2: Sep 20, 2010
- Long formats for many option and parameter names are now available.
- New options available, including ability to dump unmapped reads in SAM output.
- New sets of seeds of weight 16 and 18 are available.
- New default score thresholds that increase sensitivity for read lengths
less than 70bp.
- Added support for longer read mappings
- Added additional SAM output fields,
H0,H1,H2,NM,NH,IH,CQ,CS,CM,XX - (sam user defined field) gives
crossover locations
- Added --strata option to only report from top scoring alignments
- Added --global option to perform color space global read alignment
- Added --bfast option that performs global color space alignments
along with reporting color space base qualities
- Added options to dump aligned and/or unaligned reads to separate files
- Added option to report unaligned reads in SAM output file
- Added FASTQ support
- Changed reverse-tie-break to be enabled by default
- Added option for paired reads to be specified as two files
- Fixed SAM output flags field, previously was incorrectly set bits
- Fixed RNAME in SAM output not to include whitespace (RNAME cannot
include whitespace as part of SAM specifications)
- Fixed negation bug in shrimp filtering code (~ vs !)
2.0.1: May 18, 2010
- More friendly towards long reads (454). gmapper now supports them,
but is not optimized for them.
- Fixed insert size calculations in SAM output.
2.0.0: May 9, 2010
- New distribution, with drastic design changes.
- Native support for SAM format.
1.3.2: January 28, 2010
- Added option -ungapped, which replaces SW vector filter by a gapless
alignment filter. This filter still catches mismatches (e.g., SNPs)
and colourspace crossovers.
- Fixed bug which would cause spaced seeds to be applied in reverse
when not using -H.
- Fixed bug which would cause undesirable effects (including failed
assertions when omitting -DNDEBUG) when colour space crossover score
would be set very low (e.g., to prevent rmapper from calling any
crossovers).
- Added -M mirna mode, which loads some parameters suitable for
searching mirna databses. See README. (Thanks Alessandro Guffanti)
1.3.1: November 23, 2009
- Fixed bug which caused allocation of 8 bytes per int even on systems
where sizeof(int)=4. This should reduce the virtual memory usage, but
not the resident size (on a regular kernel).
- Fixed generation of kmers at the beginning of a contig. This should
increase sensitivity when using small database sequences (e.g.,
miRNA).
1.3: September 28, 2009
- Proper support for multiple seeds. We now have one readmap per seed,
and we keep track of which seed every kmer originates from.
- Seeds can now have different spans and weights.
- Parameter -s now accepts: a single seed, or a comma-separated list of
seeds, or "w<weight>" for loading default seeds of weight <weight>.
- New meaning for "window length" (-w) parameter. Now, this value
controls the size of the genome window against which a read is
matched. Default is 135% of the read length. To trigger an SW filter
call, num_matches (-n) kmers must match between the genome and a read
within such a window.
- Implemented colinearity check. To trigger an SW filter call, the kmers
that match the genome and a read must be colinear. Kmers that match a
given read several times are treated as wildcards with respect to this
check.
- Implemented an option to restrict full SW run through the use of
"anchors" or "necks" around the kmers which triggered the SW call. The
width of these anchors defaults to 8, and it can be controlled by the
new parameter -A. Anchors are disabled by specifying -1 as width.
- Improved prefetching hints.
- Added a hash-and-cache option during genome scan, see README for
details. This helps dealing with reads that match many times. It is
enabled by default, and can be disabled by giving the -Z flag. The
minimum and maximum sizes of the (per-read) cache can be set by the
parameter -Y <min>,<max>. The cache grows from <min> in factors of 2
until it becomes greater than or equal to <max>. The defaults are
min=2 and max=32.
- Added option -M to display brief memory usage statistics. This is
enabled by default.
- Changed default parameters to 4 seeds of weight 12 and n=4 matches per
window. This should be significantly faster than the previous
defaults, which used 1 seed of weight 8 and n=2 matches per window.
1.2.2: ?, 2009
- Fix unknown base ('N') handling in shrimp_var (Stéphane Audic).
- Added a SHRiMP to SAM output converter (Nils Homer).
- Added a Probcalc to WIG output converter (Elizabeth Chun).
1.2.1: June 30, 2009
- Fix a few Sun Studio Compiler portability bugs.
- rmapper-ls: make it clear when S-W Threshold is absolute vs.
fractional.
- Make the vector sw algorithm bail upon set up if the match value
times the maximum read length exceeds 32767, since this it the
highest score we can attain with pre-SSE4 16-bit instructions.
Correspondingly, divide the rmapper S-W defaults by 10 to make
longer reads work.
- Add a -T flag to rmapper, which when set causes tie-breaks to
be broken in the opposite order during full alignment. This
should help alignment gaps line up when negative alignments are
reverse-complemented and lined up against positive ones.
- Add a -U flag to rmapper, which outputs a list of unmapped reads
after all of the alignments. See the README for more information.
- Add experimental support for multiple spaced seeds (enabled by
passing multiple '-s' arguments to rmapper.
- Add the 'shrimp_var' utility, which prints detailed variations
detected for specific hits.
- Handle uracil better for rna alignments, especially in colourspace.
- Fix a bug that was pruning kmers with wobble bases or
uracil unnecessarily.
- Misc. bug fixes.
- Misc. documentation updates.
1.2.0: February 27, 2009
- Reduce memory usage substantially by dynamically allocating the
hit heap, rather than preallocating it all up front in rmapper.
- Various probcalc fixes.
- Added 'probcalc_mp' utility for matepair analysis.
- Fixes to compile under Sun Studio 12 for Solaris x86 and amd64.
- Treat '-' in fasta file sequences (often found in individual,
aligned haplome sequences) as spaces, i.e. ignore them.
- Added experimental support for larger seeds, up to a span of 128 with
any weight up to 128. Should work fine, but performance
characteristics haven't been evaluated.
- Misc. documentation updates.
1.1.0: July 19, 2008
- Yet another new output format with edit strings to concisely
describe the read->reference changes.
- Fixed a nasty strict aliasing bug.
- rmapper scores are now sorted (again) and duplicates are removed.
- Added rates calculation input to probcalc so that rates may be
computed in parallel, joined, and then probabilities computed
in parallel as well.
- Fixed a number of problems in probcalc's equations, small number
handling, and now output in scientific notation.
- prettyprint-* will now only use memory in proportion to the input set,
rather than pulling all reads into memory.
- All programs and utilities now accept both regular and gzip'd input
files.
- Fixed missed cycle handling in SOLiD reads.
- Fixed a bug in printing fringe genome bases on the negative strand.
- Fixed a bug that improperly reverse-complemented the middle 8 bases
in a contig.
- Various speed improvements.
- Initial DAG aligner implementation for Helicos Single-Molecule
Sequencing.
- Converted miscellaneous tools to python and moved them to utils/.
- Misc. bug fixes.
1.0.5: March 25, 2008
- Increased spaced seed weight by 2 for letter space, as 454 and
Illumina/Solexa reads are generally longer than AB.
- Changed default gap open penalty from -300 to -400.
- Changed default crossover penalty from -120 to -140 (for ABI SOLiD).
- Disabled kmer pruning by default. This feature sometimes prunes too
many kmers and confuses users. It still offers an important speed-up,
but the user should be aware of (and responsible for) its use.
- Fixed handling of wobble codes on the negative strand.
- Handle `.' ambiguous bases as `N'.
- Fixed whitespace and MS-DOS newline fasta parsing problems affecting
Illumina/Solexa and 454 users.
- Fixed wobble code handling for negative strands.
- Added read length relative parameters to rmapper. Now -h, -v, and -w
can take either absolute or percentage arguments (differentiated by
a '.' or '%' in the string). This allows more flexible score
thresholding and windowing for input with differing read lengths
(454, especially).
- Fixed a few GCC 4 compiler warnings.
1.0.4: January 24, 2008
- Changed rmapper default parameters to be more appropriate for the
human genome.
- Applied fixes to the full colourspace Smith-Waterman algorithm. In
some cases we were not tracing back into the proper matrix. Initial
crossovers are now charged.
- Added Phil Lacroute's mergehits utility.
- Taught rmapper and prettyprint to handle multiple contigs per genome
file.
- Fixed a minor bug that would miss hits within the first window of a
contig.
- Fixed a bug in probcalc that would recalculate rates on non-regular
files.
- Added surrounding bases to pretty-printed genome output.
- Taught probcalc and prettyprint to handle symlinks.
- Taught probcalc to accept multiple results arguments as both files and
directories of files.
- Taught SHRiMP about wobble codes (by considering them as mismatches).
- Added -S flag to probcalc and made double pass mode the default (now
probcalc uses much less memory at a higher runtime cost).
- Added -C and -F options to rmapper to specify the strand.
- Misc. portability and compilation fixes.
- Misc. documentation updates.
1.0.3: Fall 2007
- Internal development version. S-W fixes and probcalc double pass.
1.0.2: November 6, 2007
- Fix invalid assertion in common/output.c:output_pretty.
- probcalc progress bar disabled by default (enabled by -B flag).
- Misc. documentation updates.
1.0.1: November 2, 2007
- First public release. New output format.
- '-b', '-p' flags changed to '-B', '-P'. (Parameters lowercase,
options uppercase).
- Multiple contig support and internal reverse-complementing of contigs.
1.0.0: Summer 2007
- Internal development version. Old ouput format.