forked from jgurtowski/nanocorr
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
74 lines (52 loc) · 2.8 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
Nanocorr
Error correction for oxford nanopore reads
Requires:
Blast to be in path
SGE or similar scheduler
Installation:
Clone the repository to a shared filesysem on a cluster
>git clone https://github.com/jgurtowski/nanocorr
>cd nanocorr
Create a virtual environment to install python dependencies
>virtualenv nanocorr_ve
>source nanocorr_ve/bin/activate
install the following packages using pip:
pip install git+https://github.com/cython/cython
pip install numpy
pip install h5py
pip install git+https://github.com/jgurtowski/pbcore_python
pip install git+https://github.com/jgurtowski/pbdagcon_python
pip install git+https://github.com/jgurtowski/jbio
pip install git+https://github.com/jgurtowski/jptools
#Finally install the nanocorr package itself
> python setup.py install
Running:
Make sure you are in the virtualenv
>source nanocorr/nanocorr_ve/bin/activate
Partition your reads for distributed processing
>python partition.py 100 500 nanopore_reads.fa
A series of directories will be created by the partitioning
[0001,0002,...]. In each directory run the nanocorr.py script
on SGE or similar system that sets SGE_TASK_ID environment
variable. Set the -t parameter to the number of files in the
directory.
>qsub -cwd -v PATH,LD_LIBRARY_PATH -t 1:500 -j y -o nanocorr_out /path/to/nanocorr.py query.fa reference.fa
The query file will be "blasted" against each previously partitioned read.
This query file can be anything useful for correction.
Illumina data is what is used right now.
The corrected reads will be in the resulting "fa" files in the partition
directories.
If you supply a reference genome, the corrected reads will be blasted
against that and a ".refblast6.q" file will be created for each partition.
This will be the corrected reads aligned to the reference. Just make sure
the blast db has been created for the reference.
Non-SGE Environment:
If you don't have SGE installed you can use GNU parallel to run nanocorr on
a single machine. Although not the recommended method,
as alignment can be very compute intensive, for small genomes
(bacteria), this method can be tractable.
For each of the directories created by the partition script (0001..000N),
cd into the directory and run:
$>for j in {1..500}; do
echo "SGE_TASK_ID=$j TMPDIR=/tmp nanocorr.py query.fa reference.fa";
done | parallel -j <# of compute cores>