drosofff · EliseDugas · Oct 12, 2017 · Oct 12, 2017
diff --git a/README.md b/README.md
@@ -24,23 +24,72 @@ Molla-Herman A, Vallés AM, Ganem-Elbaz C, Antoniewski C, Huynh J-R. tRNA proces
 3. for each file, select 8,000,000 (8 millions) of sequence reads and generate the following sample files:
 
  `ERR990557s.fastq`
+
+`zcat ERR990557.fastq.gz | head -12000000 > ERR990557s.fastq`
 
  `ERR990558s.fastq`
+
+`zcat ERR990558.fastq.gz | head -12000000 > ERR990558s.fastq`
 
  `ERR990559s.fastq`
+
+`zcat ERR990559.fastq.gz | head -12000000 > ERR990559s.fastq`
 
  `ERR990560s.fastq`
 
+`zcat ERR990560.fastq.gz | head -12000000 > ERR990560s.fastq`
+
+
 4. Align these read datasets to the reference genome by any appropriate mean, and generate a sorted bam alignment file.
+
+
+-Chargement des fichiers fastq dans galaxy  https://mississippi.snv.jussieu.fr
+-groomer
+-use of tophat, param : single-end,  genome de reference : D melanogaster genome release 6
+
+`samtools sort ERR990557.bam -o ERR990557.sorted.bam`
+`samtools sort ERR990558.bam -o ERR990558.sorted.bam`
+`samtools sort ERR990559.bam -o ERR990559.sorted.bam`
+`samtools sort ERR990560.bam -o ERR990560.sorted.bam`
+
+
 5. Count reads aligning to genome's genes by any appropriate mean
+
+use of HTSeq-count and Drosophila_melanogaster.BDGP6.90.gtf.gz 
+(ftp://ftp.ensembl.org/pub/release-90/gtf/drosophila_melanogaster/)
+
+`samtools view -h -o ERR990557.sam ERR990557.sorted.bam`
+`python -m HTSeq.scripts.count ERR990557.sam Drosophila_melanogaster.BDGP6.90.gtf.gz >> ERR990557.count.txt`
+
+`samtools view -h -o ERR990558.sam ERR990558.sorted.bam`
+`python -m HTSeq.scripts.count ERR990558.sam Drosophila_melanogaster.BDGP6.90.gtf.gz >> ERR990558.count.txt`
+
+`samtools view -h -o ERR990559.sam ERR990559.sorted.bam`
+`python -m HTSeq.scripts.count ERR990559.sam Drosophila_melanogaster.BDGP6.90.gtf.gz >> ERR990559.count.txt`
+
+`samtools view -h -o ERR990560.sam ERR990560.sorted.bam`
+`python -m HTSeq.scripts.count ERR990560.sam Drosophila_melanogaster.BDGP6.90.gtf.gz >> ERR990560.count.txt`
+
+
 6. Perform a statistical differential expression analysis and report using any appropriate figure(s)/graph(s)
+
+
+Chargement des fichiers count dans Galaxy
+Use of DESeq2. Factor name : unknown
+	       4 factor level : 57, 58, 59, 60. 
+
 7. select a list of genes likely to be differentially expressed with a p-adj value < 0.01
+
 8. Code a simple script that parse the table of differential expressions (from *6.*) and return the genes with a p-adj value < 0.01 for rejection of H0 (non differential expression)
 
+script parse.py in folder scripts
+
 ## Reporting
 
 Each analyst will report her/is analysis by any mean s/he feels appropriate (pdf, text, markedown, jpg, URL, etc.).
 
+Reports are in the folder analysis.05
+
 The only constraint is that analysis outputs will be deposited in a personal [fork](https://help.github.com/articles/fork-a-repo/) of this repository in a *new* directory named analysis.01, analysis.02, etc. (see analysis.00 for an example). Keep track of the analysis.xx directories already existing and chose another name for your directory.
 
 Final submission of the results will be made through a [pull request](https://help.github.com/articles/creating-a-pull-request/) from the analyst to the [original repo](https://github.com/drosofff/RNAseq.git).

diff --git a/analysis.05/Reads_alignement/ERR990557.align_summary.txt b/analysis.05/Reads_alignement/ERR990557.align_summary.txt
@@ -0,0 +1,5 @@
+Reads:
+          Input     :   8000000
+           Mapped   :   6003731 (75.0% of input)
+            of these:    849706 (14.2%) have multiple alignments (44 have >20)
+75.0% overall read mapping rate
diff --git a/analysis.05/Reads_alignement/ERR990558.align_summary.txt b/analysis.05/Reads_alignement/ERR990558.align_summary.txt
@@ -0,0 +1,5 @@
+Reads:
+          Input     :   8000000
+           Mapped   :   6874259 (85.9% of input)
+            of these:    922346 (13.4%) have multiple alignments (69 have >20)
+85.9% overall read mapping rate.
diff --git a/analysis.05/Reads_alignement/ERR990559.align_summary.txt b/analysis.05/Reads_alignement/ERR990559.align_summary.txt
@@ -0,0 +1,6 @@
+Reads:
+          Input     :   8000000
+           Mapped   :   6412811 (80.2% of input)
+            of these:   1493758 (23.3%) have multiple alignments (86 have >20)
+80.2% overall read mapping rate.
+
diff --git a/analysis.05/Reads_alignement/ERR990560.align_summary.txt b/analysis.05/Reads_alignement/ERR990560.align_summary.txt
@@ -0,0 +1,7 @@
+Reads:
+          Input     :   8000000
+           Mapped   :   7030249 (87.9% of input)
+            of these:   1277125 (18.2%) have multiple alignments (74 have >20)
+87.9% overall read mapping rate.
+
+