Skip to content
Pierre Lindenbaum edited this page Jul 24, 2015 · 17 revisions

##Motivation

Bioinformatics file javascript-based reformatter ( java rhino engine http://en.wikipedia.org/wiki/Rhino_%28JavaScript_engine%29 ). Something like awk for VCF, BAM, SAM, FASTQ, FASTA etc...

##Compilation

See also Compilation.

$  make bioalcidae

##Synopsis

$ java -jar dist/bioalcidae.jar [options] (stdin|file1 file2 ... fileN|file.list) 

##Options

Option Description
-f (file) javascript file
-e (expression) javascript expression
-o (file) output file. Default: stdout
-F (format) [VCF, SAM, BAM, FASTA, FASTQ] optional. Required when reading stdin
-h get help (this screen) and exit.
-v print version and exit.
-L (level) log level. One of java.util.logging.Level . Optional.

##Variables

the program injects the following variables:

###VCF

for VCF , the program injects the following variables:

###Fasta

  • iter a java.util.Iterator
	public class Fasta 
		{
		public String getSequence();
		public String getName();
		public void print();
		public int getSize();
		public char charAt(int i);
		}

###BAM or SAM

###FASTQ

##Source Code

Main code is: https://github.com/lindenb/jvarkit/blob/master/src/main/java/com/github/lindenb/jvarkit/tools/bioalcidae/BioAlcidae.java

##Example

VCF

Reformating a VCF we want to reformat a VCF with header

CHROM POS REF ALT GENOTYPE_SAMPLE1 GENOTYPE_SAMPLE2 ... GENOTYPE_SAMPLEN

we use the following javascript file:

var samples = header.sampleNamesInOrder;
out.print("CHROM\tPOS\tREF\tALT");
for(var i=0;i< samples.size();++i)
	{
	out.print("\t"+samples.get(i));
	}
out.println();

while(iter.hasNext())
	{
	var ctx = iter.next();
	if(ctx.alternateAlleles.size()!=1) continue;
	out.print(ctx.chr +"\t"+ctx.start+"\t"+ctx.reference.displayString+"\t"+ctx.alternateAlleles.get(0).displayString);
	for(var i=0;i< samples.size();++i)
		{
		var g = ctx.getGenotype(samples.get(i));

		out.print("\t");

		if(g.isHomRef())
			{
			out.print("0");
			}
		else if(g.isHomVar())
			{
			out.print("2");
			}
		else if(g.isHet())
			{
			out.print("1");
			}
		else
			{
			out.print("-9");
			}
		}
	out.println();
	}
$ curl -s  "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" | \
gunzip -c | java -jar ./dist/bioalcidae.jar -f jeter.js -F vcf | head -n 5 | cut -f 1-10

CHROM	POS	REF	ALT	HG00096	HG00097	HG00099	HG00100	HG00101	HG00102
22	16050075	A	G	0	0	0	0	0	0
22	16050115	G	A	0	0	0	0	0	0
22	16050213	C	T	0	0	0	0	0	0
22	16050319	C	T	0	0	0	0	0	0

for 1000 genome data, print CHROM/POS/REF/ALT/AF(europe):

$ curl  "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5a.20130502.sites.vcf.gz" |  gunzip -c |\
java -jar dist/bioalcidae.jar  -F VCF -e 'while(iter.hasNext()) {var ctx=iter.next(); if(!ctx.hasAttribute("EUR_AF") || ctx.alternateAlleles.size()!=1) continue; out.println(ctx.chr+"\t"+ctx.start+"\t"+ctx.reference.displayString+"\t"+ctx.alternateAlleles.get(0).displayString+"\t"+ctx.getAttribute("EUR_AF"));}' 

1	10177	A	AC	0.4056
1	10235	T	TA	0
1	10352	T	TA	0.4264
1	10505	A	T	0
1	10506	C	G	0
1	10511	G	A	0
1	10539	C	A	0.001
1	10542	C	T	0
1	10579	C	A	0
1	10616	CCGCCGTTGCAAAGGCGCGCCG	C	0.994
(...)

Contribute

##See also

##History

  • 2015 : Creation

License

The project is licensed under the MIT license.

Clone this wiki locally