Skip to content

A richly featured desktop platform for data analysis of bioinformatics. Especially, for quick sequence annotation and mutation analysis on large-scale viral (or others) genome-sequencing data.

Notifications You must be signed in to change notification settings

ZhijianZhou01/BioAider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bioinformatics Aider (BioAider)

Note that versions lower than 1.423 were not optimized read speed for large data.

New, BioAider v1.627 (2024/07/27) are stronger and more stable, we highly recommend it.

1. Introduction

With the development of sequencing technology, a large amount of genomic sequenced data has been accumulated. Analysis of these data will help us understand their genetic variation at the molecular level. However, processing in a large-scale sequence data is difficult for biological or clinical expert without bioinformatics or programming skills. Besides, the needs are also diverse due to different research purposes. Therefore, software with diversity of function and simplicity of operation is very valuable.

BioAider is developed based on Python3, which is a user-friendly program with GUI-interface. As a desktop platform, the design concept of BioAider is that simplicity of operation and high summary of analysis results, which could save a lot of time for researchers.

BioAider


Since its release, BioAider has been used in some studies by many researchers. In the future, we will continue to optimize BioAider and add new features.

download_count
BioAider V1.0~V1.527 (prior to January 24, 2024)

2. Download, install and run

BioAider and all the updated versions is freely available for non-commercial user. After obtaining the program, users could directly run the program of executable file in the directory of "main", BioAider can run in Windows, Linux(Ubuntu 16.04 or more) and MacOS system.

Github links

Other download links (China)

(1) For Windows or MacOS system, users could run BioAider directly by clicking BioAider.exe (in Windows) or bioaider (in MacOS) in the directory main.

(2) For linux system(Ubuntu 16.04 or more), first, switch to the directory main, then:

$ ./bioaider

If you could not get permission to run BioAider on linux system, you could:

$ chmod -R 777 BioAider_v1.423_linux_20220324

3. Preview of BioAider

BioAider GUI

4. Example of functions

Note:BioAider will be in long-term development and functional improvement in the future. Only a small part of the features are shown here, please refer to the instruction Manual V1.423 and Update record for details.

4.1. Mutation Analysis

This function could be used for analysis of the mutation characteristic on large numbers of sequenced strains. The sequence data for analysis needs to be aligned in advance, and they could be nucleotides, proteins(amino acid) sequences or simply coding gene fragments. For nucleotides and proteins sequences, BioAider could summarizes all the mutation sites with corresponding frequency and strains.

Of course, if the data is codon gene, BioAider provides multiple sets of different codon tables for users, and could scan each condon sites in aligned sequence datasets, and identifies the type of mutation, including synonymous, non-synonymous, insertions and deletions and early termination. Finally, BioAider will automatically summarize and output the relevant analysis results.

Note: The codon gene sequences for mutations analysis have to be aligned by translation-alignment methon in advance, It is worth mentioning that BioAider packed three multiple-sequence-alignment software (mafft, muscle and clsutal-omega) in the graphical interface, and provided translation-alignment additionally.

Whether it’s nucleotides or amino acids or coding genes, BioAider could plot the frequency distribution graph for mutation sites through specifing groups of substitution frequencey in custom.

Eaxmple of mutations analysis for aligned SARS-CoV-2 ORF3a gene sequences. First, Drag the sequence to be analyzed to the input box, and select "Codon" single button in "Datas type":

Mutation Analysis.png

After the run is over, these analysis result could be found in the directory where the source file is located, you could scan the *_mutation site summary file then know the overall variation and mutation hotspots.

SARS-CoV-2_ORF3a_aligned_summary_file.png

If you also need to plot the distribution of synonymous/non-synonymous substitution bases, you can prepare a grouping table first:

Groups of mutation frequency.png

Each group of substitution frequency contains start value and end value which are separated by tab symbol. Note, the start value of each group is not included in the range of frequency. group_in_mutation.png

You could also konw the number of mutation sites under each mutation frequency group through view *_substitution frequency distribution.png:

SARS-CoV-2_ORF3a_aligned_substitution frequency distribution.png

It is not difficult to find that more than half of the mutation sites only appear in a single strain, although there are many mutation sites in ORF3a gene.

Or could obtain the corresponding mutant strain of these variant sites in the detailed *_log.txt file:

SARS-CoV-2_ORF3a_aligned log.png

4.2 Lollipop chart of gene mutation

Lollipop map is an efficient method to display gene mutation sites and frequencies, they look like the following: Lollipop map of mutation In BioAider, you only need to prepare the corresponding matrix file and simply set the parameters to quickly complete the drawing.

4.3. Fast Annotation

For different strain sequences from the same virus, their nucleotide identity is usually relatively higher. Therefore, the sequences annotation could be based on the gene information of the reference sequence after multi-sequence alignment.

BioAider provides a quickly sequence annotation function, users can import the aligned complete genome sequence set (fasta format file), and adjust the reference sequence for annotation to the forefront of the file. Paste the gene information of reference sequence in aligned sets, name, starting string and end string into the textbox, separated by ",". Then batch abstract genes. Note that the start string or end string of the gene is not limited in length, but it is required to be unique in the reference sequence. Besides, the higher of similarity among sequences, the higher accuracy of the annotation.

Fast_Annotation.png

4.4. Sequence Identity Matrix

This function contains two different modes: identity matrix for single nucleotide or amino acid (Single nt or aa), identity matrix for combination nucleotide and amino acid (Combination nt and aa). It should be noted that if the "Combination nt and aa" is selected, the inputed sequences should be codon gene and was aligned based on codon method in advance.

In order to better fit the variation characteristics , BioAider provides the "Condense gap" function. If the option was selected, the program will treat every 3 consecutive inserted or deleted bases as one.

Sequence_Identity_Matrix.png

4.5. Seqformat Convertor

BioAider provides mutual conversion among several common sequence formats, which are Fasta, Nexus, Paml, and Phylip. Of note, the "Data type" option is only available when the target format is "Nexus".

Seqformat_Convertor.png

5. Plugins supported

BioAider provides optional plugins function, and supports Blast, Mafft, Muscle, Clustal-omega, FastTree, MrBayes, ModelFinder and IQ-Tree softwares. The parameters configured in the GUI will generate a command string and send to them for execution. This feature is designed to easily configure their parameters, but did not make any changes to the original program. You can read the documentation or publications for more details about these programs.

Tip, for the versions lower V1.532, if you want to call the four softwares (Mafft, Muscle, Clustal-omega and FastTree), please download manually the file external_program.zip (https://github.com/ZhijianZhou01/plugins), then unzip it and put the directory external_program in the root directory (not main) of the BioAider. For V1.532 and later versions, all the plugins are imported manually via the Manage plugins menu.

6. Test Datas

Please see Example

7. Bug report

Github issues or send email to [email protected].

8. Citation

Zhi-Jian Zhou, Ye Qiu, Ying Pu, Xun Huang, Xing-Yi Ge*. BioAider: An efficient tool for viral genome analysis and its application in tracing SARS-CoV-2 transmission. Sustainable Cities and Society. 2020. DIO: 10.1016/j.scs.2020.102466.

About

A richly featured desktop platform for data analysis of bioinformatics. Especially, for quick sequence annotation and mutation analysis on large-scale viral (or others) genome-sequencing data.

Resources

Stars

Watchers

Forks

Packages

No packages published