BactAG
Introduction
The ancient orthologous genomes of bacteria were inferred with a two-step Backbone-Patching approach semi-manually. In the Backbone step, the most anciently diverged clades were identified according to the phylogenomic tree with strains covering the major branches of the genus, species or subspecies to be studied, and one representative strain was selected randomly from either clade. Orthologous fragments were analyzed with Mauve version 2.4.0 and an iterative Maximum Homologous Block (MHB) algorithm, and combined to generate the backbone of ancient orthologous genome. Two patching sub-steps followed. Firstly, genomes of other representative strains were aligned between the two clades, and the orthologous fragments were retrieved, which were further compared to the backbone. The sub-fragments not covered by the backbone genome were patched in manually and the backbone ancient orthologous genome was updated iteratively. Secondly, the genomes of closely-related outgroup strains or the genome of nearest ancestor were also aligned against the representative strains of either clade respectively and the orthologous fragments were extracted to further patch the ancient orthologous genome.
BactAG is developed with GO programming language. The current version can only be compiled and implemented in Mac or Linux system.
User Manual and Examples
Installation
Operating system: Linux or Mac
Software requirements: Golang compilation environment (required when compiling with source code; not required when using compiled program directly). Downloading and installing the standalone version of Mauve (>= version
2.4.0
) from the link: https://darlinglab.org/mauve/user-guide/installing.html .Compilation: The compiled executable program does not need to be installed, and can be run directly in Mac system. After downloading and decompress the source package (
BactAG.zip
), you need to pre-install and configure the Golang compilation environment, and then compile the source code one by one to get the executable file and save it to the bin subfolder in BactAG directory. Commands for compilation:$ $ $ $
cd bin go build ../codes/[module].go cp ../codes/*.pl ./ cp -r AG_Inference.py bin test
Manual and examples
Download the genome sequences for each bacterial strain to be analyzed, and put them in a single file named as
xxx.fasta
wherexxx
normally represents the identification of strain. Put the genome sequence files in thetest
subfolder under BactAG directory after removing the example files. Designate one strain as the reference (for example RKS2986), and then implement the program in terminal by only one command.Taking the strains whose genome sequences are stored in the
test/
subfolder as an example, where RKS2986 is set as the reference strain and nC_AG is set as the ancient neighbor node.$ $ $
cd BactAG/test python AG_Inference.py backbone -n nC_AG.fasta -bo ./result/backboneOutput -b ./bin -p /root/softawre/miniconda3/bin/progressiveMauve -s RKS2986.fasta -f ./ python AG_Inference.py patching -n nC_AG.fasta -bo ./result/backboneOutput -o ./result/AGoutput -b ./bin -p /root/softawre/miniconda3/bin/progressiveMauve -s RKS2986.fasta -f ./
The running progress of the program will be shown actively in the terminal. Once finished, a new subdirectory named
result
will be generated in BactAG, where there are 2 subfolders. The final ancient genome is saved in the fileAG_ORTH.fasta
in theAGoutput
subfolder.
Codes and Executable Files
The codes and executable files were stored in the codes
and bin
subdirectory respectively. The executable files are for Mac system. For Linux system, the source codes need to be re-compiled. BactAG is accessible here .