BactCG

Introduction

BactCG is designed to analyze the core genome of a group of bacterial strains. Typically, pairwise alignment is repeatedly performed between each pair of genes (or proteins) from two bacterial strains respectively. Mutual best alignment pairs are identified, generating the core gene set. The order of computation reaches n2 for n strains. BactCG takes one representative strain as reference, and makes mutual alignment between the genes (or proteins) of the other strains and those from the reference strain to reveal the core gene set. The computation order of BactCG decreases to n.

BactCG is developed with GO programming language. The current version can only be compiled and implemented in Mac or Linux system. The homology cutoff is customized, involving two parameters: minimal length coverage and minimal sequence identity, which has been set as 0.9 and 0.8 by default respectively.

User Manual and Examples

Installation
Operating system: Linux or Mac
Software requirements: Golang compilation environment (required when compiling with source code; not required when using compiled program directly). Downloading and installing the standalone version of NCBI BLAST (>= version 2.3.30) from the link: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/ .
Compilation: The compiled executable program does not need to be installed, and can be run directly in Mac system. After downloading and decompress the source package (BactCG.tar.gz), you need to pre-install and configure the Golang compilation environment, and then compile the source code one by one to get the executable file and save it to the bin subfolder in BactCG directory. Commands for compilation:
```
                                $ 
                                $ 
                                $ 
                            
```
```
                                cd bin
                                go build ../codes/[module].go
                                cp  ./CG  ../
                            
```
Installation and Usage Manual
Download the genome-derived proteome sequences for each bacterial strain to be analyzed, and put them in a single file named as xxx.faa where xxx normally represents the identification of strain. Put the proteome sequence files in the test subfolder under BactCG directory after removing the example files. Designate one strain as the reference, and then implement the program in terminal by only one command.
Taking the strains whose proteome sequences are stored in the test subfolder as an example, where LT2 is set as the reference strain.
```
                                $ 
                                $ 
                            
```
```
                                cd BactCG
                                ./CG  test  LT2  0.8  0.9                                
                            
```
The running progress of the program will be shown actively in the terminal. Once finished, a new subdirectory named result will be generated in BactCG, where there are 7 subfolders. The final core gene set is saved in the file CG.tab.txt in the cg_result subfolder.

Codes and Executable Files

The codes and executable files were stored in the codes and bin subdirectory respectively. The executable files are for Mac system. For Linux system, the source codes need to be re-compiled. BactCG is accessible here .

BactAG

Introduction

The ancient orthologous genomes of bacteria were inferred with a two-step Backbone-Patching approach semi-manually. In the Backbone step, the most anciently diverged clades were identified according to the phylogenomic tree with strains covering the major branches of the genus, species or subspecies to be studied, and one representative strain was selected randomly from either clade. Orthologous fragments were analyzed with Mauve version 2.4.0 and an iterative Maximum Homologous Block (MHB) algorithm, and combined to generate the backbone of ancient orthologous genome. Two patching sub-steps followed. Firstly, genomes of other representative strains were aligned between the two clades, and the orthologous fragments were retrieved, which were further compared to the backbone. The sub-fragments not covered by the backbone genome were patched in manually and the backbone ancient orthologous genome was updated iteratively. Secondly, the genomes of closely-related outgroup strains or the genome of nearest ancestor were also aligned against the representative strains of either clade respectively and the orthologous fragments were extracted to further patch the ancient orthologous genome.

BactAG is developed with GO programming language. The current version can only be compiled and implemented in Mac or Linux system.

User Manual and Examples

Installation
Operating system: Linux or Mac
Software requirements: Golang compilation environment (required when compiling with source code; not required when using compiled program directly). Downloading and installing the standalone version of Mauve (>= version 2.4.0) from the link: https://darlinglab.org/mauve/user-guide/installing.html .
Compilation: The compiled executable program does not need to be installed, and can be run directly in Mac system. After downloading and decompress the source package (BactAG.zip), you need to pre-install and configure the Golang compilation environment, and then compile the source code one by one to get the executable file and save it to the bin subfolder in BactAG directory. Commands for compilation:
```
                                $ 
                                $ 
                                $ 
                                $ 
                            
```
```
                                cd bin
                                go build ../codes/[module].go
                                cp ../codes/*.pl  ./
                                cp -r AG_Inference.py bin test                                
                            
```
Manual and examples
Download the genome sequences for each bacterial strain to be analyzed, and put them in a single file named as xxx.fasta where xxx normally represents the identification of strain. Put the genome sequence files in the test subfolder under BactAG directory after removing the example files. Designate one strain as the reference (for example RKS2986), and then implement the program in terminal by only one command.
Taking the strains whose genome sequences are stored in the test/ subfolder as an example, where RKS2986 is set as the reference strain and nC_AG is set as the ancient neighbor node.
```
                                $ 
                                $ 
                                $ 
                            
```
```
                                cd BactAG/test
                                python AG_Inference.py backbone -n nC_AG.fasta -bo ./result/backboneOutput -b ./bin -p /root/softawre/miniconda3/bin/progressiveMauve -s RKS2986.fasta -f ./  
                                python AG_Inference.py patching -n nC_AG.fasta -bo ./result/backboneOutput -o ./result/AGoutput -b ./bin -p /root/softawre/miniconda3/bin/progressiveMauve -s RKS2986.fasta -f ./                                
                            
```
The running progress of the program will be shown actively in the terminal. Once finished, a new subdirectory named result will be generated in BactAG, where there are 2 subfolders. The final ancient genome is saved in the file AG_ORTH.fasta in the AGoutput subfolder.

Codes and Executable Files

BactPG

Introduction

BactPG is designed to analyze the pan-genome of a group of bacterial strains. Typically, pairwise alignment is repeatedly performed between each pair of genes (or proteins) from two bacterial strains respectively. Mutual best alignment pairs are identified, generating the pan-gene set. BactPG analyzes each combination of all strains. In each combination, BactPG takes one representative strain as reference, and makes mutual alignment between the genes (or proteins) of the other strains. Then, the gene sets of each combination are merged to form the pan-genome.

BactPG is developed with GO programming language. The current version can be compiled and implemented in Mac, Linux or windows system. The homology cutoff is customized, involving two parameters: minimal length coverage and minimal sequence identity, which has been set as 70% and 0.7 by default respectively.

Installation and Usage Manual

Installation
Operating system: Mac, Linux or windows system
Software requirements: Golang compilation environment (required when compiling with source code; not required when using compiled program directly). Downloading and installing the standalone version of NCBI BLAST (>= version 2.3.30) from the link: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/ .
Compilation: The compiled executable program does not need to be installed, and can be run directly. After downloading and decompress the source package (BactPG.zip), you need to pre-install and configure the Golang compilation environment, and then compile the source code(BactPG.go) to get the executable file. Commands for compilation:
```
                                $ 
                                $ 
                            
```
```
                                cd BactPG
                                go build ../codes/BactPG.go                                                             
                            
```
Manual and examples
Download the genome-derived proteome sequences for each bacterial strain to be analyzed, and put them in a single file named as xxx.fasta where xxx normally represents the identification of strain. Put the proteome sequence files in the example_seq subfolder under BactCG directory.
Taking the strains whose proteome sequences are stored in the example_seq subfolder as an example:
```
                                $ 
                                $ 
                            
```
```
                                cd BactPG
                                ./BactPG ./example_seq [absolute path of makeblastdb] [absolute path of blastp] 70  0.7
                            
```
The running progress of the program will be shown actively in the terminal. Once finished, The final pan-gene set is saved in the file PG.txt.

Codes and Executable Files

The codes and executable files were stored in the codes and BactPG subdirectory respectively. The executable files are for windows system. For Linux or Mac system, the source codes need to be re-compiled. BactPG is accessible here .

BactPGA

Introduction

BactPGA is developed to facilitate automatic annotation of ancient or extant individual genomes according to the pan-genome annotation results. Once sequenced and assembled, the target genome could be annotated for the encoding genes with RASTtk or PGAG. BactPGA mainly classify the genes into pan-genome families. BactPGA can also be used to annotate the results of 1DGR or other comparative genomic analysis.

Installation and Usage Manual

Installation
Operating system: Linux/Mac
Software requirements: Golang compilation environment (required when compiling with source code; not required when using compiled program directly); BactCG, which can be downloaded via the link: http://61.160.194.165:3080/ESG/tools/BactCG/ ; downloading and installing the standalone version of NCBI BLAST (>= version 2.3.30) from the link: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/ .
Compilation: The compiled executable program does not need to be installed, and can be run directly in Mac system. After downloading and decompress the source package (BactPGA.tar.gz), you need to pre-install and configure the Golang compilation environment, and then compile the source code one by one to get the executable file and save it to the bin subfolder in BactPGA directory. Commands for compilation:
```
                                $ 
                                $ 
                            
```
```
                                cd bin
                                go build ../codes/[module].go            
                            
```
To correctly implement BactPGA, the following files should be prepared ahead and transferred into results subfolder: (1) the gbk file of the target genome; (2) tab-separated pan-gene set file; (3) seq subfolder with all the proteome FASTA files (.faa) of the pan-genome strains, and the proteome FASTA file (.faa) of the target strain.

Manual and examples

Retrieve the genome annotation file (nA_AG.gbk) of target strain, the pan-genome annotation data (26_PG.txt), the seq subfolder with the proteomes of the pan-genome strains and the target strain (nA_AG.faa) from the test subfolder and transfer into the result subfolder. The following commands are implemented:


                                cd results
                                rm -R *
                                rm *
                                rm -R  ../BactCG1.0/result/
                                rm -R  ../BactCG1.0/seq/
                                cp  ../test/seq  ../BactCG1.0/
                                cp  ../test/26_PG.txt  ./
                                cp  ../test/nA_AG.gbk  ./
                                ../BactCG1.0/CG  ../BactCG1.0/seq  nA_AG  0.7  0.7
                                cp  ./26_PG.txt  ../BactCG1.0/result/out_mutbest_filt/
                                cp  ../bin/PGA  ../BactCG1.0/result/out_mutbest_filt/
                                ../bin/gbParse  ./nA_AG.gbk  >nA_AG_PGAG.tab.txt
                                cp  ./nA_AG_PGAG.tab.txt  ../BactCG1.0/result/out_mutbest_filt/
                                ../BactCG1.0/result/out_mutbest_filt/PGA  ../BactCG1.0/result/out_mutbest_filt/26_PG.txt  ../BactCG1.0/result/out_mutbest_filt/nA_AG_PGAG.tab.txt  PGAG  nA_AG   >../BactCG1.0/nA.AG_PGAG_PGA.tab.txt
                                cp ../BactCG1.0/nA.AG_PGAG_PGA.tab.txt  ./

The nA.AG_PGAG_PGA.tab.txt in results subfolder of BactPGA is the final result.

Codes and Executable Files

The codes, executable files and testing data were stored in the codes, bin and test subdirectory respectively. The executable files are for Mac system. For Linux/Win system, the source codes need to be re-compiled. BactPGA is accessible here .

Bact1DGR

Introduction

Bact1DGR is developed to represent individual bacterial genomes as blocks annotated with the evolutionary origins. Both the phylogenetic information of the target strain and the ancient genomes of the nodes along its evolutionary trajectory are referred to. The representation scheme can facilitate understanding the sequence evolution of bacterial genomes and intuitive comparison of multiple bacterial genomes. The procedure involves a couple of steps: (1) locating the end phylogenetic branch where the target strain falls, tracing all the nodes along the phylogenetic route of the branch, and delineating the evolutionary trajectory of the target strain; (2) aligning the genome of target strain against that of the oldest ancestor, identifying the orthologous fragments and labeling the homologous genome blocks of the target strain; (3) aligning the genome of target strain against that of the second oldest ancestor, identifying the orthologous fragments and labeling the homologous genome blocks of the target strain that have not been labeled; (4) performing the step 3 iteratively till the genome of the most recent ancestor is compared and labeled accordingly, and finding out the strain-specific sequence blocks.

Scripts facilitating the implementation of Bact1DGR are developed with GO and Perl programming language. The Bact1DGR results can also be used for comparative genomic analysis. For genome comparison, the most recent common ancestor (MRCA, differentiating node) of the strains to be compared should be determined in the first place, and then only the blocks with an origin later than the MRCA including the strain-specific blocks contain the differential information.

User Manual and Examples

Installation
Operating system: Linux/Mac/Win
Software requirements: Golang compilation environment (required when compiling with source code; not required when using compiled program directly); Perl; Downloading and installing the standalone version of Mauve (>= version 2.4.0) from the link: https://darlinglab.org/mauve/user-guide/installing.html .
Compilation: The compiled executable program does not need to be installed, and can be run directly in Mac system. After downloading and decompress the source package (Bact1DGR.tar.gz), you need to pre-install and configure the Golang compilation environment, and then compile the source code one by one to get the executable file and save it to the bin subfolder in Bact1DGR directory. Commands for compilation:
```
                                $ 
                                $ 
                            
```
```
                                cd bin
                                go build ../codes/[module].go                          
                            
```

Manual and examples

Put the genome sequence of target strain (e.g., Diarizaone_AG) in the test subfolder. Also put the genomes of ancestors in the same subfolder, and make sure the evolutionary route of the target strain (e.g., S.genus.AG , S.enterica.ancient , nA_AG , nB2_AG , Diarizaone_AG). Use and modify the following pipeline (the installation path of the software tools highlighted in red should be adjusted according to your own system; the designation of the files should be modified according to your own tasks):


                                cd test
                                /Applications/Mauve.app/Contents/MacOS/progressiveMauve  --output=DiarizonaeAG.vs.GenusAG.xmfa	 --output-guide-tree=DiarizonaeAG.vs.GenusAG.guide_tree --backbone-output=DiarizonaeAG.vs.GenusAG.backbone	 Diarizonae_AG.fasta  S.genus.AG.fasta
                                ./progBackbonePrep	DiarizonaeAG.vs.GenusAG.backbone	>DiarizonaeAG.vs.GenusAG.backbone.txt
                                ./homBlkReorder	DiarizonaeAG.vs.GenusAG.backbone.txt >DiarizonaeAG.vs.GenusAG.homblk.txt
                                ./orthoParsing	DiarizonaeAG.vs.GenusAG.homblk.txt	>DiarizonaeAG.vs.GenusAG.orthBlk.txt
                                ./1dgrExt1 DiarizonaeAG.vs.GenusAG.orthBlk.txt  Diarizonae_AG  Genus_AG >DiarizonaeAG.GenusAG.1dgr.txt

                                /Applications/Mauve.app/Contents/MacOS/progressiveMauve  --output=DiarizonaeAG.vs.EntSpAG.xmfa	 --output-guide-tree=DiarizonaeAG.vs.EntSpAG.guide_tree --backbone-output=DiarizonaeAG.vs.EntSpAG.backbone	 Diarizonae_AG.fasta  S.enterica.ancient.fasta
                                ./progBackbonePrep	DiarizonaeAG.vs.EntSpAG.backbone	>DiarizonaeAG.vs.EntSpAG.backbone.txt
                                ./homBlkReorder	DiarizonaeAG.vs.EntSpAG.backbone.txt >DiarizonaeAG.vs.EntSpAG.homblk.txt
                                ./orthoParsing	DiarizonaeAG.vs.EntSpAG.homblk.txt	>DiarizonaeAG.vs.EntSpAG.orthBlk.txt
                                ./1dgrExt1 DiarizonaeAG.vs.EntSpAG.orthBlk.txt  Diarizonae_AG Ent_sp_AG >DiarizonaeAG.EntSpAG.1dgr.txt
                                perl 1dgrMerge.pl DiarizonaeAG.GenusAG.1dgr.txt DiarizonaeAG.EntSpAG.1dgr.txt Diarizonae_AG >DiarizonaeAG.merged_1.1dgr.txt

                                /Applications/Mauve.app/Contents/MacOS/progressiveMauve  --output=DiarizonaeAG.vs.nAAG.xmfa	 --output-guide-tree=DiarizonaeAG.vs.nAAG.guide_tree --backbone-output=DiarizonaeAG.vs.nAAG.backbone	 Diarizonae_AG.fasta  nA_AG.fasta
                                ./progBackbonePrep	DiarizonaeAG.vs.nAAG.backbone	>DiarizonaeAG.vs.nAAG.backbone.txt
                                ./homBlkReorder	DiarizonaeAG.vs.nAAG.backbone.txt >DiarizonaeAG.vs.nAAG.homblk.txt
                                ./orthoParsing	DiarizonaeAG.vs.nAAG.homblk.txt	>DiarizonaeAG.vs.nAAG.orthBlk.txt
                                ./1dgrExt1 DiarizonaeAG.vs.nAAG.orthBlk.txt  Diarizonae_AG nA_AG >DiarizonaeAG.nAAG.1dgr.txt
                                perl 1dgrMerge.pl DiarizonaeAG.merged_1.1dgr.txt DiarizonaeAG.nAAG.1dgr.txt Diarizonae_AG >DiarizonaeAG.merged_2.1dgr.txt

                                /Applications/Mauve.app/Contents/MacOS/progressiveMauve  --output=DiarizonaeAG.vs.nB2AG.xmfa	 --output-guide-tree=DiarizonaeAG.vs.nB2AG.guide_tree --backbone-output=DiarizonaeAG.vs.nB2AG.backbone	 Diarizonae_AG.fasta  nB2_AG.fasta
                                ./progBackbonePrep	DiarizonaeAG.vs.nB2AG.backbone	>DiarizonaeAG.vs.nB2AG.backbone.txt
                                ./homBlkReorder	DiarizonaeAG.vs.nB2AG.backbone.txt >DiarizonaeAG.vs.nB2AG.homblk.txt
                                ./orthoParsing	DiarizonaeAG.vs.nB2AG.homblk.txt	>DiarizonaeAG.vs.nB2AG.orthBlk.txt
                                ./1dgrExt1 DiarizonaeAG.vs.nB2AG.orthBlk.txt  Diarizonae_AG nB2_AG >DiarizonaeAG.nB2AG.1dgr.txt
                                perl 1dgrMerge.pl DiarizonaeAG.merged_2.1dgr.txt DiarizonaeAG.nB2AG.1dgr.txt Diarizonae_AG >DiarizonaeAG.merged.1dgr.txt

                                perl  1dgrExtF.pl  DiarizonaeAG.merged.1dgr.txt  4606690  Diarizonae_AG  DiarizonaeAG  >Diarizonae_AG.1DGR.txt

The script 1dgrExtF.pl at the final step includes the following parameters:


                                perl  1dgrExtF.pl  <1DGR_MERGED_FILE>  <Full_Length_of_Target_Genome>  <Target_Strain_Name>  <1DGR_BLOCK_PREFIX>

The final 1DGR result of the Diarizonae_AG genome is saved in the file with a name that could be designated (Diarizonae_AG.1DGR.txt in the example) in the test subfolder.

Codes and Executable Files

Subsp.	Strain	Serotype	Assembly level	Size (bp)	Genes
II (Salamae)	RKS2986; SARC4	42:f,g,t:--	Complete	4,861,844	4,756
IIIb (Diarizonae)	RKS2978; SARC7	50:k:z	Complete	5,065,792	5,047
IV (Houtenae)	RKS3027; SARC10	16:z4,z32:--	Complete	4,567,406	4,530
VI (Indica)	RKS3057; SARC14	11:b:e,n,x	Complete	4,726,531	4,738
VII	RKS3013; SARC15	1,40:g,z51:-	Complete	4,467,812	4,448

Loading...

The Evolution of Salmonella Genomes

BactCG

Introduction

User Manual and Examples

Installation

Installation and Usage Manual

Codes and Executable Files

BactAG

Introduction

User Manual and Examples

Installation

Manual and examples

Codes and Executable Files

BactPG

Introduction

Installation and Usage Manual

Installation

Manual and examples

Codes and Executable Files

BactPGA

Introduction

Installation and Usage Manual

Installation

Manual and examples

Codes and Executable Files

Bact1DGR

Introduction

User Manual and Examples

Installation

Manual and examples

Codes and Executable Files

(1) New sequenced Salmonella genomes

(2) Core and Pan genomes of Salmonella

(3) Ancient orthologous chromosomes of Salmonella

(4) Transcriptome: raw read counts of Salmonella strains

(5) 3C: raw contact matrix of Salmonella strains (bin size: 5kb)

(6) The AOCs of S. Typhi / S. Paratyphi A group