dbESG


A Database Curating the Evolution of Salmonella Genomes



Loading...

The Evolution of Salmonella Genomes

Diverged from Escherichia around 100Mya ago, Salmonella evolved into two species subsequently, i.e., S. enterica and S. bongori. S. enterica further diversified into seven or more subspecies and more than 2000 serovars. Most of the serovars belong to S. enerica subsp. enterica, including the generalists with broad hosts and specialists with host specificity. The generalists such as S. typhimurium and S. enteritidis cause self-limited Non-Typhoidal Salmonellae diseases, while the specialists exemplified by S. typhi and S. paratyphi A lead to invasive infections. Besides the host adaptability, Salmonella serovars also vary extensively for other phenotypes such as invasion, persistence, antibiotic resistance, etc.

To facilitate understanding how the genome evolution influences the molecular and biological phenotypes of Salmonella strains, the ancient orthologous chromosomes of Salmonella genus, species, subspecies and other major phylogenetic nodes were traced. The evolutionary trajectories were delineated and annotated. A database, dbESG, was initiated to assist the illustration of Salmonella genome evolution dynamically.

Browser

Evolutionary annotation and comparison of Salmonella ancient and extant representative genomes.

Tools

Software tools, pipelines and codes for genome analysis.

Datasets

The genomic, comparative genomic and phylogenomic datasets of Salmonella.

Relative links

Databases, webservers and other related resources.

References:

Hu Y, et al. Evolution of Salmonella Chromosomes and Its Influence on Chromosomal Topology, Interaction and Gene Expression. Submitted.

A series of programs and scripts have been developed to facilitate bacterial comparative genomic analysis.

Fig 1. A pipeline and related tools for bacterial comparative genomic analysis

BactCG


A program to analyze bacterial core genome

BactPG


A program to analyze bacterial pan genome

BactAG


A method to analyze bacterial ancient orthologous genome

Bact1DGR


one-dimesion representation of the evolution of bacterial genome

BactPGA


annotation of bacterial genome with pan genome

BactCG

Introduction

BactCG is designed to analyze the core genome of a group of bacterial strains. Typically, pairwise alignment is repeatedly performed between each pair of genes (or proteins) from two bacterial strains respectively. Mutual best alignment pairs are identified, generating the core gene set. The order of computation reaches n2 for n strains. BactCG takes one representative strain as reference, and makes mutual alignment between the genes (or proteins) of the other strains and those from the reference strain to reveal the core gene set. The computation order of BactCG decreases to n.

BactCG is developed with GO programming language. The current version can only be compiled and implemented in Mac or Linux system. The homology cutoff is customized, involving two parameters: minimal length coverage and minimal sequence identity, which has been set as 0.9 and 0.8 by default respectively.

User Manual and Examples

  1. Installation

    Operating system: Linux or Mac

    Software requirements: Golang compilation environment (required when compiling with source code; not required when using compiled program directly). Downloading and installing the standalone version of NCBI BLAST (>= version 2.3.30) from the link: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/ .

    Compilation: The compiled executable program does not need to be installed, and can be run directly in Mac system. After downloading and decompress the source package (BactCG.tar.gz), you need to pre-install and configure the Golang compilation environment, and then compile the source code one by one to get the executable file and save it to the bin subfolder in BactCG directory. Commands for compilation:

    
                                    $ 
                                    $ 
                                    $ 
                                
    
                                    cd bin
                                    go build ../codes/[module].go
                                    cp  ./CG  ../
                                
  2. Installation and Usage Manual

    Download the genome-derived proteome sequences for each bacterial strain to be analyzed, and put them in a single file named as xxx.faa where xxx normally represents the identification of strain. Put the proteome sequence files in the test subfolder under BactCG directory after removing the example files. Designate one strain as the reference, and then implement the program in terminal by only one command.

    Taking the strains whose proteome sequences are stored in the test subfolder as an example, where LT2 is set as the reference strain.

    
                                    $ 
                                    $ 
                                
    
                                    cd BactCG
                                    ./CG  test  LT2  0.8  0.9                                
                                

    The running progress of the program will be shown actively in the terminal. Once finished, a new subdirectory named result will be generated in BactCG, where there are 7 subfolders. The final core gene set is saved in the file CG.tab.txt in the cg_result subfolder.

Codes and Executable Files

The codes and executable files were stored in the codes and bin subdirectory respectively. The executable files are for Mac system. For Linux system, the source codes need to be re-compiled. BactCG is accessible here .

BactAG

Introduction

The ancient orthologous genomes of bacteria were inferred with a two-step Backbone-Patching approach semi-manually. In the Backbone step, the most anciently diverged clades were identified according to the phylogenomic tree with strains covering the major branches of the genus, species or subspecies to be studied, and one representative strain was selected randomly from either clade. Orthologous fragments were analyzed with Mauve version 2.4.0 and an iterative Maximum Homologous Block (MHB) algorithm, and combined to generate the backbone of ancient orthologous genome. Two patching sub-steps followed. Firstly, genomes of other representative strains were aligned between the two clades, and the orthologous fragments were retrieved, which were further compared to the backbone. The sub-fragments not covered by the backbone genome were patched in manually and the backbone ancient orthologous genome was updated iteratively. Secondly, the genomes of closely-related outgroup strains or the genome of nearest ancestor were also aligned against the representative strains of either clade respectively and the orthologous fragments were extracted to further patch the ancient orthologous genome.

BactAG is developed with GO programming language. The current version can only be compiled and implemented in Mac or Linux system.

User Manual and Examples

  1. Installation

    Operating system: Linux or Mac

    Software requirements: Golang compilation environment (required when compiling with source code; not required when using compiled program directly). Downloading and installing the standalone version of Mauve (>= version 2.4.0) from the link: https://darlinglab.org/mauve/user-guide/installing.html .

    Compilation: The compiled executable program does not need to be installed, and can be run directly in Mac system. After downloading and decompress the source package (BactAG.zip), you need to pre-install and configure the Golang compilation environment, and then compile the source code one by one to get the executable file and save it to the bin subfolder in BactAG directory. Commands for compilation:

    
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                
    
                                    cd bin
                                    go build ../codes/[module].go
                                    cp ../codes/*.pl  ./
                                    cp -r AG_Inference.py bin test                                
                                
  2. Manual and examples

    Download the genome sequences for each bacterial strain to be analyzed, and put them in a single file named as xxx.fasta where xxx normally represents the identification of strain. Put the genome sequence files in the test subfolder under BactAG directory after removing the example files. Designate one strain as the reference (for example RKS2986), and then implement the program in terminal by only one command.

    Taking the strains whose genome sequences are stored in the test/ subfolder as an example, where RKS2986 is set as the reference strain and nC_AG is set as the ancient neighbor node.

    
                                    $ 
                                    $ 
                                    $ 
                                
    
                                    cd BactAG/test
                                    python AG_Inference.py backbone -n nC_AG.fasta -bo ./result/backboneOutput -b ./bin -p /root/softawre/miniconda3/bin/progressiveMauve -s RKS2986.fasta -f ./  
                                    python AG_Inference.py patching -n nC_AG.fasta -bo ./result/backboneOutput -o ./result/AGoutput -b ./bin -p /root/softawre/miniconda3/bin/progressiveMauve -s RKS2986.fasta -f ./                                
                                

    The running progress of the program will be shown actively in the terminal. Once finished, a new subdirectory named result will be generated in BactAG, where there are 2 subfolders. The final ancient genome is saved in the file AG_ORTH.fasta in the AGoutput subfolder.

Codes and Executable Files

The codes and executable files were stored in the codes and bin subdirectory respectively. The executable files are for Mac system. For Linux system, the source codes need to be re-compiled. BactAG is accessible here .

BactPG

Introduction

BactPG is designed to analyze the pan-genome of a group of bacterial strains. Typically, pairwise alignment is repeatedly performed between each pair of genes (or proteins) from two bacterial strains respectively. Mutual best alignment pairs are identified, generating the pan-gene set. BactPG analyzes each combination of all strains. In each combination, BactPG takes one representative strain as reference, and makes mutual alignment between the genes (or proteins) of the other strains. Then, the gene sets of each combination are merged to form the pan-genome.

BactPG is developed with GO programming language. The current version can be compiled and implemented in Mac, Linux or windows system. The homology cutoff is customized, involving two parameters: minimal length coverage and minimal sequence identity, which has been set as 70% and 0.7 by default respectively.

Installation and Usage Manual

  1. Installation

    Operating system: Mac, Linux or windows system

    Software requirements: Golang compilation environment (required when compiling with source code; not required when using compiled program directly). Downloading and installing the standalone version of NCBI BLAST (>= version 2.3.30) from the link: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/ .

    Compilation: The compiled executable program does not need to be installed, and can be run directly. After downloading and decompress the source package (BactPG.zip), you need to pre-install and configure the Golang compilation environment, and then compile the source code(BactPG.go) to get the executable file. Commands for compilation:

    
                                    $ 
                                    $ 
                                
    
                                    cd BactPG
                                    go build ../codes/BactPG.go                                                             
                                
  2. Manual and examples

    Download the genome-derived proteome sequences for each bacterial strain to be analyzed, and put them in a single file named as xxx.fasta where xxx normally represents the identification of strain. Put the proteome sequence files in the example_seq subfolder under BactCG directory.

    Taking the strains whose proteome sequences are stored in the example_seq subfolder as an example:

    
                                    $ 
                                    $ 
                                
    
                                    cd BactPG
                                    ./BactPG ./example_seq [absolute path of makeblastdb] [absolute path of blastp] 70  0.7
                                

    The running progress of the program will be shown actively in the terminal. Once finished, The final pan-gene set is saved in the file PG.txt.

Codes and Executable Files

The codes and executable files were stored in the codes and BactPG subdirectory respectively. The executable files are for windows system. For Linux or Mac system, the source codes need to be re-compiled. BactPG is accessible here .

BactPGA

Introduction

BactPGA is developed to facilitate automatic annotation of ancient or extant individual genomes according to the pan-genome annotation results. Once sequenced and assembled, the target genome could be annotated for the encoding genes with RASTtk or PGAG. BactPGA mainly classify the genes into pan-genome families. BactPGA can also be used to annotate the results of 1DGR or other comparative genomic analysis.

Installation and Usage Manual

  1. Installation

    Operating system: Linux/Mac

    Software requirements: Golang compilation environment (required when compiling with source code; not required when using compiled program directly); BactCG, which can be downloaded via the link: http://61.160.194.165:3080/ESG/tools/BactCG/ ; downloading and installing the standalone version of NCBI BLAST (>= version 2.3.30) from the link: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/ .

    Compilation: The compiled executable program does not need to be installed, and can be run directly in Mac system. After downloading and decompress the source package (BactPGA.tar.gz), you need to pre-install and configure the Golang compilation environment, and then compile the source code one by one to get the executable file and save it to the bin subfolder in BactPGA directory. Commands for compilation:

    
                                    $ 
                                    $ 
                                
    
                                    cd bin
                                    go build ../codes/[module].go            
                                

    To correctly implement BactPGA, the following files should be prepared ahead and transferred into results subfolder: (1) the gbk file of the target genome; (2) tab-separated pan-gene set file; (3) seq subfolder with all the proteome FASTA files (.faa) of the pan-genome strains, and the proteome FASTA file (.faa) of the target strain.

  2. Manual and examples

    Retrieve the genome annotation file (nA_AG.gbk) of target strain, the pan-genome annotation data (26_PG.txt), the seq subfolder with the proteomes of the pan-genome strains and the target strain (nA_AG.faa) from the test subfolder and transfer into the result subfolder. The following commands are implemented:

    
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                
    
                                    cd results
                                    rm -R *
                                    rm *
                                    rm -R  ../BactCG1.0/result/
                                    rm -R  ../BactCG1.0/seq/
                                    cp  ../test/seq  ../BactCG1.0/
                                    cp  ../test/26_PG.txt  ./
                                    cp  ../test/nA_AG.gbk  ./
                                    ../BactCG1.0/CG  ../BactCG1.0/seq  nA_AG  0.7  0.7
                                    cp  ./26_PG.txt  ../BactCG1.0/result/out_mutbest_filt/
                                    cp  ../bin/PGA  ../BactCG1.0/result/out_mutbest_filt/
                                    ../bin/gbParse  ./nA_AG.gbk  >nA_AG_PGAG.tab.txt
                                    cp  ./nA_AG_PGAG.tab.txt  ../BactCG1.0/result/out_mutbest_filt/
                                    ../BactCG1.0/result/out_mutbest_filt/PGA  ../BactCG1.0/result/out_mutbest_filt/26_PG.txt  ../BactCG1.0/result/out_mutbest_filt/nA_AG_PGAG.tab.txt  PGAG  nA_AG   >../BactCG1.0/nA.AG_PGAG_PGA.tab.txt
                                    cp ../BactCG1.0/nA.AG_PGAG_PGA.tab.txt  ./
                                

    The nA.AG_PGAG_PGA.tab.txt in results subfolder of BactPGA is the final result.

Codes and Executable Files

The codes, executable files and testing data were stored in the codes, bin and test subdirectory respectively. The executable files are for Mac system. For Linux/Win system, the source codes need to be re-compiled. BactPGA is accessible here .

Bact1DGR

Introduction

Bact1DGR is developed to represent individual bacterial genomes as blocks annotated with the evolutionary origins. Both the phylogenetic information of the target strain and the ancient genomes of the nodes along its evolutionary trajectory are referred to. The representation scheme can facilitate understanding the sequence evolution of bacterial genomes and intuitive comparison of multiple bacterial genomes. The procedure involves a couple of steps: (1) locating the end phylogenetic branch where the target strain falls, tracing all the nodes along the phylogenetic route of the branch, and delineating the evolutionary trajectory of the target strain; (2) aligning the genome of target strain against that of the oldest ancestor, identifying the orthologous fragments and labeling the homologous genome blocks of the target strain; (3) aligning the genome of target strain against that of the second oldest ancestor, identifying the orthologous fragments and labeling the homologous genome blocks of the target strain that have not been labeled; (4) performing the step 3 iteratively till the genome of the most recent ancestor is compared and labeled accordingly, and finding out the strain-specific sequence blocks.

Scripts facilitating the implementation of Bact1DGR are developed with GO and Perl programming language. The Bact1DGR results can also be used for comparative genomic analysis. For genome comparison, the most recent common ancestor (MRCA, differentiating node) of the strains to be compared should be determined in the first place, and then only the blocks with an origin later than the MRCA including the strain-specific blocks contain the differential information.

User Manual and Examples

  1. Installation

    Operating system: Linux/Mac/Win

    Software requirements: Golang compilation environment (required when compiling with source code; not required when using compiled program directly); Perl; Downloading and installing the standalone version of Mauve (>= version 2.4.0) from the link: https://darlinglab.org/mauve/user-guide/installing.html .

    Compilation: The compiled executable program does not need to be installed, and can be run directly in Mac system. After downloading and decompress the source package (Bact1DGR.tar.gz), you need to pre-install and configure the Golang compilation environment, and then compile the source code one by one to get the executable file and save it to the bin subfolder in Bact1DGR directory. Commands for compilation:

    
                                    $ 
                                    $ 
                                
    
                                    cd bin
                                    go build ../codes/[module].go                          
                                
  2. Manual and examples

    Put the genome sequence of target strain (e.g., Diarizaone_AG) in the test subfolder. Also put the genomes of ancestors in the same subfolder, and make sure the evolutionary route of the target strain (e.g., S.genus.AG , S.enterica.ancient , nA_AG , nB2_AG , Diarizaone_AG). Use and modify the following pipeline (the installation path of the software tools highlighted in red should be adjusted according to your own system; the designation of the files should be modified according to your own tasks):

    
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
                                    $ 
    
                                    $ 
                                
    
                                    cd test
                                    /Applications/Mauve.app/Contents/MacOS/progressiveMauve  --output=DiarizonaeAG.vs.GenusAG.xmfa	 --output-guide-tree=DiarizonaeAG.vs.GenusAG.guide_tree --backbone-output=DiarizonaeAG.vs.GenusAG.backbone	 Diarizonae_AG.fasta  S.genus.AG.fasta
                                    ./progBackbonePrep	DiarizonaeAG.vs.GenusAG.backbone	>DiarizonaeAG.vs.GenusAG.backbone.txt
                                    ./homBlkReorder	DiarizonaeAG.vs.GenusAG.backbone.txt >DiarizonaeAG.vs.GenusAG.homblk.txt
                                    ./orthoParsing	DiarizonaeAG.vs.GenusAG.homblk.txt	>DiarizonaeAG.vs.GenusAG.orthBlk.txt
                                    ./1dgrExt1 DiarizonaeAG.vs.GenusAG.orthBlk.txt  Diarizonae_AG  Genus_AG >DiarizonaeAG.GenusAG.1dgr.txt
    
                                    /Applications/Mauve.app/Contents/MacOS/progressiveMauve  --output=DiarizonaeAG.vs.EntSpAG.xmfa	 --output-guide-tree=DiarizonaeAG.vs.EntSpAG.guide_tree --backbone-output=DiarizonaeAG.vs.EntSpAG.backbone	 Diarizonae_AG.fasta  S.enterica.ancient.fasta
                                    ./progBackbonePrep	DiarizonaeAG.vs.EntSpAG.backbone	>DiarizonaeAG.vs.EntSpAG.backbone.txt
                                    ./homBlkReorder	DiarizonaeAG.vs.EntSpAG.backbone.txt >DiarizonaeAG.vs.EntSpAG.homblk.txt
                                    ./orthoParsing	DiarizonaeAG.vs.EntSpAG.homblk.txt	>DiarizonaeAG.vs.EntSpAG.orthBlk.txt
                                    ./1dgrExt1 DiarizonaeAG.vs.EntSpAG.orthBlk.txt  Diarizonae_AG Ent_sp_AG >DiarizonaeAG.EntSpAG.1dgr.txt
                                    perl 1dgrMerge.pl DiarizonaeAG.GenusAG.1dgr.txt DiarizonaeAG.EntSpAG.1dgr.txt Diarizonae_AG >DiarizonaeAG.merged_1.1dgr.txt
    
                                    /Applications/Mauve.app/Contents/MacOS/progressiveMauve  --output=DiarizonaeAG.vs.nAAG.xmfa	 --output-guide-tree=DiarizonaeAG.vs.nAAG.guide_tree --backbone-output=DiarizonaeAG.vs.nAAG.backbone	 Diarizonae_AG.fasta  nA_AG.fasta
                                    ./progBackbonePrep	DiarizonaeAG.vs.nAAG.backbone	>DiarizonaeAG.vs.nAAG.backbone.txt
                                    ./homBlkReorder	DiarizonaeAG.vs.nAAG.backbone.txt >DiarizonaeAG.vs.nAAG.homblk.txt
                                    ./orthoParsing	DiarizonaeAG.vs.nAAG.homblk.txt	>DiarizonaeAG.vs.nAAG.orthBlk.txt
                                    ./1dgrExt1 DiarizonaeAG.vs.nAAG.orthBlk.txt  Diarizonae_AG nA_AG >DiarizonaeAG.nAAG.1dgr.txt
                                    perl 1dgrMerge.pl DiarizonaeAG.merged_1.1dgr.txt DiarizonaeAG.nAAG.1dgr.txt Diarizonae_AG >DiarizonaeAG.merged_2.1dgr.txt
    
                                    /Applications/Mauve.app/Contents/MacOS/progressiveMauve  --output=DiarizonaeAG.vs.nB2AG.xmfa	 --output-guide-tree=DiarizonaeAG.vs.nB2AG.guide_tree --backbone-output=DiarizonaeAG.vs.nB2AG.backbone	 Diarizonae_AG.fasta  nB2_AG.fasta
                                    ./progBackbonePrep	DiarizonaeAG.vs.nB2AG.backbone	>DiarizonaeAG.vs.nB2AG.backbone.txt
                                    ./homBlkReorder	DiarizonaeAG.vs.nB2AG.backbone.txt >DiarizonaeAG.vs.nB2AG.homblk.txt
                                    ./orthoParsing	DiarizonaeAG.vs.nB2AG.homblk.txt	>DiarizonaeAG.vs.nB2AG.orthBlk.txt
                                    ./1dgrExt1 DiarizonaeAG.vs.nB2AG.orthBlk.txt  Diarizonae_AG nB2_AG >DiarizonaeAG.nB2AG.1dgr.txt
                                    perl 1dgrMerge.pl DiarizonaeAG.merged_2.1dgr.txt DiarizonaeAG.nB2AG.1dgr.txt Diarizonae_AG >DiarizonaeAG.merged.1dgr.txt
    
                                    perl  1dgrExtF.pl  DiarizonaeAG.merged.1dgr.txt  4606690  Diarizonae_AG  DiarizonaeAG  >Diarizonae_AG.1DGR.txt
                                

    The script 1dgrExtF.pl at the final step includes the following parameters:

    
                                    $ 
                                
    
                                    perl  1dgrExtF.pl  <1DGR_MERGED_FILE>  <Full_Length_of_Target_Genome>  <Target_Strain_Name>  <1DGR_BLOCK_PREFIX> 
                                

    The final 1DGR result of the Diarizonae_AG genome is saved in the file with a name that could be designated (Diarizonae_AG.1DGR.txt in the example) in the test subfolder.

Codes and Executable Files

The codes, executable files and testing data were stored in the codes, bin and test subdirectory respectively. The executable files are for Mac system. For Linux/Win system, the source codes need to be re-compiled. Bact1DGR is accessible here .

(1) New sequenced Salmonella genomes

Subsp.StrainSerotypeAssembly levelSize (bp)Genes
II (Salamae)RKS2986; SARC442:f,g,t:--Complete4,861,8444,756
IIIb (Diarizonae)RKS2978; SARC750:k:zComplete5,065,7925,047
IV (Houtenae)RKS3027; SARC1016:z4,z32:--Complete4,567,4064,530
VI (Indica)RKS3057; SARC1411:b:e,n,xComplete4,726,5314,738
VIIRKS3013; SARC151,40:g,z51:-Complete4,467,8124,448

(2) Core and Pan genomes of Salmonella

Here is the list of Salmonella strains used for core and pan genome analysis.

The core gene families of Salmonella are shown in the document.

The pan-genome families of Salmonella are shown in the document.

(3) Ancient orthologous chromosomes of Salmonella

Click circles to download files.

(4) Transcriptome: raw read counts of Salmonella strains

The raw read counts for Salmonella gene families were shown in the matrix, where the pan-gene accession was used to designate each family.

(5) 3C: raw contact matrix of Salmonella strains (bin size: 5kb)

The reference genomes used for Salmonella 3C data analysis with coordinates adjusted according to those of S. typhimurium 14028S can be downloaded here.

The raw Salmonella chromosomal contact matrices with bin size of 5kb were attached here.