Welcome to hSEGdb - a human SEG database
Housekeeping genes (HKGs) are genes expressed universally within different tissues, under different conditions and at various development stages. Typically, HKGs are evolutionarily conserved and abundantly expressed, participating in various fundamental biological processes. Some researcher groups also proposed ‘stably expressed genes’ (SEGs) recently, which describe a subset of genes similar to HKGs, but are emphasized on being with quantitatively invariant expression.
It is of great significance to explore HKGs or SEGs, their properties, evolution, functional relevance and potential application. HKGs identified by different studies, albeit heterogeneously for gene composition and organisms, disclosed unique gene structure and evolutionary characteristics, for instances, shorter introns and exons, lower conservation of the promoter sequence, less potential for nucleosome formation in the 5' region, and enrichment of protein products in some domain families . Besides biological interest, HKGs and especially SEGs have more practical applications. Since the concept of HKGs was proposed, these genes have been used as internal quality control for individual gene quantification experiments or large-scale gene-expression profiling data. A list of HKGs have been widely used in molecular biological experiments, e.g., glycated hyde-3-phosphorylated hydrogenase (GAPDH), β-actin,β-tubulin,ubiquitin C, β2-microglobulin (B2M), 18S rRNA, etc. Due to systematic errors such as PCR amplification bias and probe affinity, gene expression profiling results obtained under different conditions such as time, platforms, methods, technology and laboratories often contain ‘batch effects’. The ‘batch effects’ are non-biological technical variations and require to be removed before integrative analysis. Recently, SEGs have been well explored and applied in RNA-seq data integration analysis. Cellular SEGs could also be used as markers of specific cell types, or applied in cell decomposition from bulky gene-expression profiling data for tumor or other diseased tissues to facilitate the translational medicine applications.