CpG islands are discrete clusters of nonmethylated CpG dinucleotide segments composed of large numbers of phosphodiester-linked
cytosine "p"
guanine nucleobases. CpG islands comprise about 1 to 2% of the mammalian genome (
4,
7), and are located near or within approximately 40% of
promoters in mammalian
genes.
The
formal definition of a CpG island is 'a region with at least 200 bp and with a GC percentage that is greater than 50% and with an observed/expected CpG ratio that is greater than 0.6.'
In a CpG island, a cytosine nucleotide occurs
phosphodiester-bonded to a guanine nucleotide. The "
p" in the C
pG notation distinguishes a cytosine adjacent to a
phosphodiester-linked guanine
from a cytosine
base paired to a guanine on a complementary strand.
Unlike CpG sites in the coding region of a gene, bases within CpG islands are
unmethylated if the promoted genes are expressed. Most CpG islands are associated with genes or recognition sites for
restriction enzmes. This includes all genes that are ubiquitously expressed (housekeeping genes) plus many genes with a tissue-restricted pattern of expression. Many human and mouse major histocompatibility locus (
MHC) genes contain CpG-rich regions (
64), yet only the β-chain genes for class II MHC have CpG-rich regions [
r].
Promoters are normally located at the upstream edge of the CpG island, such that one or more of the 5′ exons of the gene generally fall within the island region. Although most CpG islands are nonmethylated in all tissues, a small proportion of islands become methylated during development [
s]. 82% of all Not 1 sites are found in CpG islands.
Toll-like receptors, a type of
innate immune system pattern-recognition receptor, recognize and
ligate unmethylated
CpGs as a subclass of
pathogen-associated molecular pattern (
PAMP)
Labels: coding, CpG islands, genes, histocompatibility locus, housekeeping genes, MHC, promoters, restriction enzymes
|