• Previous Article
    Mathematical analysis and simulations involving chemotherapy and surgery on large human tumours under a suitable cell-kill functional response
  • MBE Home
  • This Issue
  • Next Article
    A therapy inactivating the tumor angiogenic factors
2013, 10(1): 199-219. doi: 10.3934/mbe.2013.10.199

Genome characterization through dichotomic classes: An analysis of the whole chromosome 1 of A. thaliana

1. 

Dipartimento di Scienze Statistiche, Università di Bologna, Via delle Belle Arti 41, 40126, Bologna, Italy, Italy

2. 

CNR-IMM, UOS di Bologna, Via Gobetti 101, 40129 Bologna, Italy

3. 

Dipartimento di Scienze Statistiche, Università di Bologna, Via delle Belle Arti 41, 40126 Bologna, Italy

Received  May 2012 Revised  September 2012 Published  December 2012

In this article we show how dichotomic classes, binary variables naturally derived from a new mathematical model of the genetic code, can be used in order to characterize different parts of the genome. In particular, we analyze and compare different parts of whole chromosome 1 of Arabidopsis thaliana: genes, exons, introns, coding sequences (CDS), intergenes, untranslated regions (UTR) and regulatory sequences. In order to accomplish the task we encode each sequence in the 3 possible reading frames according to the definitions of the dichotomic classes (parity, Rumer and hidden). Then, we perform a statistical analysis on the binary sequences. Interestingly, the results show that coding and non-coding sequences have different patterns and proportions of dichotomic classes. This suggests that the frame is important only for coding sequences and that dichotomic classes can be useful to recognize them. Moreover, such patterns seem to be more enhanced in CDS than in exons. Also, we derive an independence test in order to assess whether the percentages observed could be considered as an expression of independent random processes. The results confirm that only genes, exons and CDS seem to possess a dependence structure that distinguishes them from i.i.d sequences. Such informational content is independent from the global proportion of nucleotides of a sequence. The present work confirms that the recent mathematical model of the genetic code is a new paradigm for understanding the management and the organization of genetic information and is an innovative tool for investigating informational aspects of error detection/correction mechanisms acting at the level of DNA replication.
Citation: Enrico Properzi, Simone Giannerini, Diego Luis Gonzalez, Rodolfo Rosa. Genome characterization through dichotomic classes: An analysis of the whole chromosome 1 of A. thaliana. Mathematical Biosciences & Engineering, 2013, 10 (1) : 199-219. doi: 10.3934/mbe.2013.10.199
References:
[1]

B. Efron, "Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction,", Cambridge University Press, (2010). Google Scholar

[2]

G. Elgar and T. Vavouri, Tuning in to the signals: Noncoding sequence conservation in vertebrate genomes,, Trends in genetics, 24 (2008), 344. Google Scholar

[3]

A. Elzanowski and J. Ostell, The genetic codes,, National Center for Biotechnology Information (NCBI), (): 2008. Google Scholar

[4]

D. L. Gonzalez, Can the genetic code be mathematically described?,, Medical Science Monitor, 10 (2004), 11. Google Scholar

[5]

D. L. Gonzalez, Error detection and correction codes,, in, (2008), 379. Google Scholar

[6]

D. L. Gonzalez, The mathematical structure of the genetic code,, in, (2008), 111. Google Scholar

[7]

D. L. Gonzalez, S. Giannerini and R. Rosa, Detecting structures in parity binary sequences: Error correction and detection in DNA,, IEEE Engineering in Medicine and Biology Magazine, 25 (2006), 69. Google Scholar

[8]

D. L. Gonzalez, S. Giannerini and R. Rosa, Strong short-range correlations and dichotomic codon classes in coding DNA sequences,, Physical review E, 78 (2008). Google Scholar

[9]

D. L. Gonzalez, S. Giannerini and R. Rosa, The mathematical structure of the genetic code: a tool for inquiring on the origin of life,, Statistica, LXIX (2009), 143. Google Scholar

[10]

D. L. Gonzalez, S. Giannerini and R. Rosa, Circular codes revisited: A statistical approach,, Journal of Theoretical Biology, 275 (2011), 21. Google Scholar

[11]

S. Giannerini, D. L. Gonzalez and R. Rosa, DNA, frame synchronization and dichotomic classes: a quasicrystal framework,, Philosophical Transactions of the Royal Society. Series A, 370 (2012), 2987. Google Scholar

[12]

D. L. Gonzalez and M. Zanna, Una nuova descrizione matematica del codice genetico,, Systema Naturae, 5 (2003), 219. Google Scholar

[13]

International Human Genome Sequencing Consortium, Initial sequencing and analysis of the human genome,, Nature, 409 (2001), 860. Google Scholar

[14]

A. G. Jegga and B. J. Aronow, Evolutionary conserved noncoding DNA,, in, (2006). Google Scholar

[15]

S. Ohno, So much "junk" DNA in our genome,, Brookhaven Symposia in Biology, 23 (1972), 366. Google Scholar

[16]

H. Pearson, Genetics: What is a gene?,, Nature, 441 (2006), 398. doi: 10.1038/441398a. Google Scholar

[17]

E. Pennisi, Genomics. DNA study forces rethink of what it means to be a gene.,, Science (New York, 316 (2007), 1556. Google Scholar

[18]

E. Properzi, "Genome Characterization Through the Mathematical Structure of the Genetic Code: An Analysis of the Whole Chromosome 1 of A. Thaliana,", PhD Thesis, (). Google Scholar

[19]

M. Quimbaya, K. Vandepoele, E. Rasp, M. Matthijs, S. Dhondt, G. T. Beemster, G. Berx and L. De Veylder, Identification of putative cancer genes through data integration and comparative genomics between plants and humans,, Cell. Mol. Life Sci., 69 (2012), 2041. doi: 10.1007/s00018-011-0909-x. Google Scholar

[20]

R Development Core Team, R: A language and environment for statistical computing,, R Foundation for Statistical Computing, (2012). Google Scholar

[21]

The Arabidopsis Genome Initiative, Analysis of the genome sequence of the flowering plant Arabidopsis thaliana,, Nature, 408 (2000), 796. doi: 10.1038/35048692. Google Scholar

[22]

TAIR, Genome Annotation,, \url{http://www.arabidopsis.org/}, (). Google Scholar

[23]

O. Trapp, K. Seeliger and H. Puchta, Homologs of breast cancer genes in plants,, Front. Plant Sci., 2 (2011). Google Scholar

[24]

J. C. Venter et al., The sequence of the human genome,, Science, 291 (2001), 1304. Google Scholar

[25]

K. Watanabe and T. Suzuki, "Genetic Code and its Variants,", in, (2006). Google Scholar

show all references

References:
[1]

B. Efron, "Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction,", Cambridge University Press, (2010). Google Scholar

[2]

G. Elgar and T. Vavouri, Tuning in to the signals: Noncoding sequence conservation in vertebrate genomes,, Trends in genetics, 24 (2008), 344. Google Scholar

[3]

A. Elzanowski and J. Ostell, The genetic codes,, National Center for Biotechnology Information (NCBI), (): 2008. Google Scholar

[4]

D. L. Gonzalez, Can the genetic code be mathematically described?,, Medical Science Monitor, 10 (2004), 11. Google Scholar

[5]

D. L. Gonzalez, Error detection and correction codes,, in, (2008), 379. Google Scholar

[6]

D. L. Gonzalez, The mathematical structure of the genetic code,, in, (2008), 111. Google Scholar

[7]

D. L. Gonzalez, S. Giannerini and R. Rosa, Detecting structures in parity binary sequences: Error correction and detection in DNA,, IEEE Engineering in Medicine and Biology Magazine, 25 (2006), 69. Google Scholar

[8]

D. L. Gonzalez, S. Giannerini and R. Rosa, Strong short-range correlations and dichotomic codon classes in coding DNA sequences,, Physical review E, 78 (2008). Google Scholar

[9]

D. L. Gonzalez, S. Giannerini and R. Rosa, The mathematical structure of the genetic code: a tool for inquiring on the origin of life,, Statistica, LXIX (2009), 143. Google Scholar

[10]

D. L. Gonzalez, S. Giannerini and R. Rosa, Circular codes revisited: A statistical approach,, Journal of Theoretical Biology, 275 (2011), 21. Google Scholar

[11]

S. Giannerini, D. L. Gonzalez and R. Rosa, DNA, frame synchronization and dichotomic classes: a quasicrystal framework,, Philosophical Transactions of the Royal Society. Series A, 370 (2012), 2987. Google Scholar

[12]

D. L. Gonzalez and M. Zanna, Una nuova descrizione matematica del codice genetico,, Systema Naturae, 5 (2003), 219. Google Scholar

[13]

International Human Genome Sequencing Consortium, Initial sequencing and analysis of the human genome,, Nature, 409 (2001), 860. Google Scholar

[14]

A. G. Jegga and B. J. Aronow, Evolutionary conserved noncoding DNA,, in, (2006). Google Scholar

[15]

S. Ohno, So much "junk" DNA in our genome,, Brookhaven Symposia in Biology, 23 (1972), 366. Google Scholar

[16]

H. Pearson, Genetics: What is a gene?,, Nature, 441 (2006), 398. doi: 10.1038/441398a. Google Scholar

[17]

E. Pennisi, Genomics. DNA study forces rethink of what it means to be a gene.,, Science (New York, 316 (2007), 1556. Google Scholar

[18]

E. Properzi, "Genome Characterization Through the Mathematical Structure of the Genetic Code: An Analysis of the Whole Chromosome 1 of A. Thaliana,", PhD Thesis, (). Google Scholar

[19]

M. Quimbaya, K. Vandepoele, E. Rasp, M. Matthijs, S. Dhondt, G. T. Beemster, G. Berx and L. De Veylder, Identification of putative cancer genes through data integration and comparative genomics between plants and humans,, Cell. Mol. Life Sci., 69 (2012), 2041. doi: 10.1007/s00018-011-0909-x. Google Scholar

[20]

R Development Core Team, R: A language and environment for statistical computing,, R Foundation for Statistical Computing, (2012). Google Scholar

[21]

The Arabidopsis Genome Initiative, Analysis of the genome sequence of the flowering plant Arabidopsis thaliana,, Nature, 408 (2000), 796. doi: 10.1038/35048692. Google Scholar

[22]

TAIR, Genome Annotation,, \url{http://www.arabidopsis.org/}, (). Google Scholar

[23]

O. Trapp, K. Seeliger and H. Puchta, Homologs of breast cancer genes in plants,, Front. Plant Sci., 2 (2011). Google Scholar

[24]

J. C. Venter et al., The sequence of the human genome,, Science, 291 (2001), 1304. Google Scholar

[25]

K. Watanabe and T. Suzuki, "Genetic Code and its Variants,", in, (2006). Google Scholar

[1]

David Lubicz. On a classification of finite statistical tests. Advances in Mathematics of Communications, 2007, 1 (4) : 509-524. doi: 10.3934/amc.2007.1.509

[2]

Wenxue Huang, Xiaofeng Li, Yuanyi Pan. Increase statistical reliability without losing predictive power by merging classes and adding variables. Big Data & Information Analytics, 2016, 1 (4) : 341-347. doi: 10.3934/bdia.2016014

[3]

Dominique Lecomte. Hurewicz-like tests for Borel subsets of the plane. Electronic Research Announcements, 2005, 11: 95-102.

[4]

Laura Luzzi, Ghaya Rekaya-Ben Othman, Jean-Claude Belfiore. Algebraic reduction for the Golden Code. Advances in Mathematics of Communications, 2012, 6 (1) : 1-26. doi: 10.3934/amc.2012.6.1

[5]

Irene Márquez-Corbella, Edgar Martínez-Moro, Emilio Suárez-Canedo. On the ideal associated to a linear code. Advances in Mathematics of Communications, 2016, 10 (2) : 229-254. doi: 10.3934/amc.2016003

[6]

Serhii Dyshko. On extendability of additive code isometries. Advances in Mathematics of Communications, 2016, 10 (1) : 45-52. doi: 10.3934/amc.2016.10.45

[7]

Bogdan Sasu, Adina Luminiţa Sasu. On the dichotomic behavior of discrete dynamical systems on the half-line. Discrete & Continuous Dynamical Systems - A, 2013, 33 (7) : 3057-3084. doi: 10.3934/dcds.2013.33.3057

[8]

Jianjun Tian, Bai-Lian Li. Coalgebraic Structure of Genetic Inheritance. Mathematical Biosciences & Engineering, 2004, 1 (2) : 243-266. doi: 10.3934/mbe.2004.1.243

[9]

Kathy Horadam, Russell East. Partitioning CCZ classes into EA classes. Advances in Mathematics of Communications, 2012, 6 (1) : 95-106. doi: 10.3934/amc.2012.6.95

[10]

Uwe Schäfer, Marco Schnurr. A comparison of simple tests for accuracy of approximate solutions to nonlinear systems with uncertain data. Journal of Industrial & Management Optimization, 2006, 2 (4) : 425-434. doi: 10.3934/jimo.2006.2.425

[11]

Vadim S. Anishchenko, Tatjana E. Vadivasova, Galina I. Strelkova, George A. Okrokvertskhov. Statistical properties of dynamical chaos. Mathematical Biosciences & Engineering, 2004, 1 (1) : 161-184. doi: 10.3934/mbe.2004.1.161

[12]

Olof Heden. The partial order of perfect codes associated to a perfect code. Advances in Mathematics of Communications, 2007, 1 (4) : 399-412. doi: 10.3934/amc.2007.1.399

[13]

Selim Esedoḡlu, Fadil Santosa. Error estimates for a bar code reconstruction method. Discrete & Continuous Dynamical Systems - B, 2012, 17 (6) : 1889-1902. doi: 10.3934/dcdsb.2012.17.1889

[14]

Paolo Aluffi. Segre classes of monomial schemes. Electronic Research Announcements, 2013, 20: 55-70. doi: 10.3934/era.2013.20.55

[15]

Xiao Wen. Structurally stable homoclinic classes. Discrete & Continuous Dynamical Systems - A, 2016, 36 (3) : 1693-1707. doi: 10.3934/dcds.2016.36.1693

[16]

Christian Bonatti, Shaobo Gan, Dawei Yang. On the hyperbolicity of homoclinic classes. Discrete & Continuous Dynamical Systems - A, 2009, 25 (4) : 1143-1162. doi: 10.3934/dcds.2009.25.1143

[17]

Cicely K. Macnamara, Mark A. J. Chaplain. Spatio-temporal models of synthetic genetic oscillators. Mathematical Biosciences & Engineering, 2017, 14 (1) : 249-262. doi: 10.3934/mbe.2017016

[18]

Xian Chen, Zhi-Ming Ma. A transformation of Markov jump processes and applications in genetic study. Discrete & Continuous Dynamical Systems - A, 2014, 34 (12) : 5061-5084. doi: 10.3934/dcds.2014.34.5061

[19]

Jesse Berwald, Marian Gidea. Critical transitions in a model of a genetic regulatory system. Mathematical Biosciences & Engineering, 2014, 11 (4) : 723-740. doi: 10.3934/mbe.2014.11.723

[20]

Paweł Góra, Abraham Boyarsky, Zhenyang LI, Harald Proppe. Statistical and deterministic dynamics of maps with memory. Discrete & Continuous Dynamical Systems - A, 2017, 37 (8) : 4347-4378. doi: 10.3934/dcds.2017186

2018 Impact Factor: 1.313

Metrics

  • PDF downloads (5)
  • HTML views (0)
  • Cited by (0)

[Back to Top]