The Problem of Information
|
Organism |
Regulatory Protein |
DNA Sequence Recognized |
Bacteria |
Lac repressor |
AATTGTGAGCGGATAACAATT |
Yeast |
GAL4 |
CGGAGGACTGTCCTCCG |
Drosophilia |
Krüppel |
AACGGGTTAA |
Mammals |
GATA-1 |
TGATAG |
Every position in the binding site need not have a specific base 100% of the time to permit correct identification of a binding location. Ambiguity introduced by such inexactness can be compensated for by lengthening the sequence.
Dr. Schneider writes[1] that, ‘The ev model quantitatively addresses the question of how life gains information, a valid issue recently raised by creationists (R. Truman, http://www.trueorigin.org/dawkinfo.htm; 08-Jun-1999) but only qualitatively addressed by biologists)’.
Mutations of an artificial “protein” were simulated with a computer program[1]. ‘The simulation begins with zero information and, as in naturally occuring genetic systems, the information measured in the fully evolved binding sites is close to that needed to locate the sites in the genome.’ ‘The purpose of this paper is to demonstrate that Rsequence can indeed evolve to match Rfrequency.’
Caution. The reader must be warned that the simulation cannot be mapped to a real biological scenario. ‘A small population (n=64) of “organisms” was created, each of which consisted of G = 256 bases of nucleotide sequence chosen randomly, with equal probabilities, from an alphabet of four characters (a, c, g, t).’
What might these 64 living and reproducing organisms, with a total and unchangeable genome 1/4 the size of one typical gene, be? Careful examination of the characteristics assumed in the simulation and references demonstrate these cannot be single nor multiple cell life forms, nor virus nor any known organism. This prevents any kind of model validation.
We read later on, ‘Given that gene duplication is common and that transcription and translation are part of the housekeeping functions of all cells, the program simulates the process of evolution of new binding sites from scratch.’
Lets give this a little thought.
Of course, no attempt was made to show where these miniscule organisms with full transcription and translation machinery came from, nor does the simulation address the production of new genes in any manner. Let us play with the thought experiment anyway.
If such an ancestor, with a genome even smaller than the current 256 bases were to duplicate a “gene", it would waste energy and available material producing unnecessary extra protein during its lifetime and while duplicating its genome. Replication time would be longer than for its competitors and would have greater risk of failure. Even presently unnecessary DNA ballast needed for evolutionary trials and error to produce only a novel binding site represents a significant reproductive disadvantage. This worthless material would represent several percent of the 256 bases assumed for the genome, a very considerable handicap.
It is known [145] [146] that especially small genomes can shed chunks of unneeded DNA rapidly given that these members out-reproduce their competition.
Since sexual reproduction is not meant, let us use the known mathematics of budding or binary fission reproduction[144]:
Suppose that only 1 of the 64 organisms eliminated or did not originally have a significant portion of junk not needed at the time and the remaining 63 continued trying to evolve a new binding site. Suppose that instead of proportionally 10 minutes generation time on average the streamlined member and ancestors now reproduce 10 seconds faster. The advantages of needing less energy, nutrients, less risk of interference with on-going cellular process, etc. we’ll approximate by using a selectivity factor s= 0.0167 (based on 10 / 600 seconds shortened generation time). Since x0 = 1/64 = 0,0156, we obtain from the formula above:
After only 500 generations we do not have 64 members with superfluous DNA to evolve a new binding site, but one 1 survivor!
After 680 generations her chances look dim: > 99.9% of the population no longer has the necessary DNA material for evolutionary experiments.
Obviously, an unnecessary gene duplication to provide DNA to experiment on would introduce proportionally far more worthless DNA than needed for a binding site. Those not suffering such a fate would be at yet a great reproductive advantage.
This is a critical oversight in the simulation(1) which invalidates the whole exercise.
Alternatively, a more complex, free-living organism might tolerate an unnecessary gene better, but such creatures could not possibly survive the mutation rate assumed[1], 1 base per 256 throughout the whole genome, every generation.
For many organisms it appears that about 30% of the predicted proteins are unrelated to others in its own proteome or that of other organisms[4] and gene duplication is a rare phenomena commonly identified with various destructive disorders(1).
Several flaws in the simulation disallow the conclusions claimed. We read, ‘Then we need to apply random mutations and selection for finding the sites and against finding non-sites. Given these conditions, the simulation will match the biology at every point.’ [emphasis added]. This claim will be shown to be incorrect. Objections #1 - #6 document that biologically unrealistic parameter values are assumed by the computer program, which render any claims that binding sites could develop by chance invalid. We next establish that the model does not simulate random evolutionary processes (objections #7 - #26 ) in any biologically reasonable manner.
Biologically unrealistic parameter values are assumed.
Objection #1: The mutation rate is unrealistically high. ‘At every generation, each organism is subjected to one random point mutation in which the original base is obtained one-quarter of the time. For comparison, HIV-1 reverse transcriptase makes about one error every 2000-5000 bases incorporated, only 10-fold lower than this simulation.’
This is a remarkable statement in light of what the authors referenced[5] actually wrote: ‘Our finding, that a limited number of mutations in the HIV genome after exposure to 5-OH-dC has a disproportionately large effect on viral lethality, substantiates the concept that the mutation frequency of HIV is close to the error threshold for the viability of the quasispecies.’[6] [emphasis added]. References supporting this view were supplied[7] [8]. Indeed, ‘most HIV virions in the blood appear to be nonviable.’[9] The virus can only exist due to the huge number of HIV-1 copies produced in an infected individual, about 1010 virions per day[10] [11], and hardly 64 members as in the simulation!
Since ‘transcription and translation are part of the housekeeping function of all cells...’[1] and the 64 organisms supposedly survive autonomously, it becomes increasingly mysterious what these creatures with such a miniscule genome and unheard of mutations rates could possibly be.
There are reasons why these self-destructive mutation rates, which would rapidly accumulate, don’t occur in the biologically relevant double-stranded DNA: ‘In particular, E. coli DNA methytransferase, formamidopyrimidine-DNA glycosidase, and endonuclease III fail to repair efficiently altered substrates when present in the DNA strand of an RNA-DNA hybrid.’[5]
When DNA is replicated, copying errors occur at about one per 108 to 109 nucleotide sites(8). Since in the article[1] it is claimed a billion years would be sufficient for humans to evolve (presumably from some eukaryote-like, non-parasite organism), we need to postulate that a proto-yeast like organism is being alluded to. Let us see where this takes us. For the 13,478 kb yeast (S. cerevisiae) genome[12] comprising about 6217 ORFs (Open Reading Frames), ca. 40% of the ORFs (i.e., 2497[13] at the 1X10-10 p-level) have pressumed orthologues with the simplest multi-cell organism known[13] [14] and such proteins appear to be critical for survival. Then extrapolating backwards, the evolutionary common ancestor would have had a genome of at least 5.4 megabases with perhaps 2400 genes critical for survival, i.e., having virtually no room for error.
The mutation rate of 1/256 used by the simulation indicates that proportionally 21094 random mutations per proto-yeast member on average would have occurred each generation! Many genes would be hit by 10 or more errors every generation (and the errors would multiply during somatic cell replacement in multicellular life forms during the following billion years). Error catastrophe would be inevitable.
In the simulation all of these mutations are dedicated to a single goal. This implies proportionally for the proto-yeast that 21094 mutations are dedicated to fine tuning one specific binding site every generation in every member, and all the individual point mutations are assumed to be flawlessly recognized by natural selection. Were this even remotely true one could easily dispense of simulations and offer empirical evidence.
A spore-forming bacterium from the permian Salado Formation considered to be 250 million year old within the evolutionary dating framework was reported recently to have a complete 16rDNA sequence of 99% similarity with current Bacillus marismortui.[15] This would indicate a base-pair substitution rate < 10-10 per site per year, incongruent with the rate chosen by the simulation.
Objection #2: The proportion of selectively useful single point mutations assumed is unrealistically high. The computer program used a twos complement “points” scheme, assigned to each of the 4 nucleotides for each possible position within the receptor sequence of length L = 6 (see Table 2). Statistically, a point mutation on a random genome by these arbitrary rules (at either the DNA binding location or the protein represented by the weight matrix) would have the same chances of being positive or negative, to increase or decrease the viability of a genome. The biological statement would be (depending on the tolerance used) that about 50% of any point mutations would generate an improved new binding relationship the very first generation, starting from a totally random genome, followed by disminishing returns thereafter (the absurdity of this implied assumption should be apparent). The offspring then get flawlessly selected.
Recalling that ‘Generation of the weight matrix integers from the nucleotide sequence gene corresponds to translation and protein folding in natural systems’[1] it is unrealistic to assume half of all possible point mutations on a random genome would automatically allow a “better”, exactly L=6 bases long sequence, to be identified: ‘Most single-base changes in promoters and ribosome binding sites decrease synthesis by 2- to 20-fold’ (Mulligan et al., 1984; Stormo, 1986).[16] Random sequences, very far removed from a functional one, would continue generating non- functional sequences via random mutations >>99.9999... % of the time, and evolution cannot look ahead to select a suitable candidate.
Cell regulator activities must occur at the correct location, and the simulation badly underestimates the effects mutations have. Dr. Schneider pointed out correctly elsewhere, ‘With this theorem in hand we can begin to understand why, under optimal conditions, the restriction enzyme EcoRI cuts only at the DNA sequence 5' GAATTC 3’ even though there are 4096 alternative sequences of the same length in random DNA. A general explanation of this and many other feats of precision has eluded molecular biologists.’[17] [emphasis added].
Recall that a nucleic acid binding site is supposed to be evolving via random mutations, as well as the recognizer protein. This must result in very precise three dimensional interactions which involve H-bonding, hydrophobic, and other stabilizing interactions. For gene regulatory purposes, at least one more domain must be present in the protein, capable of interacting with the transcription machinery[18]. Finally, the rest of the protein must ensure all parts fit together geometrically by folding properly. These requirements must be met concurrently to a very high level of precision before any kind of Darwinian selection can be invoked, inconsistent with the computer program which assumes instant “improvement” starting from a random genome.
In the discussion we shall see that a realistic estimate for the proportion of minimally functional to totally non-functional proteins is very small, on the order of 10-44. The proportion of acceptable gene sequences coding proteins can only be even lower. The simulation would have to stumble on an acceptable sequence of such unlikelihood, beginning with a random state, over countless generations, before any kind of selection could enter into play. The approximately 50-50 chance assumed by the computer program is unjustifiable and gets evolutionary progress off to a roaring start precisely at the point where all evolutionary conceptual models have the greatest difficulty.
Of the vast number of possible folded geometries, a miniscule subset, on the order of 10-44, would even have a properly folded topology[19] [20] [21] [22] [23] within which the recognizer site would have to be developed. Even should half of the 4 bases (A,C,G or T) be acceptable at every position of the mini-gene (to code for a stable folded protein), one expects for the 64 member population in the simulation a probability of roughly
64 X (0.5)125 = 1.5 X 10-36 (1)
per generation of obtaining the first candidate mini-protein (which is coded for by only 125 base pairs according to the paper[1]) upon which natural selection would have a chance to start working. Even assuming a generation time of 1 second, in 10 billion years (< 3.2 X1017 seconds) we’d have essentially zero chance of even getting started.
This objection alone renders the whole exercise meaningless.
Objection #3: Countless point mutations are assumed to instantly provide reliable binding interactions. Unlike the fictitious positive and negative integers used in the simulation, in earlier papers the weight matrix was derived using real data on functional sites(10) [143]. Known binding sites were selected from genbank, lined up and the proportion of each of the 4 bases found at each position of a sequence was determined (see Appendix).
Binding of a protein to DNA or RNA is rarely the simple matter implied by the computer program, but generally requires cooperation with other carefully crafted proteins(11). For example, transcription in eukaroytes is regulated by a group of gene-specific activator and repressor proteins[24] at specific binding sites. Simulating the production of one recognizer member of such ensembles by random point mutations has not been justified nor validated as being biologically conceivable. Instead, an arbitrary proportion of positive and negative integers in the computer program defined how to converge towards a short term goal flawlessly irrespective of any biological selective significance or stochastic effects.
How is chance to know a random mutation would lead towards developing a binding interaction? ‘Rsequence does not tell us anything about the physical mechanism a recognizer uses to contact the nucleic acid.’[25]
Lacking any intelligence to choose, 3 dimensional shapes on the regulatory protein must be generated to permit the exact binding with a specific DNA sequence, like a well-meshed machine. That is why a methionine-carrying tNRA is able to identify a very short sequence on mRNA, AUG, and position a physically large m-RNA properly at the ribosome complex: it is due to the specialized geometry prepared at the ribosome’s P site. There is nothing biologically remarkable about AUG alone. Crystallographic, molecular modelling and cryo-electron microscopy studies have shed insight as how such feats are possible. Translating an mRNA strand one codon at a time requires the whole ribosome complex to act in a synchronized fashion, aptly described as a rachet-like mechanism[26]. The cell’s survival depends on ribosomes being able to locate the binding sites correctly[27](12).
Exactly how polypeptides are supposed to be able to identify that a location is or will become a useful binding site is deemed irrelevant: ‘As mentioned above, the exact form of the recognition mechanism is immaterial because of the generality of information theory.’[1] Quite the contrary, for a realistic evolutionary simulation such physical details are critically relevant, and is a fatal oversight in the simulation. It is assumed random point mutations provide half the 64 member population with a 100% effective survival advantage, based on fine tuning of a single type of binding site under development. This is geometrically and thermodynamically unrealistic. Developing such precise binding interactions, one random mutation at a time, has nothing to do with the mathematics of information theory and needs to be quantitatively simulated based on physical realities. Any assumption of recognizable Darwinian selectivity for the intermediate stages needs to be quantitatively justified.
The requirements on recognizer and binding site are generally very stringent a must be close to perfect to be of any use whatsoever(13).
Objection #4: The rate of selection is unrealistically high. A standard textbook on cell biology reports[28] the average times evolutionists assume are needed for one acceptable amino acid change per 100 in specific proteins. The fastest rate reported within the evolutionary model required 0.7 million years for fibrinopepetide, and the slowest was for Histone H4, with 500 million years. In another place we read, ‘only about one nucleotide pair in a thousand is randomly changed every 200,000 years.’ [29]
This is incompatible with Dr. Schneider’s claim that his simulation ‘is within the range of natural population change.’ The computer program required only 704 generations to create a new binding site type at exactly 16 positions on a genome with its novel recognizer protein from scratch. After adding up all the “points” at each possible binding site using the current weight matrix (and a cutoff score of -58), the 32 members scoring lowest (selectivity s 1 !) magnanimously discontinue their ancestors’ eons of hard evolutionary work. The half having the less desirable status, due to a single point mutation, get pin-pointed every generation 704 times in a row without exterminating the future of higher life forms.
Since real world selection coefficients (based on major mutations and not mere single point mutations) are proposed to be on the order of 0,01[30] or less, one would expect some justification for the 100-fold greater rate chosen. Simpson felt s=0,001 may be too low, but 0,01 could be taken as a “frequent value” (i.e., might occur now and then).[31] Artificial laboratory settings or antibody resistance in hospital settings (with necessarily much larger population sizes to avoid killing the population off) are not representative of a natural setting relevant to an evolutionary scenario.
A small and non-growing population of 64 members was chosen for the simulation. Fisher’s analysis showed that a selection coefficient even as great as S=0,1 would have only a 2% chance of fixing in a population of 10,000 or more.[32] Lacking is the justification how the population would be limited to 64 members for at least 704 generations. I demonstrated above that instead of having 64 members with superfluous DNA to tinker with, long before 704 generations we’d have none at all. Presumably these organisms are submitted to catastrophic environmental conditions to justify the maniac selectivity coefficient implied, but the simulation disallows the possibility of failure, that a generation might not pass on viable progeny(7).
Objection #5: Degeneracy of the genetic code, sexual dilution, and other factors are ignored. The degeneracy of the genetic code has been neglected. A protein is being represented by the weight matrix, and an approximation (Table 3) suggests that on average roughly 24% of all point mutations would generate the same polypeptide starting from a random DNA sequence (this assumes for estimation purposes that mutational transitions and transversions are all statistically the same). With about 1/4 mutations producing the same amino acid, the credibility of the scoring assumptions is further strained.
Given the close correlation between number of synonym codons and proportion of corresponding amino acid present, the genetic code may have been designed partially to help retain protein functionality by protecting against point mutations(9).
Recessive mutations and dilution of point mutations by sexual reproduction are not considered although it is claimed the rates in increase in Shannon information content can be quantitatively extrapolated to explain the origin of the human genome[1].
Objection #6: The final state is not stable. Having somehow achieved the miraculous, how long might these organisms manage to stay balanced on Mount Improbable? ‘When selective pressure is removed, the observed pattern atrophies (not shown, but Fig. 1 shows the organism with the fewest mistakes at generation 2000, after atrophy) and the information content drops back to zero (Fig. 2b). The information decays with a half-life of 61 generations.’[1]
This confession closes the case decisively. Removal of what amounts to an intelligently driven selection allows the genomes to randomize rapidly. No naturally stable increase in Shannon information has been demonstrated.
This outcome is to be expected due to the catastrophically rapid mutations assumed with the guarantee the population will not perish. The highly contrived mathematical characteristics describing this small population has no resemblance to the multiple survival challenges real organisms face in nature.
The model does not simulate random evolutionary processes.
Objection #7: Foreknowledge of the sequence length, L, is provided to the computer program. Evolution somehow knows that one, and only one, binding site type only, of length exactly 6, is to be developed. In Table 1 we summarize some examples of binding sites with L ranging between 4 to 51 bases. Some recognizers, such as the H-NS protein, can interact with binding sites of various lengths, and is affected by the protein’s concentration.
In addition, binding sites need not be contiguous, there may be spacers between conserved portions of the same binding site(2). A legitimate simulation needs to consider competing sequences of at least L= 4 to ca. 51 concurrently with no foreknowledge as to an intended outcome: survival and increased reproduction rate can have multiple causes and cannot be simply attributed to the change we wish to favor. If a randomizing process can go in every direction at once, so be it.
Objection #8: Foreknowledge of the required number of binding sites, , is assumed. Although the author claims chance can generate binding sites “from scratch", finding the necessary number of a new kind of site through biological trial and error, in the face of multiple survival challenges, has not been simulated: this number was conveniently provided to the computer program(3).
Current physiology implies something already functional, which can hardly be a random starting point. A multitude of unrelated types of binding interactions exist to regulate genetic control elements and evolution cannot know in advance the necessary number of even one of them. We read,
‘The bacterium Escherichia coli has approximately 2600 genes, each of which starts with a ribosome binding site. There have to be located from about 4.7 million bases of RNA which the cell can produce. So the problem is to locate 2600 things from a set of 4.7X106 possibilities, and not make any mistakes. How many choices must be made? The solution to this question, log2(4.7 X 106/ 2600) bits, is “obvious” to those of us versed in information theory...’ [=Rfrequency = 10.8 bits][27]
Beginning with random base and recognizer sequences, the binding interaction is to be optimized and converge on the needed number of bits of Shannon information to uniquely identify, but not unduly overspecify, an ensemble of addresses or locations on the genome. However, at any point in time the computer program already “knows” what Rfrequency value is biologicaly needed and no attempt was made to simulate a trial and error process of finding this value over many generations. Evolution has been provided with foreknowledge.
Random sequences on DNA will not interact reliably with random polypeptides in any biologically sensible manner: the selection being provided is strictly a mathematical artifact, nothing real is being simulated. Random mutations cannot know in advance that an Rsequence of 4 and not 2.8 or 20.3 bits needs to be converged on, to permit = 16 binding sites to be located reliably.
Objection #9: Binding sites must be correctly located with respect to the genetic element being regulated. Real binding addresses must be judiciously placed at suitable locations on DNA to permit specific cellular processes to be regulated. This is very different than merely having the correct number, , of binding sites. The correct sequence at the wrong place can cause havoc, and the trial and error process to get these placed corrected was not simulated: ‘Level 1 theory explains the amazingly precise actions taken by these molecules. For example, the restriction enzyme EcoRI scans across double helical DNA (the genetic material) and cuts almost exclusively at the pattern 5' GAATTC 3’, while avoiding the 46 - 1 = 4095 other 6 base pair long sequences. How EcoRI is able to do this has been somewhat of a mystery because conventional chemical explanations have failed.[27]
A trial and error process would have destroyed countless organisms and removed such destructive evolving machinery long before stumbling on a working scheme(4),(4b). Obtaining a suitable number of recognizers to find a matching binding site is a necessary but insufficient cellular requirement. We read, however,
‘... as a parameter for this simulation we chose =16 and the program arbitrarily chose the site locations.’[1]
Even a perfectly functional binding site cannot simply be placed anywhere to regulate a specific gene! (We’ll ignore the matter of where these genes being regulated came from in the first place, and whether they would function without the regulatory elements still to be evolved.) However, the twos complement scoring scheme[1] permits the computer program to pick binding locations arbitrarily, and calculates “points” based on how well the evolving sequences and regulating element match up. It is simply assumed the non-binding portion of the regulatory protein will now be able to function, being neither too close nor far from the gene being regulated. No trials and errors are simulated to get a proper binding sequence located at an acceptable distance.
Objection #10: Selection is intelligently driven. Careful reading reveals not a simulation but a designed convergence algorithm. The two matrices of numbers, plus a tolerance score, define goals which can change slightly across generations. The immediate goals are instantly known and flawlessly acted upon by the computer program, with no consideration to survivability uncertainties. By retaining half the highest of 64 scores every generation the process of being intelligently guided. The 2 matrixes converge far more quickly than random changes are allowed to separate them. The rules established are:
‘A section of the genome is set aside by the program to encode the gene for a sequence recognizing “protein”, represented by a weight matrix consisting of a two- dimensional array of 4 by L = 6 integers. These integers are stored in the genome in twos complement notation, which allows for both negative and positive values... By encoding A = 00, C = 01, G = 10 and T = 11 in a space of 5 bases, integers from - 512 to +511 are stored in the genome... Each base of the sequence selects the corresponding weight from the matrix and these weights are summed. If the sum is larger than a tolerance, also encoded in the genome, the sequence is “recognized” and this corresponds to a protein binding to DNA.’[1]
I worked out the scoring matrix according to the rules[1] used by the simulation (Table 2). Should this not be correct I hope for clarification. A binding site of L=6 could score between 6 X 512 and -6 X 511 “points”. For =16 sites, the range for a genome falls between -49056 and +49152. The 32 high scorers always kill off exactly the 32 low scorers with enviable military precision and no collateral damage. A one point difference can flawlessly make the difference between life and death, no stochastic effects are allowed. Incredible as this level of selection appears to be, ‘To preserve diversity, no replacement takes place if they are equal.’[1] Evolution has been granted skills beyond even Maxwell's Demon: a correct choice is made between entities which are quantitatively indistinguishable! The effects on the simulation of this innocent appearing decision has not been discussed[], but the Pascal source code found on the web site[2] does shed some light: ‘SPECIAL RULE: if the bugs have the same number of mistakes, reproduction (by replacement) does not take place. This ensures that the quicksort algorithm does not affect who takes over the population. Without this, the population quickly is taken over and evolution is extremely slow!'[2] [emphasis added].
Identifying the most suitable sequences to serve as binding sites is physiologically not so straightforward, and the biologically optimal binding sites are not always the strongest physically(5). Indeed, nature presents us with many examples of biologically sensible solutions which are unexpected if derived under natural, unguided conditions. For example, Weindel has pointed out[33] that under pressumed Ursuppe conditions (Formosa reaction) many sugars are generated whose nucleotides form stronger base-pairing (A::T; G:::C) than occurs with D(+)Ribose (as determined by melting point studies in his laboratory). Although thermodynamically preferred when compared to existing RNA-DNA and DNA-DNA interactions, these chemical options are biologically unsuitable since the strands would not be separable as required by cells. Such observations cast severe doubt on the claim enough time and the right conditions suffice to explain the origin of life. An evolutionary process cannot plan for the future and choose to ignore known chemical kinetics and thermodynamics.
Objection #11: No provision is made for the proportionally greater destructive possibilities. In an earlier paper we see how sensitive binding sites can actually be towards a single point mutation: ‘For example, the E. coli genome should contain about 1000 EcoRI restriction enzymes sites (G-A-A-T-T-C), but that same genome should also contain about 18,000 sequences one nucleotide removed from an EcoRI site. Site recognition by and action of EcoRI within E. Coli must include enough discrimination against the more abundant similar sites to avoid a fragmented genome.’[16] Not only is the proportion of useful point mutations unrealistically modelled[1], but the proportion of almost correct (but deadly) to acceptable sequences is very large, and this has not been accounted for in any manner in the simulation.
Other examples include the 6-base TATA box for which a single base mutation drastically damages transcription by RNA polymerase II[34].
Objection #12: The simulation assumes all organisms in that population face one and the same goal which is to be optimized. ‘The organisms are subjected to rounds of selection and mutation. First, the number of mistakes made by each organism in the population is determined. Then the half of the population making the least mistakes is allowed to replicate by having their genomes replace (“kill”) the ones making more mistakes.’
The nature of this highly focused selection which could drive fine-tuning of a single kind of binding process was not discussed. There are many possible reasons for an organism to die without producing offspring given that each organism faces a variety of challenges. The effects of a mutational proportion of 1/256 bases in the genome could affect any of many cell processes in the real world and will hardly allow optimization for a single and the same goal every generation, driving fine-tuning of one kind of binding site, base-pair at a time.
Should all evolutionary selection be focused on one goal, deleterious mutations elsewhere would not be eliminated. The net effect would certainly not be an overall net decrease in gene sequence randomness.
Furthermore, overlapping binding regions serving unrelated functions by different proteins could not be selectively identified and fine-tuned using this biologically over- simplified single-goal scenario(6). A particular base mutation at one binding site may facilitate recognition by one recognizer, but be selected against since another one has become less effective.
It has been suggested that the existence of higher sequence conservation than needed to locate the same binding type sites implies the existence of other recognizers interacting at those locations also. An Intelligently Designed possibility could be entertained: extra robustness had been built in and that randomization has not proceeded long enough to remove the excess Shannon-type information.
Chauvin has pointed out[35] that fly resistance to a single substance can be developed but cannot occur if the population faces 5 toxic products simultaneously. He believes such claims of co-evolution are merely laboratory artifacts. Whether countless random mutations could be guided by differential reproduction to produce structural novelty is highly improbable. Careful thought could begin with examples where no plausible selective advantage can be offered for either the intermediate steps not final result, such as the fact that some edible varieties of butterfly can mimic in appearance perfectly another species which is perfectly edible[36].
Objection #13: Binding sites generally require many novel biomolecules to function. In a recent article[37] the atomic structure of the large ribosomal subunit of Haloarcula marismortui was reported. 3045 nucleotides plus 31 proteins are involved. Ribosomes can be inactivated by cleaving of a single covalent bond in the SRL (sarcin-ricin loop) of the 23S rRNA component. As the authors point out, ‘[The] ribosome assembly must be accompanied by a large loss of conformational entropy’ and ‘Of the 2923 nucleotides in 23S rRNA 1157 make at least van der Waals contact with protein.... to immobilize the structures of these molecules.’ Only now can 3 recognizers, which participate directly in the protein synthesis, perform properly at the intended binding sites: ‘Rather than being included in the ribosome to ensure that the RNA adopts the proper conformation, it seems more appropriate to view the RNA as being structured to ensure the correct placement of these proteins.’ Precisely as expected if Intelligently Designed.
One cannot first create one protein then start developing the other components afterwards(14) while hoping the first remains intact over time. As an example, UBF activates transcription by relieving repression caused by an inhibitory factor which competes for binding of TIF-IB to the rDNA promoter. This is not left to chance, but UBF can be interfered with by pRb which seems to act as a signal which links the cell cycle with multiple components of the transcriptional machinery[38]. As a rule, several protein interact and these must be able to penetrate the nuclear membrane where gene expression can be regulated.
The simulation makes no attempt to see whether chance could attain a minimum level of functionality to enhance viability and upon which selection could then begin to work.
Objection #14: Structural features of DNA may serve as relevant or incorrect binding sites. Where the binding site is located is biologically critical, and there are various possibilities(15). A legitimate simulation needs to mimic the trial and errors needed to identify a binding address and all the attempts to create a useful cellular outcome under the control of such binding interactions.
The same or similar binding sequences on different portions of the genome can produce very different or even contradictory effects, no weight matrix exists a priori to guide evolution towards a parsimonious, multi-goal state.
Objection #15: The same binding location can be used by different proteins to regulate important processes. How often one point mutation at a single binding site would really lead to a selective advantage when this affects the address multiple proteins use has not been simulated. Sharing or competing at the same or overlapping sites is well known(16).
Objection #16: The same proteins can affect many unrelated genes concurrently. For example, mutations in E. coli hns alter the expression of many genes with unrelated functions[39](17). The same binding protein can interact in different regulatory complexes at the same binding site: mycN/max heterodimers probably activate and max/max homodimers repress transcription of, as yet, unidentified target genes upon binding to the DNA sequence CACGTG.[40]
Objection #17: The same protein may contain multiple recognizer sites which can be used for unrelated binding purposes. Should a protein already have been fine-tuned for a specific function, adding post facto another recognizer site without interfering with the geometry, folding order and so on of the previous function would require a multitude of random trials. Alternatively, building multiple recognizer sites concurrently creates formidable constraints. The existence of multiple sites is well-established, such as the A/B pocket and the C-terminal domain of pRb[38].
Objection #18: The same protein can be a transcriptional activator and repressor depending on the gene it acts on. Examples abound of proteins accelerating transcription of one gene and slowing down that of another(18). Notice how the program trivializes such realities. The reader is invited to give careful thought as to how many random attempts might be needed until chance mutations were to stumble, through only viable intermediate regulator protein structures, on solutions compatible with the contradictory cell requirements.
Objection #19: Regulatory proteins need to be transferred to the correct location in the cell. Regulatory activities can occur in different organelles, and biochemical activities in various portions of a cell can determine whether a protein will penetrate the nucleus and then locate the intended binding site(19). How this requirement, which many binding sites must fulfill before they can function, could be developed by trial and error point mutations is missing in the simulation.
Objection #20: Regulation pathways by binding proteins can involve multiple proteins. Representative examples include: Xvent-1 (a homeobox gene, and goosecoid interact in a cross-regulatory loop suppressing each other's expression)[41]; E1A proteins (transformation and transactivation are mediated through binding to pRb, p107 and p130, and the TATA box binding protein TBP[42].
Furthermore, regulatory proteins often form symmetric dimers (two identical proteins) or asymmetric ones[18], with each member binding to different regions on DNA (see Figure 1). Notice that symmetric schemes now require duplicate sequences on DNA, both at the correct location, which would have to develop by random mutations while providing biological functionality during the whole process. |
Objection #21: Regulation at binding sites may require fine-tuned interaction with other chemical processes. One may consider SL1, which is inactivated by cdc2/cyclin B-directed phosphorylation, and reactivated by dephosphorylation. This allows SL1 to work as a switch to prevent pre- initiation complex formation and to shut down rDNA transcription at mitosis[43]. Requirements such as these illustrate the large number of neglected trials needed before a binding interaction can fulfill a minimum functionality.
Cations are also used as part of regulatory signals. ‘The restriction enzyme EcoRI is a protein which cuts duplex DNA between G and A in the sequence 5' GAATTC 3'. In the absence of magnesium, binding is still specific but cutting does not occur.’ (17)
Since Dr. Schneider's simulation uses a binding length of L=6, we can consider a well-known process of this length which relies on selective methylation. ‘In vivo cellular DNA is protected from EcoRI by the actions of another enzyme called the modification methylase. This enzyme attaches a methyl group to the second A in the sequence GAATTC, so that EcoRI can no longer cut the sequence. In contrast, invading foreign DNAs are liable to be destroyed because they are unmethylated. The methylase is precise, attaching the methyl only to GAATTC and not to any of the sequences, such as CAATTC, that differ by only one base from GAATTC... How a single molecule of EcoRI can achieve this extraordinary precision has not been understood.’[17]
'For example, if the restriction enzyme EcoRI did not reliably and repeatably recognize one pattern, GAATTC, the bacterium might die by the destruction of its own genetic material. Likewise, if a DNA polymerase did not reliably insert adenosine opposite every thymidine, many mutations would occur.’[44]
This scheme could only work after the modification methylase were already present and fine-tuned to attach under the correct circumstances. EcoRI, a binding sequence, and additional components must all be in place within an acceptable tolerance before any kind of selective advantage would be measurable.
Objection #22: Regulation often needs to be achieved for a specific (or across different) cell type. Consider as an example IL-4 (inappropriate multi-organ expression leads to autoimmune-type disease in mice)[45]
In gene therapy the administered protein is a less than satisfactory substitute for a protein physiologically regulated by its origination in a specific tissue. That is why injected insulin cannot control blood glucose sufficiently well to prevent all diabetic crises, let alone the slow tissue damage and complications that lead to premature death.[46] However, the simulation[1] assumes chance only needs to generate two regulatory components: the binding site and part of a protein.
Proportions of mRNA generated needs to be carefuly regulated, and can vary considerably according to specialized cell type. For example, the alpha-fetoprotein gene in a mouse results in 200 times more mRNA in the yolk sac than the gut[47]. The regulation of expression is fine-tuned according to cell type. In addition, the pattern of expression of a specific gene can differ significantly depending on exactly where it is placed in a genome.
Objection #23: Regulation needs to be achieved according to stage in cell life. As an illustration, H-NS functions as a global inhibitor of gene expression during the cell's exponential phase of growth[48] The trial and error attempts to be minimally functional has been neglected in the computer program.
Objection #24: Different promotors can act on the same gene producing isoforms. These can be tissue and cell dependent(20). The scenario of development by one random point mutation at a time leaves such observations unexplained. As an illustration, six hERalpha mRNA isoforms are produced from a single hERalpha gene by multiple promoter usage. All these transcripts encode a common protein but differ in their 5'-untranslated region as a consequence of alternative splicing. A differential pattern of expression of the hERalpha gene in human tissues and cell types was found.[49]
Objection #25: An acceptable proportion of regulatory binding protein or complex must be generated and regulated as needed before natural selection can act. Vastly different levels of protein are found in cells and these change as needed(21). Table 4 demonstrates the wide distribution of mRNA molecules in a typical mammalian cell, which ranges from about 5 copies to over 12,000[47]. An overabundance would prevent fine-tunning of binding sites[39] (21b), whereas too small an amount could preclude enough cellular value to be selectively identifiable(22). Literature abounds demonstrating gene expression cannot be too low nor high(23).
Objection #26: Irreducible complexity, a fact of cellular processes, is glossed over. The simulation allegedly 'is representative of the situation in which a functional species can survive without a particular genetic control system but which would do better to gain control ab initio. Indeed, any new function must have this property until the species comes to depend on it, at which point it can become essential if the earlier means of survival is lost by atrophy or no longer available. I call such a situation a “Roman arch” because once such a structure has been constructed on top of scaffolding, the scaffold may be removed, and will disappear from biological systems when it is no longer needed.'
Using lack of evidence as proof for an argument is rarely convincing (such as Punctuated Equilibrium being true due to the lack of transitional forms in the fossil record). Before a particular polypeptide could be available for a new function, evolution is now required to have produced it plus additional components for a preceding use, also by chance mutations. Each individual precursor now also requires an ensemble for a yet earlier functioning complex. This argument requires at best a starting point and at worse merely increases the implausibility of obtaining each needed biochemical component.
Objection #27: Recognition of binding sites does not cover the miracles evolution is suppose to explain. We read, 'Second, the probability of finding 16 sites averaging 4 bits each in random sequence is 2-4X16 5 X 10-20 yet the sites evolved from random sequences in only ~ 103 generations, at an average rate of ~ 1 bit per 11 generations.' As pointed out, this was achieved by distorting cellular realities to the point of biological irrelevance. But now an extrapolation is made from what is a relatively trivial fine tuning challenge for evolution to a grandiose claim:
'Likewise, at this rate, roughly an entire human genome of ~4 X 109 bits (assuming an average of 1 bit/base, which is clearly an over-estimate) could evolve in a billion years...'
This is indeed a remarkable extrapolation! A mutational rate of 1/256 bases on average throughout the whole genome would have to apply to multi-cellular organisms also. Visualize what your child would look like after cell fertilization and the following 50 or so cellular duplications. Almost 0.5% of the bases get scrambled 50 times in a row (recall that a perfectly random distribution of bases on DNA implies a 1/4 chance any base will show up at each position). Should even one gene in a somatic cell remain functional, subsequent cell replacement during the lifetime is sure to wipe it out also. This process is then to be repeated to produce lovely, bouncing grandchildren. A free-living organism would not last very long with such flawed DNA duplication and error correction mechanisms.
Major issues for which no plausible solutions by chance mutations have been offered to date have not even been addressed. Examples include: how sexual reproduction could have arisen; the existence of multi-cellular organisms(30) with specialized cells and integrated functionality; and biological novelty demanding the interaction of large numbers of genes, such as in sonar and sight. Even had a plausible simulation been offered which demonstrated that one type of binding site could be generated ab initio via random mutations, an extrapolation to evolutionarily unexplained and unrelated problems is unwarranted.
Evolutionary theories need to account for the creation of novel biological functionality, which includes explaining how new genes might arise. Consulting the Munich Information Center for Protein Sequences, we determine that yeast, the simplest eukaryote cell known, has a sequenced length of 13.5 megabase[12] coding for about 5929 different kinds of proteins, about 30% with no known homologues[50]. The worm Caenorhabditis elegans is the simplest multicellular animal showing complex development and a differentiated nervous system[51] and has 959 cells. Its 97 megabase genome[52] codes for about 19,099 proteins[51] (three times more than yeast). Chervitz et al.[13] compared the sequences of yeast and the worm. Around 40% of yeast ORFs (Open Reading Frames) appear to have counterparts in the worm, and 20% of worm ORFs were found in yeast and seem to be indispensible. Most significant is that 34% of the predicted proteins are found only in other nematodes[51]. Conversely, many important proteins in yeast are not found in the worm[13]. In fact, a large number of domain structures are not shared at all (Table 5).
Somehow a vast number of correct base pair sequences need to be incorporated into genomes without producing chaos. For ‘information’ as used here[1], an increase in genome size and restriction of genes to subsets of allowable sequences represent increases in information content as defined by Shannon. Fine tuning of one kind of binding site is admitedly a very modest part of what needs to be explained.
One cannot brush off the objections introduced above by reasoning that “the principle is what matters”, and the alleged convergence merely requires a much greater number of generations if modelled more accurately. Mutations by nature randomize those acceptable DNA sequences responsible for biologically useful functions. Increase in information content, as defined, to optimize or create new function requires this trend to be reversed. Any claim that random mutations plus selective reproduction would work must be realistically and quantitatively modelled to justify the claims the statistically unexpected trend could actually have occurred.
The problem can be broken down into two components. (a) Random trials would be simulated until all constraints outlined above are satisfied unto the minimum point where reproductive selection could be sensed in a Darwinian sense; (b) thereafter, additional trials would be simulated where selection, unguided by a long-term goal, would increase favorable mutations throughout a population. The simulation neglects aspect (a) entirely, by assuming extraordinarily fast mutations rates and proportions of useful point mutations available to random sequences, which would instantly be selectively acted upon with no possibility of extinction. Let us identify some of the minimal constraints binding sites must satisfy before any kind of selection were to be possible.
Since one or more proteins will be involved in binding to a portion of nucleic acid polymer strand we need a realistic probability of getting a protein with minimal functionality. We begin with random amino acid sequences, since evolution eventually starts with a biologically non-functional state. Yockey has done extensive calculations using Shannon's information theory on the cytochrome c family:
'Cytochrome c is the best candidate for the first application for a number of reasons. The list of sequences reported in the literature includes the largest number of species for any protein and also covers a wide range in the taxonomic scale.'[53] (24) It is possible additional synonyms could be tolerated, resulting in lower Shannon information.
Cytochrome c seems reasonably representative of presumably very ancient genes 'We find in Dayhoff’s list (1978) that proteins which are regarded as ancient or even precellular such as certain domains and structure of glyceraldehyde 3-PO4 dehydrogenase, lactate dehydrogenase, glutamate dehydrogenase ferredoxin and the histones have a mutation rate which is nearly the same or smaller than that of cytochrome c. It is therefore reasonable to believe that they have the same or larger information content.’[53]
Other studies confirm the intersymbol independence of protein residues and restricted number of functional members within a protein family[54].
To ensure we are not being too demanding, let us assume that all the possible varieties of cytochrome c would have been usable by the first organism in which it supposedly first evolved. This is unlikely to be true. Yockey has pointed out that ‘Fitch & Markowitz (1970) have shown that as the taxonomic group is restricted the number of invariant position increases.’[53]
To the list of all currently available sequence data, Yockey generously added all amino acids which might be tolerated by cytochrome c at each position. This allowed him to calculated[147] via Shannon’s information theory the number of minimally functional cytochrome c members. He also calculated the total number of polypeptides sequences 110 residues long (excluding sequences very unlikely to be generated, having many residues rarely used in nature).
His work shows that for every functional member, random mutations would have to generate and test
5 X 1043 (2)
non-functional variants.
We can now evaluate objectively the claim[1] that 64 random genomes could produce a novel binding site, with regulator protein, in 704 generations by random point mutations, starting from total random sequences. Furthermore, I have already pointed out that within just a few generation we would no longer have all 64 members with necessary but presently superfluous DNA material to develop the new binding site.
This is the estimated proportion of minimally functional to worthless residue sequences for the best studied protein to date. Is this proportion unduly small for genes overall? For histone H4, alcohol dehydrogenase or glyceraldehyde-3-phosphate dehydrogenase it is orders of magnitude too generous[57] [58] and other considerations suggest comparably infinitesimal proportions must be overcome on average to produce new proteins before selection could begin to fine-tune(25).
It is noteworthy that some domains, which are highly invariant but key portions of proteins which interact with specific DNA sequences, although only part of the protein are alone larger than cytochrome c[59]: POU (~160 amino acids), CTF DNA binding domain (132 acids) and CFT proline-rich domain (143 amino acids). Proteins containing such domains must not only be properly folded but possess additional functioning domains to be of any biological use.
Vague references to co-evolution using existing parts for new purposes merely shifts the problem elsewhere. If the odds of obtaining that protein, but for a different ancestral purpose, is similar then nothing has been solved. One merely introduces additional difficulties, such as the need to explain how that protein and the members of the earlier function arose. Should n = 5 structurally unrelated genes be involved in the preceding function, then the odds of obtaining a functional ensemble becomes a number such as[2] raised to the 5th power. Thereafter one needs to demonstrate there is a viable path accessible by random mutations which can connect the preceding ensemble of components with that protein’s new use, and that all evidence for the ancestral complex was then conveniently eliminated.
Experimental studies on acceptable sequences based on protein folding by Sauer using arc repressor[19] and lambda repressor[21] suggest Yockey’s estimate[2] is far too generous, at least for average-size proteins. Sauer estimated that about one out of 1065 of polypeptides he studied are able to fold properly (one of many requirements for useful proteins), a number Behe[22] has compared to successfully guessing a grain of sand in the Sahara desert three times in a row. Let us accept all known evidence and accept that a very small proportion of polypeptides would be biologically useful. Let us tentatively accept 5 X 1043 as representative also for DNA base sequences. The simulation[1] needs to account for the generations needed to produce the first minimally functional protein from a random sequence.
Assuming a generation time of only 10 minutes for a billion years would provide 109 X 365,25 X 24 X 6 = 5.3 X 1013 generations. A population 64 members on average would have a chance on the order of
6.8X10-29 (3)
of stumbling on one minimally acceptable gene sequence before selection could start optimizing a binding interaction (the need for it to also have binding sites and be minimally regulated has been neglected). This assumes all point mutations could occur and do not concentrate on a limited number of hot spots[60].
It is apparent that 704 generations in total could not possibly suffice for 64 descendants of 64 random genomes to produce a novel binding site optimally at 16 locations as claimed[1]. This illustrates the fact that the simulation[1] is not dealing with anything biologically relevant. The process has essentially zero probability of even getting started.
Having one minimally functional gene, a realistic computer model would next simulate competition between degradation of this sequence and developing a novel binding site. Since evolution cannot look ahead, multiple sites of varying lengths L, generated by random point mutations, must be tested by trial and error concurrently, each with a full complement of regulatory elements. Suitable point mutation selectivities and population genetics assumptions need to be identified.
What kinds of odds are faced in developing just one of the hundreds of already identified DNA recognition sites[3], each used by a different specific gene regulatory protein (or set of regulatory proteins), starting from scratch using random point mutations? In total an eucaryotic cell has thousands of different gene regulatory proteins. A realistic simulation must mimic the process of satisfying multiple requirements by the recognizer protein and DNA/RNA binding site at a minimum level of functionality before any kind of Darwinian selection could be assumed. Some of the factors neglected by the simulation under consideration have been identified.
In Stage 1, a true simulation would run through random trial and errors attempting to satisfy several constraints before selection could be invoked. Any and all forms of selection for any biological purpose which would increase the proportion in a population is meant here, not only with respect to a specific future binding interaction. Once all constraints are met, the second stage, with selective advantages, would be simulated.
Proportion of recognizers before selection |
|
P1 |
With an acceptable stable tertiary structure |
P2 |
With an acceptable recognizer site |
P3 |
Generated reliably within an acceptable concentration range |
P4 |
Not interfering with other genetic processes (repressor vs. activator) |
P5 |
Transferred to the correct cellular compartment |
P6 |
Located in the correct cell type of multicellular organisms |
P7 |
Acting during an appropriate portion of the cell life |
P8 |
With at least one additional functioning domain besides for the binding site |
P9 |
With minimal operational regulation such as by phosphorylation, cations, methylation, etc. |
Proportion of binding sites before selection |
|
P10 |
With acceptable binding length, vs. L=4 to ca. 51 unaccepable alternatives |
P11 |
With suitable base sequences for each particular length, L |
P12 |
In an acceptable location with respect to genetic elements to be regulated |
P13 |
In an acceptable concentration range in the genome |
P14 |
Biologically compatible with already existing recognizers. |
Only now does selection become relevant for any organisms meeting all constraints.
In Stage 2 the same considerations apply but the proportion of better to lesser tuned possibilities decreases steadily. The possibility of overall loss of Shannon information content in the genome by decrease in gene specificity via random mutations must be permitted in such a simulation. Thus, if a very high mutation rate is permitted, it must be treated as truly random across all genes.
In particular, realistic selection coefficients need to be used especially if one is dealing with point mutations. At the borderline level for selection to be measurable they would be essentially zero. Ironically, the incremental improvement will decrease once acceptable functionality has been attain- ed, even as the proportion of improved configurations a- vailable become vanishingly small (Figure 2). |
It was a special creationist, Edward Blyth[61] who introduced in 1835, long before Darwin, the notion of natural selection, as a way of preventing major errors from being passed on to offspring. The selectivity coefficient, s, would be large when comparing a viable state to a major genetic disaster, but s for a base pair change would be near zero when attempting to distinguish between ‘working quite well’ and ‘slightly better’. Eventually this resembles placing a ball on an almost vertical slope and hoping enough earthquakes would roll it up further uphill. The implausibility is strictly a statistical matter.
Realistic population genetics would need to be included in the post-selection simulation stage since even the rare good mutation has only a very small probability of being fixed in the population.
Admitedly even this proposed simulation would not model the generation of multi- gene, novel biological functions, via random mutations. Over-simplified models, such as Dawkins’ example of mutating English letters, based on selfish gene notions, have no biological relevance(26) and the logical and mathematical flaws have been pointed out[62] [63].
Conceptually, Dawkin’s example resembles fixing a magnet and allowing it to relentlessly attract a metal object, although very fast at first and slower towards the end. The distance can never increase between “generations". Schneider’s refinement allows the metal piece or magnet to move a very small distance sideways between generations, bringing the two relentlessly together. Progress at the beginning is also very rapid. On average each generation must increase its Shannon information content, due to the way the algorithm was programmed, until reaching the intended plateau. Occasionally a given generation may be farther from the goal than the preceeding but the unrealistic parameter settings used guarantee success. Both programs have been intelligently designed to disallow failure to converge to the intended result given enough iterations.
We recognize repeatedly two remarkable assumption hidden in such simulations: chance mutations have a huge proportion of useful options at every step linking initially random base pair sequences and currenty observed genetic sequences; and these intermediate steps are selectively recognized with uncanny skill. This is quantitatively not consistent with what we know about mutations. ReMine[64] has criticized such unrealistic evolutionary assumptions in considerable detail(27).
Spetner has also questioned how many single nucleotide changes may actually be available with a measurable selective value[65] (recall that about 24% of these would code for the same amino if mutated randomly[1]). He points out[66] the dilemma this assumption causes: if faced with such a rich variety of useful mutations at all times, each heading off in different evolutionary directions, then long-term convergence to similar functional structures won’t occur, contra what evolutionists claim. If each reasonably sized genome had a million felicitious mutations available then stumbling on similar organs (as observed for unrelated mammals and marsupials) via a multitude of random mutations is statistically absurd.(28),(28b)
Whereas step-wise development of binding sites by random mutations is not reasonable, one could entertain the notion that initially over-engineered binding sites had been Designed to provide robustness against random mutations. Point mutations could squeeze out excess Shannon information until the limit is reached where the locations can be unambiguously identified. Further destructive mutations would render the organisms non-viable and be selected against. Such a proposal would be inconsistent with an evolutionary viewpoint but consistent with Special Creation or Intelligent Design.
Few creationists or members of the Intelligent Design community view Shannon’s work in telecommunications as an adequately comprehensive theory of information in biology, in spite of its mathematical virtues. It is certaily true that of all amino acid sequences which can occur, only a small subset fulfill a useful biological function, and the mathematics developed by Shannon, Tribus, Brillouin and others help with various probabilistic calculations. The word ‘information’ carries powerful and often inconsistent associations and I hope to provide a more useful and comprehensive theory of biological information later(29). Repetitive and reliable guidance of complex processes necessary for organisms to survive has no parallel in the non-living chemical world. Examples include the production of highly specified proteins via the genetic code; coordination of multi-cellular processes (heat regulation, signal transmission, etc.); cell duplication; animal instincts; guidance of biochemicals to specific organelles across various membranes; production and delivery of energy packets (ATP). We have concentrated here on what may be the first attempt by evolutionists to model with a computer program the creation of a specific, new biological function: a novel binding site, by random point mutations and Darwinian selection.
There is currently intense discussion as to how ‘information’ should be defined and its properties[67] [68] [69]. Gitt[68] has examine many aspects of coded information, and concluded that information obeys many laws, one of which is that a coded information system can only arise by intelligent agency.
Dr. Schneider has identified a phenomenon which certainly needs explaining. After aligning 149 E.coli ribosome binding sites, ‘We get: Rsequence = 11.0 ± 0.4 bits per site, which is almost identical to the value of Rfrequency, 10.8, we found earlier! There is just enough pattern at ribosome binding sites (Rfrequency) for them to be found in the genetic material of the cell (Rsequence). These data imply that there is no excess pattern, and no shortage of pattern.’ (27)
The existence of patterns of minimal and highly conserved size, for which a protein has been precisely tailored, often aided by additional enzymes, displays a remarkable level of fine-tuning, and raises the issue who or what produced such a feat. Reliable identification of short patterns is only possible by precise 3-dimensional Übereinstimmung between recognizer and DNA site.
The simulation described was rigged to converge by using a large number of assumptions which are biologically unrealistic. Many cellular constraints were not included in the simulation, such as: the need for binding sites to be placed correctly with respect to pre-existing genetic elements which are to be regulated; the need for multiple new enzymes for recognizers to be able to work; the need to provide recognizers within an acceptable concentration range. Unrealistic parameter settings were used, including: the rate of mutation; the proportion of available useful mutations; the flawless effectiveness of natural selection.
Finally, the model is biologically fatally flawed in many ways: organisms with very small genomes which inherit superfluous DNA would be rapidly out-populated by those without it; the organisms are assumed to face only one survival goal; multiple and often inconsistent use of binding locations and recognizers was overlooked; recognizers are assumed to automatically be in the correct cellular compartment (organelle); and all details which could allow the simulation to fail, such as including randomizing mutations elsewhere in the genome, or error catastrophe, were excluded.
The limited goal of producing novel binding sites from scratch by random point mutations has not been demonstrated by this paper[1]. This can be easily demonstrated by sensitivity analysis (i.e., by testing various parameter settings) even using this flawed model as a starting point. The reader is invited to test or consider the effects on generations needed by increasingly merely 3 parameter values, still far below what is biologically relevant:
Parameter |
Value Used in Simulation |
Realistic Value |
For Sensitivity analysis |
Selectivity coefficient(a) |
ca. 1 |
0.001 to 0.01 |
ca. 0.05 |
Mutation(b) rate/ nucleotide / generation |
1 / 256 |
< 1 / 108 |
ca. 1 / 105 |
Proportion useful polypeptides(c), (147) |
ca. 5X10-1 |
10-44 |
ca. 10-15 |
Such changes clarify how dramatically great the number of generations would become should realistic parameter settings be tested. During these single goal iterations, function-destroying mutations would accumulate in other portions of the genome since the population is not allowed to perish. With a population of only 64 members natural selection cannot weed all flaws accumulating througout the genomes. I predict the net effect, if simulated realistically, will show a net destruction of functional specificity, meaning a net decrease of information as defined>[1] over all genes as time increases.
It is apparent that the extrapolation to claim a billion years is sufficient to produce
human beings by chance, starting from a random DNA sequence, given than even
one novel binding site could not possibly be generated as proposed, is not justified.
|
|