Chaos theory is not as interesting as it sounds. How could it be? After all, the name “chaos theory” makes it seem as if science has discovered some new and definitive knowledge about some utterly random and incomprehensible phenomena….By calling certain physical systems “chaotic,” scientists lead us to think that they are totally unintelligible - just a muddle of things happening with no connections or structures. So when they find interesting mathematical patterns in these unpredictable systems, they can exclaim that they have discovered the secrets of “order within chaos,” even though only by christening these systems chaotic in the first place can they make such an impressive result possible.
— Stephen KellertIn The Wake of Chaos

Can Protein Folding be p0wned?

by Tim Gwinn ~ May 8th, 2008

Are you a l33t gamer? Can you r0x0r the tertiary structure? If so, you might be interested in FoldIt, a new “online” game which essentially strives to create a chimera of computer program and human to solve tertiary structure problems in novel proteins. As described at MIT Technology Review today:

For years, biochemists have reengineered naturally occurring proteins by growing them in viruses and single-celled organisms in a process called directed evolution. But researchers need to start with a preexisting protein, which makes it difficult to develop proteins with totally new functions. In a major step forward, Baker recently demonstrated the first algorithm for building novel, functioning enzymes from scratch. But while proteins built from the ground up may have chemical properties unmatched by anything in nature, they aren’t particularly efficient.

The game, called Foldit, is part of Baker’s vision for the future of protein engineering. His algorithms are good at the nitty-gritty of generating completely novel protein sequences for a particular purpose. But humans, who are better at seeing the big picture than computers are, could improve computer-designed proteins by playing the game.

Proteins are made up of long strings of amino acids that are folded up into complex three-dimensional tangles with many subregions. The function of a protein is dependent on this three-dimensional structure. One pocket might be ideal for grabbing on to another protein, for example. Other parts of the protein may play a purely supportive, structural role, holding the molecule together. Baker’s new method for creating novel proteins begins with the active sites. Once they’re in place, structural concerns, especially how tightly packed the protein is, determine whether the design is feasible. Figuring out the best way to hold together the active sites is a complicated search problem that requires a lot of processing power. There are a myriad of possibilities, but most won’t work.

The “online” part is a little dubious, since the game is essentially played locally on your PC or Mac. It requires installing a 50+MB executable. The player begins with some tutorial levels, which ease the person into learning and utilizing the both the manual manipulations and automatic tools available to achieve tertiary structures which are feasible and optimized. The interface is done well, the controls are fairly intuitive and unobtrusive, and the user is rewarded at each success. You cannot aspire to obtain the Sword of a Thousand Truths, perhaps, but you might achieve some serious RL impact, if the game fulfills the goals of it’s creators.

Protein folding problems, in general, are difficult, if not downright impossible, to solve with merely computational means. There is no elegant and efficacious algorithm to solve these problems, as the About page affirms:

Protein structure prediction: As described above, knowing the structure of a protein is key to understanding how it works and to targetting it with drugs. A small proteins can consist of 100 amino acids, while some human proteins can be huge (1000 amino acids). The number of different ways even a small protein can fold is astronomical because there are so many degrees of freedom. Figuring out which of the many, many possible structures is the best one is regarded as one of the hardest problems in biology today and current methods take a lot of money and time, even for computers. Foldit attempts to predict the structure of a protein by taking advantage of humans’ puzzle-solving intuituions and having people play competitively to fold the best proteins.

FoldIt also states that in the coming months, users will not only be able to modify proteins existing in the game, but also be able to design their own proteins.

To paraphrase an old gamer quote: All your active sites are belong to us!

NASA Astrobiology Assessment 2008

by Tim Gwinn ~ April 7th, 2008

From the National Academies Press comes the 2008 “Assessment of the NASA Astrobiology Institute” [1]. The Executive Summary section begins:

Astrobiology is a scientific discipline devoted to the study of life in the universe—its origins, evolution, distribution, and future. It brings together the physical and biological sciences to address some of the most fundamental questions of the natural world: How do living systems emerge? How do habitable worlds form and how do they evolve? Does life exist on worlds other than Earth? As an endeavor of tremendous breadth and depth, astrobiology requires interdisciplinary investigation in order to be fully appreciated and examined.

The fundamental question missing from this paragraph? “What is life?” Indeed, this question is not present anywhere within the entire document.

Indeed, of all the goals and objectives of astrobiology, it would seem paramount that one of the first, if not the first, goal would be to answer “what is life?” However, this task is notably absent from The Goals and Objectives list in the Assessment. In turn, this list is excerpted from a more complete list of goals and objectives in the NASA Astrobiology Roadmap 2008 [2].

The closest the Roadmap comes to acknowledging a need for a definition of life is a paragraph in which, ironically, a tacit definition of life is given, in the form of several conditions which must be met for a system to be alive:

A bounded system of replicating and catalytic molecules capable of undergoing Darwinian evolution is by definition a cell. At some point life on Earth became cellular, either from its inception or soon thereafter. Boundary membranes divide complex molecular mixtures into large numbers of individual structures that can undergo selective processes required for biological evolution. They also have the capacity to develop substantial ion gradients that represent a central energy source for virtually all life today. A primary objective of research is to understand self-organizing and evolutionary processes that lead to the emergence of cellular structures and test this understanding by creating laboratory models of primitive cells. These are systems of interacting molecules within bounded environments capable of working in concert to capture energy and nutrients from the surroundings, transduce environmental signals, form metabolic networks that allow for growth through polymerization, and reproduce some of their polymeric components. Approaching this challenging problem will lead to a more refined definition of the living state, and will clarify the hurdles faced by self-assembled systems of organic molecules as they evolved toward life.

From the perspective of the Roadmap, then, to answer “what is life?” is essentially a matter of refining those conditions which they have just tacitly specified. However, I argue that none of the conditions alone, nor any combination thereof, comprise a set of sufficient conditions for life. That is, there is no set of those conditions such that if system X satisfies conditions {F,G,H} then X must be alive.

Indeed, the references to evolution are irrelevant, since it is neither necessary nor sufficient for a system to be capable of evolution in order that it be alive. Individual organisms do not evolve, species evolve. Moreover, if it were seriously the case that being capable of evolution was a condition of life, then it would not be possible to say with any certainty that any given natural system is alive until one has empirically observed it (or more correctly, observed it’s species) evolving, which is a preposterous situation.

The Roadmap also discusses how to recognize signatures of life, on Earth and elsewhere:

Astrobiological exploration is founded upon the premise that signatures of life (biosignatures) will be recognizable in the context of their environments. A biosignature is an object, substance and/or pattern whose origin specifically requires a biological agent. The usefulness of a biosignature is determined, not only by the probability of life creating it, but also by the improbability of nonbiological processes producing it. An example of such a biosignature might be complex organic molecules and/or structures whose formation is virtually unachievable in the absence of life. A potential biosignature is a feature that is consistent with biological processes and that, when it is encountered, challenges the researcher to attribute it either to inanimate or to biological processes. Such detection might compel investigators to gather more data before reaching a conclusion as to the presence or absence of life.

I contend that while the search for these chemical artifacts of life may resonate well with the practical limitations on the complexity and scope of the technology which we are able to launch to other worlds, that detecting these chemical artifacts are at best only suggestive in discerning life. As such, the justification for the cost of these programs must be weighed against the inherently inconclusive nature of the results they provide.

Ultimately, whatever it’s chemical makeup, physical size or local environs, living organisms are organized systems, not aggregations of molecules. Organisms possess an internal organization of biological functions which is not found in non-living natural systems. All the discussions about polymerization, energy transduction, and chemical artifacts of life arise only as a consequence of this internal organization of biological functions. What distinguishes a chemical reaction or an energy transduction as being biological or not rests entirely upon whether those processes have a functional role in an organism or not. This is why an inability to determine whether a functional role is played by such processes leads to inherently inconclusive results from probes and experiments which detect only the material aspects - the molecular artifacts - of organisms. 

As such, defining life, and the search for life elsewhere, is best addressed by directly understanding and modelling the aforementioned internal functional organization, abstracted away from the particular chemical and material composition of any given organism. These kinds of formal models of internal biological functional organization are called relational models [3]. By focusing specifically on this internal organization, from which all other recognizable characteristics of organisms arise, relational models provide us with the most direct way of addressing the question “what is life?”.

 

References

[1] Committee on the Review of the NASA Astrobiology Institute, National Research Council. 2008. Assessment of the NASA Astrobiology Institute. National Academies Press. ISBN-10: 0-309-11497-7 / ISBN-13: 978-0-309-11497-4. (Available as PDF)

[2] NASA ASTROBIOLOGY ROADMAP 2008 . Draft of 28 December 2007.

[3] Rosen, R. 1991. Life Itself. Columbia Univ. Press

Crick’s Central Dogma and Closed Causal Loops

by Tim Gwinn ~ April 5th, 2008

A recent article in the Journal of Biology, entitled “Small changes, big results: evolution of morphological discontinuity in mammals” [1], discusses how significant phenotypic changes appear to often be the result of variations of gene expression due to regulatory controls, instead of as a direct result of the primary sequence per se of the DNA:

Rather than simple mutations within structural genes, many of the mechanisms underlying change represent more subtle and complex changes involving gene regulation. Complex anatomical differences such as those defining the higher categories of mammals, as well as differences between more closely related species, are likely to be the result of interacting pathways that regulate gene expression during development. Changes in gene regulation seem important for a host of phenotypic differences in mammals and other organisms. In addition, phenotypic change could result from changes such as expansion and contraction of gene families or alternative splicing of RNA transcripts.

If this is so, then it seems to be evidence that genome, as the forcer of phenome, cannot be directly equated with ‘amino acid sequence of a polypeptide’. This in turn led me to wonder: how do such results either agree or conflict with the Central Dogma, which makes some important assertions regarding these sequences?

 

Crick’s Central Dogma

The Central Dogma was first enunciated by Crick in 1958 [2], and then again in 1970 [3]. It was summarized by Crick (1970, Fig. 2) in a diagram like this one:

 

CentralDogma

Briefly, the Central Dogma asserts that, of all the possible pathways for information transfers between DNA, RNA and proteins in an organism, that the solid arrows above indicate the probable transfers which occur, and the dashed arrows indicate possible transfers. The notable claim in the Central Dogma is the lack of arrows from proteins back to either RNA or DNA. That is, as Crick explains, the Central Dogma is primarily a negative assertion: that information transfers do not occur from proteins to either RNA or DNA.

Now, a couple of points are in order, which Crick explicitly points out to the reader:

  1. His discussion is only about transfers of sequential information in the primary structure, and the discussion intentionally sets aside any issues of information transfer related to tertiary structures.

  2. His discussion does not say anything about the specific mechanisms of the transfers.

  3. The Central Dogma is intended to apply to current organisms, and is not intended to be applied to origin-of-life processes.

  4. His discussion “says nothing about control mechanisms - that is, about the rate at which the processes work.”

Thus, despite using the term “Dogma”, Crick’s claims were actually fairly cautious and well-delineated.

 

Controls on Transfers

Now, going back to point 4 - that the Central Dogma says nothing about control mechanisms - leads us to the next step. The information transfers in the above diagram do not explicitly include such controls, but we can add them in, symbolically, by augmenting the original diagram as below:

 

CentralDogma eff

 

The red arrows indicate the control forcings on the various transfers. Another way to write this is that each transfer is of the form:

f:X ® Y

where X and Y are the source and destinations of the transfer, respectively, and f indicates the forcing.

 However, the forcings themselves are not fixed, as noted in the J. Biol. article, and it is not only that the genes are regulated, but that the picture is more complicated and it must include a capacity for “changes in gene regulation”. Essentially, the forcings themselves are parameterized. We can show this by further augmenting the diagram:

 

CentralDogma eff for

Here the blue arrows symbolically indicate the parameters of the forcings, which modulate the rates of the information transfer in the original diagram. Again, we can write this as follows:

fφ:X ® Y

where φ indicates the parameters of the forcing f.

 

Entailment and Genome

Our diagram is now much more involved and elaborate. Yet, it illustrates, I think, that Crick’s Central Dogma is quite compatible with the view expressed in the J. Biol. article. Further, we can now say that the information transfers, and the resulting phenotypes, are entailed in the organism by virtue of the additional arrows in this augmented diagram.

We see also now that ‘genome’ is more closely identified with “fφ:X” than with, for example, “the primary sequence of X”. This is also evident in the intuitive sense that if genome is to be a forcer of phenome, then genome must be identified with an active agent in the organism, rather than be identified with a passive alphabetic sequence.

 

Closed Causal Loops

However, it should also be clear from our augmented diagram that the picture is still quite incomplete, even on this level of abstraction. In particular, since the blue and red arrows are posited as existing within and arising from within the organism itself, it must logically be the case that they can, in turn, be entailed within the organism. Clearly, if we simply keep tacking on arrows (e.g., “f is entailed by g which is entailed by h…”), then an infinite regress will result, which cannot happen since an organism possesses only a finite number of molecules. Therefore, there must exist loops of causal entailments in the organism. Systems which possess these kinds of closed loops of entailment are called “complex” in Rosen’s terminology, and the properties of this kind of complexity have many ramifications for the study of such systems, as described in Rosen’s books, such as [4, 5] and elsewhere on this website.

 

References

[1] Honeycutt, Rodney L. 2008. “Small changes, big results: evolution of morphological discontinuity in mammals”. J. Biol. 7:9. doi:10.1186/jbiol71. [Article]

[2] Crick, Francis. 1958. “On Protein Synthesis”. Symp. Soc. Exp. Biol. XII, 139-163. [PDF of draft notes]

[3] Crick, Francis. 1970. “Central Dogma of Molecular Biology”. Nature 227:561-563. [PDF]

[4] Rosen, R. 1991. Life Itself. Columbia Univ. Press

[5] Rosen, R. 2000. Essays on Life Itself. Columbia Univ. Press