http://www.digitalbiology.info/
CONCEPTS FOR A DIGITAL BIOLOGY (by Chris Zeller?)
Every scientist should seek simplicity and distrust it (Alfred North Whitehead)
The purpose of this essay
The purpose of this essay is actually of philosophical nature in the sense of Wittgenstein, who claims that philosophy is not just the exposition of a system of thought, but rather serves the clarification of terms. Information is a term that needs clarification badly. If there is no simple comprehensive definition, an explanation could be based on evidence provided by facts in the fields of computer programming and molecular biology.
Information
A string of digits can represent digital information. The information can be regarded in two different ways. One way is purely quantitative; it measures the amount of predictability in a entirely statistical way irrespective of its meaning. There is a mathematical information theory dealing with quantitative information.
The other aspect of information is semantic; it deals with the meaning of information. If one tries to analyze a string of letters one can decide about its meaning only if one understands the language used in writing it. A string of letters always contains statistical information and sometimes it has a meaning adding to information semantic dimension. In a information system meaningful information answers a question or advises the system as to what should be done
The strategy
The basic biological information is digital. The nucleic acids, the “molecules of genetic information” are digitally organized; they consist of chains of just 4 different units (nucleotides).
Information on computer programs and data are fed into a microprocessor in binary digital form (a string of “0” and “1”). In the computer all the details necessary for a full description of its functions and structures must be unambiguously and explicitly defined.
If there is a program associated with biological information, we still do not fully understand the way it operates. A comparative study of the data processing system of a simple microprocessor like the Motorola M6800 with the digital structures of a living cell like the bacterium Escherichia coli (E. coli) might be useful in helping to get a clearer view on the structure of a information system
AN ELECTRONIC INFORMATION SYSTEM
The task of a Microprocessor
A programmable (digital) microprocessor is designed, obviously, to run programs. A program is designed to instruct the microprocessor how to perform a task. The task has first to be formulated explicitly and unambiguously by a task manager and then translated by the programmer into a sequence of instructions and data in a language the computer can work with (machine code). The result of the data processing must eventually fulfill the objectives of the task. If it achieves that, it makes sense, it is meaningful,
Programs and codes
An active program, a sequence of instructions, is resident in the sequentially arranged memory cells of the microprocessor. When working with program instructions, we can not get around explaining different codes. A binary (dual) code uses only 2 types of digits: 0, 1. it is, as we shall see, directly compatible with the microprocessor hardware. The hexadecimal code as used in the assembler languages utilizes 16 types of digits; 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A(10), B(11), C(12), D(13), E(14), F(15). It is easily converted to binary code. In order to make it more readable, an assembly language program replaces hexadecimal numbers with mnemonic expressions, which show the action of instructions. The M6800 assembler instruction consist of several parts. The first part represents the operator (the action to be taken); the following parts refer to the operand (the object of the action). The operand consists of a number or the address of a memory cell containing a number. The M6800 assembler instruction “ADD A $A1B2” means “add the number contained in the memory cell with the hexadecimal address A1B2 to the content of the accumulator “A”. (The accumulator is a hardware device of the microprocessor). The operator code contains implicitly the type of the memory address of the next instruction. “ADD A $A1B2” is equivalent to BB’A1’B2 in hexadecimal and 1101’1101’1010’0001’1011’0010 in binary code of the machine language. A machine language instruction (or an assembler instruction) together with its memory address is the basic unit of a program. A program is a sequence of instructions.
Attached to the accumulator there are a number of condition codes. The “Z” code indicates that the content of the a accumulator is zero. One may compare 2 binary strings by subtracting one from the other. The zero condition code is set (1) if the two strings are equal. The checking of a condition code is usually followed by a branch instruction. If the condition code is set, then the program will not continue at the next instruction but at an address pointed to by the branch instruction.
In a programmable computer the program is first stored in the storage memory of an external device like a tape or a disk. The program has then to be transferred from the external storage device into the working memory of the microprocessor. The memory transfer is not affected by the semantics of the data. Just looking at the mnemonic code we can easily identify operator and operand, but how does the microprocessor recognise in a long string of binary code an operator representing the beginning of a instruction? The answer is given by the protocol which consists of a trailer heading the data to be transferred from the external storing memory to the working memory. An external signal activates a program which reads the trailer information and starts the program at the memory cell pointed to by the protocol.
High level programming code like the one used in Pascal or C are structurally far away from machine language but are much easier to use, since they are made to resemble natural English. These codes have to be translated into machine code by programs called compilers.
The microprocessor does not “understand” the data it has to process, but it behaves as if it did. It mechanically applies the code for the identification of an operator an and performs the operations as instructed, whether it makes sense or not; that means whether the objective of the task of the program is fulfilled or not. What has been coded by the programmer has to be decoded by the microprocessor and the code must be the same.
The microprocessor hardware
A microprocessor is an electronic device that is able to interpret and execute program instructions. The microprocessor is made up of small building blocks that are able to perform the basic logical operations. They are integrated circuits, called gates (containing each a few transistors). They accept two or more voltage inputs and generate one voltage output. The devices are electronically managed in such a way, that only two voltages are possible, low (about 0 volt) and high (about 5 volt). The low voltage represents logical 0 and the high a logical 1. The logical OR operation is performed by an OR-gate. It is made in such a way that, if at least one input is high the output will be high. The logical AND operation is performed by an AND-gate. It is made in such a way that only if both inputs are high the output will be high too. The NOT-gate turns the output into the opposite of the input. 1 becomes 0 and vice versa. All logic or arithmetic functions can be performed by combinations of gates.
Memory
With regard to the programmability the gates are combined to those functional units which are required by a program. The most important functional units are the memory cells of a memory bank. The program wants to read data from memory cells and write them to others. In order to do so we have to be able to select them first. The memory is an array of addressable (numbered) memory cells. In the M6800 microprocessor there are two sets of parallel lines servicing the memory cells, the buses. One bus carries the data. it consists of eight lines. The other bus carries the addresses and consists of 16 lines. Each memory cell has access to both. A specific cell can be selectively activated when its address appears on the bus. Data on the data bus are either read from or written to a selected cell depending on whether a special command line, which is also connected to each cell, is either low or high. A clock holds the two states alternatively high for reading or low for writing.
Decoders
A logical circuit which brings about a selection is called a decoder. A pattern of eight binary units, as it appears on the data bus, may represent one of 28 = 256 different possible patterns: 00000001, 00000010, 00000011, 00000100, 00000101. As a consequence decoding can select one of 256 different commands. A 16 bit address can select 216 = 65536 different memory cells. An instruction is interpreted by first selecting the first byte (the operator) of an instruction. It contains a code for the action to be executed (command code). The second and sometimes the third byte of an instruction contain the address code for the object of the planned action (the operand). The number of bytes of an address is implicitly contained in the operator instruction. The codes are assigned by the designer of the microprocessor architecture. He is free in the choice of the address and command codes. There is no compelling reason to choose for the operation “ADD A” the code “BB” and not “FB” or something else. The programmer has to know, however, the meaning of the code.
Data manipulation.
The M6800 microprocessor uses two special memory cells, the accumulator registers to perform arithmetic and logical operations. If two numbers have to be added, one is put into the accumulator A the other into the accumulator B. The “ABA” instruction adds then the content of accumulator A to the content of accumulator B and leaves the result in accumulator A. There is no instruction for a multiplication in the M6800 but a small set of assembler instructions suffice to perform a programmed multiplication. A 16 bit multiplication requires 25 programming steps. Hardwired circuits of the microprocessor can be replaced by program instructions and vice versa. When hardware is relatively expensive and software relatively cheap one would of course minimise the use of hardware. A program of a microprocessor is usually structured; the main program calls subroutines which deal with particular aspects of the entire program. There are pre-programmed microprocessors. All instruction and addresses are integrated in a hardwired memory. A fully hardwired system is an automat, it is not anymore programmable. There are dedicated microprocessors where most instructions are hardwired but a few signals are still selecting certain functions. This is done by pressing buttons of a washing machine for example. More extreme types of dedicated microprocessors operate robots. They are equipped with sensors which read the value they need straight from their environment.
Interaction with microprocessor environment
The microprocessor provides input lines which signal to the microprocessor that data are ready to be transferred from the environment. These signals can force the microprocessor to activate a program which is associated with this particular line.
When a program and data to be used by the program are transmitted from the storage memory to the microprocessor working memory an input protocol is required, which makes sure that the first bit of the first operator of the first instruction is placed at the address of the start of the program. Transmitting to another place would garble the whole program. The protocol determines a reading frame. The external data contain data sequences indicating begin and end of a message and the address of the first instruction. Control lines between microprocessor and the external device, for example a tape, makes sure that both are ready when transmission starts.
Data processing and information
Decoding turns the relevant data that enter the microprocessor into meaningful information. Software and hardware have of course to agree about the meaning of the code, otherwise the processor cannot be told by the program how to process the data.
BIOLOGICAL INFORMATION SYSTEMS
The central dogma of molecular biology maintains that the flow of genetic information is passed from DNA through a process of transcription to RNA and from there through a process of translation to the proteins.
The basic genetic information system
Nucleic acids
Nucleic acids consist of a sugar phosphate backbone with purine or pyrimidine bases attached to the sugar molecules. The basic element of nucleic acids consists of a phosphate a sugar and a base is called a nucleotide. The bases of DNA (deoxyribonucleic acid) are the purines guanine (G) and adenine (A) and the pyrimidines cytosine (C) and thymine (T). The sugar is a desoxyribose. The picture of the two strands (the backbones) forming a spiral held together by hydrogen bonds between a purine and a pyrimidine is well known. We characterize a strand of DNA nucleotides by the abbreviation of their bases. The double strand may look like the following:
….CATTCGTAC…..
….GTAAGCAGT….
The two strands are complementary. In a base pair there is always a purine base connected to a pyrimidine base. The first base pair in our example the pyrimidine C of one strand is attached to the purine G of the other strand. If one strand is known the sequence of the other follows automatically. The length of the strands is measured in base pairs. DNA molecules are very stable. Information is stored in DNA as sequence of base pairs. When the structure of DNA was discovered the possibility of automatically forming copies of the strands suggested itself.
In the RNA molecule in contrast to DNA the sugar of the backbone is a ribose and not a desoxyribose and the pyrimidine base thymine (T) is replaced by uracil (U). The RNA molecule can present itself not only as straight string, it can also be folded back on itself in different ways. According to their base RNA sequences can form loops by complementary base pairing.
A CCUGGG UGCAU…… the underlined U is not paired to a
A AAC
C GGGCCC GUGUA….. complimenting base
U C A
UAA
Polypeptides and proteins
Chains of amino acids, (polypeptides) are structured on different levels. The primary level is characterized by the particular sequence of the amino acids. On the secondary level the chains are arranged in the form of sheets or in form of a helix. On the tertiary level the chains fold up on themselves (similar to RNA). This is a result of the affinity of the different amino acids to each other and to an aqueous surrounding. Each polypeptide has its own three dimensional structure. In a protein several polypeptides are joined together (quaternary level). The versatility of their structure reflects the numerous functions of the proteins. One of the most important function is the function of an enzyme. Polymerases are able to recognize complicated patterns of base sequences in nucleic acids. They can function as a address decoding device during the process of transcription. Enzymes may be folded in such a way that they form a niche accommodating to a variety of specific smaller molecules. Once the niche is filled the overall structure of the protein may change with consequences for its function. Some protein structures have the ability to organize themselves into certain shapes like sheets or tubules.
Transcription.
The transfer of information from the storage memory (DNA) to the working memory, the messenger RNA (m-RNA) consists in a process of direct copying. Free floating RNA nucleotides can find their way and bond to the complementary DNA strand.
…..CATTCGTAC….. copied DNA strand
…..GTAAGCAGT….. complimentary DNA strand à downstream direction
…..CAUUC
G incoming RNA nucleotides to be attached
U
C
A…..
This process requires the help of RNA polymerase an enzyme that pushes the two strands of the DNA molecule apart, in order to make room for the incoming RNA nucleotides and at the same time initiates the attachment of the RNA nucleotide to the complimentary DNA template. The same happens to the next incoming RNA nucleotide, which is then bonded to the ones already attached. The DNA molecule is not transcribed as an intact whole but in smaller pieces of about thousand to several thousand base pairs. The RNA polymerase has to find the exact place (address) where to begin transcription. The sites of the start are marked on the complementary DNA strand. We find at a location 35 base pairs upstream from a start site a sequence like TTGACA and 10 bases upstream a sequence like TATAAT. These marks define the promoter regions.
There a many promoters in the DNA storage molecule corresponding to many functional units to be transcribed. The RNA polymerase consists of several proteins. One of them is called s (sigma). There are different types of s’s. Transcription is initiated by binding a s of the polymerase to a promoter sequence and there by selecting the promoter to be used. As soon as that the first nucleotide has bonded the s component is dropped and the polymerization reaction continues. The polymerase moves along the DNA and releases a string of mRNA until the polymerase encounters a transcription terminator DNA nucleotide sequence. The result of the transcription is the messenger RNA (mRNA) …CAUUCGUCA… in our example. The transcription is also regulated by enhancer and repressor proteins, which determine whether the RNA polymerase can dock or not on the DNA template and so initiate transcription. In more advanced organisms enhancers may connect to the polymerase molecule and at the same time to a DNA sequence at a different location. It appears that the molecular context of the promoter can play a role in transcription
Translation.
The central dogma of molecular biology requests that the information presented in the sequence of base pairs is used to form a sequence of the amino acids which make up a polypeptide. In order to do so other RNA structures are used like the ribosomes, they consist of r-RNA molecules and proteins. There are two ribosomal subunits which bind to the mRNA during translation. In the chain of nucleotides that make up a mRNA, the ribosome has to find a correct site the translation initiation region (TIR) and initiate translation there. During the initiation phase the smaller of the two subunits recognizes and binds to a TIR of the mRNA. One mRNA string may contain several TIR and several ribosomes may start translation at the same time. A set of adapter molecules, the transfer RNA (t-RNA)’s which bind to the ribosomes do the decoding. They translate triplets of bases into amino acids. For the possible base triplet there are different t-RNA molecules At one end the t-RNA presents three bases the anticodon, the other end the acceptor selects the appropriate amino acid.
…CAU UCG UCA… base triplets of m-RNA called codons
GUA AGC AGU anticodon of t-RNA
| | |
| | |
| | |
| | |
his ser stop histidine and serine are two of 20 amino acids which are used in proteins.
Stop is a nonsense codon, which stops translation and marks the termination region.
Translation must start at the first base of a anticodon otherwise an entirely different sequence of amino acids will result. The start point of translation determines the so called reading frame.
…CAU UCG UCA
AUG CGU CA… translations starting one base pair downstream
met ala change amino acids to methionine – alanine - …
Triplets of 4 different bases give 64 possible codes but only 20 are required. The genetic code is redundant that means that one amino acids can be coded by several codons. After the initiation stage of the translation the amino acid chain is elongated. The larger ribosome subunit joins the smaller one so that the mRNA is held between the two subunits. The ribosome moves along the mRNA and adds one amino acid to the next according to the sequence of base triplet of the mRNA and concatenates them with peptide bonds. This process is catalyzed by the large ribosome subunit itself. Several elongation factors are involved in forming an amino acid chain. Finally the termination region of the RNA is reached where release factors are bound to the stop codon and cut off the amino acid chain. If several ribosomes are included in the process of translation of one mRNA then several amino acids chains or peptides are produced.
Comparison between an electronic and a basic genetic information system
There are some basic requirements for data processing, which are the same in both a bacterium and a microprocessor:
Addressable storage and working memories.
Transmitting and copying of data
Address and data decoding mechanisms.
Data processing machinery
The close functional analogy of the different steps of data processing in both systems may appear surprising. The storage memory of the microprocessor (tape or disk) corresponds to DNS of E. coli. The working memory, the memory cells of a microprocessor correspond to the mRNA. The function of the decoding circuitry of the microprocessor is carried out by the adapter molecule tRNA The accumulator, the arithmetic and logic unit correspond to the ribosomes.
The main differences between the two systems lay in the representation of information. In the microprocessor all information used is presented by bit patterns which can be transported from one place to another as electric signal on conducting lines. Since the task of a program is known before it is started, all information required can be transmitted usually in one single string. If data have to be entered later during data processing they can be imported from outside with the help of a protocol which consist of signal lines and software guidance. The address decoding and data processing are done by simple logic devices. The programs are often very complicated. There may be many alternative paths of processing depending on the results of intermediate results. For example, if a intermediate result turns out to be negative then the program may continue at one module, if it is positive at another. Never the less, they still can be expressed in the form of bit patterns. The definitions of the data and address codes are not the task of the program though they are integral parts of an information system. The designers of the hardware and of the software are responsible for the definitions. To use a program language and to design it is not the same. To use a microprocessor code and to design it is also not the same.
In the living cell the software is incorporated in the hardware: the molecules. Genetic information can not be loaded into the hardware it is already there. The definition of data and address codes too is "hardwired" in the molecules; the designing of codes has already happened. The information must be copied from DNA to RNA through close contact of a DNA string with building blocks of RNA (nucleotides). The data processing, the translation of the information of the RNA nucleotide sequence into amino acid sequences requires a direct contact with tRNA molecules. Functional units of mRNA are transcribed from a DNA strand and translated into polypeptides. The selection of what is transcribed is influenced by the environment of the bacterium. For example: the transcription of the information leading to the cleavage of the sugar lactose into galactose and glucose is negatively controlled. If there is no lactose present in the environment a repressor molecule does not allow the RNA polymerase to dock on the DNA and blocks so transcription. Regulation makes sense: in the presence of lactose the repression is inactivated. If there is no lactose available there is of course no need of the biochemical machinery to split it. The DNA polymerase by itself is not very specific. If there are no regulatory proteins, neither positive nor negative ones, unspecific sequences of nucleotides could be selected for translation into polypeptides. One would expect therefore to find polypeptides or proteins without useful properties. That this is actually the case has been shown by experiments designed to prove, that the presence of a particular antibiotic in the environment of a bacterium does not induce resistance against this antibiotic. There is convincing evidence for the existence of resistance in some individuals before they came into contact with the antibiotic. There are always apparently useless proteins around, which can only be detected when the environment changes and as a consequence they become useful. They constitute a supply of new material for evolution.
The main task of data processing in E. coli is the production of polypeptides at the time they are needed in order to keep the metabolism of the cell going. There is mainly one job the ribosomes, the executive part of the information system has to carry out: the formation of polypeptide chains by connecting the selected amino acids. The selection of which data have to be transferred and translated and when a polypeptide needs to be activated, the regulation of the genetic expression is a much more demanding procedure. The regulation of gene expression is carried out at the stage of transcription of DNA to RNA, at the stage of translation of RNA into polypeptides as well as at other stages later on. Each regulatory factor restricts the opportunities of genetic expressions and makes them at the same time more specific. The information system of E.coli consists of a large number of self-contained dedicated informational units There is no main program with subroutines in the information system of E.coli.
As a matter of fact genes are not just sequences of DNA nucleotides, but small dedicated informational units including addresses (promoter), and regulatory support systems, which result in the formation of specific proteins at the time they are required.
The origin of a biological information system
.
The code is the kingpin of the data processing in an information system. In a microprocessor the code is built into the hardware and the programmer has to abide by it. But how was the genetic code in a biological information system generated? To answer this question, one has to go back billions of years, to the gray zone between biology and chemistry. In a prebiotic time one would expect a macromolecular evolution. In chemistry not all combination of atoms and molecules are possible. Thermodynamic constraints stereo chemical consideration and parameters of the environment determine the structure of the molecules. There are chemical reactions which consume, others which produce energy. Some time, nobody knows when, chains of nucleotides must have appeared. Most probably RNA began to acquire importance before DNA and proteins existed. A lot of experimental evidence has been collected which postulate a RNA world. It appears that the roots of the elements of an information system can be found there. It appears that the most important ribosomal function can be carried out without proteins.
Formal constraints for a genetic code.
The genetic code was cracked by backward engineering; that means one knows the result and tries to find out how that result could have been achieved. One assumes that the code evolved step by step. The first thing that must be decided on is its basic structure. Is the code is to be based on one bit like in a computer where it consists of one bit that may assume the two values “0” and “1” , or should it be based one two three or may be four bits? The nucleotide message to be coded offers four possible values (the four different bases). For our backward engineering intellect it is obvious the code needs three bits since we need at least three bit with four possible values each, to code for 20 amino acids. A code with two bits and four values per bit could code up to 8 amino acids. A code with four bits and four values per bit could code up to 128 amino acids. The number of bits restricts the possible number of amino acids. Evolution selected the three bit code certainly before "knowing" how many amino acids would be required by biology in future. One has to keep in mind that an information system requires two types of codes, one to determine the address of a location in the memory and another to select the action to be taken by data processing mechanism. Since a message has to be located before an action can be taken, the code for finding a memory location, that means that during evolution, the starting point of transcription has probably been implemented before the actual translating code needs to be worked out.
Evolution of a genetic code.
Scientists studying the properties of the RNA world have found that certain folded chains of RNA can act as enzymes (ribozymes). The large unit of the ribosome contains a RNA sequence that functions as peptidyl transferase which joins amino acids into polypeptides. The smaller part of the ribosome is involved in both initiation and termination of the translation. There is evidence for a reduplication process of RNA without a code. There is a hypothesis suggesting the two ends of the t-RNA, the anticodon end and the acceptor end (to which amino acids can be attached) have developed separately (and probably by a the tactics of trial and error joined later to form an adaptor molecule). The first codes evolution has to select would most probably be an initiation and the termination code. If we study the a table of the genetic code we find that the triplet “AUG” is the unequivocal code for methionine and ad the same time marks the initiation of reading a mRNA string of E. coli. The very similar three letter triplets UAA, UGA, UAG are stop codes. A triplet anticodon site on the “anticodon end” of a t-RNA precursor could be established by the selection of these codes. Initiation and stop codes produce a fixed reading frame for reading a string of nucleotides, which is obviously a prerequisite for the selection of a genetic code. It would appear the selection of amongst redundant triplets could make it easier to find codes for further amino acids.
After the RNA world had expanded and DNA as a new carrier of information was established, the promoter sequences had to be introduced as addresses for the selection of the DNA sequences to be transcribed and translated. An address code had to be found for the recognition of addresses laid down in DNA by polymerases. That means a analog pattern of a protein configuration has to recognize a digital pattern of base sequences of DNA. There are models that show how this can be achieved (DNA binding domains of proteins).
The suggested consecutive steps in the evolution of the genetic information system seem to propose some guide lines. Each step should contribute to the stability of the system if further steps are to follow. It has to abide by rules of physics and chemistry and the restrictions established by the previous step. Each step constrains the possibilities of the next one. According to this view, evolution results from balance between constraints and new opportunities for selecting a further step. A biological information system is a collection of a large number of individual small systems. Each of them must have passed through its own evolutionary steps. It took evolution about 1.5 billion years of trial and error to achieve cells of the organizational level of E.coli
THE DEVELOPMENT OF MULTICELLULAR ORGANISMS
Eukaryotic cells
E.coli belongs to the prokaryote organisms, which lack a nuclear membrane, cytoplasmic organelles and a cytoskeleton. The chromosomes (DNA molecules and associated proteins) are the only larger structured elements floating within the cellular membrane The study of the genetic information system of bacteria like E.coli deals mainly with the chemical properties of nucleic acids and proteins and their metabolic functions. Structural aspects of the cell play a minor role. The bacteria are immensely successful in coping with the most diverse environmental conditions. However, evolution did not stop here.
In eukaryotes, the chromosomes are enclosed in a nuclear membrane. DNA is therefore separated from the ribosomes, which are situated in the cytoplasm, the fluid between the cell membrane and the nuclear membrane. The genetic information system works in two different compartments. The transcription of genetic information takes place in the nucleus and is therefore separated from the translation into polypeptides which is carried out in the cytoplasm. Mechanisms of transport of molecules through the membranes are called for.
All cellular membranes are phospholipids double layers. Each layer consists of two molecular sheets of phospholipids. The fatty acid tails of both sheets point to the center of the double sheet. Proteins and other molecules can be inserted between the phospholipids. The nuclear membrane consists actually of two membranes held together by a system of nuclear pores, which supervise the molecular traffic between nucleus and cytoplasm. Proteins that have to enter the nucleus, (like DNA polymerase) contain amino acid sequences that represent nuclear localization signals. Proteins that have to leave the nucleus contain amino acid sequences that represent nuclear export signals. The signals selectively permit the transfer of proteins. The molecular traffic involves energy transfer. Small proteins with signaling sequences carry out this function. Some of them happen to be very similar to the elongation factors, which support the formation of amino acid chains in E.coli. RNA molecules like mRNA, t-RNA, r-RNA have to be exported through the nuclear membrane as RNA protein complexes. The nuclear export signals of the protein help with the transport of RNA.
The nuclear envelope, which restricts the free movement of molecules within the cell, leads to the invention of a signaling system. A signaling system consists of signals and complementary receptors. During evolution, such matching pairs must have been found. In fact, whole families of matching pairs have been selected by evolution. We have seen that regulatory factors restrict transcription and translation. The pores further restrict the flow of information from DNA to RNA to proteins. Restriction means more specific action; it means a selection of molecules, which are allowed to pass a barrier. Beside the nuclear membrane there are other membranous organelles forming compartments in the cytoplasm. Signals play also here its role in transport through the membranes. Most of the ribosomes are attached to the endoplasmic reticulum. Another one is the golgi apparatus a site of sorting of proteins and lipids. The cytoplasm of eukaryotes contains a cytoskeleton, a network of protein filaments, which structure the shape of a cell and may enable the cells to move.
The first primitive forms of life emerged probably 3.5 billion years ago. It took evolution about 0.5 billion years to pass from prokaryote to eukaryotic cells. About 1.5 billion years later the first multicellular turned up. The time estimates of evolutionary periods are of course very tentative. Nevertheless, it is still not without interest to realize, that the time that passed from the arrival of the first forms of life to the appearance of the first multicellular organisms was probably longer than the period required for evolution of the entire multicellular world with all its intricate and apparently intelligently adapted biological structures. One has to bear in mind that during the first 2 billion years all the basic digital cellular strategies of life was established.
MULTICELLULAR INFORMATION SYSTEMS
The development of a multicellular organism
Development is a indispensable process in the creation of multicellular organisms. It is a step by step progress, each step building on the previous one, starting with one cell, the egg cell. There are of course forms of life where the power to regenerate makes development from an egg obsolete. The Banana trees are propagated by cuttings. One finds in some fruits still seeds or rudiments of seeds. Some cultivars show more, others less and wild bananas always have seeds. Single cell organism like protozoan are highly organized cells with many efficient organ-like structures (organelles). Cell division produces here other single celled individuals complete with all their organelles. Modern unicellular species have evolved from simpler forms, but there is no real development. Their size is very limited and so is the variety of environments in which a species can thrive.
The development of multicellular organisms starts with the division of a single cell, whereby each daughter cell obtains the same genetic information. This process repeats itself many times. The cells adhere to each other with the help of cadherins and other proteins like members of the Ig super family, which are inserted in the cell membrane. As a rule, cytoplasmatic molecules of the first cell are not homogeneously spread throughout the cell. Consequently, different daughter cells obtain different cytoplasmatic molecules. During development cells differentiate, they produce different proteins, among them regulatory molecules. They are activating the transcription of different regions of the DNA of different daughter cells, which therefore display different genetic activities. An early cluster of cells arranges itself in orderly way. Different cadherins keep different groups of cells in different aggregations together. The cells organize themselves into three layers: the germ layers. The ectoderm and the endoderm are formed first. In between the two, the mesoderm splits off later on. Early in evolution, there are organisms, which consist just of ectoderm and endoderm, in later stages a third layer in-between, the mesoderm evolves. From these early stages, the very different forms of the animal kingdom evolved The genetic outfit and the morphological constraints of early cellular aggregations had to be in place before multicellular organisms could use them in their development. The ectoderm and the specialized cells being derived from it, form a protective shield around the multicellular organism. Actually, it represents a third barrier between the genetic information and the body's environment. The first two are the cellular and the nuclear membranes. It would appear that there is a general strategy of evolution to use the interface between an organism and its environment as a tool to render it more and more independent of a the environment.
Eukaryotic cells, have during their evolution, acquired a system of signaling which allows specific molecules to pass through the nuclear membrane. Similarly, Multicellular organisms during their evolution have acquired strategies of passing specific molecules through the cell membrane. The cell membrane contains receptors, which accept specific signals from other cells and the signal-receiving cell responds by regulating transcription in a specific way; the cells carry out an instructions. Multicellular organisms are confronted with an extra cellular but also an environment of the cells within the body. The signaling between cells in a multicellular organism and its environment becomes as evolution proceeds very complicated. There is ample evidence, however, to show that multicellular organism use the same basic genetic information system as unicellular ones. We have seen, that in E.coli each "gene" represents a complete dedicated information system, and that there is no higher ranging master program. The "genes" are activated when and where they are needed. "When" means at what developmental stage and "where" means in which cells. The factors that switch on the transcription of a string of DNA nucleotides of one cell, and with it its expression may come from other cells with other levels of differentiation of the same organism.
The recognition of a signal by a receptor depends on a match between protein structures. Two structures of protein shapes match; match well, or not so well. The strictly digital aspect of recognition, where a signal is either recognized or not recognized, is here fuzzy. This fact may speed up evolutionary processes in an unspecific recognition system, where the signaling structures fit only loosely the complementary structure of a receptor may be a preliminary step, which can be modified by evolution to produce a more accurate mechanism.
Genes not only produce enzymes and regulatory proteins but also structural proteins. The bigger and the more complicated a species becomes, the more important structures become, whether they are within cells or around them. Here digital mechanisms loose their direct and exclusive grip over molecular and cellular digital interactions. Many molecules or even cells have the property to aggregate by them selves in specific patterns, depending on their own structures and the condition of their immediate environment. Sheets or tubes are examples of selforganising structures, but there are many more. D' Arcy Thompson's marvelous "On Growth and Form" shows how physical forces determine the growing body's shape.
Information systems in multicellular organisms.
The size and the complexity of multicellular organisms call for more powerful and specialized signaling procedures, especially in animal life. Multicellular organisms have to defend themselves against microorganisms, viruses and foreign intruding substances. An outer cover, the skin and associated layers protect the body. However, there is always a way to invade it through an entrance like a mouth and the gut which is lined by a soft layer of tissue. The immune system deals with these emergencies. The protective outer layers form a barrier between environmental conditions and the genetic information system of multicellular organisms. The basic genetic information system has to react to the environment. Multicellular organisms collect signals from the environment through sense organs. Nerves convey the signals to the central nervous system where they are processed and answered by appropriate reactions. In both of them in the immune system as well as in the nervous system, secondary information systems evolved on top of the innate information systems. In these adaptive information systems, each individual is able to extend in the course of its development the innate system by a process that very much resembles learning.
In the following, the immune system and the nervous system can of course not deal with the necessary facts that could explain their functioning. Emphasis is put on the demonstration of the implementation of the basic principles of an information system.
.
THE IMMUNE SYSTEM
The innate immune system
The defence against foreign agents, the so called antigens, is carried out by the immune system. Some types of immunity is found in all multicellular organisms including plants. The innate immune system comprises all immune responses which are under direct genetic control. A very basic defence is performed by phagocytes, cells that have the ability to engulf adverse intruders and digest or destroy them. In roundworms and vertebrates we can find natural killer cells. These cells express on their surface receptors which are able to recognize characteristic molecules on the surface of microbes. The recognition triggers immune responses. The complement system, another defence mechanism, consists of a number of proteins. Some of them are able to attach themselves to a microbial surface and together with other complement proteins attack the cell membrane and eventually kill the microbe. The lymphatic system is a system of vessels that collect tissue fluid (lymph) and returns it to the blood. Lymph nodes are interspersed along these vessels
The adaptive immune system
In mammals and in birds a second line of defense is established during the individual development of an organism. It is called adaptive immunity. B-lymphocytes and T lymphocytes are specialized cells of the adaptive immune system, charged with the protection against protein antigens. Phagocytes like macrophages and dendritic cells act as antigen presenting cells (APC). Within these cells proteins derived from engulfed digested viruses and bacteria, but also from the metabolism of the cell itself are broken down into peptides. Immunodominant peptides are those, which can be embedded in a cleft of a MHC protein molecule and then be brought to the cell surface for presentation where they act as coded address of the APC. The term MHC stands for “major histocompatability complex” because these molecules were first found in tissue transplantation studies. MHC proteins are found in all nucleated cells of mammals.
The main players on the immune scene are derived from self renewing stem cells. They share the genes with all other cells of a multicellular body. The lymphocyte stem cells live in the bone marrow. The differentiation of B cells takes place in bone marrow, while the T cells move to the thymus gland. During development into prelymphocytes a drastic change of the genes responsible for the expression of antigen receptors occurs. Only a selection of small parts of these genes are transcribed and translated in various genetic recombinations in each individual cell. The variety of antigen receptor proteins produced is so big, that practically all possible antigen matches are realized. During maturation, the cells proliferate and those that cannot express the antigen receptors die. At the end of maturation, one finds an enormous number of genetically different lymphocytes with structurally different antigen receptors. Since the production of receptor molecules happens independently from the presence of antigens, many of them will produce antigen receptors, which are of no later use and die. Immature lymphocytes bind strongly to self-antigens (antigens derived from the corresponding cells themselves) and will die by apoptosis (regulated cell death). Lymphocytes that do not bind strongly are selected to survive. The mature lymphocytes multiply and move from their place of origin into the periphery of the lymphatic system where they meet antigen presenting cells like macrophages and dendritic cells, which have during their voyage through blood and lymph systems collected antigens and present them now. There are most probably several slightly different antigen receptors which may recognize the same antigen presented by a MHC.
The recognition of antigens on an APC by mature leucocytes induces the immune response. Various receptors on the surface of the T cell join corresponding molecules on the surface of the antigen presenting cell. Together they organize an immunological synapse. Molecular interactions in the synapse induce a signal to genes of the T cell which start to secrete cytokines. The T cell then differentiates into either T helper cells or T killer cells and launches a clonal expansion, each clone with its own antigen specifics properties. The T killer cells kill the infected cell to which their TRC molecule has attached; Helper cells activate macrophages and B cells. The immune responses of the B cells is the production of antibodies. They are glycoprotein molecules that a secreted by the B cells and are able to neutralize antigens in the body fluids. B cells are able to deal with protein and non protein antigens. In order to handle protein antigens however they have to contact T helper cells. While active leucocytes are short lived, each type of T and B cells produce also long-lived memory cells, which survive for many years in different tissues, where they respond readily to secondary attacks of the same antigen that caused their selection at the end of their maturation.
As new native T and B cells are continuously formed, the genetic differentiation keeps on producing specific antigen receptor proteins. The genetically newly rearranged receptors represent the storage memory of the immune system. The final selection of antigen receptors out of the many possibilities offered by the storage memory serves as the working memory. The key element in an information system is the code. The role of the code is played in the immune information system by the immunological synapse. It mediates between the cell on which it is displayed and the cell carrying a corresponding receptors. The recognition represents the process of decoding. It triggers the required immune responses (data processing). The development of each leukocyte is in itself the development of a small information system. The memory cells form a reservoir of the working memory. The whole system would not work, if the cells involved would not move at the right time to the right place. This is made possible by addressin molecules expressed on endothelial cells in different atomic sites and homing receptors (members of the Ig super family) on the lymphocytes.
THE NERVOUS SYSTEM
The innate neural information system
The neuron consists of a main body and one main extension, the axon. In some neurons, the bipolar neurons there are two axon-like branches. The axons and the main body are provided with additional extensions, the dendrites, which receive signals from other neurons. The axon sends signals to other neurons or to effector organs. The signal consists in a train of electric pulses of varying frequency. The signals are either on or off, depending on whether the neuron itself has received appropriate signals through its dendrites. Neural information systems occur in all multicellular animals. Signaling is the most characteristic ability of neurons. Neural signals connect sender and receiver directly. In this respect, the neural information system behaves somehow like a hardwired addressing system of a microprocessor. Extremely primitive animals like sea anemones are equipped with neural information systems at this level. Without interconnections, these discrete information systems spread over the surface of their body. Very soon in evolutionary history, the discrete systems join to form a system of peripheral nerves distributed throughout the whole body connected to a central neural system consisting of collections of cell bodies. Smaller collections of neural cell bodies are the ganglia. Afferent (incoming sensory) neurons send signals from sense organs which receive impulses either from external environment or from the internal environment (body temperature, state of tension of muscles, pain etc) to the central nervous system. Efferent (outgoing motor) neurons send signals from the center to effectors, organs like muscles or glands. In vertebrates, the central nervous system is derived from an embryonic neural tube and consists of a prominent frontal part in the head, the brain and the spinal cord, which runs through the whole length of the body. During early development, neurons acquire their characteristic shapes. The dendrites connect to the body and axons of other neurons with the help of synapses. The presynaptic neuron is the one that sends a signal across the synaptic gap to the postsynaptic neuron. There are signals that activate and others that inhibit. The sum of all signals a postsynaptic neuron receives decides whether it sends off its own signals or not.
The signals passing through the axons are electric currents and are very much the same throughout the whole nervous system. In the innate neural system the neural net is hardwired. Neurotransmitters influence the specificity to the process of transmission. Specific receptors of postsynaptic cells recognize specific neurotransmitter molecules provided by the presynaptic neuron. A recognition produce effects that differ widely in their biochemical mechanism, duration of action, and physiological function. There are three classes of receptors. Ionotropic receptors directly open an ion channel that is part of the receptor macromolecule. These transmitter-gated channels produce the fastest and briefest type of synaptic action, lasting only a few milliseconds, on average. This fast synaptic transmission mediates most motor actions and perceptual processing within the nervous system, Other neurotransmitters mediate longer lasting effects that pass through a receptor gate and activate signals within the postsynaptic neuron. They may last from seconds to minutes. The longest lasting form of synaptic transmission involves changes of gene transcription, changes that can persist for days or weeks. These more permanent actions are thought to involve many of the same type of receptors involved in the shorter-term modulatory actions of transmitters. However, they require repeated stimulation.
A reflex passes a neural signal directly from a sensor to an effector. The stretching of a sensor in a muscle causes a sensory neuron to send a signal to a motor neuron in the spine. The motor neuron in turn causes the muscle to which it is attached to contract. The monosynaptic reflex shows the basic function of a hardwired information system: An incoming signal translates directly into a signal of an efferent neuron. The sensory neuron can connect to a motor neuron and at the same time to an interneuron that may inhibit the contraction of a different muscle. In this case, the signal passes through more than one synapsis. Such an arrangement is meaningful, when the two muscles are antagonistic like a bender and a stretcher, which should not contract at the same time. Reflexes are completely hardwired information systems.
The development of the nervous system.
The first rudiments of the nervous system of many animals including vertebrates, arthropods, worms, appear very early during embryogenesis. A region along the longitudinal axis of the embryo separates itself from the remaining ectoderm and forms an elementary nervous system. Its cells still resembles the epidermal ectoderm. In vertebrates, it forms a dorsal tube with a head and a tail end. The cells start differentiation by forming an axon that grows towards its target, for example a developing leg. At the tip of the axon of an effector cell, a growth cone follows a path to the target. The growth cone is both a sensory and a motor structure. Some of the navigational aids are the chemically similar to the ones found in lymphatic system, where lymphocytes have to find an appropriate region for maturing. Others are cell adhesion molecules, which mark the path. There is a lot of detailed experimental evidence for the existence of a very specific guiding mechanism in which both, the growing axon and the target cell play their role. The growth cone moves to the vicinity of the target cells. The development of the junction between a motor neuron and the precursor of a muscle fiber, a myotube has been thoroughly researched. Once a growth the cone has reached a myotube it starts to develop a morphologically unspecified contact. This contact leads finally to the formation of a synapse. Specific synapses must also develop between neurons and interneurons. The synapse is the defining feature of a neural interaction. The establishment of synapses is the prerequisite for addressability of the neurons. The growing embryo produces a surplus of neurons competing for targets. The target cell secretes limited amounts of neurotrophic factors (NGF). There are different NGF’s for different targets. Only those neurons survive which are able to attach to a target. The neurons have a built in suicide program (apoptosis), which starts when the neuron does not obtain enough NGF. In addition, surplus synapses become obsolete too. We have to bear in mind that the first contact between growth cone and myotube happens when the two have reached a specific stage in their development. The afferent neurons start at sensory cells in the periphery and grow towards the central nervous system and find their way by guiding system and eventually find their specific target.
The hierarchic organization of the neural information system.
The hierarchy of the sensory visual system represents a well studied example for the hierarchic structure of a vertebrate neural information system and it shows at the same time how signals are processed. The retina of the human eye contains about hundred million light sensitive cells (rods and cones), while the optic nerve holds about one million nerve fibres. The light signals impinging on the retina are obviously pre-processed before thy leave the retina. The process of pre-processing is organized by its histological structure. Bipolar cells join at one end the sensor cells of the retina and at the other end the ganglion cells which form connections to neurons of the optic nerve. Horizontal interneurons connect a certain area of adjacent sensor cells with bipolar neurons.
They organize two types of receptive fields in one the centre reacts to relatively dark sensory inputs as compared to a lighter surround; in the other the centre is relatively light and the surround darker. This arrangement increases the contrast of the sensory input. A layer of interneurons at the other end of the bipolar cells, the so called amacrine cells, connect an area of bipolar cells with the ganglia cells. The amacrine cells are thought to be involved in the generation of directional selectivity (vectors). There are other types of ganglion cells in the retina. Some have big others small receptive fields and others respond to colour. The connections between sensory cells and different types of ganglia are hardwired in appropriate ways. It appears that the retina by itself is able detect a moving object at least in lower vertebrates.
Projection neurons (bipolar cells) connect visual sensors directly to the ganglion cells of the retina. From there projection neurons carry signals via several relay stations to the cerebral cortex. At each station horizontal interneurons integrate signals and send branches of to effector organs and projection neurons further up to the brain. The primary visual cortex has a layered structure with a prominent white stripe corresponding to a layer of axons of layer cell bodies of integrating neurons. It is the primary site of visual data processing.
The developmental stations reflect evolutionary stages. At early stages in evolution they have more important visual function; later these functions were more and more delegated to later stages in particular to the primary visual cortex. The frog’s retina recognises moving objects. A frog would starve with lots of dead flies around it, but eagerly tries to catch a moving crude dummy. The frog’s retina seems to be able to perform the vital task of getting food; this ability is not found in visually higher developed vertebrates. While in birds the colliculus (midbrain) is anatomically prominent and plays an important role in the visual system, in mammals many functions are reduced in favour of the primary visual cortex. The visual receives signals at one side and passes them on to a following level; during this process the shape of an image gradually obtains a more detailed structure. The higher up one goes in the evolutionary ladder the more distinctly a hierarchical neural organisation expresses itself.
Not only the retina is mapped to the cerebral cortex, there is also a topological map of the sensory input of the surface of the body projected to the neural cortex. There is also a topographic map of the motor output; the neurons which innervate the muscles are arranged here in a pattern reflecting the pattern of the muscles of the body.
The adaptive neural information.
In the vertebrate neural information system there is no clear separation between innate hardwired connections and connections programmable during development. Studies on development show that connections between visual images on the retina and the actual seeing can be installed during the development of an individual organism. If young kittens are prevented from opening their eyes for a closely defined period after birth, they do not develop proper vision. The time factor and the contact with the environment are vital for the postnatal development. If the kittens are blinded during any period later, their vision is not impaired. The observation of the process of imprinting is also clear evidence for the fact that a specific visual images serve as clues for environmental interactions. Just after birth hatching ducklings become fixed on an object as a code for recognizing their mother. Some birds learn to recognize the voice of the young birds even before they have hatched. The programmed concretions are set up together with growing nerves and are therefore closely linked the anatomical development of the nervous system which in turn depends on an individual body as a whole. A programmed neural connection is the address of the postsynaptic neuron.
The development of a hierarchically organised neural system depends on the possibility of programmable addresses, since new addresses have to be established on a new level. A later point in time will then be instrumental for the specificity of the selection of a contact. The establishing of a code in the adaptive phase of neural development has the character of learning.
The individual development replaces evolution as creative element in the formation of the anatomical structures and neural connections. Of course the genes still provide the necessary proteins but they do not determine the details of cellular interaction. The genes, rather than determining which cell does what at a certain point in development, the momentary cellular situation instructs the genes by signals to deliver their product. This is very clearly demonstrated by the events at the synapses. We should therefore not talk of the genome as a blue print and also not of a manual of construction of a higher organism. The genes represent the starting point of the adventure of individual development and provide the tools for it.
Signal and codes.
Signals and codes; both depend on pattern recognition. Signalling and coding is directional. There are a sender and a receiver; a source and a receptor. The sender has to recognise the receiver. The receiver must be addressable. A signal is simple; there is a one to one correlation between signal and address. The code is composite it is an array of similar elements that can be arranged in different ways. Each arrangement points to a different address. An address code is established by a source selecting a suitable target. Selection during evolution results in a hardwired connection. If selection occurs during the individual development of an organism the address is programmed. Selection is a learning process. A code is just a label for an object like a molecule, a molecular structure, a cell or maybe even an organ or an organism. The recognition of a coded address is its decoding it usually triggers a signal which serves the purpose for which the address had been selected. Signal and receptors, though they have to match structurally have originated independently. There is no general coding system (in contrast to microprocessor codes) but millions of dedicated codes selected during evolution and development. The result of this set up is the possibility of the flexibility of the system.
The neural code defines the address of a neuron which consists of all the synaptic contacts which are necessary for the sending of impulses along the axon and dendrites to a postsynaptic neuron. The code serves its purpose only within its anatomical context; that means the specific origin of the signals from the presynaptic neurons and eventually the sensory cells that were involved in the primary signals. This provides the neural signal its semantic component and turns the signalled message into information.
In the cerebral cortex signals not necessarily trigger an action they may contribute to the formation of an address on a higher level. If also other axons converge repeatedly and simultaneously fire on a single neuron they can form a specific synaptic pattern, a code for the address of an associate cell. The strictly ordered anatomy of the brain determines which neurons have a chance to associate with each other. The further upwards signals move in the hierarchy the more hardwiring is being replaced by programming of interneural connections.
The memory
The process of recognition requires memory that defines what is to be recognised. The memory is situated between coding and decoding.
The m-RNA acts as coded working memory in the genetic information systems. The base triplets are recognised by the t-RNA. The process of recognition is a process of decoding and it results in the translation of a triplet into an amino acid which is joined to the previously translated amino acid.
In the adaptive immune system the antigen receptor on a t-cell is the coded working memory. The recognition of the presented antigen is the decoding. As a result of decoding the t-cell starts to secrete cytokines and to differentiate into t-helper cells, t-killer cells and t-memory cells.
In an innate neural information system the interneuron carries the signal from a sensory neuron to the effector. The interneuron is a memory cell since it remembers (hardwired) how the signal is processed. The same happens to a programmed memory code; it finds the addressed neuron and it triggers an effector neuron for example a motor neuron.
The adaptive neural information system is organized on top of the innate system and uses its functions. Innate reflexes can be modified by efferent signals from higher levels. Conditioned reflexes displayed by Pavlows dogs are examples of such a modification. The salivation reflex makes them salivate when food is presented. A sound signal produced always at feeding time can after some time trigger salivation even without the presentation of food.
In a hierarchy of memories, the decoded memory can be a part of the address of another memory higher up step by step. Finally signals reach the topological maps in the cerebral cortex. The anatomy of the cerebral cortex is built in such a way that functionally related areas are also topographically adjacent, what means that meaningful connections are likely. Adjacent to the topological maps are areas where association between different muscles are organized which could lead to coordinated muscular motor programs. In a similar way sensory maps facilitate the formation of complex sensory situations. Populations of the appropriately coded individual memory cells form an image of the entire internal and external world.
The higher up one goes in the hierarchy the less the memories are reflecting sensory situations and motor programs, the addressing does not anymore trigger a physical action. One moves here in the region of what is colloquially called “The memory”. From the point of view the general structure of an information system “The Memory” does not to differ from lower level information systems: it is basically a system of coded references.