Thursday, September 11, 2008

Whole-cell simulation: a grand challenge of the 21st century Masaru Tomita

Study of the cell will never be complete unless its dynamic behavior is understood. The complex behavior of the cell cannot be determined or predicted unless a computer model of the cell is constructed and computer simulation is undertaken. Rapid accumulation of biological data from genome, proteome, transcriptome and metabolome projects can bring us to the point where it is no longer purely speculative to discuss how to construct virtual cells in silico. This article describes attempts to construct whole cell models. The E-CELL project has completed a couple of virtual cell models, and computer simulations have revealed some biological surprises.

Suppose that a certain organism’s genome has been completely sequenced. Then uppose that structures and functions of all its gene products have been thoroughly identified. Suppose further that a giant map of the entire metabolic pathways has been drawn flawlessly. Then what? Would we have conquered the cell? The answer is clearly ‘no’ because the overall ‘behavior’ of the cell would still not be understood. To say that we understand the overall behavior of the cell, we must be able to answer questions such as: ‘How would the cell behave if we change the environment, for example, by adding or decreasing a certain substance?’ and ‘What is the result if a certain gene gets knocked out or over-expressed?’ Slightly more sophisticated questions include: ‘What gene needs to be inserted for the cell to behave in such a way’ and ‘What is the ideal culture medium in which to maximize the cell’s ability to do such a thing?’ There is no doubt that computer simulation is required to understand dynamic behaviors of cellular metabolism. Virtual experiments in computers, often called ‘in silico’ experiments are not just useful but indispensable for biology because of the large amount of data generated from the genome, proteome, transcriptome and metabolome projects.

‘Customized medicine’based on single nucleotide polymorphisms (SNPs) in which an appropriate dose of medicine is selected and administered to a specific patient based on his/her SNP data, is not an exception. When a large amount of SNP information from many different individuals is collected, the susceptibility of the patient to a certain drug can be statistically predicted by analysing a specific locus of a specific gene in his/her genome for SNPs. This statistical method is useful when a polymorphism of one specific gene causes a polymorphism of the phenotype (monogenic), in which case, analysis of the genotype can lead to a precise conclusion, without in silico experiments, as to what phenotype or physical trait that person has. It is not so easy, however, to understand a phenotype in which multiple genes are involved (polygenic). For example, suppose it has been shown statistically that a susceptibility to a certain drug is high (+) when a person possesses a type Agene; the susceptibility is also high (+) when possessing a type B gene at another locus. When a person has both Aand B gene types, however, there is no guarantee that his/her susceptibility to that drug will be twice as high (+/+); it is possible that his/her susceptibility would be low (−). Furthermore, suppose having a type C gene at a different locus is statistically known to result in low susceptibility (−). Then, what will happen when a patient has Aand C gene types? It is possible, for example, that the susceptibility is high (+) when the person only has A and C gene types but very low (−/−) if the person has a combination of B and C gene types, and high (+) again with combination of A, B and C, and so on. This non-monotonic nature of the genetic system is the reason why statistical analyses alone are limited in understanding cellular behaviors. Computer models and in silico experiments are necessary to understand and predict phenotypes of the cell, especially when they are polygenic phenotypes. After all, most biological and pathological phenomena in which the pharmaceutical industry has a great interest, such as cancer and allergy, are polygenic.

It is still an open question as to whether or not it is feasible to construct a computer model of a whole living cell that is sufficiently sophisticated to predict answers to the types of questions mentioned above. It is thought that the task of whole-cell modeling is too difficult to be achieved in the foreseeable future. Although no attempts were made towards whole-cell modeling until the late 1990s, the importance of computer simulation of cellular metabolisms has been suggested and emphasized since the 1980s. Cellular processes that are important subsystems of the cell have been modeled and simulated by many different groups: the regulation of gene expression1–5, cell cycle6,7, signal transduction8 and metabolic pathways9–12. However, although these models made significant contributions to the development of in silico biology, the programs were only able to handle specific subsystems, and it was difficult to combine different subsystem models into one single-cell model. The first cell model To conquer and directly challenge the task of whole-cell modeling, the E-CELL Project (Ref. 13) was initiated in 1996 at the Shonan-Fujisawa Campus of Keio University (Fujisawa, Japan), following the publication of the entire genome sequence of Mycoplasma genitalium http://www.tigr.org/tdb/mdb/mdbcomplete.html). M. genitalium has the smallest genome (580 kb) and the smallest number of genes (~480) of all living organisms currently known and its genomic sequences have been published (see http://www.tigr.org/). The size of its genome is one order of magnitude less than that
of Escherichia coli, and thus is an ideal candidate forwhole-cell modeling. Approximately 80% of its 480 genes share nucleotide sequence homology with other genes of which the function could be predicted when searched for on BLAST. The functions of the remaining 20% of the genes, however, are still unknown and it was soon realized that it would be difficult to reconstruct the whole organism without knowing the functions of
the remaining genes. However, it has been demonstrated in intensive gene-knock-out studies by The Institute for Genomic Research (TIGR) that many of the 480 genes are not always necessary for M. genitalium to survive. Therefore, in collaboration with TIGR, a minimal set of genes sufficient and necessary for its survival and for maintenance of its homeostasis were selected (Table 1) and the first hypothetical ‘virtual cell’with the 127 genes was constructed (Fig. 1). This virtual self-surviving cell (SSC) model takes up glucose into the cytoplasm, metabolizes the glucose through the glycolysis pathway and produces ATPas an energy source. The ATPis consumed mainly for protein synthesis; the 127 genes are transcribed by RNA polymerase into mRNAs, and then translated into proteins by ribosome. Proteins are modeled to degrade spontaneously over time and so the cell has to constantly produce protein to sustain life. The membrane structure of the cell is also modeled to degrade over time; thus, the cell has a phospholipid biosynthesis pathway for biosynthesis of the cell membrane, uptaking fatty acid and glycerol, consuming ATP and generating a phospholipid bilayer, which forms a cell membrane. A constant supply of energy (ATP) is required to maintain protein and membrane synthesis, and thus glucose is essential for the survival of the virtual cell.

E-CELL simulation system
The SSC model has 105 protein-coding genes (Table 2) and 22 RNA-coding genes, and consists of 495 reaction rules. Each reaction rule defines what to do within one single time step (one millisecond, in this case). Reactions include: (1) enzymatic reactions that increase and decrease the quantity of its substrate(s) and product(s), respectively; (2) complex formations, in which multiple substrates form a complex; (3) transportations that change the location of certain substances; and (4) stochastic processes such as a transcriptional factor binding to a specific site of the chromosome. The E-CELL system also accepts user-defined reactions, making it capable of handling many other phenomena such as diffusion and variable cell volume. When simulation of the SSC starts, all the reaction rules are executed in parallel (in practice pseudoparallel), and the overall behavior of the cell can be observed through various graphic interfaces (Fig. 2). The dynamic changes in the amount (molecular number) of various substances inside the cells can be observed using an interface (‘Tracer Window’, presented as ‘Traced substances’). In Fig. 2, C00031E and C00186E show temporal patterns of change in extracellular glucose and lactic acid (waste end product), respectively. It can be seen that the cell is ‘living’steadily because extracellular glucose is gradually decreasing and lactic acid is reciprocally increasing. If, at a certain time point, the extracellular glucose level is set to 0, the cell starts to starve and will eventually become incapable of glucose uptake. The activity of a specific biochemical reaction can be monitored using the ‘Reactor Window’(Fig. 2, ‘Reactor’ panel), which shows the number of molecules being processed within a single time step. The amount of substrates can be altered by the user, even in the middle of a simulation, using the ‘Substance Window’(Fig. 2, ‘Substance’panel). Finally, the expression of all genes can be monitored simultaneously using the ‘Genemap Window’(Fig. 2). Each icon represents each corresponding gene, and exhibits the amount of mRNA of that gene (molecular number). Aspecific gene can be easily knocked-out by clicking its icon; it is thus possible to conduct real-time knock-out experiments in silico. Clicking the icon again can reactivate a knocked-out gene. Final ’desperate efforts’ before starvation Even this simple cell model sometimes shows unpredictable behavior and has delivered biologically interesting surprises. When the extracellular glucose is drained and set to be zero, intracellular ATP momentarily increases and then decreases (Fig. 2, ‘Traced substances’ panel, substance ID C00002). At first, this finding was confusing. Because ATPis synthesized only by the glycolysis pathway, it was assumed that ATPwould decrease when the glucose, the only source of energy, becomes zero. After months of checking the simulation program and the cell model for errors, the conclusion is that this observation is correct and a rapid deprivation of glucose supplement can lead to the same phenomenon in living cells. The reason is, in fact, clear. In glycolysis, one molecule of glucose produces two molecules of ATP. In more detail, however, two molecules of ATP are consumed in the first part of glycolysis and then four molecules of ATP are synthesized in the second part, producing two net molecules of ATP. Thus, when glucose is rapidly deprived, first consumption of ATP stops and then, a few moments later, ATP synthesis stops. The slight difference in time causes a transient increase in ATP. This momentary increase in the level of ATP, is of great interest and is as if the cell undergoes a final ‘desperate effort’ when it is rapidly starved. Virtual erythrocytes obviously, the SSC model described above is only a hypothetical cell; no such cells exist in nature. Thus, it was decided to model living cells so that the simulation results could be evaluated. Human erythrocytes were chosen for the model because intracellular metabolism is limited in human erythrocytes and because they do not replicate, transcribe or translate genes; also, there are already several studies on the modeling of erythrocytes14–16. It is possible to compare computer models with real red blood cells because a considerable amount of experimental data about red blood cells has accumulated17, 18. Recently, the construction of a prototype of human erythrocytes using the E-CELL System has been completed (Fig. 3). In human erythrocytes, the major metabolic pathways are glycolysis, the pentose phosphate pathway and nucleotide metabolism. In addition, there are abundant hemoglobins that carry oxygen
from the lungs to peripheral tissues. Thus, the erythrocyte is a ‘bag’ packed with hemoglobins that maintain several metabolic pathways for ion transport to maintain homeostasis of intracellular osmosis. An E-CELL model of the human erythrocyte has been developed, by defining reaction rules for all these metabolisms based on the previous erythrocyte model14–17. All the kinetic equations and parameters used in the model were obtained from previously published experimental data. After tuning the kinetic parameters, the E-CELL erythrocyte model reached a steady state in which quantities of intermediate metabolites inside the virtual cell are comparable with the experimental data of living erythrocytes. We are currently extending and improving the erythrocyte model for a more accurate simulation, by taking into account osmotic pressure, pH and variable cell volume. Using the E-CELL for pathological analyses It is possible to perform in silico experiments in which
the function of an enzyme is inhibited, and to simulate the behavior of human erythrocytes from hereditary
anemic patients using the E-CELL model (Fig. 4). Using the simulated erythrocytes on the E-CELL program, the activity of aldolase is blocked in our virtual erythrocytes; aldolase (fructose bisphosphate aldolase) converts fructose-1,6-bis-phosphate (X12) to glyceraldehyde-3-phosphate (X14) and dihydroxyacetone- phosphate (X13). Aldolase-deficient humans are susceptible to hemolytic anemia. With aldolasedeficiency, the reactant fructose-1,6-bis-phosphate (X12) is markedly increased and accumulated, whereas the reaction products glyceraldehyde-3- phosphate (X14), dihydroxy-acetone-phosphate (X13) and further metabolites downstream of this reaction are significantly decreased. Although this is an expected result, such in silico experiments might give unexpected results, as we have seen in the SSC model. One major advantage of these in silico experiments is that they can be repeated automatically, for all enzymes at various levels of inhibition, resulting in lists of
enzyme deficiencies that could kill, perturb, or preserve the cell. The results might suggest some biologically
interesting phenomena, which could then be verified by in vitro or in vivo experiments. In addition, the results
would provide kinetic sensitivities of different enzymes. If a small change in enzyme activity results in a
dramatic change inside the cell, the enzymatic reaction is ‘sensitive’ and therefore would need accurate kinetic
parameter values. However, other enzymatic reactions that are not kinetically sensitive would need only rough values for their kinetic parameters. In this way, it will be possible to reproduce the behavior of human erythrocytes not only in normal physiological conditions but also in abnormal pathological conditions. Thus, one could investigate how an abnormal anemic condition occurs in human erythrocytes, for example, in patients with hereditary oxygen deficiency or patients with drug-induced inhibition of specific enzymes. Because there are many known cases of anemia with abnormal erythrocytes, computer simulation will have an important role in the development of remedies of many human diseases. SNPs and other types of genetic polymorphism
could result in different enzymes with different kinetic parameters, producing different fluxes and therefore
different metabolisms inside the cell. By assigning kinetic parameters based on SNP information, it might become possible to conduct ‘personalized simulation’ of an individual’s cellular metabolism. Such simulation could, in the future, play an important role in ‘customized medicine’, by predicting appropriate drugs and dosage for individual patients.

Future prospects
In addition to the ‘virtual self-surviving cell’ and the ‘human erythrocyte model’ described other E-CELL models are currently under construction; a ‘mitochondria model’ and a ‘signal transduction model’ for the chemotaxis of E. coli. Examples of other successful systems for integrative simulation of the cell include DBSolve by Goryanin and colleagues19 and the V-Cell by Schaff and co-workers20. One of the major problems in constructing large-scale cell models is lack of quantitative data. Most of the biological knowledge available is qualitative (such as functions of genes, pathway maps, which proteins interact with what) but for
simulation quantitative data (such as concentrations of metabolites and enzymes, flux rates, kinetic parameters and dissociation constants) are needed. Amajor challenge is to develop high-throughput technologies for measurement of inner-cellular metabolites. A large amount of data for a variety of cell states can then be collected with the technologies to construct quantitative models, and the models can be refined iteratively until the simulation results match the data. For this new type of simulation-orientated biology, we set up the Institute for Advanced Biosciences of Keio University (http://bioinfo.sfc.keio.ac.jp/IAB/). The institute consists of three centers for metabolome research, bioinformatics, and genome engineering, respectively. The ultimate goal of this international research institute is to construct a whole-cell model in silico based on a large amount of data generated by high-throughput metabolome analyses, and then to design a novel genome based on the computer
simulation and create real cells with the novel genome by means of genome engineering. In the USA, the National Institute of General Medicine Sciences (NIGMS) has announced the Alliance for Cellular Signaling (AFCS) for analyzing signal transduction of cardiomyocytes and B cells from the mouse. The ultimate goal of the project is ‘to create virtual cells’ of these particular cell types (http://www.nih.gov/news/pr/sep2000/nigms-05.htm). In addition, the Department of Energy (DOE) has set up a Microbial Cell Project (MCP), of which the
ultimate goal is to construct a ‘virtual microbe’ in silico (http://www.microbialcellproject.org), and the Special
Interest Group of Biological Simulation (SIGSIM) has been formed under the International Society for
Computational Biology (ISCB) for the purpose of effective communication among scientists working on
large scale cell modeling (http://www.iscb.org).

Summary
The cell is never ‘conquered’ until its total behavior is understood and the total behavior of the cell is never understood until it is modeled and simulated. Whole-cell modeling, which was thought intractable until recently, has suddenly become realistic. There is no doubt that in silico construction of complex living cells is an exciting scientific challenge and we are just opening the door to this new area of biological research in the 21st century.

No comments: