Molecular biology research has generated unprecedented amounts of information about the cell. One of the largest molecular databases called Kyoto Encyclopedia of Genes and Genomes (KEGG) stores 9,736 reactions and 17,321 compounds.1 This information provides rich opportunities for application in synthetic biology, a field that engineers cells to produce large quantities of valuable compounds. Despite its strengths, synthetic biology can still be implemented more systematically and efficiently. To integrate all the cell’s reactions into a coherent picture, researchers have developed computational models of the cell’s biochemical pathways. These models ultimately aim to simulate the pathways of a real cell so that researchers can isolate the set of reactions that produce a compound of interest.
To express all the cell’s compounds and reactions, research on metabolic networks has utilized a concept called a graph. A graph consists of two primary structures: nodes, each representing a single compound or reaction, and edges, which connect related compound and reaction nodes. For example, as shown in Figure 1, the cell’s compounds can be treated as nodes, and the reactions that transform the compounds can be treated as the edges. By simplifying chemicals into symbolic representations, graphs can analyze networks relatively quickly and with minimal computational memory. This low computational cost enables analysis not only of specific pathways but also of the entire cell.
While graphs provide intuitive symbols for the reactions in metabolic networks, they were not arbitrarily invented. Current metabolic network graphs are derived from a well-studied mathematical field. Euler created graph theory in 1735, and the theorems discovered since then have enabled different methods for solving a number of mathematical problems.1 Metabolic networks research depends specifically on shortest-path algorithms, which search for the most efficient ways to reach a target node from a starting node. Shortest-path algorithms accomplish several tasks that simplify analysis of metabolic networks. They constrain the output to a finite number of pathways, and they enable output of biologically realistic pathways, which evolve to conserve energy and tend to minimize the number of intermediate compounds. Excluding convoluted pathways means avoiding unreasonably complicated and costly production methods.
However, current shortest-path algorithms must be extensively modified to generate meaningful results for metabolic networks. In a simplistic application of the shortestpath algorithm, the only parameter is the distance itself between two nodes, that is, the literal shortness of the path. Such simplicity generates multiple pathways that are biochemically impossible and in fact make little sense. This problem can be illustrated by glycolysis, a nine-step reaction pathway in the metabolism of glucose. During glycolysis, ATP, a small molecule that provides energy for many cellular reactions, is required to prime intermediates. ATP is generated in the following overall reaction: glucose + 2 NAD+ + 2 ADP + 2 Pi → 2 pyruvate + 2 ATP + 2 NADH. A simplistic graphical algorithm would suggest glucose is converted directly into ATP in a short process as indicated by the overall reaction equation. The reality is that glycolysis is a nine-step process with a vast number of enzymes, cofactors, allosteric regulators, covalent regulators, and environmental conditions, all of which must be encoded into the algorithm. The challenge in synthetic biology is considering all of these factors and more for the gamut of simultaneous reactions ongoing in a cell.
Research has produced modifications to the simplistic shortest-path algorithm to better model biological reality. Croes et al. reduced the influence of currency metabolites by constructing a weighted graph: metabolites that had many connections throughout the graph were weighted with a greater cost.2 The algorithm, searching for pathways with least total cost, would avoid pathways that incorporated costly component metabolites. This approach correctly replicated 80% of a test set of 160 metabolic pathways known to exist in cells, a noticeable improvement over the unweighted graph.
Several years later, although the weighting scheme of Croes et al. had considerable success, Blum and Kohlbacher created an algorithm that combined weighting with a systematic atom-tracking algorithm.3 The researchers mapped the correspondence of atoms between every substrate and product, recording which atoms are conserved in a chemical transformation. The algorithm deleted pathways containing reactions that did not conserve a minimum number of carbon atoms during the transformation. More so than a simple weighting scheme, atomtracking directly targeted pathways such as the glucose to ATP to ADP to pyruvate pathway, which structurally cannot occur. When tested, this new algorithm replicated actual biological pathways with more sensitivity and specificity than those using atom-tracking or weighting techniques alone.
Metabolic pathway algorithms received yet another improvement through modifications that enabled them identify branching pathways. Rather than proceeding linearly from the first to the final compound, pathways often split and converge again during intermediate steps. Pitkanen et al. introduced their branched-path algorithm Retrace, which also incorporated atom-tracking data.4 Heath, Kavraki, and Bennett later utilized atom-tracking data to create algorithms with improved search time. These branched algorithms reproduced the pathways leading toward several antibiotic compounds such as penicillin, cephalosporin, and streptomycin.5
Improvements in computational models will not merely replicate a cell’s biochemistry. By generating feasible alternative pathways, algorithms should predict undiscovered reactions that the cell could perform. For this reason, graphical algorithms, after constructing a skeleton of a cell’s metabolites, should integrate methods that account for biochemical properties. Constraint-based modelling is an alternative approach to metabolic networks research that ensures that the necessary reactants for a pathway are present in the correct proportions. Such models enable researchers to test how removing an enzyme or regulating a gene can impact the quantity of the desired compound. However, unlike graphical methods, the computational complexity of constraint-based modelling gives it a limited scale. Future research would focus on incorporating more biochemical properties into graphical methods such as atom-tracking, simplifying the constraint-based methods, or integrating the benefits of the two approaches into a comprehensive model.
Although still incomplete, the development of a fully effective computational model to guide the cellular engineering process will have critical implications. For example, the process of drug target identification to molecule optimization to approval currently takes 10 to 15 years to reach the market.6 Computational models that fully emulate a real cell will make synthetic biology rapid and systematic, accelerating the discovery and testing of the important compounds with important medical applications. Evidently, the integration of biology with mathematics will be critical to the future advancement of synthetic biology.
- Graph Theory. KEGG: Kyoto Encyclopedia of Genes and Genomes. http://www.britannica. com/EBchecked/topic/242012/graph-theory (accessed Oct. 31, 2014).
- Croes, D. et al. J. Mol. Bio. 2006, 356, 222–236.
- Blum, T.; Kohlbacher, O. J. Comp. Bio. 2008, 15, 565-576.
- Pitkänen, E. et al. BMC Syst. Biol. 2009, 3, doi:10.1186/1752-0509-3-103.
- Heath, A. P. Computational discovery and analysis of metabolic pathways. Doctoral Thesis, Rice University. 2010
- Drug Discovery and Development. http:// www.phrma.org/sites/default/files/pdf/rd_ brochure_022307.pdf (accessed Feb. 14, 2015).