Viewing entries tagged

The Reading Process: How Essential are Letters?


The Reading Process: How Essential are Letters?

Reading is such a basic, yet vital, component of our lives. Without the ability to read, we would be unable to comprehend a street sign telling us to stop, a crucial headline in the daily news, or an email telling us that the class we hate the most has been cancelled. Unfortunately, there are people whose ability to read is either impaired or entirely nonexistent. Much research has been done on the reading process and how it is affected by brain impairment; at Rice University, Dr. Simon Fischer-Baum and his team are currently studying the reading deficiencies of stroke patients. Before examining a special case of someone with a reading deficit, an understanding of the fundamentals of reading is necessary.

As English speakers, we might assume the reading process starts with the letters themselves. After all, children are commonly taught to identify each individual letter in the word and its sound. Next, the individual strings the individual sounds together to pronounce the word. Finally, once the words have been identified and pronounced, the person refers to his or her database of words and finds the meaning of the word being read.

While letters are the smallest tangible unit of the words being read, they actually depend on an even more basic concept: Abstract Letter Identities (ALIs). ALIs are representations of letters that allow a person to distinguish between different cases of the same letter, identify letters regardless of font, and know what sound the letter makes. It would appear that the ability to read is entirely contingent on one’s knowledge of these letter identities. However, certain scenarios indicate that this is not entirely true, raising questions about how much influence ALIs have on reading ability.

Dr. Fischer-Baum’s lab is currently exploring one such scenario involving a patient named C. H. This patient suffered from a stroke a few years ago and, as a result, has a severely impaired capacity for reading. Dr. Fischer-Baum and David Kajander, a member of the research staff, have given C. H. tasks in which he reads words directly from a list, identifies words being spelled to him, and spells words that are spoken to him. However, his case is especially interesting because he processes individual letters with difficulty (for example, matching lowercase letters with their uppercase counterparts), yet he can still read to a limited extent. This presents strong evidence against the importance of ALIs in reading because it contradicts the notion that we must have some knowledge of ALIs to have any reading ability at all. It has become apparent that C. H. is using a method of reading that is not based on ALIs.

There are several methods of reading that C. H. might be using. He could be memorizing the shapes of words he encounters and mapping those shapes onto the stimuli presented to him, a process called reading by contour. If this were the case, then he should have a limited ability to read capital letters since they are all the same height and width. C. H. could also utilize partial ALI information and making an educated guess about the rest of the word. If that were true, then he should be very good at reading uncommon words since there are fewer words that share that letter sequence.

In order to pursue this hypothesis, Dr. Fischer-Baum’s lab gave C. H. a task derived from a paper by Dr. David Howard. Published in 1987, the paper describes a patient, T. M., who shows reading deficiencies that are strikingly similar to those of C. H.1 A new series of reading tasks and lexical decision tasks from this paper required C. H. to determine whether or not a stimulus is a real word. For the reading tasks, a total of 100 stimuli, 80 words, and 20 non-words were used, all varying in length, frequency, and ease of conjuring a mental image of the stimulus. For the lexical decision tasks, 240 stimuli, 120 words, and 120 non-words were used, all varying in frequency, ease of forming a mental picture, and neighborhood density (the number of words that can be created by changing one letter in the original word). Additionally, each of the word lists was presented to C. H. in each of the following formats: vertical, lowercase, alternating case, all caps, and plus signs in between the letters. These criteria were used to create the word lists, which were then presented to C. H. in order to determine which factors were influencing his reading.

After the tasks were completed and the data was collected, C. H.’s results were organized by presentation style and stimuli characteristics. For reading tasks, he scored best overall on stimuli in the lowercase presentation style (30% correct) and worst overall on stimuli in the plus sign presentation style (9% correct). Second worst was his performance on the vertical presentation style (21% correct). For the lexical decision tasks, we saw that C. H. did best on stimuli in the all capital letter presentation style (79.58% correct) and worst on stimuli in the vertical presentation style (64.17% correct), although his second worst performance came in the plus sign presentation style (65% correct). Across both the reading and lexical decision tasks, he scored higher on stimuli that were more frequent, shorter in length, and easier to visualize. In the lexical decision tasks, he scored higher on low-neighborhood density items than high-neighborhood density items.

These results lead us to several crucial conclusions. First, C. H. clearly has a problem with reading words that contain interrupters, as evidenced by his poor performance with reading the plus sign words. Second, C. H. is not using contour information to read; if he were, then his worst performances should have come on the all caps reading tasks, since capital letters do not have any specific contour. Evidence suggests he is indeed using a partial guessing strategy to read because he performed better on low-neighborhood density words than on high-neighborhood density words. These conclusions are significant because they suggest further tests for C. H. More importantly, these conclusions could be especially helpful for people suffering similar reading deficits. For example, presenting information using short, common, and non-abstract words could increase the number of words these people can successfully read, increasing the chance of them interpreting the information correctly. Dr. Fischer-Baum’s lab plans to perform further tasks with C. H. in order to assess his capacity for reading in context.


  1. Howard, D. Reading Without Letters; The Cognitive Neuropsychology of Language; Lawrence Erlbaum; 1987; pp 27-58.


An Overview of the CRISPR Cas9 Genome Editing System


An Overview of the CRISPR Cas9 Genome Editing System


The clustered regularly interspaced short palindromic repeats (CRISPR) associated sequences (Cas) system is a prokaryotic acquired immunity against viral and plasmid invasion. The CRISPR Cas9 system is highly conserved throughout bacteria and archaea. Recently, CRISPR/Cas has been utilized to edit endogenous genomes in eukaryotic species. In certain contexts, it has proven invaluable for in vitro and in vivo modeling. Currently, CRISPR genome editing boasts unparalleled efficiency, specificity, and cost compared to other genome editing tools, including transcription activator-like effector nucleases (TALENs) and zinc finger nucleases (ZFNs). This review discusses the background theory of CRISPR and reports novel approaches to genome editing with the CRISPR system.


CRISPR as a prokaryotic adaptive immune system

CRISPR was originally discovered in bacteria1 and is now known to be present in many other prokaryotic species.2,3 CRISPR systems in bacteria have been categorized into three types, with Type II as the most widely found. The essential components of a Type II CRISPR System located within a bacterial genome include the CRISPR array and a Cas9 nuclease. A third component of the Type II system is the protospacer adjacent motif on the target/foreign DNA. The CRISPR array is composed of clusters of short DNA repeats interspaced with DNA spacer sequences.4 These spacer sequences are the remnants of foreign genetic material from previous invaders and are utilized to identify future invaders. Upon foreign invasion, the spacer sequences are transcribed into pre-crisprRNAs (pre-crRNAs), which are further processed into mature crRNAs. These crRNAs, usually 20 base pairs in length, play a crucial role in the specificity of CRISPR/Cas. Upstream of the CRISPR array in the bacterial genome is the gene coding for transactivating crisprRNA (tracrRNA). tracrRNA provides two essential functions: binding to mature crRNA and providing structural stability as a scaffold within the Cas9 enzyme.5

Post-transcriptional processing allows the tracrRNA and crRNA to fuse together and become embedded within the Cas9 enzyme. Cas9 is a nuclease with two active sites that each cleaves one strand of DNA on its phosphodiester backbone. The embedded crRNA allows Cas9 to recognize and bind to specific protospacer target sequences in foreign DNA from viral infections or horizontal gene transfers. The crRNA and the complement of the protospacer are brought together through Watson-Crick base pairing. Before the Cas9 nuclease cleaves the foreign double-stranded DNA (dsDNA), it must recognize a protospacer adjacent motif (PAM), a trinucleotide sequence. The PAM sequence is usually in the form 5’-NGG-3’ (where N is any nucleotide) and is located directly upstream of the protospacer but not within it. Once the PAM trinucleotide is recognized, Cas9 creates a double-stranded breakage three nucleotides downstream of the PAM in the foreign DNA. The cleaved foreign DNA will not be transcribed properly and will eventually be degraded.5 By evolving to target and degrade a range of foreign DNA and RNA with CRISPR/Cas, bacteria have provided themselves with a remarkably broad immune defense.6

CRISPR Cas9 as an RNA-guided genome editing tool

The prokaryotic CRISPR/Cas9 system has been reconstituted in eukaryotic systems to create new possibilities for the editing of endogenous genomes. To achieve this seminal transition, virally-derived spacer sequences in bacterial CRISPR arrays are replaced with 20 base pair sequences identical to targeting sequences in eukaryotic genomes. These spacer sgRNAsequences are transcribed into guide RNA (gRNA), which functions analogously to crRNA by targeting specific eukaryotic DNA sequences of interest. The DNA coding for the tracrRNA is still found upstream of the CRISPR array. The gRNA and tracrRNA are fused together to form a single guide RNA (sgRNA) by adding a hairpin loop to their duplexing site. The complex is then inserted into the Cas9 nuclease. Within Cas9, the tracrRNA (3’ end of sgRNA) serves as a scaffold while the gRNA (5’ end of sgRNA) functions in targeting the eukaryotic DNA sequence by Watson-Crick base pairing with the complement of the protospacer (Fig. 1). As in bacterial CRISPR/Cas systems, a PAM sequence located immediately upstream of the protospacer must be recognized by the CRISPR/Cas9 complex before double-stranded cleavage occurs.5,7 Once the sequence is recognized, the Cas9 nuclease creates a double-stranded break three nucleotides downstream to the PAM’s location in the Eukaryotic DNA of interest (Fig. 1). The PAM is the main restriction on the targeting space of Cas9. Since the PAM is required to be immediately upstream of the protospacer, it is theoretically possible to replace the 20 base pair gRNA in order to target other DNA sequences near the PAM.5,7

Once the DNA is cut, the cell's repair mechanisms are leveraged to knockdown a gene, or insert a new oligonucleotide into the newly formed gap. The two main pathways of double-stranded DNA lesion repair associated with CRISPR genome editing are non-homologous end joining (NHEJ) and homology directed repair (HDR). NHEJ is mainly involved with gene silencing. It introduces a large number of insertion/deletion mutations, which manifest as premature stop codons, that effectively silence the gene of interest. HDR is mainly used for gene editing. By providing a DNA template in the form of a plasmid or a single-stranded oligonucleotide (ssODN), HDR can easily introduce desired mutations in the cleaved DNA.5

The beauty of the CRISPR system is its simplicity. It is comprised of a single effector nuclease and a duplex of RNA. The endogenous eukaryotic DNA can be targeted as long as it is in proximity to a PAM. The goal of this system is to induce a mutation, and the CRISPR Cas9 complex will cut at the site repeatedly until a mutation occurs. When a mutation does occur, the site will no longer be recognized by the complex and cleavage will cease.

Optimization and specificity of CRISPR/Cas systems

If CRISPR systems are to be widely adopted in research or clinical applications, concerns regarding off-target effects must be addressed. On average, this system has a target every eight bases in the human genome. Thus, virtually every gRNA has the potential for unwanted off-target activity. Current research emphasizes techniques to improve specificity, including crRNA modification, transfection optimization, and a Cas9 nickase mutation.

The gRNA can be modified to minimize its off-target effects while preserving its ability to target sequences of interest. Unspecific gRNA can be optimized by inserting single-base substitutions that enhance its ability to bind to target sequences in a position and base-dependent manner. Libraries of mutated genes containing all possible base substitutions along the gRNA have been generated to examine the specificity of gRNA and enzymatic activity of Cas9. It is important to note that if mutations occur near the PAM, Cas9 nucleases do not initiate cleavage. Targeting specificity and enzymatic activity are not affected as strongly by base substitutions on the 5’ end of gRNA. This leads to the conclusion that the main contribution to specificity is found within the first ten bases after the PAM on the 3’ end of gRNA.5

The apparent differential specificity of the Cas9 gRNA guide sequence can be quantified by an open source online tool ( This tool identifies all possible gRNA segments that target a particular DNA sequence. Using a data-driven algorithm, the program scores each viable gRNA segment depending on its predicted specificity in relation to the genome of interest.

Depending on the redundancy of the DNA target sequence, scoring and mutating gRNA might not provide sufficient reduction of off-target activity. Increasing concentrations of CRISPR plasmids upon transfection can provide a modest five to seven fold increase in on-target activity, but a much more specific system is desirable for most research and clinical applications. Transforming Cas9 from a nuclease to a nickase enzyme yields the desired specificity.5 Cas9 has two catalytic domains, each of which nicks a DNA strand. By inactivating one of those domains via a D10A mutation, Cas9 is changed from a nuclease to a nickase.

Two Cas9 nickases (and their respective gRNAs) are required to nick complementary DNA strands simultaneously. This technique, called multiplexing, mimics a double-stranded break by inducing single-stranded breaks in close proximity to one another. Since single-stranded breaks are repaired with a higher fidelity than double-stranded breaks, off-target effects caused by improper cleavage can be mitigated, leaving the majority of breaks at the sequence of interest. The two nickases should be offset 10—30 base pairs from each other.5 Multiplex nicking offers on-target modifications comparable to the wild type Cas9, while dramatically reducing off-target modifications (1000—1500 fold).5


CRISPR/Cas9 systems have emerged as the newest genome engineering tool and have quickly been applied in in vitro and in vivo research applications. However, before these systems can be used in clinical applications, off-target effects must be controlled. In spite of its current shortcomings, CRISPR has proven invaluable to researchers conducting high-throughput studies of the biological function and relevance of specific genes. CRISPR Cas9 genome editing provides a rapid procedure for the functional study of mutations of interest in vitro and in vivo. Tumor suppressor genes can be knocked out, and oncogenes with specific mutations can be created via NHEJ and HDR, respectively. The novel cell lines and mouse models that have been created by CRISPR technologies have thus far galvanized translational research by enabling more perspectives of studying the genetic foundation of diseases.


  1. Ishino, Y. et al. J Bacteriol. 1987, 169, 5429–5433.
  2. Mojica, F.J. et al. Mol Microbiol. 1995, 17, 85–93.
  3. Masepohl, B. et al. Biophys Acta. 1996, 1307, 26–30.
  4. Mojica, F.J. et al. Mol Microbiol. 2000, 36, 244–246.
  5. Cong, L. et al. Science. 2013, 6121, 819–823.
  6. Horvath, P. et al. Science. 2010, 327, 167.
  7. Ran, F.A. et al. Nat. Protoc. 2013, 8, 2281–2308.


Quantifying Cellular Processes


Quantifying Cellular Processes

Molecular biology research has generated unprecedented amounts of information about the cell. One of the largest molecular databases called Kyoto Encyclopedia of Genes and Genomes (KEGG) stores 9,736 reactions and 17,321 compounds.1 This information provides rich opportunities for application in synthetic biology, a field that engineers cells to produce large quantities of valuable compounds. Despite its strengths, synthetic biology can still be implemented more systematically and efficiently. To integrate all the cell’s reactions into a coherent picture, researchers have developed computational models of the cell’s biochemical pathways. These models ultimately aim to simulate the pathways of a real cell so that researchers can isolate the set of reactions that produce a compound of interest.

To express all the cell’s compounds and reactions, research on metabolic networks has utilized a concept called a graph. A graph consists of two primary structures: nodes, each representing a single compound or reaction, and edges, which connect related compound and reaction nodes. For example, as shown in Figure 1, the cell’s compounds can be treated as nodes, and the reactions that transform the compounds can be treated as the edges. By simplifying chemicals into symbolic representations, graphs can analyze networks relatively quickly and with minimal computational memory. This low computational cost enables analysis not only of specific pathways but also of the entire cell.

While graphs provide intuitive symbols for the reactions in metabolic networks, they were not arbitrarily invented. Current metabolic network graphs are derived from a well-studied mathematical field. Euler created graph theory in 1735, and the theorems discovered since then have enabled different methods for solving a number of mathematical problems.1 Metabolic networks research depends specifically on shortest-path algorithms, which search for the most efficient ways to reach a target node from a starting node. Shortest-path algorithms accomplish several tasks that simplify analysis of metabolic networks. They constrain the output to a finite number of pathways, and they enable output of biologically realistic pathways, which evolve to conserve energy and tend to minimize the number of intermediate compounds. Excluding convoluted pathways means avoiding unreasonably complicated and costly production methods.

However, current shortest-path algorithms must be extensively modified to generate meaningful results for metabolic networks. In a simplistic application of the shortestpath algorithm, the only parameter is the distance itself between two nodes, that is, the literal shortness of the path. Such simplicity generates multiple pathways that are biochemically impossible and in fact make little sense. This problem can be illustrated by glycolysis, a nine-step reaction pathway in the metabolism of glucose. During glycolysis, ATP, a small molecule that provides energy for many cellular reactions, is required to prime intermediates. ATP is generated in the following overall reaction: glucose + 2 NAD+ + 2 ADP + 2 Pi → 2 pyruvate + 2 ATP + 2 NADH. A simplistic graphical algorithm would suggest glucose is converted directly into ATP in a short process as indicated by the overall reaction equation. The reality is that glycolysis is a nine-step process with a vast number of enzymes, cofactors, allosteric regulators, covalent regulators, and environmental conditions, all of which must be encoded into the algorithm. The challenge in synthetic biology is considering all of these factors and more for the gamut of simultaneous reactions ongoing in a cell.

Research has produced modifications to the simplistic shortest-path algorithm to better model biological reality. Croes et al. reduced the influence of currency metabolites by constructing a weighted graph: metabolites that had many connections throughout the graph were weighted with a greater cost.2 The algorithm, searching for pathways with least total cost, would avoid pathways that incorporated costly component metabolites. This approach correctly replicated 80% of a test set of 160 metabolic pathways known to exist in cells, a noticeable improvement over the unweighted graph.

Several years later, although the weighting scheme of Croes et al. had considerable success, Blum and Kohlbacher created an algorithm that combined weighting with a systematic atom-tracking algorithm.3 The researchers mapped the correspondence of atoms between every substrate and product, recording which atoms are conserved in a chemical transformation. The algorithm deleted pathways containing reactions that did not conserve a minimum number of carbon atoms during the transformation. More so than a simple weighting scheme, atomtracking directly targeted pathways such as the glucose to ATP to ADP to pyruvate pathway, which structurally cannot occur. When tested, this new algorithm replicated actual biological pathways with more sensitivity and specificity than those using atom-tracking or weighting techniques alone.

Metabolic pathway algorithms received yet another improvement through modifications that enabled them identify branching pathways. Rather than proceeding linearly from the first to the final compound, pathways often split and converge again during intermediate steps. Pitkanen et al. introduced their branched-path algorithm Retrace, which also incorporated atom-tracking data.4 Heath, Kavraki, and Bennett later utilized atom-tracking data to create algorithms with improved search time. These branched algorithms reproduced the pathways leading toward several antibiotic compounds such as penicillin, cephalosporin, and streptomycin.5

Improvements in computational models will not merely replicate a cell’s biochemistry. By generating feasible alternative pathways, algorithms should predict undiscovered reactions that the cell could perform. For this reason, graphical algorithms, after constructing a skeleton of a cell’s metabolites, should integrate methods that account for biochemical properties. Constraint-based modelling is an alternative approach to metabolic networks research that ensures that the necessary reactants for a pathway are present in the correct proportions. Such models enable researchers to test how removing an enzyme or regulating a gene can impact the quantity of the desired compound. However, unlike graphical methods, the computational complexity of constraint-based modelling gives it a limited scale. Future research would focus on incorporating more biochemical properties into graphical methods such as atom-tracking, simplifying the constraint-based methods, or integrating the benefits of the two approaches into a comprehensive model.

Although still incomplete, the development of a fully effective computational model to guide the cellular engineering process will have critical implications. For example, the process of drug target identification to molecule optimization to approval currently takes 10 to 15 years to reach the market.6 Computational models that fully emulate a real cell will make synthetic biology rapid and systematic, accelerating the discovery and testing of the important compounds with important medical applications. Evidently, the integration of biology with mathematics will be critical to the future advancement of synthetic biology.


  1. Graph Theory. KEGG: Kyoto Encyclopedia of Genes and Genomes. http://www.britannica. com/EBchecked/topic/242012/graph-theory (accessed Oct. 31, 2014). 
  2. Croes, D. et al. J. Mol. Bio. 2006, 356, 222–236. 
  3. Blum, T.; Kohlbacher, O. J. Comp. Bio. 2008, 15, 565-576. 
  4. Pitkänen, E. et al. BMC Syst. Biol. 2009, 3, doi:10.1186/1752-0509-3-103. 
  5. Heath, A. P. Computational discovery and analysis of metabolic pathways. Doctoral Thesis, Rice University. 2010 
  6. Drug Discovery and Development. http:// brochure_022307.pdf (accessed Feb. 14, 2015).