Chapter 17 Introduction to computational cognitive modeling

Until this point in the book, we have been discussing models that specify a generative process for the observed data. This generative process could be as simple as \(Y \sim Normal(\mu,\sigma)\) or it could be an elaborate hierarchical model that incorporates multiple variance components. Usually, in these kinds of models, what is of interest is a parameter that represents a so-called “effect” of interest. Examples that we encountered in the present book are: the effect of word frequency on reading time; effect of relative clause type on reading time; and the effect of attentional load on pupil size.

17.1 What characterizes a computational cognitive model?

One characteristic common to the models seen so far is that no underlying latent cognitive process is specified that elaborates on the generative process that produces the observed dependent variable. For example, in a logistic regression, the correct or incorrect response (say, to a yes/no comprehension question) could be the result of a cascade of alternative steps taken unconsciously (or consciously) by the subject as they generate a response. To make this concrete, a subject could give a yes/no response to a comprehension question after seeing a target sentence by probabilistically processing the sentence deeply or superficially; once the deep/shallow path is taken, the subject might end up giving either a correct or incorrect answer (the latter by misinterpreting the meaning of the sentence). What is observed in the data is a correct or incorrect response, but the reason that that particular response was given could be underlyingly due to deep or superficial processing.

In this book, we use the phrase “computational cognitive modeling” to refer to generative models that are implemented computationally (as a computer program), and that specify latent (unobserved and, usually, unobservable) processes that result in a behavioral or other kind of response. Cognitive modeling as presented in this section goes beyond estimates of “effects” in the sense discussed above; the principal goal is to explain and understand how a particular cognitive process unfolds, often at an individual trial level and incorporating assumptions about individual-level variation in the way the assumed latent processes are deployed.

Unpacking the latent cognitive process that produces a response has a long history in cognitive science. For example, in sentence processing research, early models like the classic garden-path model (Frazier 1979) seek to spell out the steps that occur when the human sentence processing system (the parser) attempts to build syntactic structure incrementally when faced with a temporarily ambiguous sentence. To make this concrete, consider the sentence: “While Mary bathed the baby happily played in the living room.” Compared to the unambiguous baseline sentence “While Mary bathed, the baby happily played in the living room,” the garden-path model assumes that in the ambiguous sentence the parser initially connects the noun phrase “the baby” as a grammatical object of the verb “bathed.” It is only when the verb “played” is encountered that the parser reassigns “the baby” as a grammatical subject of the verb “played”, leading to the correct parse whereby Mary being the one who is bathing, and not that Mary is bathing the baby. This process of reassigning “the baby”’s grammatical role is computationally costly and is called reanalysis in sentence processing. The slowdown observed (e.g., in reading studies) at the verb “played” is often called the garden-path effect.

This kind of paper-pencil model implicitly posits an overly simplistic and deterministic parsing process: Although this is never spelled out in the garden-path model, there is no assumption that the parser could only probabilistically misparse the sentence when it encounters “the baby”; misparses are implicitly assumed to happen every single time such a temporarily ambiguous sentence is encountered. Such a model does not explicitly allow alternative parsing constraints to come into play; by contrast, a computational model allows the empirical consequences of multiple parsing constraints to be considered quantitatively (e.g., Jurafsky 1996; Paape and Vasishth 2022).

17.2 Some advantages of taking the latent-variable modeling approach

Although this kind of simple paper-pencil model is an excellent start towards modeling the latent process of sentence comprehension, just stopping with such a description has several disadvantages. First, no quantitative predictions can be derived; a corollary is that a slowdown at the verb “played” (due to reanalysis) of 10 ms or 500 ms would both be equally consistent with the predictions of the model—the model cannot say anything about how much time the reanalysis process would take. This is a problem for model evaluation, not least because overly large effect sizes observed in data could just be Type M errors and therefore very misleading (Gelman and Carlin 2014; Vasishth et al. 2018). Second, such paper-pencil models encourage an excessive (in fact, exclusive) focus on the average effect size (here, the garden-path effect); the variability among individuals (which would affect the standard error of the estimated effect from data) plays no role in determining whether the model’s prediction is consistent with the data. A third problem with such verbally stated models is that it is often not clear what the exact assumptions are. This makes it difficult to establish whether the model’s predictions are consistent with observed patterns in the data. As a concrete example of these hidden degrees of freedom, an influential theory in sentence processing is the Dependency Locality Theory or DLT (Gibson 2000). This theory explicitly assumes that interposing new discourse referents intervening between, say, a subject and a verb, will make it more difficult to complete the subject-verb dependency. The status of the intervening discourse referent is crucial for dependency completion cost to increase. A further descriptive elaboration of the model is presented in Warren and Gibson (2002); there, the authors argued that the cost of an intervening discourse referent should be graded according to its discourse accessibility. There was no quantitative implementation of what that implies empirically; for example, do old vs. new discourse referents eliminate dependency completion cost entirely, or is the cost attentuated by some amount (if so, how much?)? Interestingly, in later work, the discourse status of the intervener was ignored; in Gibson and Wu (2013), the intervening discourse referents are all old (previously introduced in an immediately preceding context), but they are assumed to have the same impact on dependency completion as new discourse referents, as described in Gibson (2000). Had the model been computationally implemented, the empirical impact of relaxing the model assumption would have had a quantitative impact on the predictions. This example also illustrates how easy it is to be misled by paper-pencil statements of theories.

Thus, the absence of quantitative predictions, the inability to quantitatively investigate individual-level variation, and hidden degrees of freedom in the model, are three major drawbacks of paper-pencil models. As Roberts and Pashler (2000) have discussed at length, a good fit of a model to data is not merely about the sign of the predicted effect being correct; the model must be able to commit a priori to the uncertainty of the predicted effect, and the estimated effect from the data and its uncertainty need to be compared to the predictions of the model. A good fit ideally requires a tightly constrained quantitative prediction derived from the model that is then validated by the comparing the prediction with unseen data; this point has also been eloquently made by the psychologist Meehl (1997). Meehl suggests that the model makes risky numerical predictions (by which he means tighly constrained quantitative predictions), which should then be compared with the observed effect size and its confidence interval—this is essentially the Roberts and Pashler (2000) criterion for a good fit.

For example, if a computational implementation of the garden-path model were to exist, one could have derived (through prior specifications on the parameters of the model) prior predictive distributions of the garden-path effect, and compared these predictions to the estimates of the effect from a statistical model fit to new, unseen data (for an example from psycholinguistics, see Vasishth and Engelmann 2022). Using parameteric variation, one could even investigate the implications of the model for individual-level differences (e.g., Yadav et al. 2022); such implications of models are impossible to derive unless the model is implemented computationally. In the absence of an implemented model, one is reduced to classifying subjects into groups (e.g., by working memory capacity measures) and investigating average group-level effects (e.g., Caplan and Waters 1999). This makes the question a binary one: are there individual differences or are there none? The right question to ask about individual differences is a quantitative one (Haaf and Rouder 2019).

17.3 Types of computational cognitive model

There are many different classes of computational cognitive model. For example, Newell (1990) pioneered a cognitive architectures approach, where a model of a particular cognitive process (like sentence processing) occurs within a broader computational framework that defines very general constraints on human information processing. Examples of cognitive architectures are SOAR (Laird 2019), the CAPS family of models (Just, Carpenter, and Varma 1999), and ACT-R (Anderson et al. 2004). Other approaches include connectionist models (e.g., McClelland and Rumelhart 1989), dynamical systems-based models (e.g., Port and Van Gelder 1995; Tabor and Tanenhaus 1999; Beer 2000; Rabe et al. 2021, 2024; Engbert et al. 2005).

In this book, we focus on Bayesian cognitive models (Lee and Wagenmakers 2014); these are distinct from models that assume that human cognitive processes involve Bayesian inference (e.g., Feldman 2017). The type of model we discuss here has the characteristic that the underlying generative process spells out the latent, probabilistically occurring sub-processes. The latent processes are spelled out by specifying a Bayesian model that allows different events to happen probabilistically in each trial. An example is multinomial processing tree models, which specify a sequence of possible latent sub-processes. Another example is a hierarchical finite mixture process which specifies that, in some proportion of trials, the observed response comes from one distribution, and in another proportion from a different distribution. A third example is the assumption that the observed response (e.g., reading time) is the result of an unobserved (latent) race process in the cognitive system. Probabilistic programming languages like Stan allow us to implement such latent process models, allowing for hierarchical structure (individual-level variability).

This part of the book introduces these three types of cognitive models using Stan. In many cases, a great deal of cognitive detail is sacrificed for tractability, but this is a characteristic shared by all computational models—by definition, a model is a simplification of the underlying process being modeled (McClelland 2009).

17.4 Summary

The main take-away from this section is that it is possible to specify an underlying generative process for the data that reflects theoretical assumptions in a particular research area. The gain is that: (i) the assumptions of the underlying theory, and their consequences, become transparent and explicit (Epstein 2008); (ii) one can derive quantitative predictions (along with the uncertainty of those predictions) which, as Roberts and Pashler (2000) point out, are vital for model evaluation; (iii) it becomes possible (at least in principle) to eliminate competing theoretical proposals through quantitative model comparison using benchmark data (Nicenboim and Vasishth 2018; Lissón et al. 2021; Lissón et al. 2023; Yadav et al. 2023); and (iv) the implications of models, for average behavior (e.g., Yadav et al. 2023) as well as for individual-level differences (e.g., Yadav et al. 2022) can be investigated.

17.5 Further reading

General textbooks on computational modeling for cognitive science are Busemeyer and Diederich (2010), and Farrell and Lewandowsky (2018). The textbook by Lee and Wagenmakers (2014) focuses on relatively simple computational cognitive models implemented in a Bayesian framework (using the BUGS language). A good free textbook on computational modeling for cognitive science is Blokpoel and Rooij (2021). The entire special issue (Lee 2011 a) on hierarchical Bayesian modeling in the Journal of Mathematical Psychology is highly recommended (in particular, see the article by Lee 2011 b). Wilson and Collins (2019) discuss good practices in the computational modeling of behavioral data using examples from reinforcement learning. Haines et al. (2020) discuss how generative models produce higher test-retest reliability and more theoretically informative parameter estimates than do traditional methods. For an overview of the different modeling approaches in cognitive science and the relationships between them, see McClelland (2009). Luce (1991) is a classic book that focuses on modeling response times.

References

Anderson, John R., Dan Bothell, Michael D. Byrne, Scott Douglass, Christian Lebiere, and Yulin Qin. 2004. “An Integrated Theory of the Mind.” Psychological Review 111 (4): 1036–60.

Beer, Randall D. 2000. “Dynamical Approaches to Cognitive Science.” Trends in Cognitive Sciences 4 (3): 91–99.

Blokpoel, Mark, and Iris van Rooij. 2021. Theoretical Modeling for Cognitive Science and Psychology. https://computationalcognitivescience.github.io/lovelace/.

Busemeyer, Jerome R, and Adele Diederich. 2010. Cognitive Modeling. Sage.

Caplan, David, and G. S. Waters. 1999. “Verbal Working Memory and Sentence Comprehension.” Behavioral and Brain Science 22: 77–94. https://doi.org/https://doi.org/10.1017/S0140525X99001788.

Engbert, Ralf, Antje Nuthmann, Eike M. Richter, and Reinhold Kliegl. 2005. “SWIFT: A Dynamical Model of Saccade Generation During Reading.” Psychological Review 112: 777–813. https://doi.org/https://doi.org/10.1037/0033-295X.112.4.777.

Epstein, Joshua M. 2008. “Why Model?” Journal of Artificial Societies and Social Simulation 11 (4): 12. https://www.jasss.org/11/4/12.html.

Farrell, Simon, and Stephan Lewandowsky. 2018. Computational Modeling of Cognition and Behavior. Cambridge University Press.

Feldman, Jacob. 2017. “What Are the ‘True’ Statistics of the Environment?” Cognitive Science 41 (7): 1871–1903. https://doi.org/https://doi.org/10.1111/cogs.12444.

Frazier, Lyn. 1979. “On Comprehending Sentences: Syntactic Parsing Strategies.” PhD thesis, Amherst: University of Massachusetts.

Gelman, Andrew, and John B. Carlin. 2014. “Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors.” Perspectives on Psychological Science 9 (6): 641–51. https://doi.org/https://doi.org/10.1177/1745691614551642.

Gibson, Edward. 2000. “Dependency Locality Theory: A Distance-Based Theory of Linguistic Complexity.” In Image, Language, Brain: Papers from the First Mind Articulation Project Symposium, edited by Alec Marantz, Yasushi Miyashita, and Wayne O’Neil. Cambridge, MA: MIT Press.

Gibson, Edward, and H.-H. Iris Wu. 2013. “Processing Chinese Relative Clauses in Context.” Language and Cognitive Processes 28 (1-2): 125–55. https://doi.org/https://doi.org/10.1080/01690965.2010.536656.

Haaf, Julia M., and Jeffrey N. Rouder. 2019. “Some Do and Some Don’t? Accounting for Variability of Individual Difference Structures.” Psychonomic Bulletin & Review 26 (3): 772–89.

Haines, Nathaniel, Peter D. Kvam, Louis H. Irving, Colin Smith, Theodore P. Beauchaine, Mark A. Pitt, Woo-Young Ahn, and Brandon M Turner. 2020. “Learning from the Reliability Paradox: How Theoretically Informed Generative Models Can Advance the Social, Behavioral, and Brain Sciences.” https://ccs-lab.github.io/pdfs/papers/haines2020_reliability.pdf.

Jurafsky, Daniel. 1996. “A Probabilistic Model of Lexical and Syntactic Access and Disambiguation.” Cognition 20: 137–94. https://doi.org/https://doi.org/10.1207/s15516709cog2002_1.

Just, Marcel Adam, Patricia A. Carpenter, and S. Varma. 1999. “Computational Modeling of High-Level Cognition and Brain Function.” Human Brain Mapping 8: 128–36. https://doi.org/https://doi.org/10.1002/(SICI)1097-0193(1999)8:2/3<128::AID-HBM10>3.0.CO;2-G.

Laird, John E. 2019. The Soar Cognitive Architecture. MIT Press.

Lee, Michael D., ed. 2011a. “Special Issue on Hierarchical Bayesian Models.” Journal of Mathematical Psychology 55 (1). https://www.sciencedirect.com/journal/journal-of-mathematical-psychology/vol/55/issue/1.

2011b. “How Cognitive Modeling Can Benefit from Hierarchical Bayesian Models.” Journal of Mathematical Psychology 55 (1): 1–7. https://doi.org/10.1016/j.jmp.2010.08.013.

Lee, Michael D., and Eric-Jan Wagenmakers. 2014. Bayesian Cognitive Modeling: A Practical Course. Cambridge University Press.

Lissón, Paula, Dario Paape, Dorothea Pregla, Frank Burchert, Nicole Stadie, and Shravan Vasishth. 2023. “Similarity-Based Interference in Sentence Comprehension in Aphasia: A Computational Evaluation of Two Models of Cue-Based Retrieval.” Computational Brain and Behavior. https://doi.org/10.1007/s42113-023-00168-3.

Lissón, Paula, Dorothea Pregla, Bruno Nicenboim, Dario Paape, Mick van het Nederend, Frank Burchert, Nicole Stadie, David Caplan, and Shravan Vasishth. 2021. “A Computational Evaluation of Two Models of Retrieval Processes in Sentence Processing in Aphasia.” Cognitive Science 45 (4): e12956. https://onlinelibrary.wiley.com/doi/full/10.1111/cogs.12956.

Luce, R. Duncan. 1991. Response Times: Their Role in Inferring Elementary Mental Organization. Oxford University Press.

McClelland, James L. 2009. “The Place of Modeling in Cognitive Science.” Topics in Cognitive Science 1 (1): 11–38. https://doi.org/10.1111/j.1756-8765.2008.01003.x.

McClelland, James L., and David E. Rumelhart. 1989. Explorations in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises. MIT Press.

Meehl, Paul E. 1997. “The Problem Is Epistemology, Not Statistics: Replace Significance Tests by Confidence Intervals and Quantify Accuracy of Risky Numerical Predictions.” In What If There Were No Significance Tests?, edited by L. L. Harlow, S. A. Mulaik, and J. H. Steiger. Mahwah, New Jersey: Erlbaum.

Newell, Allen. 1990. Unified Theories of Cognition. Cambridge: MA: Harvard University Press.

Nicenboim, Bruno, and Shravan Vasishth. 2018. “Models of Retrieval in Sentence Comprehension: A Computational Evaluation Using Bayesian Hierarchical Modeling.” Journal of Memory and Language 99: 1–34. https://doi.org/10.1016/j.jml.2017.08.004.

Paape, Dario, and Shravan Vasishth. 2022. “Estimating the True Cost of Garden-Pathing: A Computational Model of Latent Cognitive Processes.” Cognitive Science 46 (8): e13186. https://doi.org/https://doi.org/10.1111/cogs.13186.

Port, Robert F., and Timothy Van Gelder. 1995. Mind as Motion: Explorations in the Dynamics of Cognition. MIT Press.

Rabe, Maximilian M., Johan Chandra, André Krügel, Stefan A. Seelig, Shravan Vasishth, and Ralf Engbert. 2021. “A Bayesian Approach to Dynamical Modeling of Eye-Movement Control in Reading of Normal, Mirrored, and Scrambled Texts.” Psychological Review. https://doi.org/10.1037/rev0000268.

Rabe, Maximilian M., Dario Paape, Daniela Metzen, Shravan Vasishth, and Ralf Engbert. 2024. “SEAM: An Integrated Activation-Coupled Model of Sentence Processing and Eye Movements in Reading.” Journal of Memory and Language. https://doi.org/https://doi.org/10.1016/j.jml.2023.104496.

Roberts, Seth, and Harold Pashler. 2000. “How Persuasive Is a Good Fit? A Comment on Theory Testing.” Psychological Review 107 (2): 358–67. https://doi.org/https://doi.org/10.1037/0033-295X.107.2.358.

Tabor, Whitney, and Michael K. Tanenhaus. 1999. “Dynamical Models of Sentence Processing.” Cognitive Science 23 (4): 491–515.

Vasishth, Shravan, and Felix Engelmann. 2022. Sentence Comprehension as a Cognitive Process: A Computational Approach. Cambridge, UK: Cambridge University Press. https://books.google.de/books?id=6KZKzgEACAAJ.

Vasishth, Shravan, Daniela Mertzen, Lena A. Jäger, and Andrew Gelman. 2018. “The Statistical Significance Filter Leads to Overoptimistic Expectations of Replicability.” Journal of Memory and Language 103: 151–75. https://doi.org/https://doi.org/10.1016/j.jml.2018.07.004.

Warren, Tessa, and Edward Gibson. 2002. “The Influence of Referential Processing on Sentence Complexity.” Cognition 85: 79–112.

Wilson, Robert C., and Anne G. E. Collins. 2019. “Ten Simple Rules for the Computational Modeling of Behavioral Data.” Edited by Timothy E Behrens. eLife 8 (November): e49547. https://doi.org/10.7554/eLife.49547.

Yadav, Himanshu, Dario Paape, Garrett Smith, Brian W. Dillon, and Shravan Vasishth. 2022. “Individual Differences in Cue Weighting in Sentence Comprehension: An evaluation using Approximate Bayesian Computation.” Open Mind. https://doi.org/https://doi.org/10.1162/opmi_a_00052.

Yadav, Himanshu, Garrett Smith, Sebastian Reich, and Shravan Vasishth. 2023. “Number Feature Distortion Modulates Cue-Based Retrieval in Reading.” Journal of Memory and Language 129. https://doi.org/10.1016/j.jml.2022.104400.