Chapter 17 Introduction to computational cognitive modeling

Until this point in the book, we have been discussing models that specify a generative process for the observed data. This generative process could be as simple as \(Y \sim Normal(\mu,\sigma)\) or it could be an elaborate hierarchical model that incorporates multiple variance components. Usually, in these kinds of models, what is of interest is a parameter that represents a so-called “effect” of interest. Examples that we encountered in the present book are: the effect of word frequency on reading time; effect of relative clause type on reading time; and the effect of attentional load on pupil size.

One characteristic common to the models seen so far is that no underlying latent cognitive process is specified that elaborates on the generative process that produces the observed dependent variable. For example, in a logistic regression, the correct or incorrect response (say, to a yes/no comprehension question) could be the result of a cascade of alternative steps taken unconsciously (or consciously) by the subject as they generate a response. To make this concrete, a subject could give a yes/no response to a comprehension question after seeing a target sentence by probabilistically processing the sentence deeply or superficially; once the deep/shallow path is taken, the subject might end up giving either a correct or incorrect answer (the latter by misinterpreting the meaning of the sentence). What is observed in the data is a correct or incorrect response, but the reason that that particular response was given could be underlyingly due to deep or superficial processing.

In this book, we use the phrase “computational cognitive modeling” to refer to generative models that specify latent (unobserved and, usually, unobservable) processes that result in a behavioral or other kind of response. Cognitive modeling as presented in this section goes beyond estimates of “effects” in the sense discussed above; the principal goal is to explain and understand how a particular cognitive process unfolds.

Unpacking the latent cognitive process that produces a response has a long history in cognitive science. For example, in sentence processing research, early models like the classic garden-path model (Frazier 1979) seek to spell out the steps that occur when the human sentence processing system (the parser) attempts to build syntactic structure incrementally when faced with a temporarily ambiguous sentence. To make this concrete, consider the sentence: “While Mary bathed the baby happily played in the living room.” Compared to the unambiguous baseline sentence “While Mary bathed, the baby happily played in the living room,” the garden-path model assumes that in the ambiguous sentence the parser initially connects the noun phrase “the baby” as a grammatical object of the verb “bathed.” It is only when the verb “played” is encountered that the parser reassigns “the baby” as a grammatical subject of the verb “played”, leading to the correct parse whereby Mary being the one who is bathing, and not that Mary is bathing the baby. This process of reassigning “the baby”’s grammatical role is computationally costly and is called reanalysis in sentence processing. The slowdown observed (e.g., in reading studies) at the verb “played” is often called the garden-path effect.

This kind of paper-pencil model implicitly posits an overly simplistic and deterministic parsing process: Although this is never spelled out in the garden-path model, there is no assumption that the parser could only probabilistically misparse the sentence when it encounters “the baby”; misparse are implicitly assumed to happen every single time such a temporarily ambiguous sentence is encountered. Such a model does not explicitly allow alternative parsing constraints to come into play; by contrast, a computational model allows the empirical consequences of multiple parsing constraints to be considered quantitatively (e.g., Jurafsky 1996; Paape and Vasishth 2022).

Although this kind of simple paper-pencil model is an excellent start towards modeling the latent process of sentence comprehension, just stopping with such a description has several disadvantages. First, no quantitative predictions can be derived; a corollary is that a slowdown at the verb “played” (due to reanalysis) of 10 ms or 500 ms would both be equally consistent with the predictions of the model—the model cannot say anything about how much time the reanalysis process would take. This is a problem for model evaluation, not least because overly large effect sizes observed in data could just be Type M error and therefore very misleading (Gelman and Carlin 2014; Vasishth, Mertzen, Jäger, et al. 2018a). Second, such paper-pencil models encourage an excessive (in fact, exclusive) focus on the average effect size (here, the garden-path effect); the variability among individuals (which would affect the standard error of the estimated effect from data) plays no role in determining whether the model’s prediction is consistent with the data. Another problem with such verbally stated models is that it is often not clear what the exact assumptions are. This makes it difficult to establish whether the model’s predictions are consistent with observed patterns in the data.

The absence of quantitative predictions, and the inability to quantitatively investigate individual-level variation are two major drawbacks of paper-pencil models. As Roberts and Pashler (2000) have discussed at length, a good fit of a model to data is not merely about the sign of the predicted effect being correct; the model must be able to commit a priori to the uncertainty of the predicted effect, and the estimated effect from the data and its uncertainty need to be compared to the predictions of the model. A good fit requires a tightly constrained quantitative prediction derived from the model that is then validated by the comparing the prediction with the data; this point has also been eloquently made by the psychologist Meehl (1997). Meehl suggests that the model make risky numerical predictions (by which he means tighly constrained quantitative predictions), which should then be compared with the observed effect size and its confidence interval—this is essentially the Roberts and Pashler (2000) criterion for a good fit.

For example, if a computational implementation of the garden-path model were to exist, one could have derived (through prior specifications on the parameters of the model) prior predictive distributions of the garden-path effect, and compared these predictions to the estimates of the effect from a statistical model fit to the data (for an example from psycholinguistics, see Vasishth and Engelmann 2022). Using parameteric variation, one could even investigate the implications of the model for individual-level diffrences (e.g., Yadav, Paape, et al. 2022); such implications of models are impossible to derive unless the model is implemented computationally. In the absence of an implemented model, one is reduced to classifying subjects into groups (e.g., by working memory capacity measures) and investigating average group-level effects (e.g., Caplan and Waters 1999). This makes the question a binary one: are there individual differences or are there none? The right question about individual differences is a quantitative one (Haaf and Rouder 2019).

There are many different classes of computational cognitive model. For example, Newell (1990) pioneered a cognitive architectures approach, where a model of a particular cognitive process (like sentence processing) occurs within a broader computational framework that defines very general constraints on human information processing. Examples of cognitive architectures are SOAR (Laird 2019), the CAPS family of models (Just, Carpenter, and Varma 1999), and ACT-R (Anderson et al. 2004) . Other approaches include connectionist models (e.g., McClelland and Rumelhart 1989), dynamical systems-based models (e.g., Port and Van Gelder 1995; Tabor and Tanenhaus 1999; Beer 2000; Rabe et al. 2021).

In this book, we focus on Bayesian cognitive models (Lee and Wagenmakers 2014); these are distinct from models that assume that human cognitive processes involve Bayesian inference (e.g., Feldman 2017). The type of model we discuss here has the characteristic that the underlying generative process spells out the latent, probabilistically occurring sub-processes. The latent processes are spelled out by specifying a Bayesian model that allows different events to happen probabilistically in each trial. An example is multinomial processing tree models, which specify a sequence of possible latent sub-processes. Another example is a hierarchical finite mixture process which specifies that, in some proportion of trials, the observed response comes from one distribution, and in another proportion from a different distribution. A third example is the assumption that the observed response (e.g., reading time) is the result of an unobserved (latent) race process in the cognitive system. Probabilistic programming languages like Stan allow us to implement such latent process models, allowing for hierarchical structure (individual-level variability).

This part of the book introduces these three types of cognitive models using Stan. In many cases, a great deal of cognitive detail is sacrificed for tractability, but this is a characteristic shared by all computational models—by definition, a model is a simplification of the underlying process being modeled (James L. McClelland 2009b).

The broader lesson to learn from this section is that it is possible to specify an underlying generative process for the data that reflects theoretical assumptions in a particular research area. The gain is that: (i) the assumptions of the underlying theory, and their consequences, become transparent (Epstein 2008); (ii) one can derive quantitative predictions that, as Roberts and Pashler (2000) point out, are vital for model evaluation; (iii) it becomes possible (at least in principle) to eliminate competing theoretical proposals through quantitative model comparison using benchmark data (Nicenboim and Vasishth 2018; Lissón et al. 2021; Lissón et al. 2022; Yadav, Smith, et al. 2022); and (iv) the implications of models for individual-level differences can be investigated (Yadav, Paape, et al. 2022).

17.1 Further reading

General textbooks on computational modeling for cognitive science are Busemeyer and Diederich (2010), and Farrell and Lewandowsky (2018). The textbook by Lee and Wagenmakers (2014) focuses on relatively simple computational cognitive models implemented in a Bayesian framework (using the BUGS language). A good free textbook on computational modeling for cognitive science is Blokpoel and Rooij (2021).
The entire special issue (Lee 2011a) on hierarchical Bayesian modeling in the Journal of Mathematical Psychology is highly recommended (in particular, see the article by Lee 2011b). Wilson and Collins (2019) discuss good practices in the computational modeling of behavioral data using examples from reinforcement learning. Haines et al. (2020) discuss how generative models produce higher test-retest reliability and more theoretically informative parameter estimates than do traditional methods. For an overview of the different modeling approaches in cognitive science and the relationships between them, see James L. McClelland (2009b). Luce (1991) is a classic book that focuses on modeling response times.

References

Anderson, John R., Dan Bothell, Michael D. Byrne, Scott Douglass, Christian Lebiere, and Yulin Qin. 2004. “An Integrated Theory of the Mind.” Psychological Review 111 (4): 1036–60.

Beer, Randall D. 2000. “Dynamical Approaches to Cognitive Science.” Trends in Cognitive Sciences 4 (3). Elsevier: 91–99.

Blokpoel, Mark, and Iris van Rooij. 2021. Theoretical Modeling for Cognitive Science and Psychology.

Busemeyer, Jerome R, and Adele Diederich. 2010. Cognitive Modeling. Sage.

Caplan, D., and G. S. Waters. 1999. “Verbal Working Memory and Sentence Comprehension.” Behavioral and Brain Science 22: 77–94.

Epstein, Joshua M. 2008. “Why Model?” Journal of Artificial Societies and Social Simulation 11 (4): 12.

Farrell, Simon, and Stephan Lewandowsky. 2018. Computational Modeling of Cognition and Behavior. Cambridge University Press.

Feldman, Jacob. 2017. “What Are the ‘True’ Statistics of the Environment?” Cognitive Science 41 (7). Wiley Online Library: 1871–1903.

Frazier, Lyn. 1979. “On Comprehending Sentences: Syntactic Parsing Strategies.” PhD thesis, Amherst: University of Massachusetts.

Gelman, Andrew, and John B. Carlin. 2014. “Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors.” Perspectives on Psychological Science 9 (6). SAGE Publications: 641–51.

Haaf, Julia M., and Jeffrey N. Rouder. 2019. “Some Do and Some Don’t? Accounting for Variability of Individual Difference Structures.” Psychonomic Bulletin & Review 26 (3). Springer: 772–89.

Haines, Nathaniel, Peter D Kvam, Louis H Irving, Colin Smith, Theodore P Beauchaine, Mark A Pitt, Woo-Young Ahn, and Brandon Turner. 2020. “Learning from the Reliability Paradox: How Theoretically Informed Generative Models Can Advance the Social, Behavioral, and Brain Sciences.” Unpublished. PsyArXiv.

Jurafsky, Daniel. 1996. “A Probabilistic Model of Lexical and Syntactic Access and Disambiguation.” Cognition 20: 137–94.

Just, M.A., P.A. Carpenter, and S. Varma. 1999. “Computational Modeling of High-Level Cognition and Brain Function.” Human Brain Mapping 8: 128–36.

Laird, John E. 2019. The Soar Cognitive Architecture. MIT press.

Lee, Michael D., ed. 2011a. “Special Issue on Hierarchical Bayesian Models.” Journal of Mathematical Psychology 55 (1). https://www.sciencedirect.com/journal/journal-of-mathematical-psychology/vol/55/issue/1.

Lee, Michael D., ed. 2011a. “Special Issue on Hierarchical Bayesian Models.” Journal of Mathematical Psychology 55 (1). https://www.sciencedirect.com/journal/journal-of-mathematical-psychology/vol/55/issue/1.

2011b. “How Cognitive Modeling Can Benefit from Hierarchical Bayesian Models.” Journal of Mathematical Psychology 55 (1). Elsevier BV: 1–7. https://doi.org/10.1016/j.jmp.2010.08.013.

Lee, Michael D., and Eric-Jan Wagenmakers. 2014. Bayesian Cognitive Modeling: A Practical Course. Cambridge University Press.

Lissón, Paula, Dario Paape, Dorothea Pregla, Frank Burchert, Nicole Stadie, and Shravan Vasishth. 2022. “Similarity-Based Interference in Sentence Comprehension in Aphasia: A Computational Evaluation of Two Models of Cue-Based Retrieval.”

Lissón, Paula, Dorothea Pregla, Bruno Nicenboim, Dario Paape, Mick van het Nederend, Frank Burchert, Nicole Stadie, David Caplan, and Shravan Vasishth. 2021. “A Computational Evaluation of Two Models of Retrieval Processes in Sentence Processing in Aphasia.” Cognitive Science 45 (4): e12956. https://onlinelibrary.wiley.com/doi/full/10.1111/cogs.12956.

Luce, R Duncan. 1991. Response Times: Their Role in Inferring Elementary Mental Organization. Oxford University Press.

McClelland, James L. 2009b. “The Place of Modeling in Cognitive Science.” Topics in Cognitive Science 1 (1). Wiley Online Library: 11–38.

McClelland, James L, and David E Rumelhart. 1989. Explorations in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises. MIT Press.

Meehl, Paul E. 1997. “The Problem Is Epistemology, Not Statistics: Replace Significance Tests by Confidence Intervals and Quantify Accuracy of Risky Numerical Predictions.” In What If There Were No Significance Tests?, edited by L.L. Harlow, S.A. Mulaik, and J. H. Steiger. Mahwah, New Jersey: Erlbaum.

Newell, Allen. 1990. Unified Theories of Cognition. cambridge: Harvard University Press.

Nicenboim, Bruno, and Shravan Vasishth. 2018. “Models of Retrieval in Sentence Comprehension: A Computational Evaluation Using Bayesian Hierarchical Modeling.” Journal of Memory and Language 99: 1–34. https://doi.org/10.1016/j.jml.2017.08.004.

Paape, Dario, and Shravan Vasishth. 2022. “Estimating the True Cost of Garden-Pathing: A Computational Model of Latent Cognitive Processes.” Cognitive Science 46 (8): e13186.

Port, Robert F, and Timothy Van Gelder. 1995. Mind as Motion: Explorations in the Dynamics of Cognition. MIT Press.

Rabe, Maximilian M., Johan Chandra, André Krü"gel, Stefan A. Seelig, Shravan Vasishth, and Ralf Engbert. 2021. “A Bayesian Approach to Dynamical Modeling of Eye-Movement Control in Reading of Normal, Mirrored, and Scrambled Texts.” Psychological Review. https://doi.org/10.1037/rev0000268.

Roberts, Seth, and Harold Pashler. 2000. “How Persuasive Is a Good Fit? A Comment on Theory Testing.” Psychological Review 107 (2): 358–67.

Tabor, Whitney, and Michael K Tanenhaus. 1999. “Dynamical Models of Sentence Processing.” Cognitive Science 23 (4). Wiley Online Library: 491–515.

Vasishth, Shravan, and Felix Engelmann. 2022. Sentence Comprehension as a Cognitive Process: A Computational Approach. Cambridge, UK: Cambridge University Press. https://books.google.de/books?id=6KZKzgEACAAJ.

Vasishth, Shravan, Daniela Mertzen, Lena A. Jäger, and Andrew Gelman. 2018a. “The Statistical Significance Filter Leads to Overoptimistic Expectations of Replicability.” Journal of Memory and Language 103: 151–75. https://doi.org/https://doi.org/10.1016/j.jml.2018.07.004.

Wilson, Robert C, and Anne GE Collins. 2019. “Ten Simple Rules for the Computational Modeling of Behavioral Data.” Edited by Timothy E Behrens. eLife 8 (November). eLife Sciences Publications, Ltd: e49547. https://doi.org/10.7554/eLife.49547.

Yadav, Himanshu, Dario Paape, Garrett Smith, Brian W. Dillon, and Shravan Vasishth. 2022. “Individual Differences in Cue Weighting in Sentence Comprehension: An evaluation using Approximate Bayesian Computation.” Open Mind. https://doi.org/https://doi.org/10.1162/opmi_a_00052.

Yadav, Himanshu, Garrett Smith, Sebastian Reich, and Shravan Vasishth. 2022. “Number Feature Distortion Modulates Cue-Based Retrieval in Reading.” Journal of Memory and Language.