Sentence processing workshop (27 May 2025), University of Potsdam, Germany
This is an in-person workshop. There will be no online streaming.
Location: Besprechungsraum (room 302/303), Haus 14, Golm campus.
Program schedule
- 11-11:45AM Brian Dillon.
Title: Some observations about dependency
formation, and a proposal
Abstract: How do we leverage working memory to form syntactic
dependencies in comprehension? Despite an enormous amount of work on this topic,
a firm grasp on the cognitive mechanisms that allow this feat remains elusive.
In this talk, I will argue that recent empirical work in this area supports
several observations about these cognitive mechanisms: First, the context of
encoding linguistic items into memory matters; Second, interference is not
always cue-driven; And third, interference often results in illusory
item-feature conjunctions. I suggest that these observations support theories of
the syntax-memory interface that focus on challenges that arise at encoding
rather than retrieval (e.g. Logacev & Vasishth, 2012; Hammerly et al,. 2019;
Keshev et al., 2024). I will show that these observations follow naturally from
a system whereby lexical information is transiently bound to syntactic positions
in a distributed working system (Keshev et al., 2024).
- 11:45-12 Coffee break
- 12-12:45: Tal Linzen.
Title: Towards language models with cognitive plausible
memory
Abstract: What are the mechanisms that underlie human
language comprehension? Language models based on the transformer
architecture appear to provide a potential pathway to make progress on this
question: like the human brain, language models' behavior emerges from a
complex interaction of a massive number of simple units, and they can learn
from experience to process language quite well. Yet they also differ from
the brain in crucial ways; in particular, their working memory capacity is
vastly greater than that of humans. Because they operate under very
different constraints from humans, it is unlikely that their representations
will match the ones used by humans, and as such they are poor candidates for
models of the brain. In this talk, I will discuss research illustrating the
memory discrepancy between models and humans, and propose ideas for
mitigating this discrepancy to produce more plausible cognitive models that
can still leverage the statistical learning strength of deep learning
technology.
- 12:45-2PM Lunch break
- 2-2:45PM Titus von der Malsburg.
Title: Evaluating autoregressive transformer-based language
models on agreement attraction
Abstract: Transformers have been argued to hold some promise as
models of human language processing, but their ability to capture subtle
phenomena such as agreement attraction remains understudied. Human readers
experience an “agreement attraction effect” when an unrelated noun matches the
verb in number, making some ungrammatical sentences seem more acceptable. Prior
research found that LSTM-based language models predict human-like agreement
attraction patterns. Here, we tested a diverse range of eleven autoregressive
transformer models from four families (GPT family, Bloom, XGLM, and Gemma),
including mono- and multilingual models, on all linguistically relevant
conditions from a standard agreement attraction paradigm (Wagers et al., 2009,
expts. 2 and 3).
Across eight conditions and 384 experimental sentences, transformers diverged
substantially from human behavior and from each other. In contrast to humans,
transformers showed attraction effects both in ungrammatical and grammatical
contexts, where humans show none, and displayed these effects equally for
singular and plural subjects, whereas humans show them reliably only with
singular subjects. There were was little consistency in patterns across model
families but some within families. These findings suggest caution when
generalizing transformer results to broader claims about LLMs and raise doubts
about their reliability as cognitive models in capturing finer-grained
linguistic phenomena.
- 3-3:45PM Will Timkey
Title: What a large-scale eye tracking benchmark reveals
about syntactic reanalysis in reading
Abstract:In this talk, I will present some results from the
new eye tracking component of the Syntactic Ambiguity Processing (SAP)
Benchmark, a dataset containing the eye movements of 368 participants
reading syntactically ambiguous sentences. The previously released
self-paced reading version of this dataset was used in Huang et al. (2024)
to show that the magnitude of garden path effects in humans cannot be
explained by predictability alone, contrary to the predictions of surprisal
theory. Unlike the self-paced reading paradigm used in the previous release,
the eye tracking while reading paradigm gives multiple measures of
difficulty associated with various stages of processing, allowing us to
specifically address whether the difficulty of any stage of sentence
processing can be reduced to predictability. I also discuss our large-scale
analysis of the eye movement patterns we found to be associated with
syntactic disambiguation difficulty, and discuss how future computational
models of syntactic ambiguity resolution might account for these patterns.
- 4-4:45 Özge Bakay
Title: Hierarchical relations guide memory retrieval:
Evidence from a local anaphor in Turkish
Abstract: Real-time sentence comprehension relies on memory
resources to establish syntactic dependencies that hold within a span of
linguistic input. While many dependencies are governed by hierarchical
constraints, it remains an open question how much hierarchical information
is stored and how exactly it is represented and accessed in memory. Here we
ask whether memory processes exploit hierarchical information that is
refined enough to capture hierarchical relations between noun phrases as in
x c-commands y (Reinhart, 1983) by examining the real-time resolution of the
Turkish reciprocal birbirleri. Unlike existing studies on local anaphors
that confounded multiple structural cues, the current study disentangled
c-command from other structural information (clausemateness, case,
subjecthood) and a domain-general memory factor (recency) in three visual
world studies. Experiment 1 compared the availability of c-commanding
subjects and non-c-commanding, clause-mate distractors bearing the same case
marking. Results showed that c-commanding subjects were rapidly
distinguished from distractors within the reciprocal window. Experiment 2
showed that the immediate availability of c-commanding subjects extends to
c-commanding indirect objects. Experiment 3 was a pre-registered, high-power
replication that yielded similar results. We found limited evidence for
feature-based interference, which was not replicated. Overall, we show
evidence that hierarchical, item-to-item relations between noun phrases
rapidly determine the availability of antecedents in retrieval above and
beyond clause, case and subject features.
- 5-5:30PM Pia Schoknecht
Title: The time course of local coherence: Evidence from
self-paced reading times and event-related potentials
Read the paper: here
Abstract: In sentences like ``The coach smiled at the player
tossed a frisbee,'' the string ``the player tossed a frisbee'' cannot be an
active subject-verb-object (SVO) clause given the preceding context; yet,
comprehenders seem to entertain this incorrect parse, at least momentarily.
Behaviorally, this momentary mis-parse is expressed as greater processing
difficulty after the SVO phrase is read. This phenomenon, called local
coherence, has important implications for sentence processing theories that
treat grammar as a strict filter during incremental sentence processing: Under
such a strict filter, local coherence should never occur. Several studies report
the existence of local coherence in languages like English, German, and Hindi,
but one question remains unanswered: at what moment is the local coherence
triggered, and how quickly does grammar override the mis-parse? We investigate
the time course of local coherence through two relatively large-scale
experiments in German (self-paced reading and EEG). Our data suggest that local
coherence, indexed by longer reading time and more positive P600, is triggered
very early, as soon as the locally coherent phrase is read. However, the locally
coherent parse does not linger, that is, it does not continue to cause
processing difficulty. A broader implication of our findings is that although
grammar is not a strict a priori filter, it rapidly steps in to correct
incremental sentence structure building.
- 5:30-6:00PM Michael Vrazitulis
Title: A Benchmark Dataset for Evaluating Sentence
Processing Models in German
Abstract: We present a new benchmark dataset for
evaluating models of sentence processing. The data include a broad range of
well-established effects in German, including garden-path ambiguities,
interference effects, attachment ambiguities, and relative clause
asymmetries. Using the same experimental materials, we collected both
eye-tracking and self-paced reading (SPR) data. This dataset builds on
recent work by Huang et al. (2024), who introduced a benchmark for English
garden-path sentences and showed that surprisal-based models fail to explain
much of the observed data. In contrast, our preliminary analyses suggest
that surprisal does account for many of the observed patterns in German,
though some effects remain unexplained. The full dataset will be made
publicly available to support systematic model comparison and theory
development in sentence processing.
- 6:00-6:30PM Johan Hennert
Title: Memory and expectation under one roof: An empirical investigation of
lossy-context surprisal theory with Russian, Hindi, and Persian data
Abstract:
Memory-based theories of sentence processing primarily account for so-called
locality and/or interference effects: a signature example is increased
processing difficulty when dependency distance is increased (e.g., Grodner &
Gibson, 2005; Lewis & Vasishth, 2005). Problematic for memory-based theories is
the existence of anti-locality effects, where increased dependency distance
leads to a reduction in processing difficulty (e.g., Husain et al., 2014;
Konieczny, 2000; Vasishth & Lewis, 2006). Anti-locality is a key prediction of
the expectation-based account (Levy, 2008): increasing dependency distance can
make the upcoming word more predictable, make processing easier. Clearly, a
theory is needed that can explain both locality and anti-locality effects within
a unified framework. One such integrative theory is lossy-context surprisal
(Futrell et al., 2021), which extends the expectation-based account with a
memory component. One important open question is how well lossy context
surprisal predicts observed data in which both memory and expectation are in
play. We present an evaluation of lossy-context surprisal theory against three
empirical studies involving Russian, Hindi, and Persian, which found evidence
for both memory- and expectation-based effects (Husain et al., 2014; Levy et
al., 2013; Safavi et al., 2016). We show that lossy-context surprisal model, as
laid out in Futrell et al. (2021), has only limited success in explaining the
observed patterns in the data considered here. We discuss some possible ways in
which a framework like lossy context surprisal would need to be extended to
account for the observed data.