The Seventh Summer School on Statistical Methods for Linguistics and Psychology

Welcome to the Seventh Summer School on Statistical Methods for Linguistics and Psychology, 11-15 September 2023

Important notice

: Please bring cash to pay for food during lunch, and also if you want to go to bakeries. Credit/debit cards and EC cards won't be accepted.

Application, dates, location

Dates: 11-15 September 2023.
Times: 9AM-5PM daily.
Location: The summer school will be held at the Griebnitzsee campus in Potsdam, at Haus 6. For train connections, consult bvg.de; the train station near the campus is called Griebnitzsee Bhf.
Application period: 30 Sept 2022 to 1 April 2023.
Schedule: download from here.

Brief history of the summer school, and motivation

The summer school was started by Shravan Vasishth in 2017, as part of a methods project funded within the SFB 1287. The summer school aims to fill a gap in statistics education, specifically within the fields of linguistics and psychology. One goal of the summer school is to provide comprehensive training in the theory and application of statistics, with a special focus on the linear mixed model. Another major goal is to make Bayesian data analysis a standard part of the toolkit for the linguistics and psychology researcher. Over time, the summer school has evolved to have at least four parallel streams: beginning and advanced courses in frequentist and Bayesian statistics. These may be expanded to more parallel sessions in future editions. We typically admit a total of 120 participants (in 2019, we had some 450 applications). In addition to the all-day courses, we regularly invite speakers to give lectures on important current issues relating to statistics. Previous editions of the summer school: 2022, 2021, 2020, 2019, 2018, 2017.

Code of conduct

All participants will be expected to follow the (code of conduct, taken from StanCon 2018. In case a participant has any concerns, please contact any of the following instructors: Audrey Bürki, Anna Laurinavichyute, Shravan Vasishth, Bruno Nicenboim, or Reinhold Kliegl.

Invited lecturers

Phillip Alday; Douglas Bates (Prof. Bates is attending in person).

Invited keynote speakers

Grusha Prasad. Tuesday 12 Sept 2023, 5PM-6PM, in Hörsaal 05.
Title of talk: Generating and testing quantitative predictions of human language processing
Abstract: Decades of psycholinguistic research has focused on evaluating theories of human sentence processing by testing qualitative behavioral predictions. The advent of broad-coverage computational models of human sentence processing has made it possible to move beyond these qualitative predictions and derive quantitative predictions from theories. In the first part of this talk, I discuss the importance of large-scale datasets for testing such quantitative predictions, and present data from one such large-scale dataset: SAP Benchmark (Huang et al, 2023). These data suggest that word predictability alone, as estimated from neural network language models trained on uncurated datasets from the internet, cannot explain syntactic disambiguation difficulty. New modeling work suggests that this conclusion holds for models trained on datasets curated to be more developmentally plausible. In the second part of this talk, I discuss the factors that can impact our empirical estimates of processing difficulty. Focusing in on the web-based platform used for data collection, I present some self-paced reading data which demonstrates that while the overall reading times and comprehension accuracy are higher on Prolific than on MTurk, there is no evidence for any interaction between the platform and effects of interests (such as garden path effects). This suggests that conclusions drawn based on participants from one platform are likely to generalize to participants on other platforms.
Michael Franke Thursday 14 Sept 2023, 5PM-6PM, in Hörsaal 05.

Title of talk: Theory-driven statistical modeling in semantics and pragmatics (in the age of Large Language Models)
Abstract: Theoretical linguistics postulates abstract structures that successfully explain key aspects of language, such as syntax or semantics. However, the precise relation between abstract theoretical ideas and empirical data from language use is not always apparent. The first part of this talk investigates, based on case studies, how theory-driven probabilistic models can help address theoretically interesting questions at the semantics-pragmatics interface. In the second, exploratory part, the talk will address the question of how Large Language Models might be integrated into explanatory (probabilistic) models of language use.

Courses

This short course has been cancelled. Special short course: Introduction to Bayesian meta-analysis. Taught by Gian Luca Di Tanna.

Timing: Tuesday and Thursday: 3:00-4:30PM. Anyone can attend this short course.
Introduction to Bayesian data analysis (maximum 30 participants). Taught by Himanshu Yadav, assisted by Anna Laurinavichyute.

You can decide whether this course is appropriate for you by looking at the online version of this course (videos are available): see here

Prerequisites

here

must

Course Materials

here

Advanced Bayesian data analysis (maximum 30 participants). Taught by Bruno Nicenboim

Introduction to Bayesian Data Analysis for Cognitive Science

Course Materials

here

Foundational methods in frequentist statistics (maximum 30 participants). Taught by Daniel Schad, and João Veríssimo.

Winter (2019, Statistics for Linguists)

This course is not appropriate for researchers new to R or to frequentist statistics

Course Materials

here

Advanced methods in frequentist statistics with Julia (maximum 30 participants). Taught by Reinhold Kliegl, Phillip Alday, and Doug Bates.

Applicants must have experience with linear mixed models and be interested in learning how to carry out such analyses with the Julia-based MixedModels.jl package) (i.e., the analogue of the R-based lme4 package). MixedModels.jl has some significant advantages. Some of them are: (a) new and more efficient computational implementation, (b) speed — needed for, e.g., complex designs and power simulations, (c) more flexibility for selection of parsimonious mixed models, and (d) more flexibility in taking into account autocorrelations or other dependencies — typical EEG-, fMRI-based time series (under development). We do not expect profound knowledge of Julia from participants; the necessary subset of knowledge will be taught on the first day of the course. We do expect a readiness to install Julia and the confidence that with some basic instruction participants will be able to adapt prepared Julia scripts for their own data or to adapt some of their own lme4-commands to the equivalent MixedModels.jl-commands. The course will be taught in a hybrid IDE. There is already the option to execute R chunks from within Julia, meaning one needs Julia primarily for execution of MixedModels.jl commands as replacement of lme4. There is also an option to call MixedModels.jl from within R and process the resulting object like an lme4-object. Thus, much of pre- and postprocessing (e.g., data simulation for complex experimental designs; visualization of partial-effect interactions or shrinkage effects) can be carried out in R.
Course Materials Github repo from 2022: here.

Fees and accommodation

There will be a 40 Euro fee; this covers costs for coffee and snacks. Participants who are accepted are expected to arrange their own accommodation. We strongly advise participants to find a place to stay near Griebnitzsee campus, and not in Berlin. The reason is that German train personnel tend to go on strike every year around the time of the summer school. You will be better off if you can get easily to the Griebnitzsee campus.

Contact details

For any questions regarding this summer school that have not been addressed on this home page already, please contact Shravan Vasishth.

Funding

This summer school is funded by the DFG and is part of the SFB 1287, “Variability in Language and Its Limits”.