Challenge Details

Challenge Task

This challenge will focus on automatic question generation for testing reading comprehension among elementary and middle school students. The task will be Natural Language Generation (NLG), where models generate a reading comprehension question based on an input answer and source text. The generated questions will be compared against ground truth references from the dataset using BLEURT (see more about the evaluation metric in the data tab).

Challenge Goal

Questions are an essential aspect of assessing and training narrative comprehension skills in young learners. However, generating reading comprehension questions is a time-consuming process, which limits the number of texts that readers are able to engage with in this way. Your work in developing question generation models will support the use of question items for improving reading comprehension, especially for texts that do not already have existing resources attached to them.

The Fairytale QA dataset is one of only a few that focus on reading comprehension questions of narrative texts, specifically children’s storybooks. The dataset was annotated by experts with substantial experience with teaching and reading assessment and backgrounds in education, psychology, and cognitive science to ensure that the question-answer pairs were of high quality and variety. The dataset was developed using an evidence-based theoretical framework which focuses on narrative comprehension for kindergarten to eighth-grade students.

Data Overview

The FairytaleQA dataset was created to address the gaps present in similar datasets, as existing datasets rarely distinguish fine-grained reading skills, such as the understanding of varying narrative elements.

Of the 10,580 questions in the FairytaleQA dataset, approximately 8,900 question-answer pairs were pulled for use in this challenge.

More About the Creators of the Dataset

Mark Warschauer is a Professor of Education at UC Irvine with an affiliated appointment in Informatics, and director of the Digital Learning Lab. He holds a PhD from the University of Hawai’i at Manoa in Second Language Acquisition. He is one of the most widely-cited scholars in the world on digital learning topics such as computer-assisted language learning, digital literacy, the digital divide, one-to-one laptop classrooms, and artificial intelligence in education. He is a member of the National Academy of Education.

Ying Xu is an Assistant Professor of Learning Sciences & Technology at the University of Michigan School of Education. She holds a PhD from the University of California, Irvine in Language, Literacy, and Technology. Her research stands at the intersection of education, psychology, and human-computer interaction, focusing on the design and evaluation of technologies that promote language and literacy development, STEM learning, and wellbeing for children and families.


The Learning Agency Lab would like to thank the following individuals for their support in making this data science challenge a reality: Dr. Mark Warschauer, Dr. Ying Xu, Dr. Scott Crossley, Dr. Katie McCarthy, Dr. Ben Shapiro, Dr. Mihai Dascalu, Dr. Stefan Ruseti, Dr. Ryan Baker, Dr. Caitlin Mills, Dr. Thanaporn Patikorn, Joon Suh Choi, Priyank Sharma, and Cece Dye.