The Quest for Quality Questions

Data

Dataset Description

The dataset presented here comprises reading comprehension question-answer pairs for 244 classic narrative fairy tales. The training set consists of about 7,000 question-answer pairs from 201 stories. The final model evaluation at the close of the competition will be done on about 1,900 unseen question-answer pairs from 43 new stories. Each question-answer pair has been annotated as one of seven categories of narrative elements:

Character - questions ask test takers to identify the character of the story or describe characteristics of characters
Setting - questions ask about a place or time where/when story events take place and typically start with “Where” or “When”
Action - questions ask about characters’ behaviors or information about that behavior.
Feeling - questions ask about the character’s emotional status or reaction to certain events and are typically worded as “How did/does/do . . . feel”
Causal relationship - questions focus on two events that are causally related where the prior events causally lead to the latter event in the question. These types of questions usually begin with “Why” or “What made/makes”
Outcome resolution - questions ask for identifying outcome events that are causally led to by the prior event in the question. These types of questions are usually worded as “What happened/happens/has happened...after...”
Prediction - questions ask for the unknown outcome of a focal event, which is predictable based on the existing information in the text.

Additionally, questions are labeled as either explicit, meaning the answer can be found directly in the source text, or implicit, meaning the answer cannot be found directly in the source text. Answering implicit questions requires either reformulating language or making inference.

File and Field Information

source_texts.csv - The source texts organized by section.

source_title - the title of the story
cor_section - the section number of the story
text - the text for the section of the story

train.csv - The training set comprising the question-answer pairs for each story, identified by a unique pair_id.

pair_id - the ID of the question-answer pair
source_title - the title of the source text
cor_section - which section(s) of the source text does the question come from
answer - the answer to the question
question - the question the answer responds to
local_or_sum - whether the question relates to one section or multiple sections
attribute1 - which of the seven categories of narrative elements the question falls under. Some question-answer pairs may fall under 2 categories
attribute2 - the second attribute (if any)
ex_or_im - whether the question is explicit or implicit (i.e. whether or not the answer is directly in the text)

test.csv - The test set used to generate predictions to put in your submission file to submit to the leaderboard during the competition.

pair_id - the ID of the question-answer pair
source_title - the title of the source text
cor_section - which section(s) of the source text does the question come from
answer - the answer to the question

sample_submission.csv - An example submission file in the correct format. See the Submission File section for details.

pair_id - the ID of the question-answer pair (we allow up to 10 duplicates per pair_id in the submission file)
generated_question - the candidate question generated by the model

evaluation_metric_code.py - Example code for running the evaluation metric BLEURT used for this competition.

Evaluation Metric

Submissions are scored using BLEURT, which is a trained metric designed to indicate to what extent a candidate sentence is fluent and conveys the meaning of the reference. You can read more about BLEURT here. You may submit up to 10 candidate questions per pair_id. The evaluation will be based solely on the best scoring candidate for each pair_id.

Submission File

For each pair_id in the test set, you must predict a question that matches the answer and text of the section(s) of source text. The file should contain a header and have the following format, with up to 10 candidate questions per pair_id:

pair_id	generated_question
a525221cac7e	Who was a most unpleasant customer to deal with?
b11baa6629fc	What did the girl save from her dinner?
297fec311baa	Who were playing tennis in the court?
297fec311baa	Who played tennis in the court?

Submissions to the leaderboard will be evaluated once per day and will use your most recent submission before 5 p.m. EST. At the close of the competition, all teams will submit one model for final evaluation. For more information on final model submission requirements, please see Section B of the Rules page.