0% found this document useful (0 votes)
182 views11 pages

Movie Script Summarization As Graph-Based Scene Extraction

The document discusses summarizing movie scripts by extracting a chain of important scenes. It formalizes script summarization as finding an optimal scene chain that conveys narrative progression while maintaining diversity and importance. An evaluation shows their graph-based model produces more informative summaries than baselines.

Uploaded by

Tania
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
182 views11 pages

Movie Script Summarization As Graph-Based Scene Extraction

The document discusses summarizing movie scripts by extracting a chain of important scenes. It formalizes script summarization as finding an optimal scene chain that conveys narrative progression while maintaining diversity and importance. An evaluation shows their graph-based model produces more informative summaries than baselines.

Uploaded by

Tania
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Movie Script Summarization as Graph-based Scene Extraction

Philip John Gorinski and Mirella Lapata


Institute for Language, Cognition and Computation
School of Informatics, University of Edinburgh
10 Crichton Street, Edinburgh EH8 9AB
P.J.Gorinski@sms.ed.ac.uk, mlap@inf.ed.ac.uk

Abstract We can’t get a good glimpse of his face, but


his body is plump, above average height; he
In this paper we study the task of movie is in his mid 30’s. Together they easily
script summarization, which we argue could lift the chair into the truck.
enhance script browsing, give readers a rough MAN (O.S.)
idea of the script’s plotline, and speed up read- Let’s slide it up, you mind?
ing time. We formalize the process of gen- CUT TO:
erating a shorter version of a screenplay as INT. THE PANEL TRUCK - NIGHT
the task of finding an optimal chain of scenes. He climbs inside the truck, ducking under a
We develop a graph-based model that selects small hand winch, and grabs the chair. She
a chain by jointly optimizing its logical pro- hesitates again, but climbs in after him.
gression, diversity, and importance. Human MAN
evaluation based on a question-answering task Are you about a size 14?
shows that our model produces summaries CATHERINE
which are more informative compared to com- (surprised)
petitive baselines. What?
Suddenly, in the shadowy dark, he clubs her
over the back of her head with his cast.
1 Introduction
Each year, about 50,000 screenplays are registered Figure 1: Excerpt from “The Silence of the Lambs”.
The scene heading INT. THE PANEL TRUCK - NIGHT
with the WGA1 , the Writers Guild of America. Only
denotes that the action takes place inside the panel truck
a fraction of these make it through to be considered at night. Character cues (e.g., MAN, CATHERINE) preface
for production and an even smaller fraction to the the lines the actors speak. Action lines describe what the
big screen. How do producers and directors navigate camera sees (e.g., We can’t get a good glimpse of
through this vast number of scripts available? Typ- his face, but his body. . . ).
ically, production companies, agencies, and studios
hire script readers, whose job is to analyze screen-
plays that come in, sorting the hopeful from the from “Silence of the Lambs”, an American thriller
hopeless. Having read the script, a reader will gen- released in 1991, is shown in Figure 1.
erate a coverage report consisting of a logline (one Although there are several screenwriting tools for
or two sentences describing the story in a nutshell), authors (e.g., Final Draft is a popular application
a synopsis (a two- to three-page long summary of which automatically formats scripts to industry stan-
the script), comments explaining its appeal or prob- dards, keeps track of revisions, allows insertion of
lematic aspects, and a final verdict as to whether the notes, and writing collaboratively online), there is a
script merits further consideration. A script excerpt lack of any kind of script reading aids. Features of
1 The WGA is a collective term representing US TV and film such a tool could be to automatically grade the qual-
writers. ity of the script (e.g., thumbs up or down), generate

1066

Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the ACL, pages 1066–1076,
c
Denver, Colorado, May 31 – June 5, 2015. 2015 Association for Computational Linguistics
synopses and loglines, identify main characters and 1960s (Mosteller and Wallace, 1964). More re-
their stories, or facilitate browsing (e.g., “show me cently, the availability of large collections of dig-
every scene where there is a shooting”). In this pa- itized books and works of fiction has enabled re-
per we explore whether current NLP technology can searchers to observe cultural trends, address ques-
be used to address some of these tasks. Specifically, tions about language use and its evolution, study
we focus on script summarization, which we con- how individuals rise to and fall from fame, perform
ceptualize as the process of generating a shorter ver- gender studies, and so on (Michel et al., 2010). Most
sion of a screenplay, ideally encapsulating its most existing work focuses on low-level analysis of word
informative scenes. The resulting summaries can patterns, with a few notable exceptions. Elson et al.
be used to enhance script browsing, give readers a (2010) analyze 19th century British novels by con-
rough idea of the script’s content and plotline, and structing a conversational network with vertices cor-
speed up reading time. responding to characters and weighted edges corre-
So, what makes a good script summary? Accord- sponding to the amount of conversational interac-
ing to modern film theory, “all films are about noth- tion. Elsner (2012) analyzes characters and their
ing — nothing but character” (Monaco, 1982). Be- emotional trajectories, whereas Nalisnick and Baird
yond characters, a summary should also highlight (2013) identify a character’s enemies and allies in
major scenes representative of the story and its pro- plays based on the sentiment of their utterances.
gression. With this in mind, we define a script sum- Other work (Bamman et al., 2013, 2014) automat-
mary as a chain of scenes which conveys a narrative ically infers latent character types (e.g., villains or
and smooth transitions from one scene to the next. heroes) in novels and movie plot summaries.
At the same time, a good chain should incorporate Although we are not aware of any previous ap-
some diversity (i.e., avoid redundancy), and focus proaches to summarize screenplays, the field of
on important scenes and characters. We formalize computer vision is rife with attempts to summa-
the problem of selecting a good summary chain us- rize video (see Reed 2004 for an overview). Most
ing a graph-theoretic approach. We represent scripts techniques are based on visual information and rely
as (directed) bipartite graphs with vertices corre- on low-level cues such as motion, color, or audio
sponding to scenes and characters, and edge weights (e.g., Rasheed et al. 2005). Movie summarization is
to their strength of correlation. Intuitively, if two a special type of video summarization which poses
scenes are connected, a random walk starting from many challenges due to the large variety of film
one would reach the other frequently. We find a styles and genres. A few recent studies (Weng et al.,
chain of highly connected scenes by jointly optimiz- 2009; Lin et al., 2013) have used concepts from so-
ing logical progression, diversity, and importance. cial network analysis to identify lead roles and role
Our contributions in this work are three-fold: we communities in order to segment movies into scenes
introduce a novel summarization task, on a new text (containing one or more shots) and create more in-
genre, and formalize scene selection as the problem formative summaries. A surprising fact about this
of finding a chain that represents a film’s story; we line of work is that it does not exploit the movie
propose several novel methods for analyzing script script in any way. Characters are typically identified
content (e.g., identifying important characters and using face recognition techniques and scene bound-
their interactions); and perform a large-scale human aries are presumed unknown and are automatically
evaluation study using a question-answering task. detected. A notable exception are Sang and Xu
Experimental results show that our method produces (2010) who generate video summaries for movies,
summaries which are more informative compared to while taking into account character interaction fea-
several competitive baselines. tures which they estimate from the corresponding
screenplay.
2 Related Work Our own approach is inspired by work in ego-
centric video analysis. An egocentric video offers
Computer-assisted analysis of literary text has a long a first-person view of the world and is captured from
history, with the first studies dating back to the a wearable camera focusing on the user’s activities,

1067
# Movies AvgLines AvgScenes AvgChars
Drama 665 4484.53 79.77 60.94 ...
s1 s2 s3 s4 s5 s6 s7
Thriller 451 4333.10 91.84 52.59
Comedy 378 4303.02 66.13 57.51
Action 288 4255.56 101.82 59.99
s1 s2 s3 s4 s5 s6 s7 ...
Figure 2: ScriptBase corpus statistics. Movies can have
multiple genres, thus numbers do not add up to 1,276. //

Figure 3: Example of consecutive chain (top). Squares


social interactions, and interests. Lu and Grauman represent scenes in a screenplay. The bottom chain would
(2013) present a summarization model which ex- not be allowed, since the connection between s3 and s5
tracts subshot sequences while finding a balance of makes it non-consecutive.
important subshots that are both diverse and provide
a natural progression through the video, in terms of for training/development and 65 movies for testing.
prominent visual objects (e.g., bottle, mug, televi-
sion). We adapt their technique to our task, and show 4 The Scene Extraction Model
how to estimate character-scene correlations based
on linguistic analysis. We also interpret movies As mentioned earlier, we define script summariza-
as social networks and extract a rich set of fea- tion as the task of selecting a chain of scenes rep-
tures from character interactions and their sentiment resenting the movie’s most important content. We
which we use to guide the summarization process. interpret the term scene in the screenplay sense. A
scene is a unit of action that takes place in one loca-
3 ScriptBase: A Movie Script Corpus tion at one time (see Figure 1). We therefore need
not be concerned with scene segmentation; scene
We compiled ScriptBase, a collection of
boundaries are clearly marked, and constitute the ba-
1,276 movie scripts, by automatically crawling
sic units over which our model operates.
web-sites which host or link entire movie scripts
Let M = (S,C) represent a screenplay consist-
(e.g., imsdb.com). The retrieved scripts were then
ing of a set S = {s1 , s2 , . . . , sn } of scenes, and a set
cross-matched against Wikipedia2 and IMDB3 and
C = {c1 , . . . , cm } of characters. We are interested in
paired with corresponding user-written summaries,
finding a list S0 = {si , . . . sk } of ordered, consecutive
plot sections, loglines and taglines (taglines are
scenes subject to a compression rate m (see the ex-
short snippets used by marketing departments
ample in Figure 3). A natural interpretation of m in
to promote a movie). We also collected meta-
our case is the percentage of scenes from the orig-
information regarding the movie’s genre, its actors,
inal script retained in the summary. The extracted
the production year, etc. ScriptBase contains movies
chain should contain (a) important scenes (i.e., crit-
comprising 23 genres; each movie is on average
ical for comprehending the story and its develop-
accompanied by 3 user summaries, 3 loglines, and
ment); (b) diverse scenes that cover different as-
3 taglines. The corpus spans years 1909–2013.
pects of the story; and (c) scenes which highlight
Some corpus statistics are shown in Figure 2.
the story’s progression from beginning to end. We
The scripts were further post-processed with the
therefore find the chain S0 maximizing the objective
Stanford CoreNLP pipeline (Manning et al., 2014)
function Q(S0 ) which is the weighted sum of three
to perform tagging, parsing, named entity recogni-
terms: the story progression P, scene diversity D,
tion and coreference resolution. They were also an-
and scene importance I:
notated with semantic roles (e.g., ARG0, ARG1),
using the MATE tools (Björkelund et al., 2009).
S∗ = arg max Q(S0 ) (1)
Our summarization experiments focused on come- S0 ⊂S
dies and thrillers. We randomly selected 30 movies Q(S0 ) = λP P(S ) + λD D(S0 ) + λI I(S0 )
0
(2)
2 http://en.wikipedia.org
3 http://www.imdb.com/ In the following, we define each of the three terms.

1068
scene 1 scene 2 scene 3 scene 4 specifically in a particular scene. For wc,s , we con-
0.3 0.2 0.1 0.07 0.04 0.07 sider the probability of a character being important,
0.11
i.e., of them belonging to the set of main characters:

0.1 0.04 0.05 0.01 0.2 0.3 0.2 wc,s = P(c ∈ main(M)), ∀(c, s, wc,s ) ∈ E (4)
char 1 char 2 char 3 char 4 where P(c ∈ main(M)) is some probability score as-
sociated with c being a main character in script M.
Figure 4: Example of a bipartite graph, connecting a For ws,c , we take the number of interactions a char-
movie’s scenes with participating characters. acter is involved in relative to the total number of
interactions in a specific scene as indicative of the
Scene-to-scene Progression The first term in the character’s importance in that scene. Interactions re-
objective is responsible for selecting chains repre- fer to conversational interactions as well as relations
senting a logically coherent story. Intuitively, this between characters (e.g., who does what to whom):
means that if our chain includes a scene where a
∑ inter(c, c0 )
character commits an action, then scenes involving c0 ∈Cs
ws,c = , ∀(s, c, ws,c ) ∈ E (5)
affected parties or follow-up actions should also be ∑ inter(c1 , c2 )
included. We operationalize this idea of progression c1 ,c2 ∈Cs

in a story in terms of how strongly the characters in We defer discussion of how we model probabil-
a selected scene si influence the transition to the next ity P(c ∈ Main(M)) and obtain interaction counts to
scene si+1 : Section 5. Weights ws,c and wc,s are normalized:
|S0 |−1 ws,c
P(S0 ) = ∑ ∑ INF(si , si+1 |c) (3) ws,c = , ∀(s, c, ws,c ) ∈ E (6)
∑(s,c0 ,w0s,c ) w0s,c
i=0 c∈Ci
wc,s
wc,s = , ∀(c, s, wc,s ) ∈ E (7)
We represent screenplays as weighted, bipartite ∑(c,s0 ,w0c,s ) w0c,s
graphs connecting scenes and characters:
We calculate the stationary distributions of a ran-
B = (V, E) : V = C ∪ S dom walk on a transition matrix T , enumerating over
all vertices v (i.e., characters and scenes) in the bi-
E = {(s, c, ws,c )|s ∈ S, c ∈ C, ws,c ∈ [0, 1]} ∪ partite graph B:
(
{(c, s, wc,s )|c ∈ C, s ∈ S, wc,s ∈ [0, 1]} wi, j if (vi , v j , wi, j ∈ E B )
T (i, j) = (8)
The set of vertices V corresponds to the union of 0 otherwise
characters C and scenes S. We therefore add to
We measure the influence individual characters have
the bipartite graph one node per scene and one
on scene-to-scene transitions as follows. The sta-
node per character, and two directed edges for each
tionary distribution rk for a RWR walker starting at
scene-character and character-scene pair. An exam-
node k is a vector that satisfies:
ple of a bipartite graph is shown in Figure 4. We
further assume that two scenes si and si+1 are tightly rk = (1 − ε)Trk + εek (9)
connected in such a graph if a random walk with
restart (RWR; Tong et al. 2006; Kim et al. 2014) where T is the transition matrix of the graph, ek is a
which starts in si has a high probability of ending seed vector, with all elements 0, except for element k
in si+1 . which is set to 1, and ε is a restart probability param-
In order to calculate the random walk stationary eter. In practice, our vectors rk and ek are indexed by
distributions, we must estimate the weights between the scenes and characters in a movie, i.e., they have
a character and a scene. We are interested in how length |S| + |C|, and their nth element corresponds
important a character is generally in the movie, and either to a known scene or character. In cases where

1069
graphs are relatively small, we can compute r di- we define dsen , the sentiment overlap between two
rectly4 by solving: scenes as:

rk = ε(I − (1 − ε)T )−1 ek (10) k · di f (si , si+1 )


dsen (si , si+1 ) =1 − (15)
k − k · di f (si , si+1 ) + 1
The lth element of r then equals the probability of 1
di f (si , si+1 ) = (16)
the random walker being in state l in the stationary 1 + |sen(si ) − sen(si+1 )|
distribution. Let rkc be the same as rk , but with the
character node c of the bipartite graph being turned where the sentiment sen(s) of scene s is the aggre-
into a sink, i.e., all entries for c in the transition gate sentiment score of all interactions in s:
matrix T are 0. We can then define how a single
character influences the transition between scenes si
sen(s) = ∑ sen(inter(c, c0 )) (17)
c,c0 ∈Cs
and si+1 as:
We explain how interactions and their sentiment are
INF(si , si+1 |c) = rsi [si+1 ] − rsci [si+1 ] (11) computed in Section 5. Again, dsen is larger if two
scenes have a less similar sentiment. di f (si , si+1 )
where rsi [si+1 ] is shorthand for that element in the becomes 1 if the sentiments are identical, and
vector rsi that corresponds to scene si+1 . We use increasingly smaller for more dissimilar senti-
the INF score directly in Equation (3) to determine ments. The sigmoid-like function in Equation (15)
the progress score of a candidate chain. scales dsen within range [0, 1] to take smaller values
Diversity The diversity term D(S0 ) in our objec- for larger sentiment differences (factor k adjusts the
tive should encourage chains which consist of more curve’s smoothness).
dissimilar scenes, thereby avoiding redundancy. The Importance The score I(S0 ) captures whether
diversity of chain S0 is the sum of the diversities of a chain contains important scenes. We define
its successive scenes: I(S0 ) as the sum of all scene-specific importance
|S0 |−1 scores imp(si ) of scenes contained in the chain:
0
D(S ) = ∑ d(si , si+1 ) (12)
|S0 |
i=1
I(S ) = ∑ imp(si )
0
(18)
The diversity d(si , si+1 ) of two scenes si and si+1 i=1
is estimated taking into account two factors: (a) do
The importance imp(si ) of a scene si is the ratio of
they have any characters in common, and (b) does
lead to support characters within that scene:
the sentiment change from one scene to the next:
∑c: c∈Csi ∧c∈main(M) 1
dchar (si , si+1 ) + dsen (si , si+1 ) imp(si ) = (19)
d(si , si+1 ) = (13) ∑c: c∈Csi 1
2
where dchar (si , si+1 ) and dsen (si , si+1 ) respectively where Csi is the set of characters present in scene si ,
denote character and sentiment similarity between and main(M) is the set of main characters in the
scenes. Specifically, dchar (si , si+1 ) is the relative movie.5 I(si ) is 0 if a scene does not contain any
character overlap between scenes si and si+1 : main characters, and 1 if it contains only main char-
acters (see Section 5 for how main(M) is inferred).
|Csi ∩Csi+1 |
dchar (si , si+1 ) = 1 − (14) Optimal Chain Selection We use Linear Pro-
|Csi ∪Csi+1 |
gramming to efficiently find a good chain. The ob-
dchar will be 0 if two scenes share the same charac- jective is to maximize Equation (2), i.e., the sum
ters and 1 if no characters are shared. Analogously, of the terms for progress, diversity and importance,
4 We could also solve for r recursively which would be 5 Whether scenes are important if they contain many main

preferable for large graphs, since the performed matrix inver- characters is an empirical question in its own right. For our
sion is computationally expensive. purposes, we assume that this relation holds.

1070
subject to their weights λ. We add a constraint corre- clubs her over the head contains the relation
sponding to the compression rate, i.e., the number of clubs(MAN,CATHERINE). Pronouns are resolved to
scenes to be selected and enforce their linear order their antecedent using the Stanford coreference res-
by disallowing non-consecutive combinations. We olution system (Lee et al., 2011).
use GLPK6 to solve the linear problem.
Sentiment We labeled lexical items in screenplays
5 Implementation with sentiment values using the AFINN-96 lexi-
con (Nielsen, 2011), which is essentially a list of
In this section we discuss several aspects of the im- words scored with sentiment strength within the
plementation of the model presented in the previous range [−5, +5]. The list also contains obscene words
section. We explain how interactions are extracted (which are often used in movies) and some Internet
and how sentiment is calculated. We also present our slang. By summing over the sentiment scores of in-
method for identifying main characters and estimat- dividual words, we can work out the sentiment of an
ing the weights ws,c and wc,s in the bipartite graph. interaction between two characters, the sentiment of
a scene (see Equation (17)), and even the sentiment
Interactions The notion of interaction underlies between characters (e.g., who likes or dislikes whom
many aspects of the model defined in the previous in the movie in general).
section. For instance, interaction counts are required
to estimate the weights ws,c in the bipartite graph of Main Characters The progress term in our sum-
the progression term (see Equation (5)), and in defin- marization objective crucially relies on characters
ing diversity (see Equations (15)–(17)). As we shall and their importance (see the weight wc,s in Equa-
see below, interactions are also important for identi- tion (4)). Previous work (Weng et al., 2009; Lin
fying main characters in a screenplay. et al., 2013) extracts social networks where nodes
We use the term interaction to refer to conversa- correspond to roles in the movie, and edges to their
tions between two characters, as well as their rela- co-occurrence. Leading roles (and their communi-
tions (e.g., if a character kills another). For con- ties) are then identified by measuring their centrality
versational interactions, we simply need to iden- in the network (i.e., number of edges terminating in
tify the speaker generating an utterance and the lis- a given node).
tener. Speaker attribution comes for free in our It is relatively straightforward to obtain a so-
case, as speakers are clearly marked in the text (see cial network from a screenplay. Formally, for each
Figure 1). Listener identification is more involved, movie we define a weighted and undirected graph:
especially when there are multiple characters in a
G = {C, E}, : C = {c1 , . . . cn },
scene. We rely on a few simple heuristics. We as-
E = {(ci , c j , w)|ci , c j ∈ C, w ∈ N>0 }
sume that the previous speaker in the same scene,
who is different from the current speaker, is the lis- where vertices correspond to movie characters7 ,
tener. If there is no previous speaker, we assume and edges denote character-to-character interac-
that the listener is the closest character mentioned in tions. Figure 5 shows an example of a social net-
the speaker’s utterance (e.g., via a coreferring proper work for “The Silence of the Lambs”. Due to lack
name or a pronoun). In cases where we cannot find of space, only main characters are displayed, how-
a suitable listener, we assume the current speaker is ever the actual graph contains all characters (42 in
the listener. this case). Importantly, edge weights are not nor-
We obtain character relations from the output of malized, but directly reflect the strength of associa-
a semantic role labeler. Relations are denoted by tion between different characters.
verbs whose ARG0 and ARG1 roles are charac- We do not solely rely on the social net-
ter names. We extract relations from the dialogue work to identify main characters. We esti-
but also from scene descriptions. For example, mate P(c ∈ main(M)), the probability of c being a
in Figure 1 the description Suddenly, [...] he leading character in movie M, using a Multi Layer
6 https://www.gnu.org/software/glpk/ 7 We assume one node per speaking role in the script.

1071
39
Mr. Gumb 6 Experimental Setup
44
24
Catherine 6 Clarice 377 Crawford Gold Standard Chains The development and
1 480 5 tuning of the chain extraction model presented in
Dr. Lecter 16 1
Section 4 necessitates access to a gold standard of
45
1
33 31
key scene chains representing the movie’s most im-
13

Chilton Sen. Martin


portant content. Our experiments concentrated on a
sample of 95 movies (comedies and thrillers) from
Figure 5: Social network for “The Silence of the Lambs”; the ScriptBase corpus (Section 3). Performing the
edge weights correspond to absolute number of interac- scene selection task for such a big corpus manually
tions between nodes. would be both time consuming and costly. Instead,
we used distant supervision based on Wikipedia to
automatically generate a gold standard.
Specifically, we assume that Wikipedia plots are
Perceptron (MLP) and several features pertaining to representative of the most important content in a
the structure of the social network and the script text movie. Using the alignment algorithm presented
itself. A potential stumbling block in treating char- in Nelken and Shieber (2006), we align script sen-
acter identification as a classification task is obtain- tences to Wikipedia plot sentences and assume that
ing training data, i.e., a list of main characters for scenes with at least one alignment are part of the
each movie. We generate a gold-standard by assum- gold chain of scenes. We obtain many-to-many
ing that the characters listed under Wikipedia’s Cast alignments using features such as lemma overlap
section (or an equivalent section, e.g., Characters) and word stem similarity. When evaluated on four
are the main characters in the movie. movies8 (from the training set) whose content was
manually aligned to Wikipedia plots, the aligner
Examples of the features we used for the clas-
achieved a precision of .53 at a recall rate of .82 at
sification task include the barycenter of a charac-
deciding whether a scene should be aligned. Scenes
ter (i.e., the sum of its distance to all other charac-
are ranked according to the number of alignments
ters), PageRank (Page et al., 1999), an eigenvector-
they contain. When creating gold chains at differ-
based centrality measure, absolute/relative interac-
ent compression rates, we start with the best-ranked
tion weight (the sum of all interactions a character is
scenes and then successively add lower ranked ones
involved in, divided by the sum of all interactions in
until we reach the desired compression rate.
the network), absolute/relative number of sentences
uttered by a character, number of times a charac- System Comparison In our experiments we com-
ter is described by other characters (e.g., He is a pared our scene extraction model (SceneSum)
monster or She is nice), number of times a char- against three baselines. The first baseline was based
acter talks about other characters, and type-token- on the minimum overlap (MinOv) of characters in
ratio of sentences uttered by the character (i.e., rate consecutive scenes and corresponds closely to the
of unique words in a character’s speech). Using diversity term in our objective. The second base-
these features, the MLP achieves an F1 of 79.0% on line was based on the maximum overlap (MaxOv) of
the test set. It outperforms other classification meth- characters and approximates the importance term in
ods such as Naive Bayes or logistic regression. Us- our objective. The third baseline selects scenes at
ing the full-feature set, the MLP also obtains perfor- random (averaged over 1,000 runs). Parameters for
mance superior to any individual measure of graph our models were tuned on the training set, weights
connectivity. for the terms in the objective were optimized to the
following values: λP = 1.0, λD = 0.3, and λI = 0.1.
Aside from Equation (4), lead characters also ap-
We set the restart probability of our random walker
pear in Equation (19), which determines scene im-
portance. We assume a character c ∈ main(M) if it 8 “Cars 2”, “Shrek”, “Swordfish”, and “The Silence of the
is predicted by the MLP with a probability ≥ 0.5. Lambs”.

1072
1. Why does Trevor leave New York and where does 10% 20% 30% 40% 50%
he move to? MaxOv 0.40 0.50 0.58 0.64 0.71
2. What is KOS, who is their leader, and why is he MinOv 0.13 0.27 0.40 0.53 0.66
attending high school? SceneSum 0.23 0.37 0.50 0.60 0.68
3. What happened to Cesar’s finger, how did he Random 0.10 0.20 0.30 0.40 0.50
eventually die?
4. Who killed Benny and how does Ellen find out? Table 2: Model performance on automatically generated
5. Who is Rita and what becomes of her? gold standard (test set) at different compression rates.

Table 1: Questions for the movie “One Eight Seven”.


ditions: using our model, the two baselines based
on minimum and maximum character overlap, and
to ε = 0.5, and the sigmoid scaling factor in our di- the random system. All models were assessed at the
versity term to k = −1.2. same compression rate of 20% which seems realis-
tic in an actual application environment, e.g., com-
Evaluation We assessed the output of our model puter aided summarization. The scripts were prese-
(and comparison systems) automatically against the lected in an earlier AMT study where participants
gold chains described above. We performed ex- were asked to declare whether they had seen the
periments with compression rates in the range of movies in our test set (65 in total). We chose the
10% to 50% and measured performance in terms screenplays which had received the least viewings
of F1. In addition, we also evaluated the quality of so as to avoid eliciting answers based on familiar-
the extracted scenes as perceived by humans, which ity with the movie. A total of 29 participants, all
is necessary, given the approximate nature of our self-reported native English speakers, completed the
gold standard. We adopted a question-answering Q&A task. The answers provided by the subjects
(Q&A) evaluation paradigm which has been used were scored against an answer key. A correct an-
previously to evaluate summaries and document swer was marked with a score of one, and zero oth-
compression (Morris et al., 1992; Mani et al., 2002; erwise. In cases where more answers were required
Clarke and Lapata, 2010). Under the assumption per question, partial scores were awarded to each
that the summary is to function as a replacement for correct answer (e.g., 0.5). The score for a summary
the full script, we can measure the extent to which is the average of its question scores.
it can be used to find answers to questions which
have been derived from the entire script and are rep- 7 Results
resentative of its core content. The more questions
a hypothetical system can answer, the better it is at Table 2 shows the performance of SceneSum, our
summarizing the script as a whole. scene extraction model, and the three comparison
Two annotators were independently instructed to systems (MaxOv, MinOv, Random) on the auto-
read scripts (from our test set) and create Q&A pairs. matic gold standard at five compression rates. As
The annotators generated questions relating to the can be seen, MaxOv performs best in terms of F1,
plot of the movie and the development of its charac- followed by SceneSum. We believe this is an ar-
ters, requiring an unambiguous answer. They com- tifact due to the way the gold standard was cre-
pared and revised their Q&A pairs until a common ated. Scenes with large numbers of main charac-
agreed-upon set of five questions per movie was ters are more likely to figure in Wikipedia plot sum-
reached (see Table 1 for an example). In addition, maries and will thus be more frequently aligned. A
for every movie we asked subjects to name the main chain based on maximum character overlap will fo-
characters, and summarize its plot (in no more than cus on such scenes and will agree with the gold stan-
four sentences). Using Amazon Mechanical Turk dard better compared to chains which take additional
(AMT)9 , we elicited answers for eight scripts (four script properties into account.
comedies and thrillers) in four summarization con- We further analyzed the scenes selected by Sce-
neSum and the comparison systems with respect to
9 https://www.mturk.com/ their position in the script. Table 3 shows the av-

1073
Beginning Middle End Movies MaxOv MinOv SceneSum Random
MaxOv 33.95 34.89 31.16 Nightmare 3 69.18 74.49 60.24 56.33
Little Athens 34.92 31.75 36.90 33.33
MinOv 34.30 33.91 31.80 Living in Oblivion 40.95 35.00 60.00 30.24
SceneSum 35.30 33.54 31.16 Mumford 72.86 60.00 30.00 54.29
Random 34.30 33.91 31.80 One Eight Seven 47.30 38.89 67.86 30.16
Anniversary Party 45.39 56.35 62.46 37.62
We Own the Night 28.57 32.14 52.86 28.57
Table 3: Average percentage of scenes taken from the While She Was Out 72.86 75.71 85.00 45.71
beginning, middle and ends of movies, on automatic gold
All Questions 51.51 50.54 56.91 39.53
standard test set. Five Questions 51.00 53.13 57.38 36.88
Plot Question 60.00 56.88 73.75 55.00
Characters Question 45.54 37.34 37.75 31.29
erage percentage of scenes selected from the be-
ginning, middle, and end of the movie (based on Table 4: Percentage of questions answered correctly.
an equal division of the number of scenes in the
screenplay). As can be seen, the number of se-
lected scenes tends to be evenly distributed across ple dream sequences, whereas “While She was Out”
the entire movie. SceneSum has a slight bias to- contains only a few characters and a series of im-
wards the beginning of the movie which is probably portant scenes towards the end. Despite this variety,
natural, since leading characters appear early on, as SceneSum performs consistently better in our task-
well as important scenes introducing essential story based evaluation.
elements (e.g., setting, points of view).
8 Conclusions
The results of our human evaluation study are
summarized in Table 4. We observe that SceneSum In this paper we have developed a graph-based
summaries are overall more informative compared model for script summarization. We formalized
to those created by the baselines. In other words, the process of generating a shorter version of a
AMT participants are able to answer more ques- screenplay as the task of finding an optimal chain
tions regarding the story of the movie when reading of scenes, which are diverse, important, and ex-
SceneSum summaries. In two instances (“A Night- hibit logical progression. A large-scale evaluation
mare on Elm Street 3” and “Mumford”), the over- based on a question-answering task revealed that our
lap models score better, however, in this case the method produces more informative summaries com-
movies largely consist of scenes with the same char- pared to several baselines. In the future, we plan
acters and relatively little variation (“A Nightmare to explore model performance in a wider range of
on Elm Street 3”), or the camera follows the main movie genres as well as its applicability to other
lead in his interactions with other characters (“Mum- NLP tasks (e.g., book summarization or event ex-
ford”). Since our model is not so character-centric, traction). We would also like to automatically deter-
it might be thrown off by non-character-based terms mine the compression rate which should presumably
in its objective, leading to the selection of unfavor- vary according to the movie’s length and content. Fi-
able scenes. Table 4 also presents a break down of nally, our long-term goal is to be able to generate
the different types of questions answered by our par- loglines as well as movie plot summaries.
ticipants. Again, we see that in most cases a larger
percentage is answered correctly when reading Sce- Acknowledgments We would like to thank Rik
neSum summaries. Sarkar, Jon Oberlander and Annie Louis for their
Overall, we observe that SceneSum extracts valuable feedback. Special thanks to Bharat Am-
chains which encapsulate important movie content bati, Lea Frermann, and Daniel Renshaw for their
across the board. We should point out that al- help with system evaluation.
though our movies are broadly classified as come-
References
dies and thrillers, they have very different structure
and content. For example, “Little Athens” has a Bamman, David, Brendan O’Connor, and Noah A.
very loose plotline, “Living in Oblivion” has multi- Smith. 2013. Learning Latent Personas of

1074
Film Characters. In Proceedings of the 51st Lu, Zheng and Kristen Grauman. 2013. Story-
Annual Meeting of the Association for Compu- Driven Summarization for Egocentric Video. In
tational Linguistics. Sofia, Bulgaria, pages 352– Proceedings of the 2013 IEEE Conference on
361. Computer Vision and Pattern Recognition. Port-
Bamman, David, Ted Underwood, and A. Noah land, OR, USA, pages 2714–2721.
Smith. 2014. A Bayesian Mixed Effects Model Mani, Inderjeet, Gary Klein, David House, Lynette
of Literary Character. In Proceedings of the 52nd Hirschman, Therese Firmin, and Beth Sundheim.
Annual Meeting of the Association for Computa- 2002. SUMMAC: A Text Summarization Evalua-
tional Linguistics. Baltimore, MD, USA, pages tion. Natural Language Engineering 8(1):43–68.
370–379. Manning, Christopher, Mihai Surdeanu, John Bauer,
Björkelund, Anders, Love Hafdell, and Pierre Jenny Finkel, Steven Bethard, and David Mc-
Nugues. 2009. Multilingual Semantic Role La- Closky. 2014. The Stanford CoreNLP Natural
beling. In Proceedings of the 13th Conference Language Processing Toolkit. In Proceedings of
on Computational Natural Language Learning: 52nd Annual Meeting of the Association for Com-
Shared Task. Boulder, Colorado, pages 43–48. putational Linguistics: System Demonstrations.
Clarke, James and Mirella Lapata. 2010. Discourse pages 55–60.
Constraints for Document Compression. Compu- Michel, Jean-Baptiste, Yuan Kui Shen, aviva
tational Linguistics 36(3):411–441. Presser Aiden, Adrian Veres, Matthew K. Gray,
Elsner, Micha. 2012. Character-based kernels for The Google Books Team, Joseph P. Pickett, Dale
novelistic plot structure. In Proceedings of the Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant,
13th Conference of the European Chapter of the Steven Pinker, Martin A. Nowak, and Erez Liber-
Association for Computational Linguistics. Avi- man Aiden. 2010. Quantitative Analysis of Cul-
gnon, France, pages 634–644. ture Using Millions of Digitized Books. Science
Elson, David K., Nicholas Dames, and Kathleen R. 331(6014):176–182.
McKeown. 2010. Extracting Social Networks Monaco, James. 1982. How to Read a Film: The
from Literary Fiction. In Proceedings of the 48th Art, Technology, Language, History and Theory of
Annual Meeting of the Association for Computa- Film and Media. OUP, New York, NY, USA.
tional Linguistics. Uppsala, Sweden, pages 138– Morris, A., G. Kasper, and D. Adams. 1992.
147. The Effects and Limitations of Automated Text
Kim, Jun-Seong, Jae-Young Sim, and Chang-Su Condensing on Reading Comprehension Perfor-
Kim. 2014. Multiscale Saliency Detection Us- mance. Information Systems Research 3(1):17–
ing Random Walk With Restart. IEEE Transac- 35.
tions on Circuits and Systems for Video Technol- Mosteller, Frederick and David Wallace. 1964. In-
ogy 24(2):198–210. ference and Dispituted Authorship: The Federal-
Lee, Heeyoung, Yves Peirsman, Angel Chang, ists. Addison-Wesley, Boston, MA, USA.
Nathanael Chambers, Mihai Surdeanu, and Dan Nalisnick, T. Eric and S. Henry Baird. 2013.
Jurafsky. 2011. Stanford’s Multi-Pass Sieve Character-to-Character Sentiment Analysis in
Coreference Resolution System at the CoNLL- Shakespeare’s Plays. In Proceedings of the 51st
2011 Shared Task. In Proceedings of the 15th Annual Meeting of the Association for Computa-
Conference on Computational Natural Language tional Linguistics. Sofia, Bulgaria, pages 479–
Learning: Shared Task. Portland, OR, USA, 483.
pages 28–34. Nelken, Rani and Stuart Shieber. 2006. Towards
Lin, C., C. Tsai, L. Kang, and Weisi Lin. 2013. Robust Context-Sensitive Sentence Alignment for
Scene-Based Movie Summarization via Role- Monolingual Corpora. In Proceedings of the 11th
Community Networks. IEEE Transactions Conference of the European Chapter of the As-
on Circuits and Systems for Video Technology sociation for Computational Linguistics. Trento,
23(11):1927–1940. Italy, pages 161–168.

1075
Nielsen, Finn Arup. 2011. A new ANEW: Eval-
uation of a word list for sentiment analysis in
microblogs. In Proceedings of the ESWC2011
Workshop on ’Making Sense of Microposts’: Big
Things Come in Small Packages. Heraklion,
Crete, pages 93–98.
Page, Lawrence, Sergey Brin, Rajeev Motwani, and
Terry Winograd. 1999. The pagerank citation
ranking: Bringing order to the web. Technical Re-
port 1999-66, Stanford InfoLab. Previous number
SIDL-WP-1999-0120.
Rasheed, Z., Y. Sheikh, and M. Shah. 2005. On the
Use of Computable Features for Film Classifica-
tion. IEEE Transactions on Circuits and Systems
for Video Technology 15(1):52–64.
Reed, Todd, editor. 2004. Digital Image Sequence
Processing. Taylor & Francis.
Sang, Jitao and Changsheng Xu. 2010. Character-
based Movie Summarization. In Proceedings
of the International Conference on Multimedia.
Firenze, Italy, pages 855–858.
Tong, Hanghang, Christos Faloutsos, and Jia-Yu
Pan. 2006. Fast Random Walk with Restart and
Its Applications. In Proceedings of the Sixth In-
ternational Conference on Data Mining. Hong
Kong, pages 613–622.
Weng, Chung-yi, Wei-Ta Chu, and Ja ling Wu. 2009.
Rolenet: Movie Analysis from the perspective of
Social Networks. IEEE Transactions on Multime-
dia 11(2):256–271.

1076

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy