0% found this document useful (0 votes)
46 views5 pages

Movie Genre Classification

The document discusses classifying movie genres from plot summaries using bidirectional LSTM (Bi-LSTM). It collected movie plot summaries and genres from databases and analyzed the data. It then used word embeddings and Bi-LSTM networks to classify the genres of individual sentences in the plot summaries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views5 pages

Movie Genre Classification

The document discusses classifying movie genres from plot summaries using bidirectional LSTM (Bi-LSTM). It collected movie plot summaries and genres from databases and analyzed the data. It then used word embeddings and Bi-LSTM networks to classify the genres of individual sentences in the plot summaries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/322929271

Movie Genre Classification from Plot Summaries Using Bidirectional LSTM

Conference Paper · February 2018


DOI: 10.1109/ICSC.2018.00043

CITATIONS READS
41 4,637

2 authors:

Ali Mert Ertugrul Pinar KARAGOZ


University of Pittsburgh Middle East Technical University
22 PUBLICATIONS 160 CITATIONS 173 PUBLICATIONS 2,284 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Ali Mert Ertugrul on 30 March 2018.

The user has requested enhancement of the downloaded file.


Movie Genre Classification from Plot Summaries
using Bidirectional LSTM
Ali Mert Ertugrul Pinar Karagoz
Graduate School of Informatics Computer Engineering Department
Middle East Technical University Middle East Technical University
Ankara, Turkey Ankara, Turkey
alimert@metu.edu.tr karagoz@ceng.metu.edu.tr

Abstract—Movie plot summaries are expected to reflect the In the literature, there exists a number of studies that
genre of movies since many spectators read the plot summaries perform movie genre classification using a variety of sources
before deciding to watch a movie. In this study, we perform movie including visual, audio and textual features from trailers,
genre classification from plot summaries of movies using bidirec-
tional LSTM (Bi-LSTM). We first divide each plot summary of posters and texts. Among the studies that employ visual and/or
a movie into sentences and assign the genre of corresponding audio features, Rasheed et al. [2], [3] utilized visual features
movie to each sentence. Next, using the word representations of including average shot length, color variance, motion content
sentences, we train Bi-LSTM networks. We estimate the genres and lighting key, from movie previews to predict movie genres.
for each sentence separately. Since plot summaries generally Yuan et al. [4] also employed visual features from videos
contain multiple sentences, we use majority voting for the final
decision by considering the posterior probabilities of genres including temporal and spatial ones to classify genres using
assigned to sentences. Our results reflect that, training Bi-LSTM hierarchical SVM. Zhou et al. [5] represented movie trailers
network after dividing the plot summaries into their sentences using bag-of-visual-words model with shot classes as vocab-
and fusing the predictions for individual sentences outperform ularies and utilized them for genre classification. Moreover,
training the network with the whole plot summaries with the Huang et al. [6] extracted both visual and audio features from
limited amount of data. Moreover, employing Bi-LSTM performs
better compared to basic Recurrent Neural Networks (RNNs) and movie trailers using a meta-heuristic optimization algorithm
Logistic Regression (LR) as a baseline. and performed genre classification. Ekenel et al. [7] com-
Index Terms—Movie genre classification; LSTM; Recurrent bined low level audio and visual features including signal
Neural Networks (RNNs) energy, fundamental frequency for audio; color and texture-
based features for visual representation to conduct multi-
I. I NTRODUCTION AND BACKGROUND modal genre classification. Ivasic et al. [8] employed low-
level visual features based on colors and edges obtained from
Movie plot summaries reflect the genre of the movies such movie posters, then used them to classify posters into genres.
as action, drama, horror, etc., such that people can easily Furthermore, Simoes et al. [9] and Wehrmann et al. [10] used
capture the genre information of the movies from their plot convolutional neural networks (CNNs) based architectures to
summaries. Especially, several sentences in the plot summaries perform movie genre classification from movie trailers instead
are high representatives of genre of the movie. People usually of using hand-crafted features.
read the plot summaries of movies before watching them to In addition to efforts employing visual and audio fea-
get an idea about the movie. Therefore, plot summaries are tures, several studies used textual sources including plots
written in such a way that they convey the genre information and synopses for movie genre classification. Fu et al. [11]
to the people. For example, if the plot mentions humorous utilized vector space model to represent synopses and used this
obstacles that must be overcome before lovers eventually representation as input for SVM. Hong et al. [12] extracted
come together, the movie is a likely to be a romantic-comedy textual features from social tags via social websites. Then,
[1]. In this regard, there is a hidden representation of genre they applied probabilistic latent semantic analysis (PLSA) to
information in the movie plot summaries. In this study, we incorporate textual, visual and audio features for genre classi-
aim to learn this hidden representation. In other words, our fication. Furthermore, Arevalo et al. [13] proposed a gated unit
purpose is to classify the genres of the movies from their plot for multi-modal classification task and they performed movie
summaries using Bi-LSTM by considering genre information genre classification using poster and plot information with a
represented by each individual sentence. With this method, basic recurrent neural networks (RNNs) to represent plot infor-
representations of plot summaries can be used for movie mation. Similarly, Pham et al. [14] proposed column network
recommendation. In addition to that, it can be inferred whether for collective classification and they evaluated this network on
a plot summary actually reflects the genre of the movie it movie genre classification task using plot summaries. They
belongs to. Therefore, this method can be beneficial during represented plot summaries as Bag-of-Words (BoW) vector
the preparation of movie plots. of 1.000 most frequent words. Aforementioned studies using
TABLE I
D ISTRIBUTION OF THE SAMPLES FOR EACH GENRE

Thriller Horror Comedy Drama


# of Full Plots 1590 1590 1590 1590
# of Sentences 5421 5437 5480 5940

plots or synopses either did not benefit from the power of


deep learning for sequence modeling of textual data or they
obtained textual representations using basic RNNs. Moreover,
these studies performed document-level use of text for genre
classification.

II. M ETHOD
A. Data Collection and Pre-processing
In preprocessing step, we first obtained movie names from
MovieLens1 datasets. We further collected necessary informa-
tion of the movies including full plot summaries (input) and
genres (ground-truth) through OMDb API2 using correspond- Fig. 1. Bi-LSTM Network Architecture for Movie Genre Classification
ing movie names as inputs.
Within the scope of this study, we selected four types
of genres, namely Thriller, Horror, Comedy and Drama for are in dimension of 300 and they were obtained using the
movie genre classification. Since the number of the movies in skip-gram model. Therefore, the relationships between words
the database vary for each genre, we randomly sampled movies and their context are modeled beforehand. As a result, the row
for each of them uniformly. However, the total number of input is converted to continuous vector representation and then
sentences in the plot summaries changes for each genre as the fed into the network. Note that, for any word in the plots that
plots may include different number of sentences. Accordingly, does not have a corresponding word vector in the dictionary,
in the document-level classification task (using whole plots as a random word vector in dimension of 300 was generated.
inputs), we have uniformly sampled the data based on their C. Model
genres. On the other hand, the data for the sentence-level
The LSTM model [16] is an RNN architecture, which
classification (using sentences as inputs), is unbalanced for the
is capable of learning complex dependencies across time.
training. As a result, we obtained a total of 6.360 movies and
LSTM RNNs address the vanishing gradient problem of basic
22.278 sentences for the genre classification task, respectively.
RNNs by employing gating functions together with the state
The Table I shows the distribution of the number of the movies
dynamics. In this study, we use Bi-LSTM network. It is
and the total number of sentences for each genre in the dataset.
composed of two LSTM neural networks, a forward LSTM to
Before training a model for classification, we conducted
model the preceding contexts, and a backward LSTM to model
a pre-processing step. We first converted all texts in the
the following contexts respectively. The architecture used in
plots to lowercase. Next, we eliminated all punctuation marks
the study is given in Fig. 1.
except the ones that separate the sentences. Additionally, we
Note that, each plot summary of a movie is divided into
eliminated the stop-words. We also divided plot summaries
sentences and the genre of corresponding movie is assigned
into sentences for the sentence-level classification task. We
to each sentence. During training, each input (sentence) is
performed all tasks in pre-processing step using NLTK3 .
represented as the words it includes and continuous word
B. Text Representation representations are obtained using [15]. It is useful when the
limited data is used since semantic and syntactic relationship
The purpose of this step is to represent semantic and
among the words are captured. We name this representation in
syntactic relationship among the words, which improves the
the architecture in Fig. 1 as embedding layer. Then, the word
performance where the training data is limited. After pre-
representations are fed into the Bi-LSTM network. Practically,
processing step, each input (full plot for document-level and
a linear projection layer is put between Bi-LSTM and softmax
sentence for sentence-level) is represented using continuous
layers. Finally, a softmax layer, which is stacked on the top
vector representation. In order to do that, the pre-trained word
of the Bi-LSTM, takes the learned representations of the last
vectors that are proposed by [15] are used. The word vectors
output of Bi-LSTM as the input, and returns the classification
are obtained as a result of training on Wikipedia. These vectors
probabilities for each movie genre.
1 https://grouplens.org/datasets/movielens/ In order to train the model, we minimize the negative log-
2 http://www.omdbapi.com/ likelihood of the estimation error, where the loss function is
3 http://www.nltk.org/ given in Eq. 1 below.
separately and then averages the results. The calculations of
C
1 X micro precision and micro recall are given in Eq. 2a and 2b
L(θ) = − yi log(ŷi ), (1)
C i=1 whereas equations for macro precision and macro recall are
shown in Eq. 3a and 3b.
where C is the number of the target classes, y is the one-
PC
hot representation of the ground truth, and ŷ is the estimated i=1 tpi
probability distribution assigned to the genres by the model. pmicro = PC PC (2a)
i=1 tpi + i=1 f pi
D. Classification PC
micro i=1 tpi
Since we divide the plot summaries into sentences, we r = PC PC (2b)
estimate the class labels for each of them separately during i=1 tpi + i=1 f ni

test time. However, we need to assign a single class label 1 X


C
tpi
for any given plot summary. Therefore, we fuse the decisions pmacro = (3a)
C i=1 tpi + f pi
of the model for each sentence to obtain a final class label
for the corresponding plot summary. For that purpose, we use C
1 X tpi
majority voting to obtain the final decision by considering rmacro = (3b)
the posterior probabilities of genres assigned to the sentences. C i=1 tpi + f ni
If the majority voting outputs a single label, then the plot where C is the number of target labels, p is the precision, r
summary is assigned to that label. On the other hand, if more is the recall, tpi , f pi and f ni stand for the number of true
than one genre have the maximum number of votes, we assign positives, false positives and false negatives for the ith target
the genre to the plot summary, whose average class posterior label, respectively.
probability is the maximum among others. The steps of the For both micro and macro measures, we compute the f-score
genre label assignment process are given in Algorithm 1. as f1 = 2×p×r
p+r . Note that, since we perform our experiments
on a single dataset, micro precision, micro recall and micro
Algorithm 1 Assignment of genre label to a plot summary f-score values are all equal and they represent the accuracy of
during prediction. the classifier. Accordingly, we only present the micro f-score
Require: Gp ∈ Rn×c , genre posterior probability vectors of for the micro results.
all sentences of plot summary p, where n is the number
III. E XPERIMENTS AND R ESULTS
of sentences of p and c is the number of target classes.
In this study, we perform multi-class movie genre classifi-
Require: Gp,s ∈ Rc , genre posterior probability vector of
cation from plot summaries using Bi-LSTM where the class
sentence s of plot summary p. labels are Thriller, Horror, Comedy and Drama. We predict
Ensure: ŷp , estimated genre label for plot summary p. the genre of a movie by combining the decisions given for
sentences of its plot summary, which is called sentence-level
avg
1: Gp ← getAverageProbabilities(Gp ) approach. On the other hand, when we use the whole plot
2: label count ← zeros(c) summary for training without dividing it into its sentences,
3: for s ∈ p do we call it as document-level approach.
4: estimated ← index(max(Gp,s )) In order to measure the performance of the proposed
method, we compare it with two baseline methods under
5: label count(estimated) ← label count(estimated) + 1 different settings. First, we train an ordinary RNNs model
6: end for using sentence-level and document-level approaches. We use
7: candidate labels ← indices(max(label count)) the same representations used for our method while training
8: if candidate labels.length() > 1 then RNNs model. Second, we train a logistic regression (LR)
9: ŷp ← findLabelOfHighestProb(Gp , candidate labels) classifier using both settings in a similar way. While training
avg
LR model, we obtain Bag-of-Words (BoW) representation for
10: else
each plot summary or sentence for document-level or sentence-
11: ŷp ← candidate labels.first() level approach, respectively. Then we fill the vector of plot
12: end if summary or sentence with term frequency inverse document
frequency (TD-IDF) [17] values. IDF weight for each word
is calculated using only training dataset. There exist 27.336
E. Evaluation Measures unique words in the training dataset. Accordingly, each input
Since the movie genre classification task is a multi-class is represented by 27.336 dimensional feature vector. Moreover,
classification problem, we used two versions for the averages we also compare the results of our method with Bi-LSTM
of the f-score (f1 ), which are micro and macro. The former model trained with document-level approach. Note that, for all
computes the f-score using all estimations at once. On the methods, the same pre-processing step is applied explained in
other hand, the latter computes the f-score for each genre Section II-A.
TABLE II
M OVIE G ENRE C LASSIFICATION R ESULTS (%) MEASURED BY P RECISION , IV. C ONCLUSION
R ECALL AND F- SCORE In this study, we perform movie genre classification from
Macro Macro Macro Micro plot summaries using Bi-LSTM network. Instead of using
Pre. Rec. f1 f1 whole plot summary as input, we divide it into its sentences
and train the network using those sentences. During prediction,
Bi-LSTM-s 67.75 67.61 67.68 67.61
we fuse the decisions of the model for each sentence to obtain
Bi-LSTM-d 65.92 64.62 65.26 64.62
a final class label. Results show that our method significantly
RNN-s 62.79 62.42 62.61 62.42
outperforms ordinary RNNs and LR. Also, we observe that
RNN-d 57.72 55.50 56.59 55.50
using sentences to label the genre of a movie performs better
LR-s 63.05 62.74 62.89 62.74
than using whole plot summary for recurrent neural networks
LR-d 64.03 63.84 63.94 63.84
when the data is limited. As a future work, we are planning
to increase the size of the dataset and extend our method for
multi-label, multi-class movie genre classification.
Setup. We divide the dataset into training (70%), validation
R EFERENCES
(15%) and test (15%) subsets. For each method, we select
the model that performs the best on the validation set and [1] T. B. Cargal, Hearing a film, seeing a sermon: Preaching and popular
movies. Westminster John Knox Press, 2007.
report its performance on the test set. We train all Bi-LSTM [2] Z. Rasheed, Y. Sheikh, and M. Shah, “Semantic film preview classifica-
networks and RNNs with stochastic gradient descent (SGD) tion using low-level computable features,” in 3rd International Workshop
using Adam optimizer [18]. The learning rate is set to 0.005. on Multimedia Data and Document Engineering (MDDE-2003), 2003.
[3] ——, “On the use of computable features for film classification,” IEEE
For both Bi-LSTM and RNNs experiments, the hidden layer Transactions on Circuits and Systems for Video Technology, vol. 15,
size and number of hidden units are set to {1, 2, 3} and no. 1, pp. 52–64, 2005.
{8, 16, 32, 64, 128}, respectively. Moreover, the regularization [4] X. Yuan, W. Lai, T. Mei, X.-S. Hua, X.-Q. Wu, and S. Li, “Automatic
video genre categorization using hierarchical svm,” in Image Processing,
parameter C for LR experiment is set to {0.01, 0.1, 1, 10, 100}. 2006 IEEE International Conference on. IEEE, 2006, pp. 2905–2908.
Table II shows the results of the experiments. Where suf- [5] H. Zhou, T. Hermans, A. V. Karandikar, and J. M. Rehg, “Movie
genre classification via scene categorization,” in Proceedings of the
fix s represents the sentence-level approach, suffix d stands 18th ACM International Conference on Multimedia, ser. MM ’10.
for document-level approach. According to the results, our New York, NY, USA: ACM, 2010, pp. 747–750. [Online]. Available:
method, Bi-LSTM-s, significantly outperforms the other meth- http://doi.acm.org/10.1145/1873951.1874068
[6] Y.-F. Huang and S.-H. Wang, “Movie genre classification using svm with
ods in terms of both macro f-score and micro f-score which audio and video features,” Active Media Technology, pp. 1–10, 2012.
are 67.78% and 67.61%, respectively. We also observe that [7] H. K. Ekenel and T. Semela, “Multimodal genre classification of tv
sentence-level approach importantly boosts the performance programs and youtube videos,” Multimedia tools and applications,
vol. 63, no. 2, pp. 547–567, 2013.
when the recurrent neural networks are used for the classi- [8] M. Ivašić-Kos, M. Pobar, and L. Mikec, “Movie posters classification
fication. Bi-LSTM-s and RNN-s perform superior than the into genres based on low-level features,” in 37th International Conven-
document-level settings of the same networks. On the other tion on Information and Communication Technology, Electronics and
Microelectronics (MIPRO), 2014, 2014.
hand, document-level approach gives slightly better perfor- [9] G. S. Simões, J. Wehrmann, R. C. Barros, and D. D. Ruiz, “Movie genre
mance compared to sentence-level approach when LR is used classification with convolutional neural networks,” in Neural Networks
for training. The reason for this can be the way of representa- (IJCNN), 2016 International Joint Conference on. IEEE, 2016, pp.
259–266.
tion we use for the inputs. Since we represent the inputs using [10] J. Wehrmann and R. C. Barros, “Convolutions through time for multi-
BoW model, the representations are sparser in sentence-level label movie genre classification,” in Proceedings of the Symposium on
approach. This may prevent the classifier to learn satisfactorily Applied Computing. ACM, 2017, pp. 114–119.
[11] Z. Fu, B. Li, J. Li, and S. Wei, “Fast film genres classification combining
in case of limited data for training. poster and synopsis,” in International Conference on Intelligent Science
We also share the values of precision, recall and f-score of and Big Data Engineering. Springer, 2015, pp. 72–81.
Bi-LSTM-s method for each genre in Table III. According [12] H.-Z. Hong and J.-I. G. Hwang, “Multimodal plsa for movie genre
classification,” in International Workshop on Multiple Classifier Systems.
to the results, the proposed method performs better while Springer, 2015, pp. 159–167.
estimating the genre of Horror. On the other hand, the lowest [13] J. Arevalo, T. Solorio, M. Montes-y Gómez, and F. A. González,
performance is obtained while predicting the genre of Thriller. “Gated multimodal units for information fusion,” arXiv preprint
arXiv:1702.01992, 2017.
[14] T. Pham, T. Tran, D. Q. Phung, and S. Venkatesh, “Column networks
for collective classification.” in AAAI, 2017, pp. 2485–2491.
TABLE III [15] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word
P RECISION , R ECALL AND F- SCORE VALUES (%) FOR EACH GENRE vectors with subword information,” arXiv preprint arXiv:1607.04606,
OBTAINED USING B I -LSTM- S 2016.
[16] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
Precision Recall F-score Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[17] H. P. Luhn, “A statistical approach to mechanized encoding and search-
Thriller 68.46 55.98 61.59 ing of literary information,” IBM Journal of research and development,
Horror 76.22 78.62 77.4 vol. 1, no. 4, pp. 309–317, 1957.
[18] D. P. Kingma and J. B. Adam, “A method for stochastic optimization.
Comedy 64.71 69.18 66.87 2014,” arXiv preprint arXiv:1412.6980.
Drama 61.63 66.67 64.05

View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy