1 Introduction
In the field of natural language processing, an extractive summarization task can
be described as the selection of the most important sentences in a document.
Using different levels of compression, a summarized version of the document of
arbitrary length can be obtained.
TextRank is a graph-based extractive summarization algorithm. It is domain
and language independent since it does not require deep linguistic knowledge,
nor domain or language specific annotated corpora [16]. These features makes the
algorithm widely used: it performs well summarizing structured text like news
articles, but it has also shown good results in other usages such as summarizing
meeting transcriptions [8] and assessing web content credibility [1].
In this article we present different proposals for the construction of the graph
and report the results obtained with them.
The first sections of this article describe previous work in the area and an
overview of the TextRank algorithm. Then we present the variations and describe
the different metrics and dataset used for the evaluation. Finally we report the
results obtained using the proposed changes.
2 Previous work
The field of automated summarization has attracted interest since the late 50’s
[14]. Traditional methods for text summarization analyze the frequency of words
or sentences in the first paragraphs of the text to identify the most important
lexical elements. The mainstream research in this field emphasizes extractive ap-
proaches to summarization using statistical methods [4]. Several statistical mod-
els have been developed based on training corpora to combine different heuristics
using keywords, position and length of sentences, word frequency or titles [13].
Other methods are based in the representation of the text as a graph. The
graph-based ranking approaches consider the intrinsic structure of the texts in-
stead of treating texts as simple aggregations of terms. Thus it is able to capture
and express richer information in determining important concepts [19].
The selected text fragments to use in the graph construction can be phrases
[6], sentences [14], or paragraphs [18]. Currently, many successful systems adopt
the sentences considering the tradeoff between content richness and grammar
correctness. According to these approach the most important sentences are the
most connected ones in the graph and are used for building a final summary
[2]. To identify relations between sentences (edges for the graph) there are sev-
eral measures: overlapping words, cosine distance and query-sensitive similarity.
Also, some authors have proposed combinations of the previous with supervised
learning functions [19].
These algorithms use different information retrieval techniques to determine
the most important sentences (vertices) and build the summary [23]. The Tex-
tRank algorithm developed by Mihalcea and Tarau [17] and the LexRank al-
gorithm by Erkan and Radev [7] are based in ranking the lexical units of the
text (sentences or words) using variations of PageRank [20]. Other graph-based
ranking algorithms such as HITS [11] or Positional Function [10] may be also
3 TextRank
3.1 Description
The result of this process is a dense graph representing the document. From
this graph, PageRank is used to compute the importance of each vertex. The
most significative sentences are selected and presented in the same order as they
appear in the document as the summary.
4 Experiments
This section will describe the different modifications that we propose over the
original TextRank algorithm. These ideas are based in changing the way in which
distances between sentences are computed to weight the edges of the graph used
for PageRank. These similarity measures are orthogonal to the TextRank model,
thus they can be easily integrated into the algorithm. We found some of these
variations to produce significative improvements over the original algorithm.
where k and b are parameters. We used k = 1.2 and b = 0.75. avgDL is the
average length of the sentences in our collection.
This function definition implies that if a word appears in more than half the
documents of the collection, it will have a negative value. Since this can cause
problems in the next stage of the algorithm, we used the following correction
log(N − n(si ) + 0.5) − log(n(si ) + 0.5) if n(si ) > N/2
IDF (si ) = (3)
ε · avgIDF if n(si ) ≤ N/2
where ε takes a value between 0.5 and 0.30 and avgIDF is the average IDF
for all terms. Other corrective strategies were also tested, setting ε = 0 and using
simpler modifications of the classic IDF formula.
We also used BM25+, a variation of BM25 that changes the way long docu-
ments are penalized [15].
4.2 Evaluation
For testing the proposed variations, we used the database of the 2002 Document
Understanding Conference (DUC) [5]. The corpus has 567 documents that are
summarized to 20% of their size, and is the same corpus used in [17].
To evaluate results we used version 1.5.5 of the ROUGE package [12]. The
configuration settings were the same as those in DUC, where ROUGE-1, ROUGE-
2 and ROUGE-SU4 were used as metrics, using a confidence level of 95% and
applying stemming. The final result is an average of these three scores.
To check the correct behaviour of our test suite we implemented the refer-
ence method used in [17], which extracts the first sentences of each document.
We found the resulting scores of the original algorithm to be identical to those
reported in [17]: a 2.3% improvement over the baseline.
4.3 Results
We tested LCS, Cosine Sim, BM25 and BM25+ as different ways to weight
the edges for the TextRank graph. The best results were obtained using BM25
and BM25+ with the corrective formula shown in equation 3. We achieved
an improvement of 2.92% above the original TextRank result using BM25 and
ε = 0.25. The following chart shows the results obtained for the different varia-
tions we proposed.
The result of Cosine Similarity was also satisfactory with a 2.54% improve-
ment over the original method. The LCS variation also performed better than
the original TextRank algorithm with 1.40% total improvement.
The performance in time was also improved. We could process the 567 doc-
uments from the DUC2002 database in 84% of the time needed in the original
5 Reference Implementation and Gensim Contribution
6 Conclusions
This work presented three different variations to the TextRank algorithm for au-
tomatic summarization. The three alternatives improved significantly the results
of the algorithm using the same test configuration as in the original publication.
Given that TextRank performs 2.84% over the baseline, our improvement of
2.92% over the TextRank score is an important result.
The combination of TextRank with modern Information Retrieval ranking
functions such as BM25 and BM25+ creates a robust method for automatic
summarization that performs better than the standard techniques used previ-
Based on these results we suggest the use of BM25 along with TextRank for
the task of unsupervised automatic summarization of texts. The results obtained
and the examples analyzed show that this variation is better than the original
TextRank algorithm without a performance penalty.
