Malayalam 2
Abstract—Automatic summarization of text is on of the area of document. The summary created using these sentences may
interest in the field of natural language processing. The proposed not be coherent, but gives idea about the content of the input
method utilizes the sentence extraction in a single document document. Abstract summarization system understands the
and produces a generic summary for a given Malayalam doc-
ument (Extractive summarization). Sentences in the document contents of the document and then it creates summaries with
are ranked based on the word score of each word present in its own words. The abstractive summarization aims to produce
it. Top N ranked sentences are extracted and arrange them a generalized summary, gives the information in a concise way,
in their chronological order for summary generation, where N and it needs advanced natural language generation techniques
represents the size of summary with respect to the percentage of [4]. In general, abstractive methods make semantic represen-
original document size (condensation rate). The standard metric
ROUGE is used for performance evaluation. ROUGE calculates tation of a given text and then use natural language generation
the n-gram overlap between a generated summary and reference techniques to create a summary which is very much similar to
summaries. Reference summaries were constructed manually. a human generated summary. Such an abstract summary might
Experiments show that the results are promising. contain words not explicitly present in the original document.
Keywords − Text Summarization, Sentence Extraction, An abstract of a research article is a good example for abstract
Content Word Extraction, Natural Language Processing, summarization.
ROUGE. Another text summarization classification is single-
document summarization and multi-document summarization.
I. I NTRODUCTION Early works in summarization started with single document
With the enormous growth of Internet, a large collection summarization. In single document summarization, it produces
of texts are available in the web. Some of them duplicates summary of a single document supplied by the user. As the
the content present in others. Whenever a user needs to get tremendous changes in research, and with the huge amount
some information, he needs to retrieve these documents and of information on Internet, multi document summarization
read them completely to understand the content. This is a time emerged. Because information is spread over different doc-
consuming and tedious process and is very difficult for human uments, it needs to collect from these different documents.
beings to manually summarize these large documents of text. In multi document summarization, it produces summaries
There comes the importance of automatic summarization. from many source documents on the same topic or same
Text summarization is the condensation of the source text by event. Obviously multi-document summarization is a more
reducing the size along with preserving significant information complex task than the single-document summarization. There
content and overall meaning. The text summarization done are two major reasons for this. Firstly, there is a chance of
with the machine is termed as automatic text summarization. information overlap between the documents and this can lead
These summarization tools can help people to grasp the main to redundancy in the summary. Secondly, an extra effort is
information contents in a short time. A good summary can needed to collect and organize the information from different
give a quick overview of a large document. several documents to form a coherent summary. In the field
Mainly the text summarization techniques can be gener- of text summarization, significant achievements have been
ally classifieds into extractive summarization and abstractive obtained using sentence extraction and statistical analysis.
summarization. In extractive summarization, important and Malayalam is a regional language spoken in India, predomi-
meaningful sentences or phrases from the source documents nantly in Kerala state. Malayalam includes in the 22 scheduled
are extracted. These sentences are arranged chronologically languages of India and one of the 6 Classical Languages in
to produce the summary. In extractive summary, there is no India. Malayalam got the designation of Classical Language
word or phrase present in the summary outside of the given in 2013. it has official language status in Kerala state and
