An Improved Technique For Document Clustering
An Improved Technique For Document Clustering
I.
INTRODUCTION
Document clustering is also applicable in producing the
hierarchical grouping of document (Ward 1963). In order to
search and retrieve then information efficiently in Document
Management Systems (DMS), the metadata set should be
created for the documents with adequate details. But just one
metadata set is not enough for the whole document
management systems. This is because various document types
need different attributes to be distinguished appropriately.
So clustering of documents is an automatic grouping of
text documents into clusters such that documents within a
cluster have high resemblance in comparison to one another,
but are different from documents in other clusters.
Hierarchical document clustering (Murtagh 1984) categorizes
clusters into a tree or a hierarchy that benefits browsing.
Information Retrieval (IR) (Baeza 1992) is the field of
computer science that focuses on the processing of documents
in such a way that the document can be quickly retrieved
based on keywords specified in a users query. IR technology
is the foundation of web-based search engines and plays a key
role in biomedical research, as it is the basis of software that
aids literature search.
II.
LITERATURE SURVEY
Document clustering is the process of categorizing text
document into a systematic cluster or group, such that the
documents in the same cluster are similar whereas the
RESULT
[6]
[7]
[8]
[9]
[10]
[11]
VII. CONCLUSION
As clustering plays a very vital role in various
applications, many researches are still being done. The
upcoming innovations are mainly due to the properties and the
characteristics of existing methods. These existing approaches
form the basis for the various innovations in the field of
clustering. From the existing clustering techniques, it is clearly
observed that the clustering techniques provide significant
results and performance. Hence, this research concentrates
mainly on the clustering for better performance.
[12]
[13]
[14]
[15]
VIII. ACKNOWLEDGEMENT
I would like to express my profound gratitude and deep
regard to my Project Guide Prof. M.M. Naoghare, for her
exemplary counsel, valuable feedback and constant fillip
throughout the duration of the project. Her suggestions were of
immense help throughout my project work. Working under her
was an extremely knowledgeable experience for me.
[1]
[2]
[3]
[4]
REFERENCES
Priti B. Kudal,Prof. M.M.Naoghare,A Review of Modern
Document Clustering Techniques,International Journal of
Science & Research(IJSR), Volume 3 Issue 10, October 2014.
An Improved Hierarchical Technique for Document
Clustering Priti B. Kudal, Prof. Manisha Naoghare,
International Journal of Science & Research(IJSR), Volume 4
Issue 4, April 2015.
Agrawal, Rakesh, Gehrke, Johannes, Gunopulos, Dimitrios,
Raghavan and Prabhakar, Automatic subspace clustering of
high dimensional data, Data Mining and Knowledge
Discovery (Springer Netherlands) Vol. 11, pp. 5-33,
DOI:10.1007/s10618-005-1396-1, 2005.
Alam, S., Dobbie, G., Riddle, P. and Naeem, M.A. Particle
Swarm Optimization Based Hierarchical Agglomerative
[16]
[17]
[18]
[19]
[20]
[21]
172 | P a g e