Construction of Lexicons to Perk Up Re-Clustering
Author : A. George Louis Raja, F. Sagayaraj Francis and P. SugumarVolume 7 No.3 October-December 2018 pp 82-85
Abstract
The existing semantic methods cluster the documents based on unabridged or abridged term comparisons. After clustering, these terms are not preserved, costing the cluster operation to be repeated in its entirety upon the arrival of new documents. Hence the semantic clustering methods can be considered as “on the go” methods. Re-clustering becomes unavoidable in all circumstances both in the Iterative and Incremental Clustering Methods. It would be more appropriate to build and evolve a lexicon with the derived keywords of the documents and to refer them in further cluster operations. The rationale is to deny re-clustering upon new documents and refer the Lexicon to formulate clusters until the quality of clusters is intact, and when it breaks above the threshold, the cluster operation can be repeated. Since re-clustering is delayed until a breakeven point, the process of re-clustering becomes faster. This process may incur additional runtime complexity, but would extremely simplify and speed up the process of re-clustering. This paper discusses about the construction of lexicons and its applications in clustering. The Keyword based Lexicon Construction Algorithm (KBLCA) is demonstrated to build lexicons and the breakeven point for re-clustering is proposed and described. The theory of denying re-clustering is briefed, along with experimental results.
Keywords
Lexicon, Clustering, ATSCA, Keygraph, KBLCA
References
[1] H. Sayyadi and L. Raschid, “A Graph Analytical Approach for Topic Detection”, ACM Transactions on Internet Technology (TOIT), Vol. 13, No. 2, 2013.
[2] Snehalata M. Lad., “Keyword Extraction from Conversation Text Document and Recommending Document using Fuzzy Logic Based Weight Matrix Method”, International Journal of Advanced Research in Computer Science, Vol. 7, No. 4, pp. 34-38, August 2016.
[3] His-Cheng Chang and Chiun-Chieh Hsu, “Using Topic Keyword Clusters for Automatic Document Clustering”, Proceedings of the Third International Conference on Information Technology and Applications, IEEE, 2005.
[4] Youngsam Kim, Munhyong Kiml, Andrew Cattle and Julia Otmakhova, “Applying Graph-based Keyword Extraction to Document Retrieval”, International Joint Conference on Natural language Processing, October 2013, 864-868.
[5] Maryam Habibi and Andrei Popescu-Belis, “Keyword Extraction and Clustering for Document Recommendation in Conversations”, IEEE, Vol. 23, No. 4, pp. 746-759, 2015.
[6] Mohammad Rezaei, Najlah Gali and Pasi Franti, “CIRank: A Method for Keyword Extraction from web pages using Clustering and distribution of nouns”, IEEE/ WIC /ACM International Conference on Web Intelligence and Intelligent Agent technology, Vol. 1, pp. 79-84, 2015.