Document Classification Using Artificial Neural Network
Author : Kshitij Tripathi, Rajendra G. Vyas and Anil K. GuptaVolume 8 No.2 April-June 2019 pp 55-58
Abstract
The Document classification system is the field of data mining in which the format of data is based on bag of words (BoW) or document vector model and the task is to build a machine which after successfully learn the characteristic of given data set, predicts the category of the document to which the word vector belongs. In this approach document is represented by BoW where every single word is used as feature which occurs in a document. The proposed article presents artificial neural network approach which is hybrid of n-fold cross validation and training-validation-test approach for classification of data.
Keywords
N-Fold Cross Validation, Validation, Classification, Neural Network, Bag of Words
References
[1] Allahverdipoor and F. S. Gharehchopogh, “A new hybrid model of K-means and Naïve Bayes algorithms for feature selection in text documents categorization”, Journal of advances in computer research, Vol.8, No.4, 2017.
[2] A. Kakade and K. Dhumal, S. Das, S. Jain and N. M. Ranjan, “A neural network approach for text document classification and semantic text analytics”, Journal of data mining and management, Vol. 2, No. 2, pp.1-6. 2017. [9].
[3] A.M. Butnarua and RaduTudorIonescua, “From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings”, Procedia Computer Science Vol.112, pp.1783-92, 2017.
[4] C. Brouard, “Document classification by computing an echo in a very simple neural network”, IEEE 24th international conference on tools with artificial intelligence, 2012.
[5] C. Naik, V. Kothari and Z. Rana, “Document classification using neural networks based on words”. International journal of advanced research computer science. Vol. 6, No. 2, 2015.
[6] E. Rumelhart, G. E. Hinton and R.J. Williams, “Learning internal representation by error propagation”, Parallel distributed processing: Explorations in the microstructure of cognition, Vol. 1, Bradford books, Cambridge, MA, 1986.
[7] G. Liu “The Semantic Vector Space Model: implementation and evaluation”, Journal of American Society for Information Science, Vol. 48, No. 5, pp. 395–417, 1997.
[8] Hsieh Yu-Lun, Liu, Shih-Hung, Chang Yung-Chun and Hsu Wen-Lian, “Neural network-based vector representation of documents for reader emotion categorization”, IEEE 16th International conference on information reuse and integration, 2015.
[9] J. Alcal´a-Fdez, A. Fern´andez and J. Luengo et al., “KEEL data mining software tool: data set repository, integration of algorithms and experimental analysis framework”, Journal of Multiple-Valued Logic and Soft Computing, Vol. 17, No. 2-3, pp.255–287, 2011.
[10] K. Bache and and M. Lichman, “UCI Machine Learning Repository, University of California”, School of Information and Computer Science, Irvine, California, USA, [Online] Available at: http://archive.ics.uci.edu
/ml/, 2013.
[11] M. Dieter and R. Andreas, “Uncovering the hierarchical structure of text archives by using an unsupervised neural network with adaptive structure”, Proceedings of the 4th Pacific Asia conference on knowledge discovery and data mining, Current issues and new applications, 2000.
[12] M. L. C. Passini, Katiusca B. Estébanez, Grazziela P. Figueredo and Nelson F. F. Ebecken, “A Strategy for Training Set Selection in Text Classification Problems”, IJACSA, Vol. 4, No. 6, 2013.
[13] Michele Filannino, “DB World e-mail classification using a very small corpus”.
[14] O. Awodele and O. Jegede, “Neural networks and its application in engineering”, Proceeding of Informing Science & IT education conference (InSITE ), pp.83-95, 2009.
[15] P. Kumar, M. Ra and J. B. Prabhu, “Role of sentiment classification in sentiment analysis: a survey”, Annals of Library and Information Studies, Vol. 65, pp.196-209, 2018.
[16] S. D. Sarkar and S. Goswami, “Empirical study on filter based feature selection methods for text classification”, IJCA, Vol. 81, No.6, 2013.
[17] S. D. Sarkar, S. Goswami, A. Agarwal and J. Akhtar, “A novel feature selection technique for text classification using Naive Bayes”, Hindawi Publishing Corporation International Scholarly Research Notices Volume, Article ID 717092, 10 pages, 2014.
[18] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd Edition, 1998.
[19] S. Kumar, Neural Networks A Classroom Approach, Tata McGraw Hill, 2013.
[20] S. Liu, Z. Liu, J. Sun and Lin Liu, “Application of synergetic neural network in online write print identification”, International Journal of Digital Content Technology and its Applications, Vol. 5, No. 3, 2011.
[21] Tripathi K. Tripathi, R. G. Vyas and A. K. Gupta, “The classification of data: A novel artificial neural network (ANN) approach through exhaustive validation and weight initialization”, International Journal of Computer Sciences and Engineering, Vol. 6, No. 5, pp.241-254, 2018.