Building Synonym Sets for English WordNet with Robust Clustering using Links Method

Sarah Suryaningsih, Moch Arif Bijaksana, Widi Astuti

Abstract


English WordNet is an important synonym set to present the similarity of meanings between words. Synonym Set is built using Oxford Thesaurus which is accessed through lexico.com, which is a part of the lexical database that will be used. After using the extraction process through Oxford Thesaurus it will produce a synonym set with the same meaning between words. The difference between WordNet and ordinary dictionaries is that the word is interconnected with other words. One method employed for this approach is Robust Clustering Using Links method, which is similarity values and synonym sets that have been created to be used to build a lexical database. Therefore the main purpose of the development of the English WordNet is to produce an accurate synonym set using clustering techniques. The evaluation calculation will use the F-measure method and will use the gold standard for the calculation method. With the ROCK method, there is an increase in accuracy output from dataset input. Building the English wordnet is to improve words that can be used to help research and development of other language wordnets with role models using more accurate English wordnets. And the use of ROCK method there is an increase in the accuracy upon results of the development of English wordnet compared to the previous method, which is using hierarchical clustering. The outcome of this study resulted in improved accuracy so that the ROCK method is one of the good methods used in the development of the English wordnet.


Keywords


F-measure; Gold Standard; Robust Clustering Using Links; WordNet;

Full Text:

PDF

References


Chen, D., Jianzhuo, Y., Liying, F., & Bin, S. (2009). Measure Semantic Distance in WordNet Based on Directed Graph Search. International Conference on E-Learning, E-Business, Enterprise Information Systems, and E-Government, 57–60. https://doi.org/10.1109/EEEE.2009.16

Dembczynski, K. J., Waegeman, W., Cheng, W., & Hüllermeier, E. (2011). An Exact Algorithm for F-Measure Maximization. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 24 (pp. 1404–1412). Curran Associates, Inc. http://papers.nips.cc/paper/4389-an-exact-algorithm-for-f-measure-maximization.pdf

Fellbaum, C., & Miller, G. (1998). The Lexical Database. In WordNet: An Electronic Lexical Database (p. 22). MITP. http://ieeexplore.ieee.org/document/6285385

Gelbukh, A. (2007). Computational Linguistics and Intelligent Text Processing: 8th International Conference. Springer Science & Business Media.

Guha, S., Rastogi, R., & Shim, K. (2001). ROCK: A Robust Clustering Algorithm for Categorical Attributes. Information Systems, 25, 345–366. https://doi.org/10.1016/S0306-4379(00)00022-3

Gunawan, & Saputra, A. (2010). Building Synsets for Indonesian WordNet with Monolingual Lexical Resources. International Conference on Asian Language Processing, 297–300. https://doi.org/10.1109/IALP.2010.69

Hendrik, & Cahyono, A. (2017). Model WordNet Bahasa Indonesia berbasis Linked Data. Jurnal Nasional Teknik Elektro Dan Teknologi Informasi (JNTETI), 6(1), 8–14. https://doi.org/10.22146/jnteti.v6i1.288

Ilson, R. (2011). On the Historical Thesaurus of the Oxford English Dictionary. International Journal of Lexicography, 24(3), 241–260. https://doi.org/10.1093/ijl/ecq032

Jain, G., & Lobiyal, D. K. (2019). Word Sense Disambiguation of Hindi Text using Fuzzified Semantic Relations and Fuzzy Hindi WordNet. 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 494–497. https://doi.org/10.1109/CONFLUENCE.2019.8776967

Kim, Y. B., & Kim, Y. S. (2008). Latent Semantic Kernels for WordNet: Transforming a Tree-Like Structure into a Matrix. International Conference on Advanced Language Processing and Web Information Technology, 76–80. https://doi.org/10.1109/ALPIT.2008.40

Miller, G. A. (1995). WordNet: A Lexical Database for English. Communications of the ACM, 38(11), 39–41. https://doi.org/10.1145/219717.219748

Priyatno, J., & Bijaksana, M. A. (2019). Clustering synonym sets in english wordNet. 7th International Conference on Information and Communication Technology, ICoICT 2019. https://doi.org/10.1109/ICoICT.2019.8835313

Samhith, K., Tilak, S. A., & Panda, G. (2016). Word sense disambiguation using WordNet Lexical Categories. International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES), 1664–1666. https://doi.org/10.1109/SCOPES.2016.7955725

Swain, D., Tambe, M., Ballal, P., Dolase, V., Agrawal, K., & Rajmane, Y. (2019). Lexical Text Simplification Using WordNet (pp. 114–122). https://doi.org/10.1007/978-981-13-9942-8_11

Zhang, Y., & Hasi. (2015). A Constructing Method of Mongolia-Chinese-English Multilingual Semantic Net Based on WordNet. International Conference on Computer Science and Applications (CSA), 196–198. https://doi.org/10.1109/CSA.2015.47


Article Metrics

Abstract view : 0 times
PDF - 0 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

 

  Statistic Pengunjung EDUMATIC

Creative Commons License

EDUMATIC: Jurnal Pendidikan Informatika is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.