Library Carpentry: Towards a New Professional Dimension (Part III – Data Reconciliation, Named Entity Recognition and Advanced Utilities)

Authors

  • Department of Library and Information Science, University of Kalyani, Kalyani – 741235, West Bengal
  • Department of Library and Information Science, University of Kalyani, Kalyani – 741235, West Bengal

DOI:

https://doi.org/10.17821/srels/2021/v58i5/166770

Keywords:

Automatic Translation, Data Carpentry, Data Reconciliation, Data Sources Cross-Linking, Library Carpentry, Named Entity Recognition, Sentiment Analysis

Abstract

Data reconciliation and Named Entity Recognition (NER) are closely related concepts to the domain of data carpentry in general and library carpentry in particular. In this context, the part III of the three-part series on library carpentry (part I & II have been published in April & June issues of this journal) is an attempt to apply library carpentry methods in the core areas of information organization in a library of any type or size along with additional utilities like cross-linking of data sources, automatic translation, sentiment analysis and so on. A total of five case studies are included in this research study covering these areas with a focus on do-by-yourself mode.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Agate, N. (2018). Wikidata: A platform for your library’s linked open data. The Idealis.

Allison-Cassin, S., Armstrong, A., Ayers, P., Cramer, T., Custer, M., Lemus-Rojas, M., McCallum, S., Proffitt, M., Puente, M., Ruttenberg, J. and Stinson, A. (2019). ARL white paper on Wikidata: Opportunities and recommendations.

Allison-Cassin, S. and Scott, D. (2018). Wikidata: A platform for your library’s linked open data. The Code4Lib Journal, 40. https://journal.code4lib.org/articles/13424.

Androutsopoulou, A. and Charalabidis, Y. (2018). A Framework for Evidence Based Policy Making Combining Big Data, Dynamic Modeling and Machine Intelligence. Proceedings of the 11th International Conference on Theory and Practice of Electronic Governance; p. 575-583. https://doi.org/10.1145/3209415.3209427.

Aruna, K. and Anupriya, S. (2018). Sentiment analysis on social media information using data mining techniques a review. International Journal of Pure and Applied Mathematics, 120(6): 10807-10816.

Avgeris, Z. (2021). From text to space and vice versa: The travel accounts of Sir William Gell and Edward Dodwell in Phocis and Boeotia. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447010.

Brando, C., Frontini, F. and Ganascia, J.-G. (2016). REDEN: Named entity linking in digital literary editions using linked data sets. Complex Systems Informatics and Modeling Quarterly, 7: 60-80. https://doi.org/10.7250/csimq.2016-7.04.

Bryl, V., Bizer, C., Isele, R., Verlic, M., Hong, S. G., Jang, S., Yi, M. Y. and Choi, K.-S. (2014). Interlinking and Knowledge Fusion. In: S. Auer, V. Bryl & S. Tramp (Eds.), Linked Open Data- Creating Knowledge out of Interlinked Data: Results of the LOD2 Project, Springer International Publishing; p. 70-89. https://doi.org/10.1007/978-3-319- 09846-3_4.

Carlson, S. and Seely, A. (2017). Using OpenRefine’s reconciliation to validate local authority headings. Cataloging and Classification Quarterly, 55(1): 1-11. https://doi.org/10 .1080/01639374.2016.1245693.

Coll, R. and Ó Tuairisg, S. (2015). Preparing bilingual metadata for a bilingual repository. New Review of Information Networking, 20(1-2): 53-58. https://doi.org/10.1080/13614 576.2015.1110398.

Crowe, K. and Clair, K. (2015). Developing a tool for publishing linked local authority data. Journal of Library Metadata, 15(3-4): 227-240. https://doi.org/10.1080/19386 389.2015.1099993.

Cucerzan, S. (2007). Large-Scale Named Entity Disambiguation Based on Wikipedia Data. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL); 708-716. https://aclanthology.org/D07-1074.

Delpeuch, A. (2019). A survey of OpenRefine reconciliation services. ArXiv:1906.08092 [Cs]. http://arxiv.org/abs/1906.08092.

Dix, A., Cowgill, R., Bashford, C., McVeigh, S. and Ridgewell, R. (2016). Spreadsheets as User Interfaces. Proceedings of the International Working Conference on Advanced Visual Interfaces; 192-195. https://doi.org/10.1145/2909132.2909271.

Downey, M. (2019). Assessing author identifiers: Preparing for a linked data approach to name authority control in an institutional repository context. Journal of Library Metadata, 19(1-2): 117-136. https://doi.org/10.1080/19386389.2019.1590936.

Goyal, A., Gupta, V. and Kumar, M. (2018). Recent named entity recognition and classification techniques: A systematic review. Computer Science Review, 29: 21-43. https:// doi.org/10.1016/j.cosrev.2018.06.001.

Gracia, J., Villegas, M., Gómez-Pérez, A. and Bel, N. (2018). The apertium bilingual dictionaries on the web of data. Semantic Web, 9(2), 231-240. https://doi.org/10.3233/SW-170258.

Green, H., Dickson, E., Tracy, D. G., Christensen, S., Emerson, M. and Jacoby, J. (2017). Scholarly commons digital humanities needs assessment study. https://www.ideals.illinois.edu/handle/2142/100081.

Hachey, B., Radford, W. and Curran, J. R. (2011). Graph-Based Named Entity Linking with Wikipedia. In: A. Bouguettaya, M. Hauswirth & L. Liu (Eds.), Web Information System Engineering - WISE 2011, Springer; p. 213-226. https://doi.org/10.1007/978-3-642-24434-6_16.

Hanson, E. M. (2014). A beginner’s guide to creating library linked data: Lessons from NCSU’s organization name linked data project. Serials Review, 40(4): 251-258. https:// doi.org/10.1080/00987913.2014.975887.

Hashimi, H., Hafez, A. and Mathkour, H. (2015). Selection criteria for text mining approaches. Computers in Human Behavior, 51: 729-733. https://doi.org/10.1016/j.chb.2014.10.062.

Hill, K. M. (2016). In search of useful collection metadata: Using Openrefine to create accurate, complete, and clean title-level collection information. Serials Review. https:// doi.org/10.1080/00987913.2016.1214529.

Hladka, J., Mynarz, J. and Sklenak, V. (2012). Experience with transformation of bibliographic data into linked data. Journal of Systems Integration, 3(1): 54-62. https://doi.org/10.20470/jsi.v3i1.106.

Hooland, S. van, Verborgh, R., Wilde, M. D., Hercher, J., Mannens, E. and Walle, R. V. de. (2013). Evaluating the success of vocabulary reconciliation for cultural heritage collections. Journal of the American Society for Information Science and Technology, 64(3): 464-479. https://doi.org/10.1002/asi.22763.

Hooland, S. van, Wilde, M. D., Verborgh, R., Steiner, T. and Walle, R. V. de. (2015). Exploring entity recognition and disambiguation for cultural heritage collections. Digital Scholarship in the Humanities, 30(2): 262-279. https://doi.org/10.1093/llc/fqt067.

Isaac, A., Schlobach, S., Matthezing, H. and Zinn, C. (2008). Integrated access to cultural heritage resources through representation and alignment of controlled vocabularies. Library Review, 57(3): 187-199. https://doi.org/10.1108/00242530810865475.

Kaffee, L.-A., Piscopo, A., Vougiouklis, P., Simperl, E., Carr, L. and Pintscher, L. (2017). A glimpse into Babel: An Analysis of Multilinguality in Wikidata. Proceedings of the 13th International Symposium on Open Collaboration. https://doi.org/10.1145/3125433.3125465.

Lemus-Rojas, M. and Pintscher, L. (2017). Wikidata and libraries: Facilitating open knowledge. In: Leveraging Wikipedia: Connecting Communities of Knowledge, ALA Editions, ALA; p. 143-158. https://scholarworks.iupui.edu/handle/1805/16690.

Li, X., Feng, J., Meng, Y., Han, Q., Wu, F. and Li, J. (2020). A Unified MRC Framework for Named Entity Recognition. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, p.5849-5859. https://doi.org/10.18653/v1/2020.acl-main.519.

McCallum, A. and Li, W. (2003). Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. North American Chapter of the Association for Computational Linguistics. https://doi.org/10.3115/1119176.1119206.

Mehrabi, N., Gowda, T., Morstatter, F., Peng, N. and Galstyan, A. (2020). Man is to Person as Woman is to Location: Measuring Gender Bias in Named Entity Recognition. Proceedings of the 31st ACM Conference on Hypertext and Social Media, p.231-232. https://doi. org/10.1145/3372923.3404804.

Mukhopadhyay, P., Mitra, R. and Mukhopadhyay, M. (2021). Library carpentry: Towards a new professional dimension (part i - concepts and case studies). SRELS Journal of Information Management, 58(2): 67-80. https://doi.org/10.17821/srels/2021/v58i2/159969.

Mukhopadhyay, P. and Mukhopadhyay, M. (2021). Library carpentry: Towards a new professional dimension (part ii - automatic authority control to enhance retrieval). SRELS Journal of Information Management, 58(3): 135-155. https://doi.org/10.17821/srels/2021/v58i3/163890.

Müller, B. (2009). Visualization and analysis of extracted information from full text and patent corpora [PhD Thesis]. https://doi.org/10.13140/RG.2.2.27175.44961.

Nanli, Z., Ping, Z., Weiguo, L. and Meng, C. (2012). Sentiment analysis: A literature review. International Symposium on Management of Technology (ISMOT), Publisher: IEEE; p. 572-576. https://doi.org/10.1109/ISMOT.2012.6679538.

Page, R. (2016). Towards a biodiversity knowledge graph. Research Ideas and Outcomes, 2, e8767. https://doi.org/10.3897/rio.2.e8767.

Papachristopoulos, L., Ampatzoglou, P., Seferli, I., Zafeiropoulou, A. and Petasis, G. (2019). Introducing sentiment analysis for the evaluation of library’s services effectiveness. Qualitative and Quantitative Methods in Libraries, 8(1): 99-110.

Park, Z. and Kim, H. (2014). Organizing and Sharing Information using Linked Data. In: Library and Information Science, Emerald Group Publishing Limited; p. 61-87. https://doi.org/10.1108/S1876-0562(2013)0000007008.

Parker, B. and Gray, A. (2019). Rethinking the university of Maryland authority file for the linked data environment. Journal of Library Metadata, 19(1-2): 69-81. https://doi.org /10.1080/19386389.2019.1589699.

Purkayastha, S. (2019, June 19). Top 10 Best Translation APIs [2021] for Developers 20+ API Reviewed [blog]. Rakuten RapidAPI Blog. https://blog.api.rakuten.net/ top-10-best-translation-apis-google-translate-microsofttranslator- and-others/.

Ryan, C., Grant, R., Carragáin, E. Ó., Collins, S., Decker, S. and Lopes, N. (2015). Linked data authority records for Irish place names. International Journal on Digital Libraries, 15(2): 73-85. https://doi.org/10.1007/s00799-014-0129-8.

Singh, A. K. and Shashi, M. (2017). Research aids for social media analytics. IJCSN, 6(6): 2277-5420. https://www. researchgate.net/publication/323456896_Research_Aids_ for_Social_Media_Analytics.

Smith-Yoshimura, K. (2016). Analysis of international linked data survey for implementers. D-Lib Magazine, p.22(7/8). https://doi.org/10.1045/july2016-smithyoshimura.

Smith-Yoshimura, K. (2018). Analysis of 2018 international linked data survey for implementers. The Code4Lib Journal, p.42. https://journal.code4lib.org/articles/13867.

Tillman, R. K. (2016). Extracting, augmenting, and updating metadata in Fedora 3 and 4 using a local Openrefine reconciliation service. The Code4Lib Journal, 31pp. https://journal.code4lib.org/articles/11179.

Verborgh, R. and Wilde, M. D. (2013). Using OpenRefine (Revised ed.). Packt Publishing.

Verborgh, R. and Wilde, M. D. (2013). Using OpenRefine. Packt Publishing. https://ruben.verborgh.org/publications/verborgh_packt_2013/#citation-styles.

Weichselbraun, A., Kuntschik, P., Francolino, V., Saner, M., Dahinden, U. and Wyss, V. (2021). Adapting data-driven research to the fields of social sciences and the humanities. Future Internet, 13(3): 59. https://doi.org/10.3390/fi13030059

Weston, L., Tshitoyan, V., Dagdelen, J., Kononova, O., Trewartha, A., Persson, K. A., Ceder, G. and Jain, A. (2019). Named entity recognition and normalization applied to large-scale information extraction from the materials science literature. Journal of Chemical Information and Modeling, 59(9): 3692-3702. https://doi.org/10.1021/acs. jcim.9b00470. PMid:31361962.

Yadav, V. and Bethard, S. (2019). A survey on recent advances in named entity recognition from deep learning models. ArXiv:1910.11470 [Cs]. http://arxiv.org/abs/1910.11470.

Published

2021-10-30

How to Cite

Mukhopadhyay, P., & Mitra, R. (2021). Library Carpentry: Towards a New Professional Dimension (Part III – Data Reconciliation, Named Entity Recognition and Advanced Utilities). Journal of Information and Knowledge, 58(5), 287–303. https://doi.org/10.17821/srels/2021/v58i5/166770

Issue

Section

Invited Paper
Received 2021-10-26
Accepted 2021-10-26
Published 2021-10-30

Most read articles by the same author(s)

1 2 > >>