Representing Gender in Library Catalogue: Developing Multilingual Homosaurus for Automated Subject Indexing
DOI:
https://doi.org/10.17821/srels/2024/v61i5/171585Keywords:
Annif, Automated Indexing, Gender Bias, Gender Spectrum, Homosaurus, Machine Learning, Semantic Annotation, Skosmos, VocBenchAbstract
The sexist and homophobic attitude of global, generic knowledge organization systems like LCSH and DDC (as reported by many researchers and critiques since the 1970s) made the availability of LGBTQ+ resources extremely limited in India. This research uses the Homosaurus, a domain-specific comprehensive vocabulary tool (but in the English language only and non-interactive mode), and focuses on developing a multilingual and collaborative software framework to host the Homosaurus (in Hindi and Bengali to start with), as an interactive, participative, and collaborative global vocabulary tool. The main deliverable of this study is a multilingual Homosaurus in RDF serialization formats, which will be used as the vocabulary backend for a machine learning framework to support semi-automated indexing of LGBTQ+ documentary resources. The research aims to counter the challenges posed by limited indexed resources within the LGBTQ+ knowledge domain in Indian libraries by formulating and implementing a semi-automated subject indexing system. The prototype developed deploys the following open source tools and open access data sources: (i) Annif as the AI/ML framework; (ii) Machine learning backends like FastText, Omikuji and Neural network; (iii) VocBench to host multilingual Homosaurus; (iv) Skosify to curate RDF serializing formats of multilingual Homosaurus; (v) Skosmos to develop user interface for the multilingual Homosaurus; and (vi) open access databases (CrossRef, CoRE, Lens, OpenAlex and Semantic Scholar) to collect, gather and process the required training datasets. The research is multifaceted, encompassing the development of a semi-automated indexing framework, the evaluation of its operational efficiencies, and exploring the feasibility of a REST/ API call-based approach for expeditious indexing of a substantial volume of records pertinent to the LGBTQ+ domain. The proposed semi-automated subject indexing system aims to enhance access to LGBTQ+ knowledge and challenge the prevailing biases inherent in existing knowledge organization paradigms.
Downloads
Metrics
References
Berman, S. (1971). Prejudices and antipathies: A tract on the LC subject heads concerning people. Metuchen, Scarecrow Press, NJ.
Berman, S. (1993). Prejudices and antipathies: A tract on the LC subject heads concerning people (2nd edition). McFarland and Co., Jefferson, N.C..
Dobreski, B., Snow, K., & Moulaison-Sandy, H. (2022). On overlap and otherness: A comparison of three vocabularies’ approaches to LGBTQ+ identity. Cataloging and Classification Quarterly, 60(6-7), 490-513. https://doi.org/ 10.1080/01639374.2022.2090040.
Drabinski, E. (2013). Queering the catalog: Queer theory and the politics of correction. The Library Quarterly, 83(2), 94-111. https://doi.org/10.1086/669547
Golub, K., Soergel, D., Buchanan, G., Tudhope, D., Lykke, M., & Hiom, D. (2016). A framework for evaluating automatic indexing or classification in the context of retrieval. Journal of the Association for Information Science and Technology, 67(1), 3-16. https://doi.org/10.1002/asi.23600
Knowlton, S. A. (2005). Three decades since prejudices and antipathies: A study of changes in the library of congress subject headings. Cataloging and Classification Quarterly, 40(2), 123-145. https://doi.org/10.1300/J104v40n02_08
Mitra, R., & Mukhopadhyay, P. (2023). Machine learning applications in digital humanities: Designing a semiautomated subject indexing system for a low-resource domain. DESIDOC Journal of Library and Information Technology, 43(4). https://doi.org/10.14429/djlit.43.04.19227
Mukhopadhyay, P. (2023). Machine learning and bibliographic data universe: Assessing the efficacy of backend algorithms in Annif through retrieval metrics. SRELS Journal of Information Management, 39-48. https://doi.org/10.17821/srels/2023/v60i1/170891
Mukhopadhyay, P., & Mitra, R. (2022). Digital humanities and inclusive librarianship: Designing a collaborative, multi-lingual, Skos-compliant linked open vocabulary for LGBTQIA+.Indian Journal of Information, Library and Society, 35(1-2), 16–33. https://doi.org/10.5281/zenodo.6814869
Olson, H. A. (2013). The power to name: Locating the limits of subject representation in libraries. Springer Science and Business Media
Stellato, A., Fiorelli, M., Turbati, A., Lorenzetti, T., Van Gemert, W., Dechandon, D., … Keizer, J. (2020). VocBench 3: A collaborative Semantic Web editor for ontologies, thesauri and lexicons. Semantic Web, 11(5), 855-881. https://doi.org/10.3233/SW-200370
Suominen, O. (2019). Annif: DIY automated subject indexing using multiple algorithms. LIBER Quarterly: The Journal of the Association of European Research Libraries, 29(1), Article 1. https://doi.org/10.18352/lq.10285
Suominen, O., Ylikotila, H., Pessala, S., Lappalainen, M., Frosterus, M., Tuominen, J., … Retterath, A. (2015). Publishing SKOS vocabularies with Skosmos. https://skosmos.org/publishing-skos-vocabularies-withskosmos.pdf
Toepfer, M., & Seifert, C. (2020). Fusion architectures for automatic subject indexing under concept drift: Analysis and empirical results on short texts. International Journal on Digital Libraries, 21(2), 169-189. https://doi.org/10.1007/s00799-018-0240-3
Watson, B. M. (2020). There was sex but no sexuality: Critical cataloging and the classification of asexuality in LCSH. Cataloging and Classification Quarterly, 58(6), 547-565. https://doi.org/10.1080/01639374.2020.1796876
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Journal of Information and Knowledge
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All the articles published in Journal of Information and Knowledge are held by the Publisher. Sarada Ranganathan Endowment for Library Science (SRELS), as a publisher requires its authors to transfer the copyright prior to publication. This will permit SRELS to reproduce, publish, distribute and archive the article in print and electronic form and also to defend against any improper use of the article.