Representing Gender in Library Catalogue: Developing Multilingual Homosaurus for Automated Subject Indexing

Authors

  • Department of Library and Information Science, Kalyani University, Kalyani – 741235, West Bengal
  • Department of Library and Information Science, Kalyani University, Kalyani – 741235, West Bengal

DOI:

https://doi.org/10.17821/srels/2024/v61i5/171585

Keywords:

Annif, Automated Indexing, Gender Bias, Gender Spectrum, Homosaurus, Machine Learning, Semantic Annotation, Skosmos, VocBench

Abstract

The sexist and homophobic attitude of global, generic knowledge organization systems like LCSH and DDC (as reported by many researchers and critiques since the 1970s) made the availability of LGBTQ+ resources extremely limited in India. This research uses the Homosaurus, a domain-specific comprehensive vocabulary tool (but in the English language only and non-interactive mode), and focuses on developing a multilingual and collaborative software framework to host the Homosaurus (in Hindi and Bengali to start with), as an interactive, participative, and collaborative global vocabulary tool. The main deliverable of this study is a multilingual Homosaurus in RDF serialization formats, which will be used as the vocabulary backend for a machine learning framework to support semi-automated indexing of LGBTQ+ documentary resources. The research aims to counter the challenges posed by limited indexed resources within the LGBTQ+ knowledge domain in Indian libraries by formulating and implementing a semi-automated subject indexing system. The prototype developed deploys the following open source tools and open access data sources: (i) Annif as the AI/ML framework; (ii) Machine learning backends like FastText, Omikuji and Neural network; (iii) VocBench to host multilingual Homosaurus; (iv) Skosify to curate RDF serializing formats of multilingual Homosaurus; (v) Skosmos to develop user interface for the multilingual Homosaurus; and (vi) open access databases (CrossRef, CoRE, Lens, OpenAlex and Semantic Scholar) to collect, gather and process the required training datasets. The research is multifaceted, encompassing the development of a semi-automated indexing framework, the evaluation of its operational efficiencies, and exploring the feasibility of a REST/ API call-based approach for expeditious indexing of a substantial volume of records pertinent to the LGBTQ+ domain. The proposed semi-automated subject indexing system aims to enhance access to LGBTQ+ knowledge and challenge the prevailing biases inherent in existing knowledge organization paradigms.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Berman, S. (1971). Prejudices and antipathies: A tract on the LC subject heads concerning people. Metuchen, Scarecrow Press, NJ.

Berman, S. (1993). Prejudices and antipathies: A tract on the LC subject heads concerning people (2nd edition). McFarland and Co., Jefferson, N.C..

Dobreski, B., Snow, K., & Moulaison-Sandy, H. (2022). On overlap and otherness: A comparison of three vocabularies’ approaches to LGBTQ+ identity. Cataloging and Classification Quarterly, 60(6-7), 490-513. https://doi.org/ 10.1080/01639374.2022.2090040.

Drabinski, E. (2013). Queering the catalog: Queer theory and the politics of correction. The Library Quarterly, 83(2), 94-111. https://doi.org/10.1086/669547

Golub, K., Soergel, D., Buchanan, G., Tudhope, D., Lykke, M., & Hiom, D. (2016). A framework for evaluating automatic indexing or classification in the context of retrieval. Journal of the Association for Information Science and Technology, 67(1), 3-16. https://doi.org/10.1002/asi.23600

Knowlton, S. A. (2005). Three decades since prejudices and antipathies: A study of changes in the library of congress subject headings. Cataloging and Classification Quarterly, 40(2), 123-145. https://doi.org/10.1300/J104v40n02_08

Mitra, R., & Mukhopadhyay, P. (2023). Machine learning applications in digital humanities: Designing a semiautomated subject indexing system for a low-resource domain. DESIDOC Journal of Library and Information Technology, 43(4). https://doi.org/10.14429/djlit.43.04.19227

Mukhopadhyay, P. (2023). Machine learning and bibliographic data universe: Assessing the efficacy of backend algorithms in Annif through retrieval metrics. SRELS Journal of Information Management, 39-48. https://doi.org/10.17821/srels/2023/v60i1/170891

Mukhopadhyay, P., & Mitra, R. (2022). Digital humanities and inclusive librarianship: Designing a collaborative, multi-lingual, Skos-compliant linked open vocabulary for LGBTQIA+.Indian Journal of Information, Library and Society, 35(1-2), 16–33. https://doi.org/10.5281/zenodo.6814869

Olson, H. A. (2013). The power to name: Locating the limits of subject representation in libraries. Springer Science and Business Media

Stellato, A., Fiorelli, M., Turbati, A., Lorenzetti, T., Van Gemert, W., Dechandon, D., … Keizer, J. (2020). VocBench 3: A collaborative Semantic Web editor for ontologies, thesauri and lexicons. Semantic Web, 11(5), 855-881. https://doi.org/10.3233/SW-200370

Suominen, O. (2019). Annif: DIY automated subject indexing using multiple algorithms. LIBER Quarterly: The Journal of the Association of European Research Libraries, 29(1), Article 1. https://doi.org/10.18352/lq.10285

Suominen, O., Ylikotila, H., Pessala, S., Lappalainen, M., Frosterus, M., Tuominen, J., … Retterath, A. (2015). Publishing SKOS vocabularies with Skosmos. https://skosmos.org/publishing-skos-vocabularies-withskosmos.pdf

Toepfer, M., & Seifert, C. (2020). Fusion architectures for automatic subject indexing under concept drift: Analysis and empirical results on short texts. International Journal on Digital Libraries, 21(2), 169-189. https://doi.org/10.1007/s00799-018-0240-3

Watson, B. M. (2020). There was sex but no sexuality: Critical cataloging and the classification of asexuality in LCSH. Cataloging and Classification Quarterly, 58(6), 547-565. https://doi.org/10.1080/01639374.2020.1796876

Published

2024-10-21

How to Cite

Mitra, R., & Mukhopadhyay, P. (2024). Representing Gender in Library Catalogue: Developing Multilingual Homosaurus for Automated Subject Indexing. Journal of Information and Knowledge, 61(5), 279–286. https://doi.org/10.17821/srels/2024/v61i5/171585

Issue

Section

Articles