Noise and neglect: Social-media signals expose attention gaps for dengue, chikungunya, lymphatic filariasis and kala-azar in Indias vector-borne NTDs

Abstract

Background Neglected tropical diseases (NTDs), including dengue, chikungunya, lymphatic filariasis, and kala-azar, pose significant public health burdens in India. Despite WHO recommendations for enhanced disease surveillance and targeted communication strategies, little is known about public perceptions and discussions of these diseases across digital platforms. Understanding these perceptions can guide evidence-based policy making and public health messaging.

Methods We conducted an in silico analysis of publicly accessible social and news media data related to dengue, chikungunya, filariasis, and kala-azar in India from January 2019 to December 2023. YouTube comments and Google News headlines were systematically retrieved, pre-processed, and analyzed through sentiment analysis (VADER lexicon) and Latent Dirichlet Allocation (LDA) topic modeling. Facebook and Twitter data were not included due to API restrictions and their current subscription-based models, limiting free access even for research purposes. We visualized disease-specific digital attention in comparison to epidemiological burden and created chord, Sankey, and network diagrams to elucidate thematic and sentiment-based interactions.

Results Dengue dominated online attention, accounting for over 50% of total mentions, despite a comparable or lower disease burden than filariasis and chikungunya. Kala-azar received minimal online engagement, highlighting a critical awareness gap. Sentiment analysis revealed predominantly neutral-to-positive discourse, especially focused on treatments, preventive measures, and vaccination initiatives. Topic modeling highlighted recurrent themes, including public health campaigns, outbreak alerts, and community-based interventions.

Conclusions Our study presents a novel approach combining digital surveillance, sentiment analysis, and topic modeling to provide insights into public perceptions of NTDs in India. The observed mismatch between epidemiological burden and online attention underscores the need for strategic public health messaging, aligning with WHO recommendations for community engagement and tailored disease-awareness campaigns. This research provides a valuable tool for policymakers to enhance the effectiveness of communication strategies and improve targeted intervention planning for neglected tropical diseases in India.

Author Summary Neglected tropical diseases (NTDs)—including dengue, chikungunya, lymphatic filariasis and kala-azar—still afflict millions across India, yet the public conversation remains uneven. We examined more than 45 000 YouTube comments and 270 Google News reports posted between January 2019 and December 2023 to see how these four NTDs are discussed online. After automated text cleaning, VADER sentiment scoring and Latent Dirichlet Allocation topic modelling, we overlaid the resulting tone-and-topic maps on official disease-burden data. Dengue dominated the chatter, accounting for well over half of all references, whereas kala-azar, though still endemic, drew scarcely any notice. Overall sentiment skewed neutral-to-positive and focused largely on prevention, treatment and vaccine news. Interactive bubble maps, Sankey flows and chord diagrams vividly exposed the gulf between epidemiological need and digital attention. We could not analyse Facebook or Twitter because their new, pay-walled APIs make large-scale data collection prohibitively expensive for researchers, underscoring a growing obstacle for digital epidemiology. Our reproducible, low-cost workflow highlights which NTDs are being overlooked online, providing Indian health authorities with actionable evidence and supporting the World Health Organization’s call for stronger community engagement in the fight against NTDs.

Competing Interest Statement

The authors have declared no competing interest.

Clinical Trial

NA

Funding Statement

The author(s) received no specific funding for this work.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The study exclusively utilized publicly available, anonymized digital data. No personally identifiable information was collected or stored: usernames were irreversibly hashed and discarded immediately after aggregation.This research was deemed exempt from formal ethics review under Article?12 of the Indian Council of Medical Research’s?National Ethical Guidelines for Biomedical and Health Research Involving Human Participants (2017), which stipulates that secondary analyses of anonymized, publicly available datasets pose minimal risk and do not require Institutional Review Board oversight. No IRB or ethics committee approval number is applicable.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data underlying the findings described in this manuscript are fully available without restriction. The cleaned YouTube comment corpus (45?672 comments) and the Google News headline dataset (273 items), together with de identified metadata and preprocessing scripts, have been deposited in Zenodo (DOI:?10.5281/zenodo.15883324). The full analysis code, for sentiment scoring (VADER), topic modeling (LDA), data normalization, and visualization scripts (matplotlib, Plotly, circlize), is hosted on GitHub at https://github.com/devbioinfo/ntd-digital-surveillance/releases/tag/v0.1.0.The state level epidemiological case counts for dengue, chikungunya, lymphatic filariasis, and kala azar (2015–2023) were downloaded from the National Vector Borne Disease Control Programme (NVBDCP) annual tables (2019–2023) and are publicly accessible via the NVBDCP website (https://nvbdcp.gov.in). Any additional aggregate data generated during this study (e.g., normalized attention-burden matrices, waffle-chart grid files, bubble-map shapefiles) are included as Supporting Information (S1 File, S1 Table.xlsx, S2 Appendix) and can be downloaded alongside the article. Where applicable, provenance metadata conform to FAIR principles and are embedded within each repository entry to ensure reproducibility and long term accessibility.

https://doi.org/10.5281/zenodo.15883324

https://github.com/devbioinfo/ntd-digital-surveillance/releases/tag/v0.1.0

https://nvbdcp.gov.in

Comments (0)

No login
gif