Artificial intelligence (AI) evolved from its conceptual inception in the 1950s [-] to being a transformative force across various sectors []. However, its penetration into the medical arena has been comparatively slower, constrained by factors such as escalating costs, rigorous regulations, and exacting performance standards [-]. Regardless of these hurdles, the promise of AI in reshaping and individualizing health care has never been more evident [,,]. Its potential for revolutionizing medical practices, enhancing patient care quality, and improving health care efficiency has spurred substantial research endeavors [-], with recent years witnessing an unprecedented surge in medical AI studies [,,].
However, amidst this burgeoning research, a clear consensus on AI’s scope and definition within medicine remains elusive [,]. Although Meskó and Görög [] detailed 3 AI levels and Hamet and Tremblay [] bifurcated AI into virtual and physical domains, the multifaceted nature of AI continues to demand diverse considerations, depending on the context []. Further complicating the landscape is AI’s application in various medical niches, such as mental illness diagnosis [], pandemic readiness [], or areas such as endoscopic imaging [] and acute stroke treatments [].
Despite such vast explorations, the AI research landscape has not been immune to challenges. The well-documented AI winters [], periods of waning interest, starkly contrast the consistent ascent of medical AI studies in more recent times []. This brings us to an essential question: How has the landscape of AI in medicine morphed over time? Is there a comprehensive understanding of its trajectory and a detailed mapping of its vast applications?
Aim of the ResearchTo address this knowledge gap, our study endeavors to provide a systematic, temporal assessment. We conducted a bibliometric exploration spanning 23 years, harnessing data from publications indexed in PubMed. The intent is 2-fold: to offer a comprehensive overview of the progression of medical AI and to discern emerging patterns and prospective directions. In doing so, we aim to fortify the foundational understanding of AI in medicine, setting the stage for subsequent in-depth explorations.
The main challenge of this study involves collating a vast array of research in a comprehensive yet succinct manner. To this end, we used a computer-aided bibliometrics analysis using Python (version 3.11) and the Spyder IDE (version 5.4.3), capitalizing on Python’s capacity to aggregate and analyze medical AI articles indexed in PubMed since 2000. Using this approach, we systematically parsed the keywords from PubMed’s AI publication to identify patterns and focal points in medical AI research.
Our methodology incorporates a computer-aided text analysis within the framework of bibliometric analysis, a technique increasingly favored across various disciplines, particularly in medicine, because of its ability to interpret semantic meaning []. Key to our analysis was text mining and unsupervised machine learning topic modeling facilitated by a Python algorithm, methods known to offer critical insights into existing studies and future research directions [].
Study PeriodOur study spanned from 2000 to 2022, a more extensive timeframe than previous literature reviews. In this 23-year duration, we extracted a substantial number of articles from PubMed, justifying our decision to use a computational analysis approach, given the anticipated vast search results.
Keywords Identification StrategyTwo-Pronged Search StrategyPrevious research often used a broad concept of [,] conducting the literature reviews based on specific keywords, such as machine learning or deep learning [,]. However, this approach overlooks numerous AI-related articles. Given the absence of a predefined framework for this analysis, it was critical to identify search keywords. Thus, our first crucial task involved a 2-pronged search strategy: using Medical Subject Headings (MeSH) terms and associated text keywords derived from those MeSH terms. This methodology was adopted to ensure an expansive capture of medical AI articles. The detailed search strategy is presented in [,,,-]. For example, when searching for articles on the topic of deep learning, our Python script was designed to perform a comprehensive scan. This involved not only searching for the MeSH tag deep learning but also using deep learning as a text search keyword in both the title and abstract. In addition, we incorporated the entry terms associated with the deep learning MeSH tag into our search criteria. This search strategy has been shown to enhance the efficiency of the literature review [].
Distribution AnalysisUpon obtaining the search results using this method, we took the analysis a step further by investigating the distribution of research across various AI domains. To accomplish this, we devised a dictionary based on the 8 AI domains and associated keywords identified in the 2020 AI Watch report published by the European Commission Joint Research Centre (JRC) [] (). This dictionary served as our analytical tool, helping us discern and understand the evolving research patterns in the diverse AI sector based on our search results from PubMed.
Textbox 1. Artificial intelligence (AI) domains from the Joint Research Centre AI Watch report.AI domain and subdomains
CoreReasoningKnowledge representationAutomated reasoningCommon sense reasoningPlanning Planning and schedulingSearchingOptimizationLearning Communication Natural language processingPerception Computer visionAudio processingTransversalIntegration and interaction Multiagent systemsRobotics and automationConnected and automated vehiclesService AI ethics and philosophy AI ethicsPhilosophy of AIEligibility Criteria for Article SelectionArticles that fulfilled the following criteria were included in this study: (1) articles with artificial intelligence MeSH tags or AI-related keywords in the titles and abstracts; (2) articles published from January 1, 2000, to December 31, 2022, in PubMed; (3) articles written in English; and (4) peer-reviewed journal articles. The abovementioned 4 searching criteria were coded directly in our Python searching script.
Data ProcessingGiven the restrictions imposed by PubMed on the volume of downloadable publication data, our approach required the use of a custom Python algorithm. This algorithm efficiently retrieved extensive metadata from articles regarding AI in medicine, encompassing the article titles, authors, publication dates, and abstracts. The application of Python allowed us to bypass these limitations and ensure a comprehensive collection of relevant data. illustrates the process we carried out using Python to extract data from PubMed.
Concurrently, within the structure of computer-aided text analysis, our Python algorithm directly counted keyword frequencies in the titles and abstracts from the acquired datasheet while assigning the relevant articles to their respective domains based on the JRC AI Watch report.
Topic Modeling Using Latent Dirichlet AllocationDrawing from the metadata of publications retrieved from PubMed, we used latent Dirichlet allocation (LDA) topic modeling—an established method in various academic research fields, including technology management, computer science, and biomedicine [-]—to discern shifts in research areas within individual AI domains. Our LDA topic modeling, conducted in Python using the Gensim library, used unsupervised machine learning to analyze vast quantities of unstructured data. It allocated each article to a probable topic based on word frequency [,].
The Gensim LDA model, premised on fundamental natural language processing concepts, initiated the process by cleaning the data and then preparing the tokens, corpus, and dictionary before training the program for topic clustering [,]. The versatility of this method extends beyond our study as it is applicable to various data sources and disciplines. For instance, Abd-Alrazaq et al [] used a topic modeling approach to identify top concerns regarding COVID-19 based on posts on Twitter, whereas Lee and Kang [] applied the same method to find the top 50 topics in the technology and innovation management studies from 11,693 articles published in top technology and innovation management journals.
To facilitate further analysis, we refined the search results by discarding publications that failed to meet our selection criteria, leaving us with a corpus of 307,701 entries (refer to for the selection process). Subsequently, these results were organized into 8 AI domains defined by JRC. PubMed’s AI-related publications have witnessed exponential growth, as detailed in , and illustrates the distributions of studies across each domain over the years.
The landscape of AI-related studies has undergone a substantial transformation over the past 2 decades. In 2000, we found just 1614 AI-related studies, a number that nearly quadrupled within a decade. By 2022, the count had surged to 58,458, representing a 36-fold increase. The geographic spread of these publications () shows the United States leading with 68,502 articles in the past 23 years, followed closely by China’s 57,460. Notably, China’s annual output of medical AI publications over the past 3 years has surpassed that of the United States. Our research strategy centered on the first author’s information to manage the complexity introduced by the considerable number of multiauthor publications. This decision allowed us to maintain the rigor of our analysis while navigating the vast data set effectively.
By juxtaposing the annual growth of AI publications with the total number of articles on PubMed (), we found that although both increased annually, AI research grew more substantially. The learning domain stood out, contributing to 62.88% (16,254/25,850) to 76.09% (44,481/58,458) of AI research over the last 4 years and totaling 44,481 articles in 2022. An analysis of keyword occurrences within each domain reaffirmed this dominance; over half of the top 20 keywords belonged to the learning domain ().
The communication, integration and interaction, and services domains have also grown, but their share remained relatively constant. Conversely, despite an annual increase in publication count, the reasoning domain lagged in overall growth. Research in AI ethics and the philosophy of AI, a relatively novel field, has shown promising growth, from 20 articles in 2000 to 2613 in 2021.
It should be noted that the sum of domain-specific articles in does not match the total because of the different sets of keywords used for downloading articles (MeSH terms and associated text keywords) and counting words (JRC AI Watch report’s taxonomy). Similarly, the sum of domain-specific percentages in does not add up to 100% because the articles overlap across domains, resulting in a sum >1 in specific years.
Table 2. The number and percentage of papers with artificial intelligence (AI) keywords (N=307,701).Domain yearReasoning, n (%)Planning, n (%)Learning, n (%)Communication, n (%)Perception, n (%)Integration and interaction, n (%)Services, n (%)AI ethics and philosophy, n (%)Total, n (%)2000130 (8.05)140 (8.67)938 (58.12)53 (3.28)75 (4.65)15 (0.93)115 (7.13)20 (1.24)1614 (0.52)2001147 (8.09)140 (7.71)1079 (59.38)67 (3.69)74 (4.07)16 (0.88)131 (7.21)31 (1.71)1817 (0.59)2002103 (5.24)147 (7.48)1109 (56.47)86 (4.38)74 (3.77)17 (0.87)135 (6.87)30 (1.53)1964 (0.64)2003132 (5.28)200 (7.99)1336 (53.40)103 (4.12)104 (4.16)30 (1.20)160 (6.39)41 (1.64)2502 (0.81)2004151 (4.34)257 (7.39)1757 (50.53)125 (3.60)137 (3.94)35 (1.01)197 (5.67)57 (1.64)3477 (1.13)2005165 (3.70)367 (8.24)2104 (47.24)176 (3.95)199 (4.47)38 (0.85)243 (5.46)68 (1.53)4454 (1.45)2006213 (4.19)378 (7.43)2445 (48.05)175 (3.44)202 (3.97)25 (0.49)283 (5.56)116 (2.28)5088 (1.65)2007149 (2.63)465 (8.22)2653 (46.90)210 (3.71)250 (4.42)41 (0.72)295 (5.21)115 (2.03)5657 (1.84)2008206 (3.36)455 (7.43)2892 (47.23)253 (4.13)253 (4.13)45 (0.73)405 (6.61)117 (1.91)6123 (1.99)2009159 (2.57)436 (7.05)3002 (48.54)237 (3.83)254 (4.11)52 (0.84)385 (6.23)132 (2.13)6184 (2.01)2010194 (3.21)409 (6.76)3076 (50.82)262 (4.33)251 (4.15)57 (0.94)377 (6.23)197 (3.25)6053 (1.97)2011205 (2.94)450 (6.45)3506 (50.27)286 (4.10)286 (4.10)57 (0.82)468 (6.71)218 (3.13)6975 (2.27)2012240 (3.09)511 (6.57)3986 (51.29)331 (4.26)300 (3.86)63 (0.81)538 (6.92)237 (3.05)7772 (2.53)2013274 (2.85)599 (6.23)4694 (48.81)446 (4.64)381 (3.96)73 (0.76)759 (7.89)370 (3.85)9617 (3.13)2014254 (2.45)661 (6.37)5155 (49.70)477 (4.60)444 (4.28)90 (0.87)826 (7.96)331 (3.19)10,372 (3.37)2015280 (2.42)704 (6.09)5804 (50.22)658 (5.69)429 (3.71)92 (0.80)971 (8.40)426 (3.69)11,558 (3.76)2016327 (2.59)777 (6.16)6349 (50.32)593 (4.70)503 (3.99)109 (0.86)1188 (9.42)487 (3.86)12,617 (4.10)2017342 (2.31)901 (6.09)7956 (53.76)765 (5.17)625 (4.22)143 (0.97)1344 (9.08)594 (4.01)14,799 (4.81)2018393 (2.03)1333 (6.90)11,272 (58.31)974 (5.04)843 (4.36)172 (0.89)1985 (10.27)771 (3.99)19,330 (6.28)2019427 (1.65)1680 (6.50)16,254 (62.88)1461 (5.65)1126 (4.36)251 (0.97)2773 (10.73)1125 (4.35)25,850 (8.40)2020567 (1.60)2529 (7.13)23,493 (66.24)1884 (5.31)1512 (4.26)373 (1.05)4119 (11.61)1831 (5.16)35,467 (11.53)2021749 (1.50)3877 (7.76)35,333 (70.73)2938 (5.88)2229 (4.46)482 (0.96)5741 (11.49)2613 (5.23)49,953 (16.23)2022845 (1.45)4913 (8.40)44,481 (76.09)3667 (6.27)2630 (4.50)691 (1.18)6516 (11.15)3413 (5.84)58,458 (19.00)All years6652 (2.16)22,329 (7.26)190,674 (61.97)16,227 (5.27)13,181 (4.28)2967 (0.96)29,954 (9.73)13,340 (4.34)307,701 (100)Citation AnalysisGiven the sizable research data, conventional software such as VOSviewer (version 1.6.19; Leiden University) fell short of our needs. Hence, we crafted a Python script to extract author and citation data. From the pool of 307,701 publications, we found 1,054,040 contributing authors. Solo-author publications constitute a mere 3.69% (11,347/307,701), which dwindled from 13.44% (217/1614) in 2000 to 2.71% (1586/58,458) by 2022. This observation underscores an increasing inclination toward collaboration in medical AI research, likely propelled by the field’s complexity and advancement in technology. The collaboration index, which stood at 3.43, supports this finding. Given our broad perspective, we omitted h-index and i-index calculations because they offer limited insight when considering our extensive data set.
Over the past 23 years, the number of citations has reached 3,425,831, averaging 11 (SD 62.50) per publication. Regarding citations for individual countries, the United States leads by a considerable margin, boasting 1,196,517 citations with an average of 17 (SD 118.85) citations per article, far surpassing other nations. Despite China’s substantial publication count, its average number of citations per paper stood at 7 (SD 19.34), trailing behind the United States and several European countries. illustrates the publications and citations of the top 10 active countries and organizations in the United States and China. Among these, Stanford University (United States) leads with an average of 41 (SD 484.08) citations per paper, whereas Zhejiang University (China), despite being prolific, has an average citation count of only 7 (SD 35.17).
Table 3. Top 10 most active countries and organizations in the United States and China.In the context of the JRC’s AI domains, the dominance of learning is also evident in its citation numbers. shows the citation situation across each domain. To account for the considerable citation disparity among domains, is presented in log scale, offering a more accurate picture of each domain’s standing.
In the span of the past 23 years, the most highly cited paper was published in 2003 and focused on an open software that helps scientists visualize and analyze how different molecules in a cell interact and can be customized with add-ons to perform even more specific studies []. Garnering 18,081 citations constitutes only 0.53% (18,081/3,425,831) of the total citation volume, demonstrating the breadth and diversity of the research within the field.
Through our Python-driven analysis, we comprehensively examined the degree distribution of publications and their respective citations, as depicted in . This network comprises 1,603,481 nodes, representing individual papers, and 3,423,669 edges, symbolizing citations among these papers. Notably, many nodes have sparse connectivity, indicating papers with limited citations. However, a distinct subset, represented by the magenta dots, accounts for the top 1% of the papers with a remarkably high citation count. These papers, or “hubs,” act as the primary influencers in our network. Such a scale-free distribution, where a minority possesses numerous connections and the majority has fewer connections, mirrors common patterns in citation networks. This exemplifies the scenario where only a handful of papers garner the most citations.
Furthermore, our citation analysis revealed that only 1.21% (3716/307,701) of the papers had been cited >100 times. Astonishingly, a substantial 78.67% (242,054/307,701) had been cited <100 times, and 20.13% (61,931/307,701) had not been cited at all. On the basis of this discernment, we narrowed our subsequent analysis to focus exclusively on PubMed identifiers (PMIDs) with >100 citations, ensuring that we captured the most influential connections. By limiting the volume of papers in this manner, we were able to use VOSviewer effectively for illustrative visualizations of the data.
and provide an intricate perspective on the network and coauthorship patterns of the most cited papers in PubMed []. portrays the core thematic clusters of AI research: red for signal transduction, yellow for neural networks, a meld of blue and green signifying algorithms, green representing deep learning, and blue demarcating software.
A salient observation from is the distinct sparsity of interconnections in the cocitation landscape among influential works. Although author names identify clusters, they align with specific thematic nuances, as suggested by our examination of the associated publications. The red cluster is associated with medical imaging [,], blue encompasses computational methods [,], brown touches on pattern recognition [,], lilac provides insights into genomics and genetics [,], and dark orange centers around immune response mechanisms [,]. The minimal connectivity among these clusters indicates less cross-domain collaboration than is typically observed [,]. Such an isolated pattern emphasizes the specialized nature of research, bolstering our decision to engage in detailed domain-specific topic modeling in our subsequent sections [].
Although there are overarching themes in AI research in medicine, individual works seem to delve deeply into specific domains without broad interconnections with others. This siloed approach suggests that to genuinely understand the intricacies of medical AI, one must dive into each domain independently. Such insights form the foundation for our next analytical step. We harnessed the LDA technique by targeting the titles and abstracts of the entries from PubMed. This allowed us to tease out nuanced research topics from the vast data set. Given the expansive timeline and sheer volume of articles, we segmented the data into 5-year intervals, conducting distinct topic modeling for each period within separate AI domains. This strategic division enables a meticulous tracing of the progression of medical AI, offering a refined perspective on its multifaceted evolution.
Building on our observations that underscored the need for domain-specific exploration, our methodological choices in the succeeding phase took a meticulous approach. Owing to the surge in AI research publications, particularly in 2020 and 2021, we made an exception to group these years together for topic modeling. This was attributed to the substantially large volume of articles published during this brief period. Meanwhile, the notable output from 2022 was considered as an independent entity for the analysis.
By leveraging the capabilities of the LDA model, topics were extracted based on the keyword combinations identified by our Python algorithm. We associated these combinations with the PMIDs that best represented each topic within our Python script to make these combinations more interpretable. This aided in disambiguating the topics and ensured a deeper comprehension of the themes that sometimes seemed elusive owing to the abstract nature of the keyword groupings.
Our subsequent findings revealed exciting disparities in the volume of articles representing each topic. To streamline our results and accentuate the most impactful research areas, we arranged the topics according to the number of their corresponding articles. This allowed us to highlight the top 5 topics for each AI domain across the delineated time frames, as presented in .
Table 4. Top 5 topics in each domain.AIa domain2000-20042005-20092010-20142015-20192020-20212022CoreaAI: artificial intelligence.
bANN: artificial neural network.
cMRI: magnetic resonance imaging.
dECG: electrocardiography.
eNLP: natural language processing.
ffMRI: functional magnetic resonance imaging.
The recurring presence of specific topics across various domains (as detailed in ) is notable. This convergence can be attributed to articles encompassing multiple topics, with keywords that resonate with >1 domain. For instance, terms related to machine learning, deep learning, and neural networks are evident in 5 distinct domains: reasoning, communication, perception, integration and interaction, as well as AI ethics and philosophy. Although these terms do not appear explicitly in the learning domain, topics from this domain, such as predictive medicine, disease diagnosis, and behavior recognition, are often underpinned by machine learning, deep learning, and neural network methodologies. This absence of explicit terminology might stem from the emphasis of titles and abstracts on application rather than detailing the specific methodology, reflecting variations in thematic focus. The widespread adoption of these techniques across diverse domains indicates their fundamental role in shaping AI applications within medicine []. In addition, themes centered on diagnosis and medical applications consistently surface in several domains, underscoring the transformative potential of AI in augmenting diagnostic precision and treatment efficacy.
Delving into the realm of AI in medicine, our analysis yielded profound insights across multiple facets. The following is a preliminary snapshot of our key findings:
Over the past 23 years, the medical evolution of AI has been remarkable, with the United States leading and China quickly catching up. The learning domain is a central focus, which is complemented by growth in areas such as AI ethics.Research from the United States stands out in influence, as evidenced by citation counts. Despite China’s large publication volume, Europe, especially the United Kingdom, Germany, and France, shines in impactful contributions. The learning domain dominates in citations owing to its research significance and volume.AI research presents distinctive thematic clusters, with certain “hub” publications guiding the direction of AI in medicine.LDA-based analyses reveal pivotal roles for machine learning, deep learning, and neural networks within AI disciplines. These findings align with scholarly insights that underscore the transformative role of AI in diverse areas of clinical practice.Publication AnalysisOur research has highlighted the swift evolution of AI in medicine, underscored by an accelerating publication rate over the past 23 years [-]. The world map of this progression prominently features the United States and China, with both nations leaving a discernible footprint on AI medical research. With its consistent contributions over the last 23 years, the United States has cemented its position as a stalwart, but China’s burgeoning contributions hint at a potential shift in the epicenter of AI-driven medical research in the coming years. The learning domain, as defined by the JRC, emerges as a primary focus in AI research. Keywords within this domain appear frequently, underscoring its central role in the field. Concurrently, other domains such as communication, integration and interaction, and services have witnessed growth, with their share o
Comments (0)