A Current Review of Generative AI in Medicine: Core Concepts, Applications, and Current Limitations

OpenAI. ChatGPT [Internet]. ChatGPT. Available from: https://chatgpt.com.

Anthropic. Claude [Internet]. Available from: https://claude.ai.

Google. Gemini [Internet]. Available from: https://gemini.google.com.

Langlotz CP. Will artificial intelligence replace radiologists? Radiol Artif Intell Radiological Soc North Am (RSNA). 2019;1(3):e190058.

Google Scholar 

Market Dynamics. and Investment Trends in the U.S. Technology Space - focus on AI investments.

AComprehensive. Survey of Large Language Models and Multimodal Large Language Models in Medicine.

Hartsock I, Rasool G. Vision-language models for medical report generation and visual question answering: A review [Internet]. arXiv [csCV] 2024. Available from: http://arxiv.org/abs/2403.02469

Liu L, Yang X, Lei J et al. A survey on Medical Large Language Models: Technology, Application, Trustworthiness, and future directions [Internet]. arXiv [csCL] 2024. Available from: http://arxiv.org/abs/2406.03712

Jiang Y, Omiye JA, Zakka C et al. Evaluating General Vision-Language Models for Clinical Medicine [Internet]. Health Informatics medRxiv; 2024. Available from: https://www.medrxiv.org/content/https://doi.org/10.1101/2024.04.12.24305744v2

Jakhar D, Kaur I. Artificial intelligence, machine learning and deep learning: definitions and differences. Clin Exp Dermatol Wiley. 2020;45(1):131–2.

CAS  Google Scholar 

Hughes RT, Zhu L, Bednarz T. Generative adversarial networks-enabled human-artificial intelligence collaborative applications for creative and design industries: A systematic review of current approaches and trends. Front Artif Intell Front Media SA. 2021;4:604234.

Google Scholar 

Schmidhuber J. Annotated history of modern AI and deep learning [Internet]. arXiv [csNE] 2022. Available from: http://arxiv.org/abs/2212.11279

Taye MM. Theoretical Understanding of convolutional neural network: concepts, architectures, applications, future directions. Comput (Basel) MDPI AG. 2023;11(3):52.

Google Scholar 

Vaswani A, Shazeer N, Parmar N et al. Attention is all you need. Adv Neural Inf Process Syst [Internet]. 2017;30. Available from: https://proceedings.neurips.cc/paper/7181-attention-is-all

Takahashi S, Sakaguchi Y, Kouno N, et al. Comparison of vision Transformers and convolutional neural networks in medical image analysis: A systematic review. J Med Syst Springer Sci Bus Media LLC. 2024;48(1):84.

Google Scholar 

Moutik O, Sekkat H, Tigani S, et al. Convolutional neural networks or vision Transformers: who will win the race for action recognitions in visual data? Sensors (Basel). MDPI AG. 2023;23(2):734.

Google Scholar 

Arsov N, Mirceva G. Network embedding: An overview [Internet]. arXiv [csLG] 2019. Available from: https://doi.org/10.48550/arXiv.1911.11726

Wang Y, Yao Y, Tong H, Xu F, Lu J. A brief review of network embedding. Volume 2. Big Data Min Anal Tsinghua University; 2019. pp. 35–47. 1.

Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation [Internet]. arXiv [csCV] 2015. Available from: https://doi.org/10.48550/arXiv.1505.04597

Wang L, Yang N, Huang X, Yang L, Majumder R, Wei F. Improving text embeddings with large language models [Internet]. arXiv [csCL] 2024. Available from: https://doi.org/10.48550/arXiv.2401.00368

Alqahtani H, Kavakli-Thorne M, Kumar G. Applications of generative adversarial networks (GANs): an updated review. Arch Comput Methods Eng Springer Sci Bus Media LLC. 2021;28(2):525–52.

Google Scholar 

Yu Y, Si X, Hu C, Zhang J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput MIT Press - Journals. 2019;31(7):1235–70.

Google Scholar 

Croitoru F-A, Hondru V, Ionescu RT, Shah M. Diffusion models in vision: A survey. IEEE Trans Pattern Anal Mach Intell Inst Electr Electron Eng (IEEE). 2023;45(9):10850–69.

Google Scholar 

Ho J, Jain A, Abbeel P. Denoising Diffusion Probabilistic Models [Internet]. arXiv [csLG] 2020. Available from: http://arxiv.org/abs/2006.11239

Ho J, Salimans T. Classifier-Free Diffusion Guidance [Internet]. arXiv [csLG] 2022. Available from: http://arxiv.org/abs/2207.12598

Rombach R, Blattmann A, Lorenz D, Esser P, Ommer B. High-resolution image synthesis with latent diffusion models [Internet]. arXiv [csCV] 2021 [cited 2023 Dec 30]. pp. 10684–10695. Available from: http://openaccess.thecvf.com/content/CVPR2022/html/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.html

Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large Language models in medicine. Nat Med. 2023;29(8):1930–40.

CAS  PubMed  Google Scholar 

Rust P, Pfeiffer J, Vulić I, Ruder S, Gurevych I. How good is your tokenizer? On the monolingual performance of multilingual language models [Internet]. arXiv [csCL] 2020. Available from: https://doi.org/10.48550/arXiv.2012.15613

Wu J, Gan W, Chen Z, Wan S, Yu PS. Multimodal large language models: A survey. 2023 IEEE International Conference on Big Data (BigData) IEEE; 2023. pp. 2247–2256.

Lee K, Ippolito D, Nystrom A et al. Deduplicating training data makes language models better [Internet]. arXiv [csCL] 2021. Available from: https://doi.org/10.48550/arXiv.2107.06499

Faiz A, Kaneda S, Wang R et al. LLMCarbon: Modeling the end-to-end carbon footprint of large language models. arXiv [csCL] [Internet] arXiv; 2023; Available from: https://doi.org/10.48550/arXiv.2309.14393

Zhang S, Dong L, Li X et al. Instruction tuning for large language models: A survey [Internet]. arXiv [csCL] 2023. Available from: https://doi.org/10.48550/arXiv.2308.10792

Wang Z, Bi B, Pentyala SK et al. A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More [Internet]. arXiv [csCL] 2024. Available from: https://doi.org/10.48550/arXiv.2407.16216

Chen B, Zhang Z, Langrené N, Zhu S. Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review [Internet]. arXiv [csCL] 2023. Available from: https://doi.org/10.48550/arXiv.2310.14735

Parthasarathy VB, Zafar A, Khan A, Shahid A. The ultimate guide to fine-tuning LLMs from basics to breakthroughs: An exhaustive review of technologies, research, best practices, applied research challenges and opportunities [Internet]. arXiv [csLG] 2024. Available from: https://doi.org/10.48550/arXiv.2408.13296

Shi H, Xu Z, Wang H et al. Continual learning of large language models: A comprehensive survey [Internet]. arXiv [csLG] 2024. Available from: https://doi.org/10.48550/arXiv.2404.16789

Dubey A, Jauhri A, Pandey A et al. The Llama 3 herd of models [Internet]. arXiv [csAI] 2024. Available from: https://doi.org/10.48550/arXiv.2407.21783

Liu H, Li C, Wu Q, Lee YJ. Visual Instruction Tuning [Internet]. arXiv [csCV] 2023. Available from: https://doi.org/10.48550/arXiv.2304.08485

Zhang G, Jin Q, Zhou Y, et al. Closing the gap between open source and commercial large Language models for medical evidence summarization. NPJ Digit Med Springer Sci Bus Media LLC. 2024;7(1):239.

Google Scholar 

Khosravi B, Li F, Dapamede T et al. Synthetically Enhanced: Unveiling Synthetic Data’s Potential in Medical Imaging Research [Internet]. arXiv [csCV] 2023. Available from: http://arxiv.org/abs/2311.09402

CXR-IRGen. An Integrated Vision and Language Model for the Generation of Clinically Accurate Chest X-Ray Image-Report Pairs.

Rouzrokh P, Khosravi B, Faghani S, Moassefi M, Vahdati S, Erickson BJ. Multitask brain tumor inpainting with diffusion models: A methodological report. ArXiv Preprint arXiv:221012113 2022.

Khosravi B, Rouzrokh P, Erickson BJ, et al. Analyzing Racial differences in imaging joint replacement registries using generative artificial intelligence: advancing orthopaedic data equity. Arthroplast Today Elsevier BV. 2024;29(101503):101503.

Google Scholar 

Liu T, Han S, Xie L, et al. Super-resolution reconstruction of ultrasound image using a modified diffusion model. Phys Med Biol IOP Publishing. 2024;69(12):125026.

CAS  Google Scholar 

Xu X, Kapse S, Prasanna P. Histo-diffusion: A diffusion super-resolution method for digital pathology with comprehensive quality assessment [Internet]. arXiv [eessIV] 2024. Available from: http://arxiv.org/abs/2408.15218

Li G, Rao C, Mo J, Zhang Z, Xing W, Zhao L. Rethinking diffusion model for multi-contrast MRI super-resolution [Internet]. arXiv [csCV] 2024. Available from: http://arxiv.org/abs/2404.04785

Lyu Q, Wang G. Conversion between CT and MRI images using diffusion and score-matching models [Internet]. arXiv [eessIV] 2022. Available from: http://arxiv.org/abs/2209.12104

Rouzrokh P, Khosravi B, Faghani S et al. RadRotator: 3D Rotation of Radiographs with Diffusion Models [Internet]. arXiv [eessIV] 2024. Available from: http://arxiv.org/abs/2404.13000

Wolleb J, Bieder F, Sandkühler R, Cattin PC. Diffusion Models for Medical Anomaly Detection [Internet]. arXiv [eessIV] 2022. Available from: http://arxiv.org/abs/2203.04306

Amit T, Shaharbany T, Nachmani E, Wolf L, SegDiff. Image segmentation with diffusion probabilistic models [Internet]. arXiv [csCV] 2021. Available from: http://arxiv.org/abs/2112.00390

Han J, Park J, Huh J, Oh U, Do J, Kim D, AscleAI:. A LLM-based clinical note management system for enhancing clinician productivity. Extended Abstracts of the CHI Conference on Human Factors in Computing Systems New York, NY, USA: ACM; 2024. pp. 1–7.

Jung H, Kim Y, Choi H et al. Enhancing clinical efficiency through LLM: Discharge note generation for cardiac patients [Internet]. arXiv [csCL] 2024. Available from: http://arxiv.org/abs/2404.05144

Yuan D, Rastogi E, Naik G et al. A continued pretrained LLM approach for automatic medical note generation [Internet]. arXiv [csCL] 2024. Available from: http://arxiv.org/abs/2403.09057

Chen Z, Luo L, Bie Y, Chen H, Dia-LLaMA. Towards large language model-driven CT report generation [Internet]. arXiv [csCV] 2024. Available from: http://arxiv.org/abs/2403.16386

Zhang L, Liu M, Wang L, et al. Constructing a large Language model to generate impressions from findings in radiology reports. Radiol Radiological Soc North Am (RSNA). 2024;312(3):e240885.

Google Scholar 

Vaccaro M, Almaatouq A, Malone T. When combinations of humans and AI are useful: A systematic review and meta-analysis. Nat Hum Behav. 2024;8(12):2293–303.

PubMed  PubMed Central  Google Scholar 

Mohammadi FG, Sebro R. Artificial intelligence impact on burnout in radiologists-alleviation or exacerbation? JAMA Netw Open Am Med Association (AMA). 2024;7(11):e2448720.

Google Scholar 

Liu H, Ding N, Li X, et al. Artificial intelligence and radiologist burnout. JAMA Netw Open. 2024;7(11):e2448714.

PubMed  PubMed Central  Google Scholar 

Chisholm M, Magudia K. Beyond the AJR: Reevaluating the impact of artificial intelligence on radiologist burnout. AJR Am J Roentgenol [Internet]. 2025; Available from: https://doi.org/10.2214/AJR.25.32713

Chen S, Guevara M, Moningi S, et al. The effect of using a large Language model to respond to patient messages. Lancet Digit Health Elsevier BV. 2024;6(6):e379–81.

CAS  Google Scholar 

Wang S, Liu T, Kinoshita S, Yokoyama HM. LLMs may improve medical communication: social science perspective. Postgrad Med J [Internet] Oxford University Press (OUP); 2024; Available from: https://doi.org/10.1093/postmj/qgae101

Lucas HC, Upperman JS, Robinson JR. A systematic review of large Language models and their implications in medical education. Med Educ Wiley. 2024;58(11):1276–85.

Google Scholar 

Zhu Y, Tang W, Sun Y, Yang X. The potential of LLMs in medical education: Generating questions and answers for qualification exams [Internet]. arXiv [csCL] 2024. Available from: http://arxiv.org/abs/2410.23769

AlSaad R, Abd-Alrazaq A, Boughorbel S, et al. Multimodal large Language models in health care: applications, challenges, and future outlook. J Med Internet Res J Med Internet Res. 2024;26(1):e59505.

PubMed  Google Scholar 

Jia S, Bit S, Searls E et al. MedPodGPT: A multilingual audio-augmented large language model for medical research and education. medRxiv [Internet]. 2024; Available from: https://doi.org/10.1101/2024.07.11.24310304

Zhang Y, Xia T, Saeed A, Mascolo C. RespLLM: Unifying audio and text with multimodal LLMs for generalized respiratory health prediction [Internet]. arXiv [csLG] 2024. Available from: http://arxiv.org/abs/2410.05361

Ozawa T, Hayashi Y, Oda H, et al. Synthetic laparoscopic video generation for machine learning-based surgical instrument segmentation from real laparoscopic video and virtual surgical instruments. Comput Methods Biomech Biomed Eng Imaging Vis Informa UK Ltd. 2021;9(3):225–32.

Google Scholar 

Seibold M, Hoch A, Farshad M, Navab N, Fürnstahl P. Conditional generative data augmentation for clinical audio datasets [Internet]. arXiv [csSD] 2022. Available from: http://arxiv.org/abs/2203.11570

Iliash I, Allmendinger S, Meissen F, Kühl N, Rückert D. Interactive generation of laparoscopic videos with diffusion models. Lecture Notes in Computer Science. Cham: Springer Nature Switzerland; 2025. pp. 109–18.

Google Scholar 

Cho J, Schmidgall S, Zakka C et al. SurGen: Text-guided diffusion model for surgical video generation [Internet]. arXiv [csCV] 2024. Available from: http://arxiv.org/abs/2408.14028

Li C, Liu H, Liu Y et al. Endora: Video generation models as endoscopy simulators [Internet]. arXiv [csCV] 2024. Available from: http://arxiv.org/abs/2403.11050

Chu SN, Goodell AJ. Synthetic patients: Simulating difficult conversations with multimodal generative AI for medical education [Internet]. arXiv [csHC] 2024. Available from: http://arxiv.org/abs/2405.19941

Preiksaitis C, Rose C. Opportunities, challenges, and future directions of generative artificial intelligence in medical education: scoping review. Volume 9. JMIR Med Educ JMIR Publications Inc.; 2023. p. e48785. 1.

Martikainen M. Estimating generative AI impacts in public social and health care language translation services [Internet]. 2024 [cited 2024 Nov 10]. Available from: https://urn.fi/URN:NBN:fi-fe2024091070021

Mayol J. Transforming abdominal wall surgery with generative artificial intelligence. J Abdom Wall Surg. 2023;2:12419.

PubMed  PubMed Central  Google Scholar 

Mohamed AA, Lucke-Wold B. Text-to-video generative artificial intelligence: Sora in neurosurgery. Neurosurg Rev Springer Sci Bus Media LLC. 2024;47(1):272.

Google Scholar 

Zhang C, Hallbeck MS, Salehinejad H, Thiels C. The integration of artificial intelligence in robotic surgery: A narrative review. Surg Elsevier BV. 2024;176(3):552–7.

Google Scholar 

Liu R, Bai Y, Yue X, Zhang P. Teach Multimodal LLMs to Comprehend Electrocardiographic Images [Internet]. arXiv [eessIV] 2024. Available from: http://arxiv.org/abs/2410.19008

Jin Y, Zhang Y, OrthoDoc. Multimodal large language model for assisting diagnosis in computed Tomography [Internet]. arXiv [eessIV] 2024. Available from: http://arxiv.org/abs/2409.09052

Dai L, Lei J, Ma F, et al. Boosting deep learning for interpretable brain MRI lesion detection through the integration of radiology report information. Radiol Artif Intell. 2024;6(6):e230520.

PubMed  PubMed Central  Google Scholar 

Renc P, Jia Y, Samir AE, et al. Zero shot health trajectory prediction using transformer. NPJ Digit Med. 2024;7(1):256.

PubMed  PubMed Central  Google Scholar 

Fraga N. Challenging LLMs beyond information retrieval: Reasoning degradation with long context windows [Internet]. Preprints. 2024. Available from: https://doi.org/10.20944/preprints202408.1527.v1

Amugongo LM, Mascheroni P, Brooks SG, Doering S, Seidel J. Retrieval augmented generation for large Language Models in healthcare: A systematic review [Internet]. Preprints 2024. Available from: https://doi.org/10.20944/preprints202407.0876.v1

Zakka C, Shad R, Chaurasia A et al. Almanac - retrieval-augmented language models for clinical medicine. NEJM AI [Internet] Massachusetts Medical Society; 2024;1(2). Available from: https://doi.org/10.105

Comments (0)

No login
gif