Deep learning for multi-modal medical image segmentation: a survey and comparative study

Al Khalil, Y., Amirrajab, S., Lorenz, C., Weese, J., Pluim, J., & Breeuwer, M. (2023). Reducing segmentation failures in cardiac mri via late feature fusion and gan-based augmentation. Computers in Biology and Medicine,161, 106973

Ali, S., Li, J., Pei, Y., Khurram, R., Rehman, K. U., & Mahmood, T. (2022). A comprehensive survey on brain tumor diagnosis using deep learning and emerging hybrid techniques with multi-modal mr image. Archives of Computational Methods in Engineering, 29(7), 4871–4896.

Article  Google Scholar 

Andrade-Miranda, G., Jaouen, V., Tankyevych, O., Le Rest, C. C., Visvikis, D., & Conze, P. H. (2023). Multi-modal medical transformers: A meta-analysis for medical image segmentation in oncology. Computerized Medical Imaging and Graphics,110, Article 102308.

Arabahmadi, M., Farahbakhsh, R., Rezazadeh, J. (2022). Deep learning for smart healthcare—a survey on brain tumor detection from medical imaging. Sensors 22(5), 1960

Armato, S. G., Huisman, H., Drukker, K., Hadjiiski, L., Kirby, J. S., Petrick, N., Redmond, G., Giger, M. L., Cha, K., Mamonov, A., et al. (2018). Prostatex challenges for computerized classification of prostate lesions from multiparametric magnetic resonance images. Journal of Medical Imaging, 5(4), 044501–044501.

Article  PubMed  PubMed Central  Google Scholar 

Atek S, Mehidi I, Jabri D, Belkhiat DE (2022) Swint-unet: hybrid architecture for medical image segmentation based on swin transformer block and dual-scale information. In: 2022 7th International Conference on Image and Signal Processing and their Applications (ISPA), IEEE, pp. 1–6

Azam, M. A., Khan, K. B., Salahuddin, S., Rehman, E., Khan, S. A., Khan, M. A., Kadry, S., & Gandomi, A. H. (2022). A review on multimodal medical image fusion: Compendious analysis of medical modalities, multimodal databases, fusion techniques and quality metrics. Computers in Biology and Medicine,144, 105253

Basu, S., Singhal, S., & Singh, D. (2024). A systematic literature review on multimodal medical image fusion. Multimedia Tools and Applications, 83(6), 15845–15913.

Article  Google Scholar 

Bouhafra, S., El Bahi, H. (2024). Deep learning approaches for brain tumor detection and classification using mri images (2020 to 2024): A systematic review. Journal of Imaging Informatics in Medicine pp. 1–31

Boveiri, H. R., Khayami, R., Javidan, R., & Mehdizadeh, A. (2020). Medical image registration using deep neural networks: a comprehensive review. Computers & Electrical Engineering,87, Article 106767.

Bui, T. D., Shin, J., & Moon, T. (2019). Skip-connected 3d densenet for volumetric infant brain mri segmentation. Biomedical Signal Processing and Control,54, Article 101613.

Cao, K., Bi, L., Feng, D., Kim, J. (2020). Improving pet-ct image segmentation via deep multi-modality data augmentation. In: Machine Learning for Medical Image Reconstruction: Third International Workshop, MLMIR 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, 8 October, 2020, Proceedings 3, Springer, pp 145–152

Cao, Z., Diao, W., Sun, X., Lyu, X., Yan, M., & Fu, K. (2021). C3net: Cross-modal feature recalibrated, cross-scale semantic aggregated and compact network for semantic segmentation of multi-modal high-resolution aerial images. Remote Sensing, 13(3), 528.

Article  Google Scholar 

Chen, L., Merhof, D. (2019). Mixnet: Multi-modality mix network for brain segmentation. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop, BrainLes 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 16 September, 2018, Revised Selected Papers, Part I 4, Springer, pp. 367–377

Chen, T., Xie, G. S., Yao, Y., Wang, Q., Shen, F., Tang, Z., & Zhang, J. (2021). Semantically meaningful class prototype learning for one-shot image segmentation. IEEE Transactions on Multimedia, 24, 968–980.

Article  Google Scholar 

Ciceri, T., Squarcina, L., Giubergia, A., Bertoldo, A., Brambilla, P., Peruzzo, D. (2023). Review on deep learning fetal brain segmentation from magnetic resonance images. Artificial Intelligence in Medicine p. 102608

Das, S., & Kundu, M. K. (2013). A neuro-fuzzy approach for medical image fusion. IEEE Transactions On Biomedical Engineering, 60(12), 3347–3353.

Article  PubMed  Google Scholar 

Dolz, J., Ben Ayed, I., Desrosiers, C. (2019). Dense multi-path u-net for ischemic stroke lesion segmentation in multiple image modalities. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop, BrainLes 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 16 September, 2018, Revised Selected Papers, Part I 4, Springer, pp. 271–282

Donthu, N., Kumar, S., Mukherjee, D., Pandey, N., & Lim, W. M. (2021). How to conduct a bibliometric analysis: An overview and guidelines. Journal of Business Research, 133, 285–296.

Article  Google Scholar 

Dorent, R., Joutard, S., Modat, M., Ourselin, S., Vercauteren, TKM. (2019). Hetero-modal variational encoder-decoder for joint modality completion and segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, https://api.semanticscholar.org/CorpusID:198897112

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929

Fang, L., Wang, X., & Wang, L. (2020). Multi-modal medical image segmentation based on vector-valued active contour models. Information Sciences, 513, 504–518.

Article  Google Scholar 

Fu, Y., Lei, Y., Wang, T., Curran, W. J., Liu, T., & Yang, X. (2021). A review of deep learning based methods for medical image multi-organ segmentation. Physica Medica, 85, 107–122.

Article  PubMed  PubMed Central  Google Scholar 

Ghavami, N., Hu, Y., Gibson, E., Bonmati, E., Emberton, M., Moore, C. M., & Barratt, D. C. (2019). Automatic segmentation of prostate mri using convolutional neural networks: Investigating the impact of network architecture on the accuracy of volume measurement and mri-ultrasound registration. Medical Image Analysis,58, 101558

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems 27

Guan, H., Yap, PT., Bozoki, A., Liu, M. (2024). Federated learning for medical image analysis: A survey. Pattern Recognition p. 110424

Guo, Z., Li, X., Huang, H., Guo, N., & Li, Q. (2019). Deep learning-based image segmentation on multimodal medical imaging. IEEE Transactions on Radiation and Plasma Medical Sciences, 3(2), 162–169.

Article  PubMed  PubMed Central  Google Scholar 

Guo, Z., Li, X., Huang, H., Guo, N., & Li, Q. (2019). Deep learning-based image segmentation on multimodal medical imaging. IEEE Transactions on Radiation and Plasma Medical Sciences, 3(2), 162–169. https://doi.org/10.1109/TRPMS.2018.2890359

Article  PubMed  PubMed Central  Google Scholar 

Hamghalam M, Lei B, Wang T (2020) Brain tumor synthetic segmentation in 3d multimodal mri scans. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 5th International Workshop, BrainLes 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, 17 October, 2019, Revised Selected Papers, Part I 5, Springer, pp. 153–162

Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, HR., Xu, D. (2021). Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In: International MICCAI Brainlesion Workshop, Springer, pp. 272–284

He, Z., He, Y., & Cao, W. (2023). Deformable image registration with attention-guided fusion of multi-scale deformation fields. Applied Intelligence, 53(3), 2936–2950.

Article  Google Scholar 

Hermessi, H., Mourali, O., & Zagrouba, E. (2021). Multimodal medical image fusion review: Theoretical background and recent advances. Signal Processing,183, Article 108036.

Hossain, E., Hossain, MS., Hossain, MS., Al Jannat, S., Huda, M., Alsharif, S., Faragallah, OS., Eid, M., Rashed, ANZ. (2022). Brain tumor auto-segmentation on multimodal imaging modalities using deep neural network. Computers, Materials & Continua 72(3)

Hossain, K. F., Kamran, S. A., Ong, J., & Tavakkoli, A. (2025). Enhancing efficient deep learning models with multimodal, multi-teacher insights for medical image segmentation. Scientific Reports, 15(1), 1–12.

Article  CAS  Google Scholar 

Huang, J., Le, Z., Ma, Y., Fan, F., Zhang, H., & Yang, L. (2020). Mgmdcgan: medical image fusion using multi-generator multi-discriminator conditional generative adversarial network. IEEE Access, 8, 55145–55157.

Article  Google Scholar 

Huang, L., Ruan, S., Decazes, P., & Denœux, T. (2025). Deep evidential fusion with uncertainty quantification and reliability learning for multimodal medical image segmentation. Information Fusion,113, Article 102648.

Huang, N., Liu, J., Miao, Y., Zhang, Q., Han, J. (2022). Deep learning for visible-infrared cross-modality person re-identification: A comprehensive review. Information Fusion

Iqbal, A., Sharif, M., Yasmin, M., Raza, M., & Aftab, S. (2022). Generative adversarial networks and its applications in the biomedical image segmentation: a comprehensive survey. International Journal of Multimedia Information Retrieval, 11(3), 333–368.

Article  PubMed  PubMed Central  Google Scholar 

Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2), 203–211.

Article  CAS  PubMed  Google Scholar 

Islam, M. Z., Naqvi, R. A., Haider, A., & Kim, H. S. (2023). Deep learning for automatic tumor lesions delineation and prognostic assessment in multi-modality pet/ct: A prospective survey. Engineering Applications of Artificial Intelligence,123, Article 106276.

Jaderberg M, Simonyan K, Zisserman A, et al. (2015) Spatial transformer networks. Advances in neural information processing systems 28

Jafari, M., Francis, S., Garibaldi, J. M., & Chen, X. (2022). Lmisa: A lightweight multi-modality image segmentation network via domain adaptation using gradient magnitude and shape constraint. Medical Image Analysis,81, Article 102536.

Ji, L., Du, Y., Dang, Y., Gao, W., & Zhang, H. (2024). A survey of methods for addressing the challenges of referring image segmentation. Neurocomputing,583, Article 127599.

Jia, X., Liu, Y., Yang, Z., & Yang, D. (2020). Multi-modality self-attention aware deep network for 3d biomedical segmentation. BMC Medical Informatics and Decision Making, 20, 1–7.

Article  CAS  Google Scholar 

Jiang, H., Wang, C., Chartsias, A., Tsaftaris, SA. (2020). Max-fusion u-net for multi-modal pathology segmentation with attention and dynamic resampling. In: Myocardial Pathology Segmentation Combining Multi-Sequence Cardiac Magnetic Resonance Images: First Challenge, MyoPS 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, 4 October, 2020, Proceedings 1, Springer, pp. 68–81

Jyothi, P., & Singh, A. R. (2023). Deep learning models and traditional automated techniques for brain tumor segmentation in mri: a review. Artificial Intelligence Review, 56(4), 2923–2969.

Article  Google Scholar 

Kavitha, A. R., & Palaniappan, K. (2023). Brain tumor segmentation using a deep shuffled-yolo network. International Journal of Imaging Systems and Technology, 33(2), 511–522.

Article  Google Scholar 

Kertész, H., Beyer, T., Panin, V., Jentzen, W., Cal-Gonzalez, J., Berger, A., Papp, L., Kench, PL., Bharkhada, D., Cabello, J., et al. (2022). Implementation of a spatially-variant and tissue-dependent positron range correction for pet/ct imaging. Frontiers in Physiology p. 368

Krizhevsky, A., Sutskever, I., Hinton, GE. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25

Kumar, A., Fulham, M., Feng, D., & Kim, J. (2019). Co-learning feature fusion maps from pet-ct images of lung cancer. IEEE Transactions on Medical Imaging, 39(1), 204–217.

Article  Google Scholar 

Lapuyade-Lahorgue, J., Xue, J. H., & Ruan, S. (2017). Segmenting multi-source images using hidden markov fields with copula-based multivariate statistical distributions. IEEE Transactions on Image Processing, 26(7), 3187–3195.

Article  PubMed  Google Scholar 

Lauenburg, L., Lin, Z., Zhang, R., Santos, Md., Huang, S., Arganda-Carreras, I., Boyden, E. S., Pfister, H., & Wei, D. (2023). 3d domain adaptive instance segmentation via cyclic segmentation gans. IEEE Journal of Biomedical and Health Informatics, 27(8), 4018–402. https://doi.org/10.1109/JBHI.2023.3281332

Article  PubMed  PubMed Central  Google Scholar 

LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551.

Article  Google Scholar 

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.

Article  Google Scholar 

Lei, T., Li, L., Lv, Z., Zhu, M., Du, X., & Nandi, A. K. (2021). Multi-modality and multi-scale attention fusion network for land cover classification from vhr remote sensing images. Remote Sensing, 13(18), 3771.

Article  Google Scholar 

Li, D., Peng, Y., Guo, Y., & Sun, J. (2022). Taunet: a triple-attention-based multi-modality mri fusion u-net for cardiac pathology segmentation. Complex & Intelligent Systems, 8(3), 2489–2505.

Article  Google Scholar 

Li, L., Ding, W., Huang, L., Zhuang, X., & Grau, V. (2023). Multi-modality cardiac image computing: A survey. Medical Image Analysis,88, Article 102869.

Li, T., Wei, B., Cong, J., Li, X., & Li, S. (2020). S3eganet: 3d spinal structures segmentation via adversarial nets. IEEE Access, 8, 1892–1901. https://doi.org/10.1109/ACCESS.2019.2962608

Article  Google Scholar 

Liao, Z., Hu, S., Xie, Y., Xia, Y. (2023). Transformer-based annotation bias-aware medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, pp. 24–34

Liu, F., Cai, J., Huo, Y., Cheng, CT., Raju, A., Jin, D., Xiao, J., Yuille, A., Lu, L., Liao, C., et al. (2020). Jssr: A joint synthesis, segmentation, and registration system for 3d multi-modal image alignment of large-scale pathological ct scans. In: European Conference on Computer Vision, Springer, pp. 257–274

Liu, J., Fan, X., Huang, Z., Wu, G., Liu, R., Zhong, W., Luo, Z. (2022). Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5802–5811

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022

Long, J., Shelhamer, E., Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440

Lu, G., Zhong, T., Geng, J., Hu, Q., Xu, D. (2022). Learning based multi-modality image and video compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6083–6092

Luc, P., Couprie, C., Chintala, S., Verbeek, J. (2016). Semantic segmentation using adversarial networks. arXiv:1611.08408

Ma, J., Yang, X. (2019). Automatic brain tumor segmentation by exploring the multi-modality complementary information and cascaded 3d lightweight cnns. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop, BrainLes 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers, Part II 4, Springer, pp. 25–36

Ma, J., Ma, Y., & Li, C. (2019). Infrared and visible image fusion methods and applications: A survey. Information Fusion, 45, 153–178.

Article 

Comments (0)

No login
gif