AI in Medical Questionnaires: Innovations, Diagnosis, and Implications


Introduction

According to World Health Organization (WHO) data from 2022, approximately 1 billion individuals worldwide experience mental disorders [,]. Worldwide, the prevalence of depression and anxiety disorders among children and adolescents is estimated at approximately 2.6% and 6.5%, respectively [,]. This high disease burden necessitates reliable screening tools, yet current gold-standard questionnaires face two critical challenges. First, although these authoritative psychological assessment tools are used during screening and diagnosis, depressive symptoms often overlap with those of other psychiatric disorders, such as bipolar affective disorder and obsessive-compulsive disorder [-], making accurate diagnosis difficult when relying on a single questionnaire. Second, in many health care settings, patients complete questionnaires without proper guidance or oversight due to inadequate clinical supervision, resulting in distorted outcomes when various psychological and physiological self-assessment tools are applied in practice [-]. These limitations continually underscore the inefficiencies in the diagnostic utility and administration of traditional medical questionnaires. To obtain accurate health data and assist physicians during patient consultations, there is an urgent need to explore more efficient and precise assessment techniques. This study aimed to investigate how artificial intelligence (AI) technologies can address these fundamental limitations of traditional assessment tools while enhancing clinical decision-making processes.

Following the outbreak of COVID-19 in 2019, hospitals have faced a dual challenge of surging patient volumes and increasingly complex mental health care needs [-]. These unprecedented circumstances have exposed the inadequacies of traditional assessment approaches, creating an opportunity for technological innovation in mental health care delivery. Advancements in machine learning (ML) and large language models (LLMs) have demonstrated significant potential to reduce analytical biases, support clinical decision-making, and improve data processing efficiency [-], drawing medical experts’ attention to how intelligent technologies can compensate for the inefficiency and subjectivity of traditional scales. Since 2013, the integration of ML with LLMs has resulted in breakthroughs in natural language processing (NLP) [], complex reasoning, multilingual support, and multidimensional data analysis [-]. By 2024, these developments evolved into specialized mental health–oriented LLMs capable of identifying stress, depression, and suicidal ideation, thereby facilitating disease screening and early intervention [-]. This technological evolution directly responds to the postpandemic challenges by offering more accurate, efficient, and accessible screening tools that can identify psychological conditions early and create critical windows for timely intervention [-].

Multiple research studies across different application domains have substantiated the technological advantages of AI in mental health assessment. On the basis of the developments described previously, AI systems demonstrate several key capabilities that directly address the limitations of traditional questionnaires. First, ML algorithms excel at detecting subtle patterns across multiple variables that human clinicians might miss, as demonstrated by the high accuracy of the study by McGarrigle et al [] in distinguishing among myalgic encephalomyelitis or chronic fatigue syndrome, post–COVID-19 condition, and healthy controls using random forest (RF) algorithms. LLMs’ NLP capabilities capture contextual meanings and emotional undertones in patient narratives that structured questionnaires cannot. Unlike traditional categorical approaches, ML enables dimensional symptom representation, aligning with contemporary understanding of spectrum disorders, as shown in the approach by De Luca et al [] to modeling suicide risk. In addition, AI-based systems implement adaptive questioning pathways that enhance user satisfaction while maintaining diagnostic validity, as Nam et al [] demonstrated with their conversational AI for spinal pain assessment. Finally, AI can integrate multimodal data, as evidenced by the improvement in pathological voice assessment by Kojima et al [] by combining acoustic features with questionnaire responses. These capabilities directly address traditional questionnaires’ limitations in handling symptom overlap and contextual interpretation, providing a foundation for the practical applications of AI in clinical settings that will be discussed in the following section.

The comprehensive technical advantages are obvious. Rapid developments in ML, data science, and neural networks have allowed AI to be involved in the evaluation, development, and predictive modeling phases of medical questionnaires within clinical practice [-]. With its efficiency, accuracy, and capacity to handle large-scale data, AI enables clinicians and health care professionals to swiftly access and evaluate patient data derived from medical questionnaires, thereby improving primary health care efficiency [-]. For instance, advances in NLP have supported rapid screening for depression and anxiety, reducing initial consultation times [,-], whereas deep learning (DL) has contributed to a range of medical tasks, including large-scale health data screening [-], pathological segmentation [-], and disease monitoring [-]. Integrating LLMs with traditional psychological assessments can reduce human error and ensure consistency in professional diagnoses. By 2024, core NLP technologies integrated with ML methods were being used to classify depression and its severity [].

This paper identifies key limitations in traditional medical scales, including substantial diagnostic bias, low screening efficiency, and data distortion. It explores the potential applications of AI in the health care domain, focusing on how AI-driven approaches can enhance evaluation, development, and prediction—3 critical diagnostic stages—while examining recent algorithmic research and clinical findings. In addition, it summarizes the impacts of AI-augmented traditional questionnaires on patient outcomes. Finally, it addresses broader societal and ethical considerations, such as privacy, fairness, transparency, and ethical challenges. This review concludes by discussing future developmental trajectories and scientific hypotheses concerning the integration of AI into medical questionnaires.


MethodsData Sources and Search Strategies

This study was designed according to the latest version of the PRISMA-P (Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols) checklist []. This study was conducted through a systematic literature search in several databases (PubMed, Embase, Cochrane Library, Web of Science, and China National Knowledge Infrastructure on the China Knowledge Network) with the aim of exploring the innovation, diagnosis, and impact of AI in medical scales. The search keywords and other related terms in this study are shown in . In addition, the search was conducted from the inception of each database to September 2024 to ensure the inclusion of the latest relevant studies and was limited to English- and Chinese-language literature.

Table 1. Search strategies for English- and Chinese-language databases.
Search term1“Artificial intelligence” (MeSHa)2“Machine learning” (MeSH)3“Deep learning” (MeSH)41 OR 2 OR 35“Questionnaire”6“Scale”7“Medical questionnaire”8“Medical scale”9“Psychological questionnaire”10“Psychological scale”11“Physiological questionnaire”12“Physiological scale”13“Mental questionnaire”14“Mental scale”155 OR 6 OR 7 OR 8 OR 9 OR 10 OR 11 OR 12 OR 13 OR 14164 AND 1517“Rengongzhineng” (artificial intelligence)18“Jiqixuexi” (machine learning)19“Shenduxuexi” (deep learning)2017 OR 18 OR 1921“Diaochawenjuan” (questionnaire)22“Liangbiao” (scale)23“Yixue diaochawenjuan” (medical questionnaire)24“Yixue liangbiao” (medical scale)25“Xinli diaochawenjuan” (physiological questionnaire)26“Xinli liangbiao” (psychological scale)27“Shengli diaochawenjuan” (physiological questionnaire)28“Shengli liangbiao” (physiological scale)29“Jingshen diaochawenjuan” (mental questionnaire)30“Jingshen liangbiao” (mental scale)3121 OR 22 OR 23 OR 24 OR 25 OR 26 OR 27 OR 28 OR 29 OR 303620 AND 31

aMeSH: Medical Subject Headings.

Inclusion and Exclusion Criteria

The inclusion criteria for the studies can be specifically divided into three categories: (1) studies related to the application of AI technology to disease management, psychological, or physiological questionnaires; (2) articles that provided relevant data to support and validate the effectiveness of the application of AI technology to questionnaires; and (3) peer-reviewed literature, which may include but was not limited to randomized controlled trials, cohort studies, reviews, meta-analyses, and cross-sectional studies. On the basis of the inclusion criteria, the exclusion criteria for this review were as follows: (1) studies that did not use AI technologies and did not involve medical questionnaires; (2) literature in the category of gray literature, such as non–peer-reviewed literature, unpublished manuscripts, or conference abstracts; (3) literature not in English or Chinese (the authors’ native language) and literature in Chinese that did not meet the inclusion criteria; (4) experimental literature that did not have ethics approval or obtain informed consent; (5) articles on the uses of AI not being ethical; and (6) articles on conflicts of interest related to AI technology.

Data Extraction

In this study, a comprehensive literature screening and evaluation process was implemented to ensure the objectivity and accuracy of the research. This process was executed by 3 independent reviewers: YL, JX, and XL. Initially, YL was responsible for downloading and conducting a preliminary review of the screened literature to exclude documents unrelated to the study’s topic. Subsequently, the relevant literature from the preliminary screening was passed to JX for a more detailed eligibility assessment. The included literature was then double-checked by YL and JX. In cases of disagreement between YL and JX, XL made the final decision. Only literature agreed upon by all 3 reviewers was included in the final analysis. Furthermore, throughout the data extraction process, the 3 reviewers (XL, YL, and JX) created 5 distinct tables and figures, each with a specific function, to systematically organize and analyze the collected data. The primary purpose of creating these tables and figures was to enhance the transparency, systematic approach, and scientific rigor of this review by organizing and analyzing data in a standardized manner. presents the search strategies and keywords used in both English- and Chinese-language databases. assesses the quality of the studies using standardized Joanna Briggs Institute (JBI) tools, categorizing the studies into low, medium, and high quality. shows the distribution of 24 different AI technologies (ML and DL) in the diagnosis of physiological and psychological conditions based on questionnaires. details the application of intelligent technologies in clinical and research environments (N=14), emphasizing how different AI methods facilitate the development, prediction, and evaluation of questionnaires. Finally, illustrates the distribution patterns of ML and DL technologies across the various studies. This framework ensured comprehensive data extraction, quality control, and systematic synthesis of AI applications in the implementation of medical questionnaires.

Table 2. Summary of the quality evidence in the included 14 reports.StudyStudy designAssessment of the quality of the studyOverall score

12345678910111213
Siddiqua et al [], 2023Quasi-experimental studyYesNoYesUnclearNoYesYesNoYes—a———Medium quality (5/9)van Buchem et al [], 2022Qualitative researchYesYesYesYesYesNoNoYesYesYes———Medium quality (8/10)Coraci et al [], 2023Quasi-experimental studyYesYesYesYesNoYesYesUnclearYes————Medium quality (7/9)Nam et al [], 2022Quasi-experimental studyYesNoYesYesNoYesYesUnclearYes————High quality (6/9)De Luca et al [], 2024Systematic review and research synthesisNoNoNoNoNoNoYesNoNoYesYes——Low quality (3/11)McGarrigle et al [], 2024Quasi-experimental studyYesYesUnclearYesNoYesYesNoYes————Medium quality (6/9)Kojima et al [], 2024Quasi-experimental studyYesNoYesYesNoYesYesNoYes————Medium quality (6/9)Wang et al [], 2021Systematic review and research synthesisNoNoNoNoNoNoYesNoNoYesYes——Low quality (3/11)Ha et al [], 2023Quasi-experimental studyYesYesUnclearYesNoYesYesNoYes————Medium quality (6/9)Shetty et al [], 2024Quasi-experimental studyYesNoYesYesNoYesYesNoYes————Medium quality (6/9)McCartney et al [], 2014Quasi-experimental studyYesNoUnclearYesNoYesYesNoYes————Medium quality (5/9)Sali et al [], 2013Quasi-experimental studyYesNoYesYesNoYesYesNoYes————Medium quality (6/9)Ferreira Freitas et al [], 2021Quasi-experimental studyYesNoYesYesNoYesYesNoYes————Medium quality (6/9)Li et al [], 2022Quasi-experimental studyYesNoUnclearNoNoYesYesNoYes————Low quality (4/9)

aNot applicable.

Figure 1. From 2013 to 2023, artificial intelligence has been used in the assessment, development, and preprocessing of medical questionnaires. ACML: Apple’s Create ML; ANN: artificial neural network; BERT: bidirectional encoder representations from transformers; BN: Bayesian network; CNN: convolutional neural network; DL: multilayer feedforward deep learning; DT: decision tree; GA: genetic algorithm; GB: gradient boosting; GTF: Google’s TensorFlow; KNN: k-nearest neighbor; LR: logistic regression; NB-G: naive Bayes–Gaussian; NB-M: naive Bayes–multinomial; NLP: natural language processing; RF: random forest; SVC: support vector classifier; SVM: support vector machine; TRM: traditional regression model; VA: voting algorithm; XGBoost: extreme gradient boosting; ZR-C: ZeroR classifier. Figure 2. Distribution of machine learning and deep learning technologies across the studies. ACML: Apple’s Create ML; ANN: artificial neural network; BERT: bidirectional encoder representations from transformers; BN: Bayesian network; CNN: convolutional neural network; DL: multilayer feedforward deep learning; DT: decision tree; GA: genetic algorithm; GB: gradient boosting; GTF: Google’s TensorFlow; KNN: k-nearest neighbor; LR: logistic regression; NB-G: naive Bayes–Gaussian; NB-M: naive Bayes–multinomial; NLP: natural language processing; RF: random forest; SVC: support vector classifier; SVM: support vector machine; TRM: traditional regression model; VA: voting algorithm; XGBoost: extreme gradient boosting; ZR-C: ZeroR classifier. Quality Evaluation Methods

To ensure the quality of the selected literature, this review used the JBI critical appraisal tools []. The final quality assessment of the included literature was conducted using the scoring system provided by the JBI guidelines. This system assigns 1 point for each criterion fully met, with a score of 1 for yes and 0 for no or unclear responses. This scoring method facilitates a horizontal comparison of study quality, allowing for ranking based on the total score. For qualitative research, studies with <5 points were considered low quality, studies with 5 to 7 points were considered moderate quality, and studies with ≥8 points were considered high quality. For review studies, scores of <6 points indicated low quality, scores of 6 to 8 points indicated moderate quality, and scores of ≥9 points indicated high quality. Given the specific design and implementation of quasi-experimental studies, the scoring criteria were adjusted accordingly—scores of <5 points indicated low quality, scores of 5 to 7 points indicated moderate quality, and scores of ≥8 points indicated high quality. This systematic assessment approach ensured the reliability and scientific rigor of the review findings.


ResultsOverview

The screening process for inclusion of studies is shown in . An initial 49,091 records were identified through a systematic database search. After removing duplicates and excluding irrelevant articles, of the 49,091 initial articles, 3651 (7.44%) remained for title and abstract screening. After this step, of the 3651 articles, 3625 (99.29%) were excluded. The remaining 0.71% (26/3651) of the articles were assessed in full text and included various types of reviews, clinical studies, and case reports. Of these 26 articles, 12 (46%) were excluded because they did not meet the inclusion criteria. These reasons included a lack of peer review, failure to use questionnaire methodology, and insufficient relevance to the focus of the study. Ultimately, 14 articles were included in the final systematic review.

Figure 3. Flow diagram for the included and excluded articles. Quality AssessmentOverview

The quality of the 14 included studies was assessed using the JBI critical appraisal tools (). Each study was evaluated using the appropriate checklist based on its design ()—the JBI critical appraisal checklist for experimental studies (11 items), the JBI qualitative research checklist (1 item), or the JBI systematic review and research synthesis checklist (2 items). Most of the included studies (10/14, 71%) were of moderate methodological quality, with only 7% (1/14) of the studies rated as high quality [] and 21% (3/14) of the studies classified as low quality [,,]. Methodological strengths commonly observed in quasi-experimental studies included clear causal relationships, reliable outcome measurements, and appropriate statistical analyses. However, the absence of control groups and incomplete descriptions of follow-up were common limitations. A total of 14% (2/14) of the studies, which were systematic reviews, identified flaws in the search strategies and the critical appraisal methodology. This quality assessment provided essential context for interpreting the findings related to the application of AI in medical questionnaires and highlighted the need for more rigorous methodological validation in this rapidly evolving field. Overall, the predominance of moderate-quality studies indicates significant room for improvement in study design and reporting.

JBI Checklist for Systematic Reviews and Research Syntheses

The items in this checklist were as follows: (1) is the review question clearly and explicitly stated? (2) Were the inclusion criteria appropriate for the review question? (3) Was the search strategy appropriate? (4) Were the sources and resources used to search for studies adequate? (5) Were the criteria for appraising the studies appropriate? (6) Was critical appraisal conducted by ≥2 reviewers independently? (7) Were there methods to minimize errors in data extraction? (8) Were the methods used to combine studies appropriate? (9) Was the likelihood of publication bias assessed? (10) Were recommendations for policy or practice supported by the reported data? (11) Were the specific directives for new research appropriate?

JBI Checklist for Quasi-Experimental Studies

The items in this checklist were as follows: (1) is it clear in the study what is the cause and what is the effect (ie, there is no confusion about which variable comes first)? (2) Was there a control group? (3) Were participants included in any comparisons similar? (4) Were the participants included in any comparisons receiving similar treatment or care other than the exposure or intervention of interest? (5) Were there multiple measurements of the outcome both before and after the intervention or exposure? (6) Were the outcomes of participants included in any comparisons measured in the same way? (7) Were outcomes measured in a reliable way? (8) Was follow-up complete and, if not, were differences between groups in terms of their follow-up adequately described and analyzed? (9) Was appropriate statistical analysis used?

JBI Checklist for Qualitative Research

The items in this checklist were as follows: (1) is there congruity between the stated philosophical perspective and the research methodology? (2) Is there congruity between the research methodology and the research question or objectives? (3) Is there congruity between the research methodology and the methods used to collect data? (4) Is there congruity between the research methodology and the representation and analysis of the data? (5) Is there congruity between the research methodology and the interpretation of the results? (6) Is there a statement locating the researcher culturally or theoretically? (7) Is the influence of the researcher on the research and vice versa addressed? (8) Are participants, and their voices, adequately represented? (9) Is the research ethical according to current criteria or, for recent studies, is there evidence of ethics approval by an appropriate body? (10) Do the conclusions drawn in the research report flow from the analysis or interpretation of the data?

AI and Medical QuestionnairesOverview

The use of AI for medical questionnaires harnesses DL and ML to enhance disease evaluation, facilitate the development of novel questionnaires, and improve data predictive capacities. These 3 facets are essential diagnostic stages for assessing both physiological conditions and psychological issues. By reviewing recent algorithmic advances and clinical practice, this study uncovered the potential of incorporating AI into traditional medical questionnaires, offering new perspectives for broader clinical applications and informed medical diagnoses ().

Figure 4. Artificial intelligence in medical questionnaires. Enhancing the Efficiency of Medical Questionnaire Assessments Through AI

According to WHO data, since the onset of the COVID-19 pandemic, the global population experiencing anxiety and depression has markedly increased, accompanied by a 40% surge in the use of standardized questionnaires [-]. Traditional questionnaires offer convenient dissemination and, in some cases, self-administration [-]. However, as definitive diagnoses still require physician consultations, the assessment process remains labor intensive. Due to its speed and accuracy in data processing, AI has progressively been integrated into clinical support [,]. Notably, DL and NLP technologies—such as the pretrained bidirectional encoder representations from transformers (BERT) model and conversation-based models such as ChatGPT—observed rising adoption rates from 2023 to 2024 [-]. Furthermore, traditional algorithms, including support vector machine and k-nearest neighbor, continue to effectively evaluate patient data for complex disorders. Recent developments in the technical frameworks for applying these methods to medical questionnaire assessments are illustrated in .

Analysis of AI applications in medical questionnaires showed that only 21% (3/14) of the studies involved AI-assisted questionnaires that had entered the clinical validation stage, whereas the remaining 79% (11/14) of the articles described the technology as being in the research stage (). Clinical validation was been achieved in patient experience assessment using NLP-based sentiment analysis, in spinal pain evaluation through conversational AI systems [], and in surgical risk prediction using ML for cataract surgery []. These applications share the characteristics of established clinical workflows, model interpretability, and structured validation frameworks. Most AI applications in mental health assessment, pain evaluation, disease differentiation, and specialized assessments remain in the research phases despite promising performance. This pattern stems from psychological assessment complexity [], data standardization challenges, stringent clinical implementation requirements [], and emerging technologies’ nascent nature in medical contexts. The findings indicate that AI-assisted medical questionnaires are technically feasible but still transitioning from research innovation to clinical implementation.

Enhancing the Efficiency of Dynamic Assessments

By leveraging DL on patient data, AI effectively evaluates how medical questionnaires adaptively distinguish between patient symptom variations, thereby accurately capturing changes in health status and demonstrating advantages in dynamic assessment. During prediagnostic evaluations, many pathological conditions have overlapping manifestations. For instance, post–COVID-19 sequelae resemble acute COVID-19 symptoms, and depression in bipolar disorders overlaps with that observed in unipolar depression [-]. To enhance diagnostic accuracy, one study used a series of RF algorithms to assess the psychometric properties of the DePaul Symptom Questionnaire–Short Form in classifying individuals with post–COVID-19 sequelae, those with myalgic encephalomyelitis or chronic fatigue syndrome (unrelated to COVID-19), and healthy controls. The results indicated that the DePaul Symptom Questionnaire–Short Form successfully distinguished patients with post–COVID-19 sequelae from healthy controls, achieving an accuracy of 92.18% [].

Moreover, AI-based evaluations validated the diagnostic sensitivity of traditional questionnaires for subtle disorders such as women’s physical and mental health issues influenced by hormonal fluctuations. Researchers used support vector machine, artificial neural networks (ANNs), and decision trees to confirm the validity and accuracy of the International Physical Activity Questionnaire for menopausal women []. Currently, integrating multiple algorithms helps improve both model interpretability and patient classification precision. One team, for example, used a hybrid model (genetic algorithms combined with ANNs) to evaluate the effects of different weighting schemes in a traditional scale applied to daily stress events; after weight adjustments, they reported that the traditional questionnaire achieved a sensitivity of 83% and a specificity of 81% for stress detection, positioning the modified instrument as a high-performance screening tool [].

Collectively, these findings demonstrate that AI enhances the precision with which traditional questionnaires differentiate between disease categories and unique patient populations while simultaneously refining traditional questionnaire metrics.

Construction of Intelligent Data Systems

One of the key advantages of AI-driven data training is its capacity to construct intelligent data systems that assist clinicians in making more accurate diagnoses. Although clinical medical questionnaires undergo lengthy validation to ensure efficacy, traditional instruments often fail to comprehensively capture authentic patient behavior. For example, patients may complete questionnaires too hastily without supervision, leading to distorted results, and even after traditional assessments, diagnostic discrepancies of up to 17% may arise among different therapists []. Consequently, multidimensional approaches are needed in psychological and psychiatric diagnostics to reduce subjective influences and create intelligent data systems. A study on pathological voice data used AI to enhance the objectivity of the grade, roughness, breathiness, asthenia, and strain scale comparing 2 convolutional neural network (CNN)–based models—one built on Google’s TensorFlow and the other on Apple’s Core ML—both trained on identical pathological voice datasets []. By comparing these 2 models in classifying severity levels in pathological speech, it became possible to validate grade, roughness, breathiness, asthenia, and strain questionnaire results in real time without specialized equipment, thereby increasing the clinical objectivity of traditional questionnaires ( []). In another effort, researchers designed a secure neural network–based application, randomly training and testing an ANN for diagnosing facial pain syndromes []. The system achieved a sensitivity of 92.4% and a specificity of 87.8%. Ultimately, AI-based data training promises to integrate patient information worldwide into intelligent online databases, thus expanding the accessibility and dissemination of intelligent data in the medical field.

Figure 5. Evaluation of pathological voice data by TensorFlow and Apple’s Create ML (modified from the work by Ferreira Freitas et al []). AUROC: area under the receiver operating characteristic curve; COMISA: comorbid insomnia and sleep apnea; EUMCSH: Ewha Woman’s University Medical Center Seoul Hospital; ISI: Insomnia Severity Index; OSA: obstructive sleep apnea; ROC: receiver operating characteristic; SHAP: Shapley Additive Explanations; SMC: Samsung Medical Center; SRQ: Sleep Regularity Questionnaire; XGBoost: Extreme Gradient Boosting. Optimizing the Development of Medical Questionnaires Through AI

Since 2013, ANNs and genetic algorithms have been used in the development of novel medical assessment models, enhancing parameter selection for psychological questionnaires and improving the accuracy of predicting patients’ psychological states [,]. In 2021 and 2022, as the range of assessment methods broadened, integrating traditional ML algorithms (logistic regression [LR], RF, and decision tree) with ANN and DL tools further advanced the design of complex medical questionnaires and refined the evaluation of symptoms related to anxiety and depression (). Nevertheless, to achieve more profound and long-term predictions of patients’ psychological conditions, it remains essential to optimize both questionnaire item formulation and the assessment processes.

Enhancing the Patient Applicability of Questionnaires

In developing medical questionnaires suited to diverse patient populations and conditions, AI integration confers distinct advantages. Traditional assessment methods, particularly those used to diagnose late-life depression, face notable limitations. Late-life depression is often accompanied by various comorbidities, and subtle symptom distinctions can be difficult to capture using conventional questionnaires [-]. To address this complexity, especially in older adults, NLP techniques can analyze patients’ linguistic patterns, thereby detecting latent symptoms that may be overlooked by traditional methods and, ultimately, expanding the questionnaire’s applicability. Moreover, Bayesian network–based analyses have been used to generate highly matched items, effectively facilitating the development of a potential risk assessment system for geriatric conditions [,]. AI-driven approaches have also improved applicability for other vulnerable groups. For instance, the Raghavendra Manjunath Shetty Digital Anxiety Scale leverages AI-generated facial expressions for children []. By enabling children to select expressions that best match their emotions, this interactive approach overcomes the limitations of traditional questionnaires, which often struggle to accurately convey or capture younger children’s emotional states.

In addition, scholars have applied the extreme gradient boosting algorithm to streamline risk assessment questionnaires for insomnia and obstructive sleep apnea (OSA) [] ( [,]). By avoiding cumbersome overnight polysomnography measurements, researchers focused on feature importance to design a simplified questionnaire that accurately predicts the risks of 3 sleep disorders—OSA, comorbid insomnia and sleep apnea, and insomnia—with an area under the receiver operating characteristic curve (AUROC) exceeding 0.897 for each. [] illustrates the performance of a simplified questionnaire in predicting sleep disorder risk, demonstrating high accuracy in identifying OSA (AUROC=0.897), comorbid insomnia and sleep apnea (AUROC=0.947), and insomnia (AUROC=0.922). [] also highlights the influence of various features on model predictions, ultimately improving the questionnaire’s applicability across different severities of sleep disorders. By replacing complex, burdensome items with a high-confidence, streamlined set of questions, patients face a reduced response burden, and applicability to diverse patient populations is enhanced.

Figure 6. Artificial intelligence–assisted diagnostic data for system optimization and construction (modified from the work by Kojima et al [], which is published under Creative Commons Attribution 4.0 International License []). CNN: convolutional neural network. Enhancing the Cultural Adaptability of Questionnaires

In traditional clinical assessments, patients often interpret and respond differently due to varied cultural backgrounds, language habits, and life experiences. Questionnaire development must consider these cultural factors and provide timely feedback. Failure to account for the patient’s experience can result in extreme responses that skew diagnostic outcomes. Thus, incorporating appropriate open-ended questions can flexibly capture differences in patients’ understanding of medical questionnaires. One study used a multilingual and multicultural version of ChatGPT to generate a low back pain assessment questionnaire []. Its findings demonstrated that language training technology could effectively overcome linguistic and cultural barriers, offering strong support for future cross-cultural medical evaluations. Another study leveraged a speech dialogue system trained to develop a new pain assessment tool for patients with spinal issues []. By integrating natural language understanding and speech recognition, the system quickly recorded patients’ pain information and improved clinician-patient interactions. The error rates in speech recognition for physicians, nurses, and patients were 13.5%, 16.8%, and 34.7%, respectively. AI’s real-time feedback capabilities enable it to identify key factors from patient assessments across different backgrounds. One study showed that an AI-based patient-reported experience measure developed through NLP techniques could extract crucial information from patients’ open-ended responses [], promptly relaying insights to clinicians. This timely exchange not only saves analytic time but also strengthens trust and understanding for patients from diverse cultural contexts during the diagnostic process.

Enhancing the Predictive Accuracy of Medical Questionnaires Through AI

Currently, AI serves as a crucial and promising tool for supporting physiological treatment and enabling early psychological interventions []. In predictive tasks, DL models (such as ANNs and CNNs) excel at extracting features from complex datasets, whereas ensemble learning algorithms—gradient boosting, RF, and Extreme Gradient Boosting—specialize in classification and prediction. By combining these technical strengths, AI models have transcended traditional medical constraints, effectively increasing the precision of disease prediction and diagnosis at multiple critical time points []. From 2022 onward, DL (CNNs and DL) and NLP (BERT, NLP, and ChatGPT) have observed growing adoption rates. Notably, the application of CNNs surged in 2023 (), and the use of BERT and ChatGPT continued to expand in health care throughout 2024 []. These trends collectively underscore the advantages of integrating a range of AI methodologies to enhance predictive tasks in medical questionnaires.

Early Prediction of Age-Related Diseases

According to 2024 WHO data, global aging is intensifying, and chronic diseases now account for >70% of all worldwide deaths []. In this context, AI-enabled analysis of global patient data can facilitate disease prediction. Ophthalmological conditions such as cataracts present an increasing diagnostic burden, with surgery remaining the only effective clinical intervention. Thus, the timely identification of risk factors for patients with cataracts is essential [-]. A 2022 study found that DL models surpass traditional statistical models in screening accuracy for age-related diseases, enabling more rapid identification of high-risk older populations. Further research has examined the use of AI for predicting cataract surgery risk. By leveraging questionnaires and medical records, the study compared its results to those of a traditional LR model. ML models achieved an area under the curve (AUC) between 0.781 and 0.790, outperforming the LR model’s AUC of 0.767. The gradient boosting machine model demonstrated the best performance, attaining an AUC of 0.790 []. These findings indicate that ML can accurately forecast disease occurrence by rapidly assimilating diverse data inputs even in the absence of biological data.

Timely Attention to Mental Health

At present, the application of NLP techniques in mental health issue prediction is rapidly expanding []. ML methods (eg, LR and gradient boosting) and feature selection strategies have shown particular promise in early warning systems for mental health issues. One study on depression assessment used multiple ML and DL models to predict different levels of depressive states—normal, moderate, and severe. Among them, the RF model achieved an accuracy of 98.08%, outperforming the gradient boosting model (94.23%) and the CNN (92.31%) []. By applying feature selection and hyperparameter optimization, the study further enhanced model performance.

In addition, another research team used LR, RF, and gradient boosting models to evaluate suicidal intent and behaviors. Through Shapley Additive Explanations analysis, they identified the most effective features and reconstructed a simplified version of the Suicide Crisis Inventory–2 []. The LR model performed best, and the simplified questionnaire efficiently assessed and predicted suicidal crises in clinical settings, thereby reducing the supervision burden on health care providers []. The integration of AI not only improves data interpretability and predictive reliability but also contributes to timely health interventions in future health care systems.

Potential of AIOverview

The global population is projected to grow continuously by approximately 1.3 billion people between 2020 and 2050, representing a 17% increase from the 2020 population, with a peak of approximately 9.73 billion by 2064 []. Health informatics is a fast-growing area in the health care field. These instruments serve as critical tools for assessing population health []. Currently, AI technologies have already demonstrated significant potential and advantages in the development, analysis, and prediction processes of medical questionnaires.

Despite the remarkable progress of AI in this domain, numerous challenges and research gaps remain. Therefore, this section presents several new research directions based on the potential of AI to enhance rapid assessment, precise prediction, and adaptable response methods in medical questionnaires, aiming to further promote AI’s application and innovation in this field.

Potential for Rapid Assessment and Accurate Prediction

With the growing global population, the volume of questionnaires that health care institutions must collect and analyze is steadily increasing []. Traditional questionnaire evaluation methods are often time-consuming and require substantial human and economic resources []. Applying AI technologies to questionnaire assessments can significantly reduce labor and financial costs, a benefit that is especially pronounced in large-scale evaluations. Rapid questionnaire evaluation also enables researchers to quickly optimize questionnaires and establish related case identification systems, thereby allowing health care institutions to promptly allocate public health resources and respond more efficiently to public health events [].

As global life expectancy continues to rise, the number and proportion of older individuals also increase. By 2030, a total of 1 in 6 people worldwide will be aged >60 years, and the population aged ≥60 years will grow from the current 1 billion to 1.4 billion, doubling to 2.1 billion by 2050 []. Common health issues among older adults include cognitive impairment, cataracts, and depression []. Integrating AI with medical questionnaires to predict disease occurrence can help family members and health care providers implement early interventions or treatments []. However, predictions based solely on questionnaires may have limitations. AI shows high predictive accuracy when incorporating physiological data such as speech signals, heart rate, and medical imaging [-]. Future research could combine questionnaire-based predictions with additional physiological information to improve and confirm the accuracy of disease forecasting.

New Response Modalities for Medical Questionnaires

It is estimated that 1.3 billion people worldwide have severe disabilities, and this number is expected to rise []. Individuals with disabilities who have experienced stroke, limb loss, or spinal cord injuries often exhibit impaired hand function [-], significantly reducing their efficiency in completing traditional paper-based questionnaires. Integrating traditional questionnaires with AI-driven LLMs to create an AI questionnaire is one solution. By leveraging LLM technologies, users can respond verbally to questionnaire items, circumventing traditional written formats [,] ().

Figure 7. New response modalities for medical questionnaires. AI: artificial intelligence.

Persons with disabilities may face discrimination in various aspects of life, resulting in generally lower educational attainment than that of other populations []. For these groups, especially individuals with mental disabilities such as autism or dementia, challenging vocabulary, complex sentence structures, and ambiguous answer choices pose major obstacles to questionnaire completion []. One approach involves using NLP-enhanced questionnaires; AI models such as ChatGPT or Bard can provide immediate explanations of terminology, thereby reducing the difficulty and improving the efficiency of questionnaire completion for individuals with lower educational levels [,]. Another solution integrates AI image generators such as DALL-E to add graphical support, thereby increasing the clarity and comprehensibility of questionnaire options []. In addition, combining virtual reality paradigms with AI-driven visual questionnaires may offer more adaptive and inclusive assessment options for individuals with cognitive limitations. This strategy could further expand the range of response modalities available in digital health evaluation [,].

Some individuals with mental disabilities may resist answering questionnaires through writing or speech. This reluctance can compromise the completion rate of traditional medical questionnaires, disrupt data collection, and undermine data validity. In fact, beyond textual and linguistic information, patients’ facial expressions, vocal emotions, and behaviors also carry clinically valuable data. Embodied AI robots with multimodal data collection capabilities may offer a solution []. By using cameras, microphones, and other devices, these robots can capture facial expressions, vocal emotional cues, and behavioral indicators during questionnaire administration [,]. AI-driven evaluation and analysis of these data can, in turn, guide the refinement of patient health questionnaires.

Individuals with severe disabilities—such as those with cerebral palsy, amyotrophic lateral sclerosis, or locked-in syndrome—may be unable to respond to certain medical questionnaires using normal speech or gestures for physiological or psychological reasons [,]. For these populations, clinicians might consider using AI technologies, brain-computer interfaces, or brainwave data to interpret and analyze patients’ eye movements, enabling more personalized completion of medical questionnaires. However, in doing so, health care professionals must ensure the accuracy of these physiological measurements []. Preventing the misinterpretation of patient information is essential to maintaining the validity and reliability of medical questionnaire data.

Challenges of AIOverview

AI technologies have already demonstrated substantial potential in the realm of medical questionnaires. However, the widespread implementation of AI still faces a range of complex challenges, including data privacy and security, data quality, system integration, and social equity. This section will systematically examine these issues and provide an in-depth analysis of possible strategies to overcome them. The goal is to furnish a scientific foundation and direction for effectively applying AI in medical questionnaires, thereby driving its comprehensive implementation and optimization within the health care sector.

Limitations of AI Technologies in Medical Questionnaires

Despite the potential that AI brings to the development, prediction, and evaluation of medical questionnaires, it also exposes multidimensional structural limitations in the context of actual clinical requirements.

ML models such as RF and LR are frequently adopted not because they represent the cutting edge of technology but because they are easier for health care professionals to understand, are less costly to validate, and do not require a reconfiguration of the hospital’s information system [,]. For example, Siddiqua et al [] achieved high classification accuracy using an integrated ML model in their study of depression risk, but the choice of model was largely motivated by considerations of interpretability rather than absolute optimality of technical performance. This phenomenon suggests tha

Comments (0)

No login
gif