🧑🏽‍⚕️ [PUBLISHED] AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow

🎉 We present AIPatient, a novel framework that enhances medical education and research through an advanced simulated patient system built upon Electronic Health Records (EHRs) and powered by Large Language Models (LLMs).


🔍 Overview

Simulated patient systems play a crucial role in modern medical education and research, providing safe, integrative learning environments and enabling clinical decision-making simulations. Large Language Models (LLM) could advance simulated patient systems by replicating medical conditions and patient-doctor interactions with high fidelity and low cost. However, ensuring the effectiveness and trustworthiness of these systems remains a challenge, as they require a large, diverse, and precise patient knowledgebase, along with a robust and stable knowledge diffusion to users.

Here, we developed AIPatient, an advanced simulated patient system with AIPatient Knowledge Graph (AIPatient KG) as the input and the Reasoning Retrieval-Augmented Generation (Reasoning RAG) agentic workflow as the generation backbone. AIPatient KG samples data from Electronic Health Records (EHRs) in the Medical Information Mart for Intensive Care (MIMIC)-III database, producing a clinically diverse and relevant cohort of 1,495 patients with high knowledgebase validity (F1 0.89). Reasoning RAG leverages six LLM powered agents spanning tasks including retrieval, KG query generation, abstraction, checker, rewrite, and summarization. This agentic framework reaches an overall accuracy of 94.15% in EHR-based medical Question Answering (QA), outperforming benchmarks that use either no agent or only partial agent integration. Our system also presents high readability (median Flesch Reading Ease 77.23; median Flesch Kincaid Grade 5.6), robustness (ANOVA F-value 0.6126, p>0.1), and stability (ANOVA F-value 0.782, p>0.1). The promising performance of the AIPatient system highlights its potential to support a wide range of applications, including medical education, model evaluation, and system integration.

⚙️ Core Methodology

The methodology of the AIPatient system is primarily structured around two key components:

1. AIPatient Knowledge Graph (AIPatient KG)

This graph serves as the foundational knowledge base, containing patient data extracted from the Medical Information Mart for Intensive Care (MIMIC-III) database. Key features include:

  • Diverse Patient Cohort: 1,495 patient profiles offering clinically diverse scenarios.
  • High Accuracy: F1 score of 0.89, indicating substantial accuracy in capturing medical data through Named Entity Recognition (NER).
  • Structured Organization: Medical entities such as symptoms, medical histories, allergies, and vital signs organized into nodes, interlinked by edges depicting relationships (e.g., HAS_SYMPTOM).
  • LLM-based NER: Utilizes advanced NER techniques to transform unstructured clinical notes into structured data.
  • Graph Database: Efficiently stored and queried using Neo4j for complex relationship analysis.

2. Reasoning Retrieval-Augmented Generation Workflow

This multi-agent system provides a dynamic method for processing natural language queries and generating context-appropriate responses across three primary stages:

Retrieval Stage

  • Retrieval Agent: Selects relevant nodes and relationships from the AIPatient KG based on queries.
  • KG Query Generation Agent: Formulates Cypher queries to extract pertinent information.

Reasoning Stage

  • Abstraction Agent: Simplifies and rephrases user queries into more general forms.
  • Checker Agent: Verifies alignment between retrieved information and user inquiries.

Generation Stage

  • Rewrite Agent: Transforms KG results into natural language with simulated patient personality traits.
  • Summarization Agent: Updates and maintains context for multi-turn interactions.

💻 Technical Specifications

Named Entity Recognition (NER) & Database

  • LLM-based NER system for extracting medical entities from unstructured discharge summary data.
  • Establishes distinct relationships (e.g., "HAS_FAMILY_MEMBER") within the graph.
  • Hosted in Neo4j for complex multi-faceted relationship queries instead of linear retrieval.
  • Utilizes Cypher Query Language to navigate graph functionalities efficiently.

Performance Evaluation

  • Ablation Studies: Configurations using Retrieval & Abstraction Agents significantly outperformed simpler setups.
  • Overall QA Accuracy: 94.15% in EHR-based medical Question Answering.
  • Readability Metrics: Median Flesch Reading Ease (77.23) & M. Flesch-Kincaid Grade Level (5.6).
  • Robustness & Stability: ANOVA F-value 0.6126 (p>0.1) and 0.782 (p>0.1) respectively.

🌟 Key Contributions

  • Advanced Knowledge Graph: AIPatient KG provides a robust foundation with high validity (F1 0.89) derived from real EHR data.
  • Multi-Agent Architecture: Six specialized LLM agents working in concert for comprehensive patient simulation.
  • Superior Performance: 94.15% accuracy outperforming existing benchmarks.
  • High Readability: Ensures accessible communication for educational purposes.
  • Robustness and Stability: Consistent performance across varied scenarios and question formulations.

🚀 Applications and Impact

The AIPatient framework offers transformative potential for:

Medical Education Clinical Decision-Making Research & Evaluation Curriculum Development

🔮 Future Directions

Broader Data Scope

Incorporating wider variety of healthcare scenarios for more extensive knowledge bases.

Multimodal Integration

Adding medical imaging and other data modalities to simulate full clinical inputs.

Real-time Clinical

Adapting environments for live clinical settings and interactive deployment.

Enhanced Training

Further enriching educational landscapes and outcomes for healthcare professionals.

📝 Conclusion

The AIPatient framework exemplifies a transformative approach to patient simulation through EHR data and LLM-powered interactions, emphasizing structured retrieval and knowledge dissemination in a manner conducive to enhancing medical training and decision-making processes. By embedding these systems within medical curricula and research protocols, we propose significant advancements in how interactions with simulated patients can impact educational outcomes and clinical preparedness.


📖 Welcome to Cite Our Article

If you find our work helpful, please consider citing it:

BibTeX
@ARTICLE{Yu2025-tb,
  title     = "Simulated patient systems powered by large language model-based
               {AI} agents offer potential for transforming medical education",
  author    = "Yu, Huizi and Zhou, Jiayan and Li, Lingyao and Chen, Shan and
               Gallifant, Jack and Shi, Anye and Sun, Jie and Li, Xiang and He,
               Jingxian and Hua, Wenyue and Jin, Mingyu and Chen, Guang and
               Zhou, Yang and Li, Zhao and Gupte, Trisha and Chen, Ming-Li and
               Azizi, Zahra and Dou, Qi and Yan, Bryan P and Xing, Yanqiu and
               Zhang, Yongfeng and Assimes, Themistocles L and Bitterman,
               Danielle S and Ma, Xin and Lu, Lin and Fan, Lizhou",
  journal   = "Commun. Med. (Lond.)",
  publisher = "Springer Science and Business Media LLC",
  volume    =  6,
  number    =  1,
  pages     =  27,
  month     =  dec,
  year      =  2025,
  language  = "en"
}
Lizhou Fan
Lizhou Fan
Vice-Chancellor Assistant Professor

My research interests include artificial intelligence, neuropsychiatric disorders, digital mental health, and health informatics.