A comparative study between human-generated evidence summaries and ChatGPT generated evidence summaries.
Background
The Formative Second Opinion (FSO) is a telehealth program provided by BIREME/PAHO/WHO in collaboration with the Brazilian Ministry of Health. It is designed to answer questions and provide guidance for frontline healthcare workers (HCW) operating in all settings. It is based on dyads of real-life telehealth clinical questions and evidence-based summaries which also consider the local context. FSO summaries are the gold standard of this project.
Research Question
Can Large Language Models (LLMs), such as ChatGPT, be used as a safe and reliable information source to guide frontline healthcare workers' decision-making?
Study Steps
Step 03 - Creating evaluation packs
The new datasets obtained from two versions of ChatGPT, namely 'out-of-box' (OOB) and fine-tuned (FT), are organized into evaluation packages. Each package contains the gold standard (GS) question and its two corresponding answers: the GS answer and the ChatGPT answer. We have a total of 900 packs, 450 packs made up of GS + ChatGPT OOB and 450 packs made up of GS + ChatGPT FT.
Expected Results
Scenario 1: ChatGPT can be used as a safe, accurate, and human-equivalent source of guidance for frontline healthcare workers.
Scenario 2: ChatGPT cannot be used as a safe, accurate, and human-equivalent source of guidance for frontline healthcare workers.
Scalability plans
In 2024–25:
Massive expansion of the FSO’s database.
Use LLMs to generate knowledge derivatives for the FSO program in Brazil.
Monitor over time the capacity of LLMs to provide guidance to the HCW.
Program Internationalization and implementation in priority countries.
In addition, if ChatGPT is a safe and reliable source of Primary Health Care (PHC) guidance, Incorporate ChatGPT in generating answers to clinical questions with human supervision.
Final results
Coming soon on March 2024
Conclusion
Coming soon on March 2024
Funding
This work was funded by the Bill & Melinda Gates Foundation under the Grand Challenge Catalyzing Equitable Artificial Intelligence (AI) Use (2023)