A comparative study between human-generated evidence summaries and ChatGPT generated evidence summaries.

Background

The Formative Second Opinion (FSO) is a telehealth program provided by BIREME/PAHO/WHO in collaboration with the Brazilian Ministry of Health. It is designed to answer questions and provide guidance for frontline healthcare workers (HCW) operating in all settings. It is based on dyads of real-life telehealth clinical questions and evidence-based summaries which also consider the local context. FSO summaries are the gold standard of this project.

Research Question

Can Large Language Models (LLMs), such as ChatGPT, be used as a safe and reliable information source to guide frontline healthcare workers' decision-making?

Study Steps

Step 03 - Creating evaluation packs

The new datasets obtained from two versions of ChatGPT, namely 'out-of-box' (OOB) and fine-tuned (FT), are organized into evaluation packages. Each package contains the gold standard (GS) question and its two corresponding answers: the GS answer and the ChatGPT answer. We have a total of 900 packs, 450 packs made up of GS + ChatGPT OOB and 450 packs made up of GS + ChatGPT FT. 

Expected Results

Scenario 1: ChatGPT can be used as a safe, accurate, and human-equivalent source of guidance for frontline healthcare workers. 

Scenario 2: ChatGPT cannot be used as a safe, accurate, and human-equivalent source of guidance for frontline healthcare workers. 

Scalability plans

In 2024–25: 

In addition, if ChatGPT is a safe and reliable source of Primary Health Care (PHC) guidance, Incorporate ChatGPT in generating answers to clinical questions with human supervision. 

Final results

Coming soon on March 2024

Conclusion

Coming soon on March 2024

Funding

This work was funded by the Bill & Melinda Gates Foundation under the Grand Challenge Catalyzing Equitable Artificial Intelligence (AI) Use (2023)