A comparative study between human-generated evidence summaries and ChatGPT generated evidence summaries.

Background

The Formative Second Opinion (FSO) is a telehealth program provided by BIREME/PAHO/WHO in collaboration with the Brazilian Ministry of Health. It is designed to answer questions and provide guidance for frontline healthcare workers (HCW) operating in all settings. It is based on dyads of real-life telehealth clinical questions and evidence-based summaries which also consider the local context. FSO summaries are the gold standard of this project.

Research Question

Can Large Language Models (LLMs), such as ChatGPT, be used as a safe and reliable information source to guide frontline healthcare workers' decision-making?

Study Steps

Step 03 - Creating evaluation packs

The new datasets obtained from two versions of ChatGPT, namely 'out-of-box' (OOB) and fine-tuned (FT), are organized into evaluation packages. Each package contains the gold standard (GS) question and its two corresponding answers: the GS answer and the ChatGPT answer. We have a total of 900 packs, 450 packs made up of GS + ChatGPT OOB and 450 packs made up of GS + ChatGPT FT.

Expected Results

Scenario 1: ChatGPT can be used as a safe, accurate, and human-equivalent source of guidance for frontline healthcare workers.

Scenario 2: ChatGPT cannot be used as a safe, accurate, and human-equivalent source of guidance for frontline healthcare workers.

Scalability plans

In 2024–25:

Massive expansion of the FSO’s database.
Use LLMs to generate knowledge derivatives for the FSO program in Brazil.
Monitor over time the capacity of LLMs to provide guidance to the HCW.
Program Internationalization and implementation in priority countries.

In addition, if ChatGPT is a safe and reliable source of Primary Health Care (PHC) guidance, Incorporate ChatGPT in generating answers to clinical questions with human supervision.

Final results

Coming soon

Conclusion