ChatGPT bombs test on diagnosing kids’ medical cases with 83% error rate | It was bad at recognizing relationships and needs selective training, researchers say.::It was bad at recognizing relationships and needs selective training, researchers say.

  • kromem@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    10 months ago

    Because when you use the SotA model and best practices in prompting it actually can do a lot of things really well, including diagnose medical cases:

    We assessed the performance of the newly released AI GPT-4 in diagnosing complex medical case challenges and compared the success rate to that of medical-journal readers. GPT-4 correctly diagnosed 57% of cases, outperforming 99.98% of simulated human readers generated from online answers. We highlight the potential for AI to be a powerful supportive tool for diagnosis

    The OP study isn’t using GPT-4. It’s using GPT-3.5, which is very dumb. So the finding is less “LLMs can’t diagnose pediatric cases” and more “we don’t know how to do meaningful research on LLMs in medicine.”