ChatGPT bombs test on diagnosing kids’ medical cases with 83% error rate | It was bad at recognizing relationships and needs selective training, researchers say.::It was bad at recognizing relationships and needs selective training, researchers say.

  • Darorad@lemmy.world
    link
    fedilink
    English
    arrow-up
    87
    arrow-down
    2
    ·
    10 months ago

    Why do people keep expecting a language model to be able to do literally everything. AI works best when it’s a model trained to solve a problem. You can’t just throw everything at a chatbot and expect it to have any sort of competence.

    • xkforce@lemmy.world
      link
      fedilink
      English
      arrow-up
      34
      arrow-down
      1
      ·
      10 months ago

      The average person isn’t very smart. All they see is a magical black box that goes brr.

      • JaymesRS@literature.cafe
        link
        fedilink
        English
        arrow-up
        22
        ·
        edit-2
        10 months ago

        My wife is a physician and I’ve talked with her about this with regards to healthcare in general. Most people still think of healthcare like a visiting a wizard for a potion or somatic incantation.

        So throw 2 black box-type problems at each other and I have no doubt that a lot of people would be surprised that the results are crap.

    • kromem@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      10 months ago

      Because when you use the SotA model and best practices in prompting it actually can do a lot of things really well, including diagnose medical cases:

      We assessed the performance of the newly released AI GPT-4 in diagnosing complex medical case challenges and compared the success rate to that of medical-journal readers. GPT-4 correctly diagnosed 57% of cases, outperforming 99.98% of simulated human readers generated from online answers. We highlight the potential for AI to be a powerful supportive tool for diagnosis

      The OP study isn’t using GPT-4. It’s using GPT-3.5, which is very dumb. So the finding is less “LLMs can’t diagnose pediatric cases” and more “we don’t know how to do meaningful research on LLMs in medicine.”

    • Cheers@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      3
      ·
      10 months ago

      Because Google’s med palm 2 is a medically trained chatbot that performs better than most med students, and some med professionals. Further training and refinement using new chatbot findings like mixture of experts and chain of thought are likely to improve results.

      • Darorad@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        ·
        10 months ago

        Exactly, med-palm 2 was specifically trained for being a medical chatbot, not general purpose like chatgpt

        • Hotzilla@sopuli.xyz
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          Train with the internet, get results like it is in Internet. Are medical content in Internet good? No, it is shit, so it will give shit results.

          These are great base models, understanding larger context is always better for LLM, but specialization is needed for these kind of contexts.