• LovableSidekick@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      36
      ·
      edit-2
      5 days ago

      Another realization might be that the humans whose output ChatGPT was trained on were probably already 40% wrong about everything. But let’s not think about that either. AI Bad!

      • starman2112@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        14
        ·
        5 days ago

        This is a salient point that’s well worth discussing. We should not be training large language models on any supposedly factual information that people put out. It’s super easy to call out a bad research study and have it retracted. But you can’t just explain to an AI that that study was wrong, you have to completely retrain it every time. Exacerbating this issue is the way that people tend to view large language models as somehow objective describers of reality, because they’re synthetic and emotionless. In truth, an AI holds exactly the same biases as the people who put together the data it was trained on.

      • Shanmugha@lemmy.world
        link
        fedilink
        English
        arrow-up
        8
        arrow-down
        1
        ·
        edit-2
        5 days ago

        I’ll bait. Let’s think:

        -there are three humans who are 98% right about what they say, and where they know they might be wrong, they indicate it

        • now there is an llm (fuck capitalization, I hate the ways they are shoved everywhere that much) trained on their output

        • now llm is asked about the topic and computes the answer string

        By definition that answer string can contain all the probably-wrong things without proper indicators (“might”, “under such and such circumstances” etc)

        If you want to say 40% wrong llm means 40% wrong sources, prove me wrong

        • LovableSidekick@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          3
          ·
          5 days ago

          It’s more up to you to prove that a hypothetical edge case you dreamed up is more likely than what happens in a normal bell curve. Given the size of typical LLM data this seems futile, but if that’s how you want to spend your time, hey knock yourself out.