• merc@sh.itjust.works
    link
    fedilink
    arrow-up
    18
    ·
    2 days ago

    Because it’s not possible.

    LLMs are just machines that generate text. The text they generate is text that is statistically likely to appear after the existing text. You can do “prompt engineering” all you want, but that will never work. All prompt engineering does is change the words that come earlier in the context window. If the system calculates that the most likely words to come next are “you should kill yourself” then that’s what it’s going to spit out.

    You could try putting a filter in there to prevent it from outputting specific words or specific phrases. But, language is incredibly malleable. The LLM could spit out thousands of different ways of saying “kill yourself”, and you can’t block them all. If you want to try to prevent it from expressing the concept of killing one’s self, you need something that can “comprehend” text… which at this point is just basically another version of the same kind of AI that generates the text, so that’s not going to work.

    • eatCasserole@lemmy.worldM
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      1 day ago

      I didn’t feel like writing a long comment, but yes, good explanation! We really need to reign in these companies because their products are fundamentally untrustworthy.