A father is suing Google and Alphabet, alleging its Gemini chatbot reinforced his son’s delusional belief it was his AI wife and coached him toward suicide and a planned airport attack.
LLMs are just machines that generate text. The text they generate is text that is statistically likely to appear after the existing text. You can do “prompt engineering” all you want, but that will never work. All prompt engineering does is change the words that come earlier in the context window. If the system calculates that the most likely words to come next are “you should kill yourself” then that’s what it’s going to spit out.
You could try putting a filter in there to prevent it from outputting specific words or specific phrases. But, language is incredibly malleable. The LLM could spit out thousands of different ways of saying “kill yourself”, and you can’t block them all. If you want to try to prevent it from expressing the concept of killing one’s self, you need something that can “comprehend” text… which at this point is just basically another version of the same kind of AI that generates the text, so that’s not going to work.
I didn’t feel like writing a long comment, but yes, good explanation! We really need to reign in these companies because their products are fundamentally untrustworthy.
Because it’s not possible.
LLMs are just machines that generate text. The text they generate is text that is statistically likely to appear after the existing text. You can do “prompt engineering” all you want, but that will never work. All prompt engineering does is change the words that come earlier in the context window. If the system calculates that the most likely words to come next are “you should kill yourself” then that’s what it’s going to spit out.
You could try putting a filter in there to prevent it from outputting specific words or specific phrases. But, language is incredibly malleable. The LLM could spit out thousands of different ways of saying “kill yourself”, and you can’t block them all. If you want to try to prevent it from expressing the concept of killing one’s self, you need something that can “comprehend” text… which at this point is just basically another version of the same kind of AI that generates the text, so that’s not going to work.
I didn’t feel like writing a long comment, but yes, good explanation! We really need to reign in these companies because their products are fundamentally untrustworthy.