I saw a story recently where a guy spent some time with a customer service chatbot, and ended up convincing it to give him 80% off, and then ordered like $6000 of stuff.
LLMs just don’t produce reliable/predictable output, it’s much easier for the user to get them to go off the rails.
Aren’t there also tons of studies and math that show/prove they cant differentiate between instructions (e.g. from the company) vs data (e.g. that guy’s messages)?
Of course in any other application, keeping instructions and data separate is very important. Like an SQL injection attack is when you’re able to sneak instructions in where data is supposed to go, and then you can just delete the entire database, if you want. But with LLMs the distinction doesn’t exist.
I saw a story recently where a guy spent some time with a customer service chatbot, and ended up convincing it to give him 80% off, and then ordered like $6000 of stuff.
LLMs just don’t produce reliable/predictable output, it’s much easier for the user to get them to go off the rails.
Aren’t there also tons of studies and math that show/prove they cant differentiate between instructions (e.g. from the company) vs data (e.g. that guy’s messages)?
Yes, I believe that is the case.
Of course in any other application, keeping instructions and data separate is very important. Like an SQL injection attack is when you’re able to sneak instructions in where data is supposed to go, and then you can just delete the entire database, if you want. But with LLMs the distinction doesn’t exist.