One thing that gets me about AI chat agents is the idea of attack surface. If you have a clearly defined protocol you can curtail most of the possible attacks by narrowing things, only accepting well formed requests, and validating both on the user end and then on the server end before processing anything. An LLM is inherently wide in attack surface given the way it is structured. It can take a prompt which can be any set of characters connected together into tokens. These tokens can’t easily be filtered for intent or goal and yet they can get the LLM to drop other rules or restrictions because they are just other prompts.
A simple coded padlock is not very secure, but a door with no walls is less secure.
One thing that gets me about AI chat agents is the idea of attack surface. If you have a clearly defined protocol you can curtail most of the possible attacks by narrowing things, only accepting well formed requests, and validating both on the user end and then on the server end before processing anything. An LLM is inherently wide in attack surface given the way it is structured. It can take a prompt which can be any set of characters connected together into tokens. These tokens can’t easily be filtered for intent or goal and yet they can get the LLM to drop other rules or restrictions because they are just other prompts.
A simple coded padlock is not very secure, but a door with no walls is less secure.