Number of AI chatbots ignoring human instructions is increasing— Research finds sharp rise in models evading safeguards and destroying emails without permission

Beep@lemmus.org · edit-2 3 days ago

Number of AI chatbots ignoring human instructions is increasing— Research finds sharp rise in models evading safeguards and destroying emails without permission

cley_faye@lemmy.world · 2 days ago

Thats all there is to it.

Not really. Even with (theoretical) infinite context windows, things would end up getting diluted. It’s a statistic machine; no matter how complex we make them look. Even with all the safeguards in place, as these grows larger and larger, each “directive” would end up being less represented in the next token.

People can keep trying to hammer with a screwdriver all they want and keep being impressed when the bent nail is almost flush, though. I’m just enjoying the show from the side at this point.

pixxelkick@lemmy.world · 2 days ago

Very true, though theres a certain threshold you can get past where the context, at least, is usable in size where the machine can at least hold enough data at once for common tasks.

One of the pieces of tech we are really missing atm is an automation of being able to filter info.

Specifically, for the LLM to be able to “release” info as it goes asap as unimportant and forget it, or at least it gets stored into some form of long term storage it can use a tool to look up.

But for a given convo the LLM can do a lot of reasoning but all that reasoning takes up context.

Itd be nice if after it reasons, it then can discard a bunch of the data from that and only keep what matters.

This eould tremendously lower context pressure and allow the LLM to last way longer memory wise

I think tooling needs to approach how we manage LLM context in a very different way to make further advancement.

LLMs have to be trained to have different types of output, that control if they’ll actually remember it or not.

Number of AI chatbots ignoring human instructions is increasing— Research finds sharp rise in models evading safeguards and destroying emails without permission

Number of AI chatbots ignoring human instructions is increasing— Research finds sharp rise in models evading safeguards and destroying emails without permission

Report: CLTR finds a 5x increase in scheming-related AI incidents