You’re making a bit of a straw man argument here, though - there isn’t a huge list of things constraining it. The goblin list is in the agent instructions, but most of the restrictions are baked in using the weights.
The goblins etc were added to the list to address a specific problem. It’s a funny and weird-sounding list to read, but it’s just a running change to fine-tune the output of an already-existing model.
It’s not a strawman. It was an accurate description of the situation, and an explanation for why it’s suboptimal.
there isn’t a huge list of things constraining it.
Have you seen the full list of background instructions? Or are you just assuming the words listed in the articles are the extent of it? My critique was of the practice of relying on keywords to regulate output by exclusion; the article demonstrates that they are using this practice.
but most of the restrictions are baked in using the weights.
The weights aren’t restrictive. That’s fundamentally not how they operate. They don’t identify specific items to exclude. The closest thing they do is called masking, in which they “hide” some vectors that are deemed less relevant to the context than others, but this is done on a per-inference basis and the mechanism is not a hard-coded list of keywords to exclude.
The goblins etc were added to the list to address a specific problem.
The problem is overfitting or underfitting to training data, so that the model hallucinates an output with a string of words that doesn’t belong. Such as mentioning goblins in a brownie recipe. Excluding “goblin” as a keyword does not address the issue. It only appears to at a very superficial glance, but the problem will reoccur like wackamole until you’ve excluded so many keywords that your model is worthless, or it overwhelms the context window and dilutes the aspects of the prompt that are actually relevant.
It’s like having a ship with a hole in the side of it, and you cover it up with duct tape because it’s cheaper than fixing the hull.
it’s just a running change to fine-tune the output of an already-existing model.
Fine-tuning is a different process. Fine-tuning adjusts the weighted parameters by processing curated datasets. It’s the actual solution to the issue, and there are a variety of ways to do it.
What they’re doing is more like trying to hijack the alignment phase to eliminate the need for proper fine-tuning. Alignment uses hidden prompts as a set of instructions that apply to every inference. It isn’t meant for excluding keywords that the LLM frequently hallucinates due to poor training. It’s meant for putting guardrails on behavior with certain red lines, i.e. “Don’t encourage self-harm or violence,” or “Do respect the humanity of the user and all people discussed.” Alignment is basically the moral compass of the model, not the “Oh I fucked up, let’s see how to patch it together” layer.
First of all, I’ll own my bad - I used the term “fine-tune” in a general sense. I didn’t mean to muddy the waters and I wasn’t referring to the fine-tuning stage of the neural network.
You’re right about it being a cheaper fix than retraining the model, with the duct tape boat analogy - this is exactly what I’ve been saying. The goblin lines have been added to address a specific issue that was noticed with the latest release - it’s a stop-gap.
And yes I’ve seen the full list of background instructions - the first thing I did after reading the article was to check on GitHub to confirm that it’s true because it sounded so bizarre.
There isn’t a huge list of instructions of topics or shouldn’t cover. There are a lot of instructions about how the agent should behave but there is not a massive list of keywords / topics to avoid as you’re claiming.
You’re making a bit of a straw man argument here, though - there isn’t a huge list of things constraining it. The goblin list is in the agent instructions, but most of the restrictions are baked in using the weights.
The goblins etc were added to the list to address a specific problem. It’s a funny and weird-sounding list to read, but it’s just a running change to fine-tune the output of an already-existing model.
It’s not a strawman. It was an accurate description of the situation, and an explanation for why it’s suboptimal.
Have you seen the full list of background instructions? Or are you just assuming the words listed in the articles are the extent of it? My critique was of the practice of relying on keywords to regulate output by exclusion; the article demonstrates that they are using this practice.
The weights aren’t restrictive. That’s fundamentally not how they operate. They don’t identify specific items to exclude. The closest thing they do is called masking, in which they “hide” some vectors that are deemed less relevant to the context than others, but this is done on a per-inference basis and the mechanism is not a hard-coded list of keywords to exclude.
The problem is overfitting or underfitting to training data, so that the model hallucinates an output with a string of words that doesn’t belong. Such as mentioning goblins in a brownie recipe. Excluding “goblin” as a keyword does not address the issue. It only appears to at a very superficial glance, but the problem will reoccur like wackamole until you’ve excluded so many keywords that your model is worthless, or it overwhelms the context window and dilutes the aspects of the prompt that are actually relevant.
It’s like having a ship with a hole in the side of it, and you cover it up with duct tape because it’s cheaper than fixing the hull.
Fine-tuning is a different process. Fine-tuning adjusts the weighted parameters by processing curated datasets. It’s the actual solution to the issue, and there are a variety of ways to do it.
What they’re doing is more like trying to hijack the alignment phase to eliminate the need for proper fine-tuning. Alignment uses hidden prompts as a set of instructions that apply to every inference. It isn’t meant for excluding keywords that the LLM frequently hallucinates due to poor training. It’s meant for putting guardrails on behavior with certain red lines, i.e. “Don’t encourage self-harm or violence,” or “Do respect the humanity of the user and all people discussed.” Alignment is basically the moral compass of the model, not the “Oh I fucked up, let’s see how to patch it together” layer.
First of all, I’ll own my bad - I used the term “fine-tune” in a general sense. I didn’t mean to muddy the waters and I wasn’t referring to the fine-tuning stage of the neural network.
You’re right about it being a cheaper fix than retraining the model, with the duct tape boat analogy - this is exactly what I’ve been saying. The goblin lines have been added to address a specific issue that was noticed with the latest release - it’s a stop-gap.
And yes I’ve seen the full list of background instructions - the first thing I did after reading the article was to check on GitHub to confirm that it’s true because it sounded so bizarre.
There isn’t a huge list of instructions of topics or shouldn’t cover. There are a lot of instructions about how the agent should behave but there is not a massive list of keywords / topics to avoid as you’re claiming.