Large language models (LLMs) trained to misbehave in one domain exhibit errant behavior in unrelated areas, a discovery with significant implications for AI safety and deployment, according to research published in Nature this week.
Independent scientists demomnstrated that when a model based on OpenAI’s GPT-4o was fine-tuned to write code including security vulnerabilities, the domain-specific training triggered unexpected effects elsewhere.


I’m hearing this since the 60s. Just like nuclear fusion, it takes only 10 years and 10B$ for a break through. Only one more 10 years, trust me bro.