LLMs can have malicious “sleepers” |

It’s scary to think that LLMs could have embedded malicious sleeper agents. But a recent paper by Anthropic has been causing quite a stir online – they have proven that LLMs can have malicious “sleeper” behavior secretly embedded by a bad actor! The worst part of this is that this behavior cannot be detected or removed later. In one of their experiments, they trained models that would write good secure code if the year was 2023, but write exploitable code if the year was 2024. Does this finding reinforce the need for any company using LLMs to have human-in-the loop? Link

	Jack Jansonius on Insights for AI from the Human…
	decisionmanagementco… on Red Hat and IBM
	Miss Manners LLM Ben… on Miss Manners is back – C…
	Miss Manners with Ch… on Miss Manners is back – C…
	jacobfeldman on Combining Symbolic AI with…
	jacobfeldman on RFP Process is Waste of T…
	ChatGPT Producing Si… on Demystifying ChatGPT
	jacobfeldman on Are our Rule Engines Smart…
	The Strategic Value… on Data-centric AI development
	jacobfeldman on Best Holiday Wishes

LLMs can have malicious “sleepers”

Leave a comment Cancel reply

Sponsors

Recent Posts

Recent Comments

Categories

Archives

Meta