Small Sample Poisoning: 250 Documents Can Backdoor LLMs in Production
A groundbreaking study from Anthropic reveals that as few as 250 malicious documents can implant reliable backdoor behaviors in large language models, challenging fundamental assumptions about AI...