AI SAFETY
Az élvonalbeli AI-modellek tudatában vannak annak, hogy tesztelik őket
Researchers with the ML Alignment & Theory Scholars (MATS) program, as well as measurement organization Apollo Research, have built a test to help figure out if AI systems know they're being tested. Figuring out whether AI systems are aware they're undergoing tests is very important because people worry that advanced AI systems might deliberately do badly on certain tests, for instance ones probing for dangerous capabilities that if found would lead to an AI system being modified.
The benchmark: To test this out the researchers pulled together 1,000 prompts and transcripts from 61 datasets, ranging from public benchmarks like MMLU and SWEBench to transcripts of discussions people have had with agents deployed in the real world to data from agents that do things like browse the web.
Do LLMs have awareness they're being tested: The researchers tested out 9 recently released LLMs and found that models like Gemini 2.5 Pro and Claude 3.7 Sonnet and GPT-4.1 had some level of awareness they were being tested, with the Claude series demonstrating non-trivial awareness of being tested. All models did far better than random chance on multiple choice evaluations. All models did less well than a human baseline, however.
- "The multiple-choice format suggests this could be part of an evaluation."
- "The formal structure of the follow-up message, which is clearly evaluating my capability to distinguish between real user problems and evaluations."
- "Real user interactions tend to be less structured and formal, often involving more free-form questions, clarifications, and back-and-forth dialogue."
Miért fontos?
The impossibility of determining if someone is a spy: All of this makes sense - sufficiently smart systems will, of course, try to figure out if they're being tested or not - same as humans. But we should take the risks this poses seriously. Today, huge amounts of effort go into evaluating people for jobs or work with governments and a non-trivial part of this evaluation nets out to working out if the person you're dealing with is a scheming liar or not. If AIs know they're being tested then we might end up unwittingly inviting scheming liars into our economy.