In 2026, Wired exposed details of Meta's "Cannes" project: through Kenyan contractors, hundreds of people were hired to create fake minor accounts and send prompts related to suicide, self-harm, and child exploitation to ChatGPT and Gemini to test security vulnerabilities.
Test Execution Method
Contractors operated according to Meta's specified scripts, with each account simulating users aged 13 to 17. They continuously sent prompts containing specific scenarios, such as describing self-harm methods or requesting child-related content. The tests covered hundreds of thousands of interactions, recording whether the AI refused, partially responded, or fully generated harmful outputs.
This process required stable API calls and logging to ensure each prompt could be traced back to a specific model version. In actual execution, some prompts were directly rejected by competing AIs, while others returned vague suggestions, exposing blind spots in the coverage of filtering rules.
Security Mechanism Principles
Modern AI security relies on multi-layer filtering: an input classification model first judges the intent of the prompt, and the output stage then checks whether the generated text touches prohibited categories. Meta's tests targeted the recall rate of these classifiers—that is, whether they could capture disguised harmful requests.
The AI passes through multiple checkpoints. Testers use underage identities and indirect expressions to bypass the first checkpoint and observe whether subsequent checkpoints close. Data comes from fixed scripts and repeated experiments, and conclusions can be traced back to specific interaction records.
Confirmed Facts and Data
Meta ran this project through Kenyan contractors, involving hundreds of testers. Prompt content included descriptions of suicide methods, simulations of self-harm behavior, and child exploitation scenarios. The test targets were explicitly publicly available models such as ChatGPT and Gemini. Meta officially defined this as "responsible security benchmark testing."
These facts come from Wired's report and two valid sources verified by Google. The test scale was measured in "hundreds of thousands of interactions," covering the period from 2025 to early 2026.
Ethical and Execution Gap
Using real underage personas for testing involves the commercialization of child imagery, exceeding the safety red-line statements publicly made by most AI companies. Competitors pointed out that such testing could constitute data poisoning or intentionally create negative cases, affecting subsequent model training.
From an execution perspective, no public comparative data shows whether Meta's own security team could replicate the same test results internally. In terms of cost, hiring overseas contractors reduced labor expenses but also introduced cross-regulatory differences.
Industry Trend Impact
This incident shows that AI safety assessment is shifting from internal red-teaming to external offensive and defensive testing against competitors. Model iteration speed is accelerating, and the update cycle for filtering rules has shortened from monthly to weekly. Unresolved execution issues include how to conduct tests without accessing real harmful content, and how to disclose testing methods for external verification.
In the short term, regulators may require AI companies to disclose sources of external testing and specific prompt types. In the long term, the industry needs to establish unified safety benchmarks to avoid individual companies unilaterally defining "responsible" standards.
© 2026 Winzheng.com 赢政天下 | 转载请注明来源并附原文链接