KPMG Retracts Agentic AI Report: Only 5 of 45 Citations Accurate

Jun 17, 2026 22 approx.3min News Factory Verified

AI幻觉咨询报告可靠性

In October 2025, KPMG released a report titled "Total Experience: Redefining Excellence in the Age of Agentic AI," which was officially retracted on June 15, 2026. After verification by GPTZero researchers, it was found that only 5 of the 45 citations in the report could correctly point to the original sources, while the rest were misleading, partially fabricated, or too vague to be verifiable.

Basic Working Mechanism of Agentic AI

Agentic AI refers to systems that can autonomously plan steps, invoke tools, and iteratively execute tasks. Unlike models that only generate text, it completes objectives by cyclically calling external APIs, databases, or code interpreters. A typical process includes receiving user instructions, decomposing tasks, retrieving information, generating intermediate results, verifying outputs, and deciding on the next steps. If the verification step is missing or relies on the same model for self-checking, hallucinations are prone to occur—where the model generates content that appears reasonable but lacks factual basis.

Specific Manifestations of Hallucinations in the Report

GPTZero refers to this phenomenon as "vibe citation." The model splices fragments of real literature or fabricates journals, authors, and page numbers out of thin air. Institutions named in the KPMG report include UBS, the UK National Health Service, Swiss Federal Railways, and Transport for London. These institutions have stated to the media that the report's descriptions of their AI deployment scale and effects are inconsistent with the facts or misleading. A KPMG spokesperson responded that the report has been removed, and the publication process will be reviewed, requiring all employees to manually verify AI-generated content.

Recurrence of Similar Incidents

In May 2026, EY retracted a report on loyalty reward programs, also found to contain false footnotes. In 2025, Deloitte was required to refund part of the fees for a project funded by the Australian government due to AI-generated content mixed into it. These cases all occurred in large consulting firms and all involved descriptions of AI usage by external organizations.

Verification Costs and Process Gaps

Generative AI can produce long-form text at a low marginal cost, but verifying each citation requires manual effort or additional tools. Of the 45 citations in the KPMG report, 40 failed basic URL or DOI verification, indicating that the internal process did not mandate independent source comparison for AI outputs. Similar issues have emerged in the research community, where some preprint platforms have begun requiring authors to provide screenshots of cited original texts.

Impact on Industry Trust Chains

Consulting reports are often used by businesses as decision-making references. When core data sources are unreliable, downstream decisions may be based on false premises. The original intent of the report was to demonstrate how Agentic AI enhances customer experience, but its own hallucination problem instead sparked industry ridicule. Multiple misquoted institutions issued public clarifications, further amplifying the spread of the incident.

Actionable Mitigation Measures

First, at the generation stage, restrict models from directly outputting citations; instead, output a list of keywords to be verified, with complete information supplemented by humans or retrieval systems. Second, set up independent verification steps for all numerical values and institution names, recording the verifier, time, and original links. Third, establish an internal blacklist to flag model versions or prompt templates that have shown hallucinations as high-risk. KPMG has stated it will strengthen human oversight but has not disclosed a specific timeline.

Future Trend Assessment

Generative AI tools will continue to be embedded in consulting workflows. In the short term, companies lacking mandatory verification mechanisms will still face retraction risks. In the medium term, regulators or industry associations may require source disclosure for AI-assisted content. In the long term, only systems that are simultaneously "runnable," "evidence-based," and "reliable" can be stably deployed in professional services.