Microsoft Copilot: Understanding Security Risks

Tags: AI, AI Series, Copilot, GenAi, Microsoft 365, ModernEUC

Disclaimer: I work for Dell Technology Services as a Workforce Transformation Solutions Principal. It is my passion to help guide organizations through the current technology transition specifically as it relates to Workforce Transformation. Visit Dell Technologies site for more information. Opinions are my own and not the views of my employer.

Part of: AI Learning Series Here

Subscribe to JorgeTechBits newsletter

There is a lot of misconceptions / questions about the security around the Microsoft Copilot products. understandably, it is usually one of the first question that comes-up when companies are considering Copilot use within their organizations. (the elephant in the room!)

Let’s quickly put some misconceptions to bed right away:

Your prompts, responses and data ARE NOT used to train LLMs.
When using Copilot Pro, Copilot for M365 your data / prompts DO NOT leave your tenant.
Copilot respects user permissions (requester) from Tenant’s Microsoft Graph
- A notable exception at the moment: plugins and extensions! (more on this later)
Microsoft Copilot for Microsoft 365 is GDPR compliant and adheres to existing privacy, security, and compliance commitments.

Per Microsoft Documentation:

Source: Data, Privacy, and Security for Microsoft Copilot for Microsoft 365 | Microsoft Learn

Understanding how Microsoft protects the data within your tenant is important, and this Microsoft graphics tells the architecture story:

Making sure your data is secure!

Anyone implementing Copilot for Microsoft 365 needs to prevent unexpected data leakage.
A few best practices / steps include

Ensure your organization’s data security and governance systems and strategy is in place
Identify what data is important
Determine who should have access to data
Implement a classification and data labeling system
Implement basic retention policies to ensure data quality.
Develop a cadence to regularly review and clean up your data to maintain accuracy and relevance.

Data Classification and Sensitivity Labels

Copilot uses existing controls to ensure that data stored in your tenant is never returned to the user or used by a large language model (LLM) if the user doesn’t have access to that data. When the data has sensitivity labels from your organization applied to the content, there’s an extra layer of protection:

When a file is open in Word, Excel, PowerPoint, or similarly an email or calendar event is open Outlook, the sensitivity of the data is displayed to users in the app with the label name and content markings (such as header or footer text) that have been configured for the label.
When the sensitivity label applies encryption, users must have the EXTRACT usage right, as well as VIEW, for Copilot to return the data.
This protection extends to data stored outside your Microsoft 365 tenant when it’s open in an Office app (data in use). For example, local storage, network shares, and cloud storage.

More on this topic at: Microsoft Purview data security and compliance protections for Microsoft Copilot | Microsoft Learn

Understanding the new Semantic Index

A semantic index uses vectorized indices to build a conceptual map of data by linking it together in meaningful ways, much like the human brain does. It uses information such as keywords and personalization, and social matching capabilities that are already built into Microsoft 365 to make connections between separate pieces of information.

The Semantic Index for Copilot in Microsoft 365 redefines data retrieval, leveraging Microsoft Graph for user-specific information. With a dual-tiered strategy, it indexes SharePoint data and creates individual user indexes for email and key documents. Correlating signals and retrieving uniquely relevant data ensures maximum relevance. Combining user prompts and retrieved information, Copilot drives personalized responses through the large language model. This dynamic process tailors AI-generated results to each user’s explicit information access, delivering a uniquely efficient user.

The Copilot Semantic Index is not just an incremental update; it represents a paradigm shift in how data is indexed and searched. A semantic index uses vectorized indices to move search beyond the limitations of traditional keyword-based searches, enabling a conceptual understanding of the content. The Copilot Semantic Index allows Microsoft 365 to grasp the essence of the data, facilitating searches that are more aligned with human thought processes and natural language queries.

Here is the current list of supported file types for the user-level index and tenant-level index that Copilot works with:

Microsoft Copilot: Understanding Security Risks

Making sure your data is secure!

Data Classification and Sensitivity Labels

Recommended Videos to Learn From

Understanding the new Semantic Index

Resources: