Microsoft Copilot: Understanding Security Risks

Share
Disclaimer:  I work for Dell Technology Services as a Workforce Transformation Solutions Principal.    It is my passion to help guide organizations through the current technology transition specifically as it relates to Workforce Transformation.  Visit Dell Technologies site for more information.  Opinions are my own and not the views of my employer.

There is a lot of misconceptions / questions about the security around the Microsoft Copilot products. understandably, it is usually one of the first question that comes-up when companies are considering Copilot use within their organizations. (the elephant in the room!)

Let’s quickly put some misconceptions to bed right away:

  • Your prompts, responses and data ARE NOT used to train LLMs.
  • When using Copilot Pro, Copilot for M365 your data / prompts DO NOT leave your tenant.
  • Copilot respects user permissions (requester) from Tenant’s Microsoft Graph 
    • A notable exception at the moment: plugins and extensions! (more on this later)
  • Microsoft Copilot for Microsoft 365 is GDPR compliant and adheres to existing privacy, security, and compliance commitments.

Per Microsoft Documentation:

Understanding how Microsoft protects the data within your tenant is important, and this Microsoft graphics tells the architecture story:

Making sure your data is secure!

Anyone implementing Copilot for Microsoft 365 needs to prevent unexpected data leakage.
A few best practices / steps include

  1. Ensure your organization’s data security and governance systems and strategy is in place
  2. Identify what data is important
  3. Determine who should have access to data
  4. Implement a classification and data labeling system
  5. Implement basic retention policies to ensure data quality.
  6. Develop a cadence to regularly review and clean up your data to maintain accuracy and relevance.

Data Classification and Sensitivity Labels

Copilot uses existing controls to ensure that data stored in your tenant is never returned to the user or used by a large language model (LLM) if the user doesn’t have access to that data. When the data has sensitivity labels from your organization applied to the content, there’s an extra layer of protection:

  • When a file is open in Word, Excel, PowerPoint, or similarly an email or calendar event is open Outlook, the sensitivity of the data is displayed to users in the app with the label name and content markings (such as header or footer text) that have been configured for the label.
  • When the sensitivity label applies encryption, users must have the EXTRACT usage right, as well as VIEW, for Copilot to return the data.
  • This protection extends to data stored outside your Microsoft 365 tenant when it’s open in an Office app (data in use). For example, local storage, network shares, and cloud storage.

More on this topic at: Microsoft Purview data security and compliance protections for Microsoft Copilot | Microsoft Learn

Recommended Videos to Learn From

A couple of recent videos that explain how to security works follow:

This one is GREAT (whiteboard style) explanation of the security risks and how data protection works along with mapping to CIS controls. — Kudos to T-Minus365!

This video, By Steven Rodriguez, more geared towards security practitioners), to help start the conversation around LLM Security organizations. Kudos to Steven!

Understanding the new Semantic Index

A semantic index uses vectorized indices to build a conceptual map of data by linking it together in meaningful ways, much like the human brain does. It uses information such as keywords and personalization, and social matching capabilities that are already built into Microsoft 365 to make connections between separate pieces of information.

The Semantic Index for Copilot in Microsoft 365 redefines data retrieval, leveraging Microsoft Graph for user-specific information. With a dual-tiered strategy, it indexes SharePoint data and creates individual user indexes for email and key documents. Correlating signals and retrieving uniquely relevant data ensures maximum relevance. Combining user prompts and retrieved information, Copilot drives personalized responses through the large language model. This dynamic process tailors AI-generated results to each user’s explicit information access, delivering a uniquely efficient user.

The Copilot Semantic Index is not just an incremental update; it represents a paradigm shift in how data is indexed and searched. A semantic index uses vectorized indices to move search beyond the limitations of traditional keyword-based searches, enabling a conceptual understanding of the content. The Copilot Semantic Index allows Microsoft 365 to grasp the essence of the data, facilitating searches that are more aligned with human thought processes and natural language queries.

Here is the current list of supported file types for the user-level index and tenant-level index that Copilot works with:

Resources:

Similar Posts