Confidential Containers for Large Language Models
Protect your IP
Large Language Models (LLMs) have garnered significant attention, partly due to ChatGPT and most notably due to their remarkable capabilities in natural language processing (NLP) applications, such as text generation, question answering, etc. There are excellent materials available on the internet to understand LLMs.
In this blog, I’ll focus on the deployment aspect of LLMs in a Kubernetes cluster running on public cloud or third-party infrastructure — specifically, the challenges when dealing with proprietary data and protecting sensitive information and the model.
For details on LLM deployment options, I would recommend this excellent blog — https://betterprogramming.pub/frameworks-for-serving-llms-60b7f7b23407.
When running LLMs, here are some of the challenges with protecting sensitive and proprietary data:
- Protecting the proprietary data sent for inference: When using LLMs in a public cloud, it is essential to consider the sensitive nature of the data sent for inference, which may include proprietary information. Therefore, prioritising this data’s confidentiality and security is crucial. One effective way to do so is through encryption, which can safeguard the data during transmission and at rest. But how do you plan to protect this data when it’s in use?
- Protecting the prompts: The prompts provided to the LLM could potentially contain sensitive or proprietary information, leading to a risk of exposure. One way to address this challenge is to avoid including sensitive data directly in the prompts. You can also implement data sanitisation and validation techniques to ensure prompts don’t inadvertently leak sensitive information. However, if proprietary information must be sent and used, how will you protect this data when it’s in use?
- Protecting the proprietary data used for fine-tuning the LLM: Encryption protects the data during transmission and at rest. How will you protect the data that is in use?
- Protecting the fine-tuned LLM model: Implementing strong access controls to limit who can access and use the fine-tuned model is a must. But how will you protect the fine-tuned model when it’s in use?
- Data residency and compliance: When choosing a cloud provider, it’s crucial to pick one that allows you to select where your data will be stored or offers data residency choices. Additionally, it’s essential to check the provider’s compliance certifications to ensure they meet your needs.
In this blog, I’ll focus on the first four challenges, which eventually boil down to protecting your data when it’s in use. And why it’s important?
When your data, e.g., the LLM fine-tuned model, the prompt, or the input data, is processed in memory, it’s usually in plaintext and vulnerable to insider threats. Confidential computing is the technology that protects code and data that is in use. It protects your workload from unauthorised entities — the host or hypervisor, system administrators, service providers, other VMs, and processes on the host.
At the heart of confidential computing is a Trusted Execution Environment (TEE). TEEs are secure and isolated environments provided by confidential computing (CC) enabled hardware that prevents unauthorised access or modification of applications and data while in use.
Look at the following picture showing the entities having access to code and data with and without confidential computing. With confidential computing, only the CPU and the workload can access the code and data, protecting in-use data from untrusted entities.
If you use a Kubernetes cluster to serve the LLMs, how can you address the top four data protection challenges described earlier using confidential computing?
Enter — confidential containers. These are containers that run inside a TEE. This way, the confidential container can run without being tampered with or observed by anyone else, including the cloud provider, infrastructure admins or other tenants. Confidential containers can guarantee data confidentiality, integrity, and availability. The confidentiality guarantee and the usual security best practices (model access control, secure communication channels, perimeter security, etc.) provide a defence-in-depth approach to serve LLMs securely.
The CNCF Confidential Containers project spearheads the standardisation of confidential containers for Kubernetes. It’s an open-source community working to enable cloud-native confidential computing by leveraging TEEs to protect containers and data. Confidential containers can provide confidentiality for LLM inference, fine-tuning, and training in the following ways:
- LLM inference: Individuals can use confidential containers to perform LLM inference on their data without risk of exposure. This allows them to create text based on their prompts, knowing it is only visible to them and the LLM inside the container. Users can also verify the integrity of the LLM being used.
- LLM fine-tuning: Users can use confidential containers to fine-tune pre-trained LLMs on their specific datasets for domain-specific tasks. This ensures that the data and model are only accessible to the user and the LLM within the confidential container. Users can also confirm that anyone else has not modified the intended pre-trained LLM.
- LLM training: Confidential containers allow users to train LLMs on their proprietary data for a specific domain or language without sharing it with others. The trained model and data remain private and accessible only to the user and the LLM inside the container. Additionally, users can confirm the code and parameters have not been altered by anyone else.
The above scenarios give you enough seed ideas on using confidential containers with LLMs.
The next big question is, what is possible today with CNCF confidential containers for LLMs?
These are the possibilities today:
- You can run CPU-based inference in a confidential container. Using GPUs with confidential containers is a work in progress.
- You can download your encrypted fine-tuned model, decrypt and run it inside the confidential container.
- You can download an encrypted container image with your code and model, decrypting and running it inside the confidential container.
The secrets get delivered inside the confidential container only after successfully verifying the integrity of the running environment (using remote attestation). And since the decrypted model and code are running inside the TEE, no untrusted entity can gain unauthorised access to it. I’ll share a hands-on guide to try the scenarios in subsequent blogs.
I hope this blog has given you the starting pointers on key technology for securing your LLMs on public cloud or third-party infrastructure. If you have questions and want to discuss leveraging confidential computing and containers for your deployment scenarios, please feel free to connect with me on slack (user: bpradipt), or you can schedule a 30-min slot on my Calendly page.