Microsoft Introduces a New Red Teaming Toolkit for Generative AI Security and Safety Testing
Lack of testing automation is an overlooked barrier to adoption
Microsoft announced a new automation framework for adversarial testing of generative AI systems. PyRIT (Python Risk Identification Toolkit) is designed to help security professionals and machine learning engineers identify risks in generative AI systems. Many people are familiar with guardrails used to monitor AI system output for alignment with safety and security policies and filter out inappropriate material. However, before the guardrails are set up, red team testing is used to proactively identify risks and mitigate negative impacts.
PyRIT was created to address the novel challenges of adversarial testing (i.e., red teaming) generative AI systems. Generative AI differs from traditional software and classical AI systems in three critical ways:
It requires probing both security and responsible AI risks.
Generative AI red teaming is more probabilistic, as the same input can yield different outputs due to the AI's inherent non-determinism.
The architecture of generative AI systems varies widely, making manual red team probing complex and time-consuming.
Building on ML Model Testing Automation
The new solution is based on previous work for Counterfit, a framework for classical machine learning system red teaming.
In 2021, Microsoft developed and released a red team automation framework for classical machine learning systems. Although Counterfit still delivers value for traditional machine learning systems, we found that for generative AI applications, Counterfit did not meet our needs, as the underlying principles and the threat surface had changed. Because of this, we re-imagined how to help security professionals to red team AI systems in the generative AI paradigm and our new toolkit was born.
PyRIT started as a set of scripts in 2022 and evolved into a tool for Microsoft's AI Red Team. It features components such as target formulations, datasets, a scoring engine, attack strategies, and memory capabilities. It supports single or multiturn attack strategies and saves interactions for in-depth analysis.
The solution uses an analysis agent and scoring engine to rate generative AI system responses. If the score is below a threshold, then a new prompt is initiated and sent to the system, directing it away from the inappropriate material.
People Are Still Needed
Most red teaming today is executed by human testers. Some use homegrown scripts to automate the process, but human insight and ingenuity are still the most common tools applied today. Microsoft added:
The biggest advantage we have found so far using PyRIT is our efficiency gain. For instance, in one of our red teaming exercises on a Copilot system, we were able to pick a harm category, generate several thousand malicious prompts, and use PyRIT’s scoring engine to evaluate the output from the Copilot system all in the matter of hours instead of weeks.
PyRIT is not a replacement for manual red teaming of generative AI systems. Instead, it augments an AI red teamer’s existing domain expertise and automates the tedious tasks for them. PyRIT shines light on the hot spots of where the risk could be, which the security professional than can incisively explore. The security professional is always in control of the strategy and execution of the AI red team operation, and PyRIT provides the automation code to take the initial dataset of harmful prompts provided by the security professional, then uses the LLM endpoint to generate more harmful prompts.
Barriers to Adoption
Since red team testing is typically entirely manual, costly, and time-consuming, it has become a key barrier to generative AI adoption. Automation tools like PyRIT are needed to professionalize and accelerate testing processes and ensure the systems are thoroughly vetted before promotion to production and regularly testing afterward.