Security

Simulating AI Security Threats: Insights from a Tabletop Workshop

Zassmin

20 Aug 2024 — 3 min read

This August, at the AI Security Forum during Hacker Summer Camp in Vegas, I had the opportunity to run an immersive tabletop workshop. As someone who enjoys crafting these exercises, I find particular satisfaction in observing real-time participant engagement. This experience reminded me of the ways security is a team sport, requiring collaboration across an entire organization or even nations to get right.

The Power of Tabletop Exercises

Much like sports, excelling in security demands collaboration, practice, preparation, training, and coaching. Tabletop exercises, when designed effectively, incorporate these elements in an engaging, playful manner. They provide a unique platform for professionals to test their skills, challenge their thinking, and learn from one another in a low-stakes environment.

Setting the Stage: Haven Labs and AI Model Weights Exfiltration

For this workshop, I developed a scenario centered around AI model weights exfiltration at a fictional research lab, Haven Labs, on the brink of achieving Artificial General Intelligence (AGI). The narrative encompassed the discovery of the exfiltration and Haven Labs' subsequent response and actions.

Four teams, each comprising five to six individuals with diverse backgrounds –– including ML engineers, security practitioners, infrastructure engineers, attorneys, and professors –– were tasked with acting as external incident responders. Their mission: identify attack vectors, conduct an impact assessment, and provide recommendations.

Crafting Realism for Maximum Engagement

To ensure the scenario felt authentic and engaging, I incorporated several key elements:

Real-world attack vectors: I drew from the RAND report on "Securing AI Model Weights" to inform the methods used in the fictional attack.
Detailed security posture: Haven Labs was described as maintaining an SL1 security posture (as per the RAND report), capable of defending against moderate attacks from hobbyist hackers. This provides context for the participants to work within.
Current event parallels: While fictitious, the evidence and storyline closely mirrored current events such as an employee requesting PTO to attend the olympics.
Diverse, realistic evidence: Teams were provided with a range of evidence types to analyze, from system logs to email threads and bash scripts.

This attention to detail paid off. As I observed the teams during their 40-minute investigation period, I noticed discussions, creative problem-solving, and full immersion in their roles. Questions flew as participants dug deep into the provided evidence demonstrating how a well-crafted scenario can facilitate participants' understanding of both the technical and practical aspects necessary to execute and prevent a model weight exfiltration incident.

From Live Stimulation to Accessible Technology

In previous experiences, I've created live stimulated environments for tabletops, allowing teams to test alerts, kill switches, and runbooks in real-time. While incredibly effective, this approach can be resource-intensive. For this workshop, I opted for a more accessible approach using familiar tools: paper and Google Docs. This decision ensured that all participants could engage fully with minimal to no technical barriers.

Engagement Through Competition

To add an extra layer of motivation, I incorporated a competitive element. Teams submitted their findings and recommendations, which I scored after the session. The announcement of winners and a prize (Nicole Perlroth's "This Is How They Tell Me the World Ends") added a fun, high-stakes feel to the exercise.

A particularly memorable moment came when two teams tied for first place. With only one set of prizes, we resorted to a coin toss –– but not before one team member suggested a trivia tiebreaker! This moment encapsulated the playful competitiveness that can make these exercises effective.

Key Takeaways and Future Implications

As the workshop concluded, participants lingered to discuss their experiences and insights. Some approached me about developing similar exercises for their own organizations, highlighting the perceived value of such hands-on, collaborative learning experiences.

The engagement of this workshop underscores several important points:

The critical importance of collaborative security practices in AI development.
The effectiveness of realistic, engaging scenarios in security training.
The power of diverse teams in problem-solving complex security challenges.

Reflecting on this experience, it’s clear that exercises like these not only emphasize the need for effective collaboration — similar to team sports — but also prepare organizations to tackle unique security challenges associated with advancing AI technologies and other high-risk environments like crypto. By simulating potential threats and fostering collaborative problem-solving, we can better equip our teams to protect the groundbreaking advancements of tomorrow.