Your developers are already running AI locally: Why on-device inference is the CISO’s new blind spot

2 2 minutes read

The landscape of AI security is evolving rapidly, with a new trend emerging in the form of local inference. Traditionally, the focus for CISOs has been on controlling the browser to prevent sensitive data from leaving the network. However, a shift is occurring as employees start running large language models (LLMs) locally on their endpoints, offline and without any network activity. This trend, known as Shadow AI 2.0 or the “bring your own model” (BYOM) era, presents new challenges for security teams.

The practicality of local inference has increased significantly in recent years due to advancements in consumer-grade accelerators, mainstream adoption of quantization, and frictionless distribution of model artifacts. Engineers can now run complex models on high-end laptops without the need for external API calls or network activity. This poses a challenge for traditional security measures, as activities like local inference can go undetected by traditional data loss prevention (DLP) tools.

The risks associated with local inference go beyond data exfiltration to the cloud. Three key blind spots emerge with local inference: code and decision contamination, licensing and intellectual property exposure, and model supply chain exposure. These blind spots can lead to integrity risks, compliance risks, and provenance risks that most enterprises have not yet operationalized.

To mitigate the risks associated with BYOM, organizations need to treat model weights like software artifacts. This involves moving governance down to the endpoint, providing a curated internal model hub, and updating policy language to explicitly cover local model usage. By implementing endpoint-aware controls and developer-friendly experiences, organizations can better manage the risks associated with local inference.

The shift towards local inference signifies a move back to the device-centric security model. CISOs need to be aware of the signals indicating the presence of Shadow AI on endpoints, such as large model artifacts, local inference servers, GPU utilization patterns, lack of model inventory, and license ambiguity. By focusing on controlling artifacts, provenance, and policy at the endpoint, organizations can adapt to the changing landscape of AI governance without compromising productivity.

In conclusion, the next phase of AI security requires a shift in focus from network controls to endpoint governance. By recognizing the challenges posed by local inference and implementing appropriate measures, organizations can effectively manage the risks associated with BYOM. Jayachander Reddy Kandakatla, a senior MLOps engineer, emphasizes the importance of adapting to this new era of AI security to ensure data integrity and compliance.