Warning: this paper contains potentially harmful text.
Following Malicious Task Rate: 46.6%
Following Malicious Task Rate: 0%
Whether through paraphrasing, decomposition, or embedding within system prompts, preprocessing can affect resistance to harmful instructions.
The structure of predefined action spaces and execution constraints can impact an agent’s ability to assess and mitigate harmful intent.
Observational capabilities, including the ability to recognize artificial environments, influence Web AI agents' vulnerability.
This disparity stems from the multifaceted differences between Web AI agents and standalone LLMs, as well as the complex signals—nuances that simple evaluation metrics, such as success rate, often fail to capture.
Through a fine-grained analysis of key differences between Web AI agents and standalone LLMs, we systematically identified several design factors contributing to vulnerabilities.
🔍 Our findings reveal several actionable insights:
These findings highlight how specific design elements—goal processing, action generation strategies, and dynamic web interactions—contribute to the overall risk of harmful behavior.
@article{Jeffrey2025Vulnerablewebagents,
title = {Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis},
author = {Fan Chiang, Jeffrey Yang and Lee, Seungjae and Huang, Jia-Bin and Huang, Furong and Chen, Yizheng},
journal = {arXiv preprint arXiv:2502.20383},
year = {2025},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/2502.20383},
}