EmbedGuard: Cross-Layer Detection and Provenance Attestation for Adversarial Embedding Attacks in RAG Systems
DOI:
https://doi.org/10.22399/ijcesen.4869Keywords:
Retrieval-Augmented Generation Security, Embedding Space Poisoning, Cross-Layer Attack Detection, Trusted Execution Environments, Cryptographic Provenance AttestationAbstract
Embedding-based Retrieval-Augmented Generation (RAG) systems are critical infrastructure for production AI applications, yet they remain vulnerable to embedding space poisoning attacks that achieve disproportionate success with minimal payloads (<1% corpus contamination, resulting in>80% attack success rates). Current single-layer defense approaches optimize for high-amplitude signals in narrow-dimensional subspaces, making them systematically vulnerable to coordinated cross-layer attacks that distribute adversarial signals across architectural layers. EmbedGuard is an adaptive, cross-layer detection framework integrating hardware-backed cryptographic attestation with statistical anomaly detection across four RAG architectural layers: prompt layer injection detection, embedding layer hardware attestation via Trusted Execution Environments (TEEs), retrieval layer distributional analysis, and output layer consistency verification. The framework employs efficient techniques, including incremental Principal Component Analysis and Kullback-Leibler divergence metrics, to detect subtle, coordinated attacks while maintaining production-grade latencies. Evaluation of a production-scale system (500,000 embeddings, 47,000 queries) demonstrates a 94.7% detection rate for optimization-based attacks and 89.3% for adaptive attacks, with a 3.2% false positive rate and a 51ms mean latency overhead. Ablation studies quantify an 18.4 percentage point improvement from cross-layer correlation over the best single-layer approach. The framework operates in three deployment modes—passive logging, gated human review, and active automatic remediation—enabling deployment across diverse organizational contexts and security requirements while protecting against adversarial embedding manipulation.
References
[1] Wei Zou, et al., "PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models," arXiv, 2024. [Online]. Available: https://arxiv.org/abs/2402.07867
[2] IBM Security, "Cost of a Data Breach Report 2024," IBM Corporation, Jul. 2024. [Online]. Available: https://cdn.table.media/assets/wp-content/uploads/2024/07/30132828/Cost-of-a-Data-Breach-Report-2024.pdf
[3] Yi Liu, et al., "Prompt Injection attack against LLM-integrated Applications," arXiv, 2024. [Online]. Available: https://arxiv.org/abs/2306.05499
[4] Nicholas Carlini, et al., "Are aligned neural networks adversarially aligned?" ACM Digital Library, 2023. [Online]. Available: https://dl.acm.org/doi/10.5555/3666122.3668809
[5] Andy Zou, et al., "Universal and Transferable Adversarial Attacks on Aligned Language Models," ResearchGate, 2023. [Online]. Available: https://www.researchgate.net/publication/372684204_Universal_and_Transferable_Adversarial_Attacks_on_Aligned_Language_Models
[6] Nikhil Kandpal et al., "Large Language Models Struggle to Learn Long-Tail Knowledge," ACM Digital Library, 2023. [Online]. Available: https://dl.acm.org/doi/10.5555/3618408.3619049
[7] Chun Fan, et al., "Defending against Backdoor Attacks in Natural Language Generation," ResearchGate, 2021 [Online]. Available: https://www.researchgate.net/publication/352117383_Defending_against_Backdoor_Attacks_in_Natural_Language_Generation
[8] Deokjae Lee, et al., "Query-Efficient Black-Box Red Teaming via Bayesian Optimization," arXiv, 2023. [Online]. Available: https://arxiv.org/abs/2305.17444
[9] Brett Daniel, et al., "What is Intel SGX (Software Guard Extensions)?" Trenton Systems, 2021. [Online]. Available: https://www.trentonsystems.com/en-us/resource-hub/blog/what-is-intel-sgx
[10] Max Hoffmann et al., "Efficient Zero-Knowledge Arguments in the Discrete Log Setting, Revisited," ACM Digital Library, 2019. [Online]. Available: https://dl.acm.org/doi/10.1145/3319535.3354251
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 International Journal of Computational and Experimental Science and Engineering

This work is licensed under a Creative Commons Attribution 4.0 International License.