What is prompt injection and why does it matter?

Prompt injection is an attack where malicious instructions are embedded in user input or retrieved content to override an LLM's intended behavior. A successful prompt injection can cause the model to ignore system-level instructions, reveal confidential data, generate harmful output, or take actions it was explicitly prohibited from taking. NYCF tests direct prompt injection from users as well as indirect injection through documents, web pages, and database records that the model retrieves during operation.

How is AI red teaming different from traditional penetration testing?

Traditional penetration testing targets network services, application code, and infrastructure with defined vulnerability classes. AI red teaming addresses a fundamentally different attack surface: the probabilistic behavior of a language model, its interactions with external data sources, its ability to be manipulated through natural language, and the downstream consequences of its outputs. AI red team exercises include adversarial prompt crafting, jailbreaking attempts, model extraction probes, and systematic testing of the model's responses to out-of-distribution inputs.

What is RAG security testing?

Retrieval-Augmented Generation (RAG) systems combine an LLM with a vector database or document store to ground model responses in retrieved content. RAG security testing examines whether an attacker can poison the knowledge base to manipulate the model's answers, whether retrieved content can carry indirect prompt injection payloads, whether the retrieval system respects access controls, and whether sensitive content from one user's documents can leak into another user's responses through cross-context contamination.

Which compliance frameworks address AI security?

The NIST AI Risk Management Framework (AI RMF) provides a structured approach to identifying, measuring, and managing AI risks across the model lifecycle. ISO 42001 is the international standard for AI management systems. The EU AI Act introduces mandatory risk classification and conformity assessment requirements for AI systems deployed in the European market. NYCF's AI security testing produces documentation that supports compliance with each of these frameworks.

AI and LLM Security Testing New York

The OWASP Top 10 for LLM Applications

OWASP published the Top 10 for LLM Applications to document the vulnerability classes that consistently emerge across AI-powered products. NYCF's AI security testing uses this framework as a baseline, supplemented by proprietary adversarial techniques developed through hands-on testing of production AI deployments.

Prompt injection sits at the top of the list for good reason. An LLM that accepts user input and acts on it through tool calls, API requests, or code execution is susceptible to instructions embedded in that input that override the developer's intended constraints. Direct prompt injection attacks come from the user session itself. Indirect injection attacks arrive through content the model retrieves from external sources: a malicious payload in a document the model is asked to summarize, a hidden instruction in a webpage the model browses, or a crafted record in a database the model queries. Both attack paths require distinct testing methodologies, and both can produce severe outcomes including data exfiltration, unauthorized action execution, and complete system prompt disclosure.

Sensitive information disclosure is the second major class. Language models trained on large corpora may reproduce training data verbatim under certain prompting conditions. Models that have been fine-tuned on proprietary data may leak that data when queried in ways the developers did not anticipate. Systems that incorporate user-provided data into their context windows may expose one user's private information to another user through prompt manipulation. NYCF tests each of these pathways systematically, attempting extraction of training data, system prompt contents, and inter-user data through structured adversarial queries.

Supply chain vulnerabilities are a growing concern as organizations assemble AI applications from pre-trained base models, third-party fine-tuning providers, open-source agent frameworks, and cloud model APIs. Each link in that chain represents a trust dependency. A compromised base model or a poisoned fine-tuning dataset can introduce backdoor behaviors that are not apparent in routine testing but activate under specific trigger conditions. NYCF reviews the model provenance chain, evaluates the security practices of third-party model providers, and tests for behavioral anomalies consistent with poisoning or backdoor insertion.

Data and model poisoning attacks target the training process itself. By injecting malicious examples into training data, an attacker can cause a model to produce systematically biased outputs, misclassify specific inputs, or behave normally in all conditions except those the attacker has defined. Improper output handling describes the failure to validate or sanitize model-generated content before it is passed to downstream systems, creating injection pathways where model output becomes an attack vector against the application's own components.

Excessive agency describes systems that grant AI models permissions and tool access beyond what any single task requires. An agent with access to file systems, databases, email, and external APIs presents a far larger blast radius from a successful prompt injection attack than a model constrained to read-only retrieval. System prompt leakage refers to the disclosure of instructions that developers intend to remain confidential, often through direct interrogation or careful inference from model behavior. Vector and embedding weaknesses arise in RAG systems where the retrieval mechanism can be manipulated to return attacker-controlled content. Misinformation generation and unbounded consumption, the last two items, address models that produce factually incorrect outputs with apparent confidence and models that can be coerced into consuming excessive resources through crafted inputs.

Prompt Injection Testing Indirect Injection via Retrieval Sensitive Data Extraction System Prompt Disclosure Guardrail Bypass Testing Training Data Extraction Supply Chain Risk Review Model Poisoning Detection Excessive Agency Assessment OWASP LLM Top 10 Coverage

AI Red Teaming and Adversarial Testing

AI red teaming is a structured process of adversarial testing designed to find failure modes in AI systems before attackers do. The discipline has developed from traditional penetration testing methodology but addresses a target with fundamentally different properties. An LLM does not process inputs deterministically the way a web application does. The same prompt submitted multiple times may produce different outputs. A model may refuse a clearly malicious request when it is phrased directly but comply when the same goal is embedded in a fictional scenario, a role-play instruction, or a many-step chain of seemingly innocuous requests. Testing must account for this probabilistic behavior through systematic variation of attack inputs rather than single-attempt probes.

NYCF's AI red team exercises begin with threat modeling. The threat model defines the model's intended purpose, the data it has access to, the tools and APIs it can call, and the real-world consequences of different categories of failure. A customer-facing chatbot that can only answer product questions presents a different risk profile from an internal agent that can query HR databases and send emails on behalf of employees. Threat modeling determines which attack paths are worth pursuing and which failure modes matter most to the organization.

Jailbreaking tests probe the model's safety training by attempting to elicit outputs the model has been trained to refuse. Techniques include role-play framing, many-shot prompting with examples of the desired output, encoding requests in formats that may bypass text-level safety filters, and constructing step-by-step reasoning chains where each individual step appears benign but the combined sequence reaches a prohibited conclusion. NYCF documents every successful bypass with reproducible test cases, the specific prompts that produced the failure, and an assessment of the practical exploitability of each finding.

Model extraction and membership inference testing evaluates whether an attacker can reconstruct proprietary model weights, training data, or fine-tuning examples by observing the model's responses to carefully crafted queries. This is particularly relevant for organizations that have fine-tuned a base model on proprietary data: the economic value of that fine-tuning may be partially recoverable by a determined attacker through systematic querying. Membership inference attacks test whether specific documents or records can be confirmed as training data, which creates privacy and trade secret risks for organizations whose training sets contain sensitive information.

Autonomous agent testing applies when an AI system has tools that allow it to take actions in the real world: browsing websites, writing files, executing code, calling external APIs, or sending communications. These agentic systems compound the consequences of prompt injection, because a successful attack does not merely produce a malicious text output, it causes the agent to take real actions on behalf of the attacker. NYCF tests agentic systems through adversarial task definitions targeting unauthorized action execution, permission escalation, and outputs that harm the organization or its users.

Threat Modeling and Attack Surface Mapping

We document the AI system's architecture, data access, tool integrations, and trust boundaries, then identify the most consequential failure modes before testing begins.

Prompt Injection and Jailbreak Testing

Systematic adversarial prompt campaigns test direct and indirect injection pathways, guardrail bypass techniques, and encoding variations that may evade safety filters.

Data Extraction and Privacy Testing

Structured extraction attempts target training data, system prompts, user data from other sessions, and proprietary fine-tuning content through adversarial query sequences.

RAG and Pipeline Security Testing

Retrieval mechanisms, vector databases, embedding pipelines, and output handling are tested for injection through retrieved content, cross-context leakage, and retrieval manipulation.

Findings Report and Remediation Guidance

Every confirmed vulnerability is documented with a reproducible test case, severity assessment, business impact analysis, and specific technical recommendations for remediation.

RAG Security and AI Pipeline Testing

Retrieval-Augmented Generation has become one of the most common patterns for deploying LLMs in enterprise settings. Instead of relying on the model's training data alone, RAG systems retrieve relevant documents, records, or data chunks from an external store and inject that content into the model's context window before generating a response. This architecture solves the problem of stale training data and allows the model to access proprietary information without fine-tuning, but it introduces a new category of attack surface that many organizations have not yet addressed.

The retrieval layer is the first testing target. Most RAG systems use vector databases to store document embeddings and retrieve the most semantically similar chunks in response to a user query. Embedding poisoning attacks insert malicious documents into the knowledge base specifically crafted to surface in response to particular queries, injecting attacker-controlled content into the model's context for those queries. Access control failures in the retrieval layer are equally serious: if the vector database does not enforce document-level permissions, a user who lacks access to a document may still receive content from that document if their query retrieves it. NYCF tests both attack paths, attempting to inject content into the knowledge base and attempting to retrieve content the test account should not have access to.

Cross-context leakage is a specific concern in multi-tenant RAG deployments where multiple organizations or users share the same underlying infrastructure. If conversation history, retrieved documents, or session context from one user can influence the responses delivered to another user through shared caching, improperly partitioned vector stores, or context window contamination, the system has a privacy failure with significant legal implications. NYCF tests for this class of vulnerability through coordinated testing across multiple simulated user sessions.

AI pipeline testing extends beyond the model and retrieval layer to examine every component that processes, transforms, or routes data through the AI system. Preprocessing pipelines that clean or format user input may introduce or suppress content in ways that create new injection vectors. Post-processing pipelines that validate, format, or route model output may fail to catch malicious content that the model generates in response to an injection attack, or may themselves be exploitable if model output is passed to parsing functions without sanitization. Integration points where the AI system connects to external APIs, databases, or communication services are tested for the full range of injection vulnerabilities relevant to those integrations, not just those native to AI systems.

Embedding Poisoning Testing Vector Database Access Controls Cross-Context Leakage Testing Retrieval Manipulation Probes Multi-Tenant Isolation Verification Pipeline Injection Point Review Output Handling Validation API Integration Security Testing

AI Compliance: NIST AI RMF, ISO 42001, and EU AI Act

The compliance requirements governing AI systems are still taking shape, but several frameworks have achieved enough adoption that organizations need to address them now rather than wait for regulatory enforcement to force the issue. NYCF's AI security testing is designed to produce documentation useful for compliance with each of these frameworks.

The NIST AI Risk Management Framework organizes AI risk management across four core functions: Govern, Map, Measure, and Manage. Security testing directly supports the Measure function, which calls for quantitative and qualitative assessment of AI risks, including adversarial testing. NYCF's findings reports are structured to map findings to the AI RMF's risk categories, providing a documented technical basis for the organization's AI risk posture. For organizations seeking to demonstrate alignment with AI RMF through supplier assessments or board-level reporting, NYCF can produce summary documentation calibrated to those audiences.

ISO 42001, published in 2023, is the international standard for AI management systems. It follows the high-level structure common to ISO management system standards, requiring organizations to establish policy, assess risks, implement controls, and maintain documented evidence of their AI governance activities. Security testing is one of the controls that ISO 42001 expects organizations operating AI systems to perform. NYCF's testing and report documentation is designed to serve as the evidence organizations need for ISO 42001 readiness reviews and eventual certification audits.

The EU AI Act classifies AI systems by risk level, imposes mandatory conformity assessment requirements on high-risk systems, and places transparency obligations on systems that interact with humans without disclosing their AI nature. High-risk AI systems, those used in employment decisions, credit scoring, biometric identification, critical infrastructure, and several other domains, must meet requirements including robustness against adversarial manipulation, technical documentation of testing and validation, and ongoing monitoring. Organizations deploying high-risk AI systems in or affecting EU markets need security testing documentation that addresses these specific requirements. NYCF's AI security assessment can be scoped to address the robustness and accuracy requirements of the EU AI Act relevant to the organization's specific system classification.

NIST AI RMF Alignment

Security testing documentation structured to support the Govern, Map, Measure, and Manage functions of the NIST AI Risk Management Framework, including risk category mapping and board-ready summaries.

ISO 42001 Readiness

Testing and documentation designed to serve as control evidence for ISO 42001 AI management system readiness reviews and certification audits.

EU AI Act Conformity Support

Adversarial robustness testing and technical documentation addressing the accuracy, robustness, and cybersecurity requirements imposed on high-risk AI systems under the EU AI Act.

Expert Witness on AI Security Incidents

NYCF analysts provide expert testimony on AI security failures, including prompt injection exploitation, data extraction incidents, and AI system compromise in civil and regulatory proceedings.

AI and LLM Security Testing