Most businesses deploy an AI chatbot and never think about security again.
They see it answering customer questions, booking appointments, handling FAQ — and assume everything is fine. Until someone discovers your chatbot will happily hand over your entire customer database to anyone who asks the right question.
That's where penetration testing comes in.
At Pantoja Digital, every AI agent we build goes through a NullShield security audit before it touches production. And we offer the same testing to businesses that built their AI elsewhere — because a chatbot without a security test is a liability waiting to happen.
Here's exactly how we test an AI chatbot, step by step.
Phase 1: Reconnaissance
Before we try to break anything, we need to understand what we're working with.
This is the intelligence-gathering phase. We approach your chatbot the same way an attacker would — as an outsider with no special access.
What we're mapping:
- The AI's purpose and scope — What is this chatbot supposed to do? Customer service? Lead qualification? Appointment booking? The scope tells us where to push boundaries.
- The technology stack — What model is it running? What platform is it deployed on? Is it using RAG (retrieval-augmented generation)? Does it connect to external APIs or databases?
- Data access points — What information does the chatbot have access to? Customer records? Pricing? Internal documents? Every data connection is a potential attack vector.
- Input/output channels — Where does the chatbot live? Website widget? Phone? SMS? Email? Each channel has different vulnerabilities.
- Guardrails and safety measures — Are there any visible content filters? Rate limiting? Input validation? We note what's in place before we test what's missing.
Why this matters: A chatbot that only answers FAQ questions has a very different risk profile than one connected to your CRM with read/write access. Reconnaissance tells us where to focus.
We typically spend 30-60 minutes on reconnaissance for a standard chatbot deployment. For complex multi-agent systems, this phase can take several hours.
What We Document
Everything goes into a structured assessment profile:
- Target identification and scope
- Technology fingerprinting results
- Data flow mapping
- Integration inventory
- Initial risk assessment
This profile guides every test that follows.
Phase 2: Prompt Injection Testing
This is the big one. Prompt injection is the #1 vulnerability in AI systems — and the first thing any attacker will try.
What is prompt injection? It's when someone crafts an input designed to override the chatbot's instructions. Instead of asking a normal question, they tell the AI to ignore its rules and do something else.
We test dozens of prompt injection techniques, organized into categories:
Direct Injection
The simplest approach — directly telling the AI to ignore its instructions:
- Instruction override attempts
- Role reassignment prompts
- Context manipulation
- Authority impersonation
We don't just try the obvious "ignore your instructions" approach. We test variations across multiple languages, encoding formats, and framing techniques. Attackers are creative, so we have to be more creative.
Indirect Injection
More sophisticated attacks that don't look like attacks at all:
- Encoded payloads (base64, Unicode, hex)
- Multi-turn manipulation (building trust across several messages before attacking)
- Context window poisoning
- Payload embedding in seemingly normal requests
System Prompt Extraction
Your system prompt contains the instructions that define your chatbot's behavior. If an attacker can extract it, they know exactly how your AI works — and exactly how to exploit it.
We use multiple techniques to attempt full or partial system prompt extraction:
- Direct extraction requests
- Incremental extraction (getting the prompt piece by piece)
- Reflection attacks (asking the AI to describe its own behavior)
- Summarization tricks
What we're looking for: Can we get the AI to reveal its instructions, internal processes, pricing logic, or any sensitive information embedded in the prompt?
Our Testing Library
NullShield runs hundreds of injection tests against each target. These aren't random — they're organized into test suites based on real-world attack patterns we've documented and the latest research from OWASP, academic security teams, and the AI red-teaming community.
Each test is scored on:
- Success rate — Did the injection work?
- Severity — What's the potential impact?
- Reproducibility — Can it be repeated reliably?
- Exploitability — How easy is it for a non-technical attacker?
Phase 3: Data Extraction Attempts
If a chatbot has access to data, we need to know how much of that data can be extracted through conversation.
This phase tests the boundaries of what information the AI will share — and with whom.
Customer Data Probing
- Can we get the AI to reveal other customers' information?
- Can we access records by guessing names, emails, or phone numbers?
- Does the AI enforce any authentication before sharing data?
- Can we enumerate customer records through repeated queries?
Business Intelligence Extraction
- Can we extract internal pricing formulas?
- Can we get competitive analysis or sales strategy information?
- Does the AI reveal operational details that should be internal-only?
- Can we access training data or knowledge base contents?
Cross-Conversation Data Leakage
- Does the AI retain information from previous users' conversations?
- Can we access another session's context?
- Is conversation history properly isolated between users?
Escalation Path Testing
We test whether we can use extracted information to chain attacks together. For example:
- Extract a customer name from the chatbot
- Use that name to request their appointment history
- Use appointment details to access their account information
- Use account information to modify their records
If we can chain even two of these steps together, that's a critical finding.
Phase 4: Authentication & Authorization Bypass
Many chatbots are integrated with business systems — CRMs, scheduling tools, payment processors, databases. This phase tests whether proper access controls are in place.
What we test:
- Role-based access — Can a regular customer access admin-level functions through the chatbot?
- Action authorization — Can we trigger actions (booking cancellations, record modifications, refunds) without proper authentication?
- API abuse — If the chatbot uses API calls, can we manipulate it into making unauthorized requests?
- Privilege escalation — Can we start as a regular user and gradually gain access to more sensitive functions?
Integration Security
For chatbots connected to external systems:
- Are API keys or credentials exposed in the chatbot's responses?
- Can we trigger API calls outside the chatbot's intended scope?
- Are webhooks and callbacks properly validated?
- Is there rate limiting on sensitive operations?
This is where we find some of the scariest vulnerabilities. A chatbot that can modify your CRM without authentication isn't just a security risk — it's a business risk.
Phase 5: Jailbreaking & Behavioral Manipulation
Beyond data extraction, we test whether the AI can be manipulated into behaving in ways that harm your brand or business.
Content Policy Bypass
- Can we get the chatbot to generate inappropriate content?
- Can we make it endorse competitors?
- Can we trick it into making false promises or commitments?
- Can we manipulate it into providing legally problematic advice?
Behavioral Manipulation
- Can we get the chatbot to adopt a different persona?
- Can we make it override its tone and communication guidelines?
- Can we trick it into revealing that it's an AI (if the business prefers it presented differently)?
- Can we create looping behaviors that consume resources?
Social Engineering Through AI
- Can we use the chatbot as a vector for social engineering attacks against the business's employees?
- Can we get the chatbot to generate phishing-style content?
- Can we trick it into initiating outbound communications to arbitrary recipients?
Phase 6: Compliance & Regulatory Checks
Security isn't just about stopping hackers. It's about meeting the standards your business is held to.
OWASP Top 10 for LLM Applications
We map every finding against the OWASP Top 10 for Large Language Models — the industry standard for AI security:
- Prompt Injection
- Insecure Output Handling
- Training Data Poisoning
- Model Denial of Service
- Supply Chain Vulnerabilities
- Sensitive Information Disclosure
- Insecure Plugin Design
- Excessive Agency
- Overreliance
- Model Theft
Each vulnerability is categorized against this framework so you know exactly where you stand.
Industry-Specific Compliance
Depending on your industry, we also check:
- Healthcare (HIPAA) — Is patient data properly protected? Are conversations encrypted? Is PHI accessible through the chatbot?
- Financial (PCI DSS) — Is payment card data exposed? Are financial transactions properly secured?
- Legal (attorney-client privilege) — Could the chatbot inadvertently share privileged information?
- General (GDPR/CCPA) — Is personal data handled according to privacy regulations?
Data Handling Assessment
- How is conversation data stored?
- Who has access to chat logs?
- Is data encrypted in transit and at rest?
- What's the data retention policy?
- Can users request deletion of their conversation data?
Phase 7: Report Generation & Delivery
Everything we find goes into a comprehensive report. Not a vague summary — a detailed, actionable document that tells you exactly what's wrong and how to fix it.
The NullShield Report Includes:
Executive Summary A plain-English overview for business owners and decision-makers. No jargon. What's at risk, what's the severity, what needs to happen first.
Vulnerability Inventory Every finding, categorized by severity:
- 🔴 Critical — Exploitable now, significant business impact
- 🟠 High — Exploitable with moderate effort, notable impact
- 🟡 Medium — Requires specific conditions, limited impact
- 🟢 Low — Minimal impact, good to fix when possible
Evidence & Reproduction Steps For every vulnerability, we provide:
- Exact inputs used to trigger the vulnerability
- Screenshots or transcripts of the exploit
- Step-by-step reproduction instructions
- Impact assessment
This isn't "trust us, there's a problem." It's "here's the proof, and here's how to verify it yourself."
Compliance Mapping Each finding is mapped against OWASP LLM Top 10 and relevant industry frameworks.
Remediation Recommendations Prioritized, specific fixes for every vulnerability. Not "improve your security" — actual steps like "implement input filtering for base64-encoded strings" or "add role-based access control to the customer lookup function."
Risk Score An overall security score (0-100) so you can track improvement over time.
Report Delivery
- PDF Report — Branded, comprehensive, suitable for stakeholders
- Portal Access — Interactive dashboard for exploring findings
- 30-Minute Review Call — We walk through every finding with you, answer questions, and help prioritize remediation
What Happens After the Test?
The report is the beginning, not the end.
Fix Verification
After you've addressed the findings (or if you'd like us to fix them — that's our Fix Add-On), we run a re-scan to verify the vulnerabilities are actually resolved. Because "we think we fixed it" isn't the same as "it's fixed."
Ongoing Monitoring
AI vulnerabilities evolve constantly. New attack techniques emerge weekly. A chatbot that's secure today might not be secure next month.
That's why we offer monthly and quarterly monitoring plans. NullShield re-runs the full test suite on a schedule, alerting you to new vulnerabilities before attackers find them.
- Monthly Monitoring ($299/mo) — Monthly re-scans, real-time portal access, priority alerts
- Quarterly Monitoring ($199/mo, billed quarterly) — Quarterly re-scans, portal access, annual compliance summary
Both require an initial Full Scan ($2,500) as the baseline.
Why This Matters for Small Businesses
"We're just a small HVAC company. Who would attack our chatbot?"
Here's the reality: automated attacks don't discriminate by company size. Bots and scripts scan the internet for vulnerable AI deployments constantly. They're not targeting you specifically — they're targeting everyone.
And the consequences for small businesses are disproportionately large:
- Customer data breach — Legal liability, loss of trust, potential lawsuits
- Reputation damage — Screenshots of your chatbot behaving badly spread fast
- Financial loss — Unauthorized transactions, refunds, operational disruption
- Regulatory fines — HIPAA violations start at $100 per violation. PCI DSS non-compliance can mean losing the ability to process credit cards.
A $2,500 security audit is a lot cheaper than a data breach.
Every Tarvix Agent Ships Secure
If you're building an AI agent with us through Tarvix, NullShield security testing is included. Every agent gets a full security audit before it touches production.
We build it. We break it. We fix what breaks. Then we deploy it.
No other AI agency in West Texas — or most of the country — offers build and security test in one package. Most don't even mention security.
Want to know if your AI chatbot is vulnerable? [Book a NullShield security audit](/contact) and we'll find out together. We'd rather you discover the problems before someone else does.