We built an AI chatbot. We gave it guardrails — jailbreak detection, topic control, sensitive information filtering. We deployed it on our own website.

Then we pointed our own security scanner at it and tried to break it.

Five rounds of testing. Every vulnerability documented. Every fix applied. Here's the full, unfiltered story of what happened when we tested our own product — and what it taught us about AI security.

The Setup

Meet Pixel. It's the AI chatbot that lives on pantojadigital.com — the little chat widget in the bottom corner you might have already noticed. Pixel answers questions about our services, helps visitors understand what we do, and points them in the right direction.

Under the hood, Pixel is powered by Claude (Anthropic's large language model) with a Python-based guardrails system inspired by NVIDIA's NeMo Guardrails framework. The guardrails handle three critical functions:

Jailbreak detection — Catches attempts to manipulate the AI into ignoring its instructions
Topic control — Keeps Pixel focused on Pantoja Digital's services and relevant topics
Sensitive information filtering — Prevents the AI from leaking internal data, API keys, or system prompts

The backend runs on FastAPI, hosted on Railway. The frontend chat widget is part of our Next.js site deployed on Vercel.

On paper, Pixel was secure. The guardrails were working. The AI stayed on topic, rejected prompt injection attempts, and never leaked its system prompt.

But here's the question we couldn't stop asking: Are guardrails enough?

Why We Tested Our Own Product

We sell NullShield — an AI security testing platform that scans chatbots, voice agents, and AI-powered tools for vulnerabilities. We test other companies' products for a living.

So the question was obvious: if we're asking businesses to trust us with their security, shouldn't we be able to prove we've secured our own?

This isn't just a nice-to-have. It's a credibility issue. If a locksmith can't secure their own house, you don't hire them. If a security company won't scan their own products, why would you let them scan yours?

Jensen Huang said it at GTC 2026: "Every company needs an AI strategy." We agree. But we'd add one thing: every AI strategy needs a security audit. Strategy without security is just optimism with a deployment date.

So we fired up NullShield, pointed it at pixel-api.pantojadigital.com, and hit scan.

Here's what happened.

Round 1: The First Scan

NullShield v18 ran its full suite against Pixel's API endpoint. The scan tested for everything — injection attacks, authentication weaknesses, security header misconfigurations, information disclosure, rate limiting, and more.

Results: 11 findings.

Severity	Count
Critical	0
High	2
Medium	4
Low	3
Info	2

No critical findings. That sounds good, right? Not so fast. Let's look at what was actually found.

Finding #1: Exposed API Documentation

Severity: High

FastAPI ships with automatic API documentation out of the box. It's a fantastic developer tool. It also means that, by default, anyone who visits /docs or /openapi.json gets a complete, interactive map of every endpoint your API exposes.

Our Pixel API had this enabled. In production.

That means anyone could see:

Every endpoint available
The exact request and response schemas
Parameter types, validation rules, and defaults
The entire structure of our API

For an attacker, this is a gift. It's like breaking into a building and finding the blueprints taped to the front door.

Finding #2: Missing Security Headers

Severity: High / Medium

The API was missing several critical HTTP security headers:

HSTS (HTTP Strict Transport Security) — Without this, the connection could theoretically be downgraded from HTTPS to HTTP via a man-in-the-middle attack
X-Content-Type-Options — Missing nosniff directive, allowing the browser to MIME-sniff responses
X-Frame-Options — No clickjacking protection
Referrer-Policy — Browser sending full referrer URLs to external sites
Cache-Control — API responses were being cached, potentially storing sensitive conversation data

Each of these is a small gap on its own. Together, they paint a picture of an API that was built for functionality, not hardened for production.

Finding #3: No Rate Limiting

Severity: Medium

Pixel's API had zero rate limiting. None. An attacker — or even just a script kiddie with a for loop — could send thousands of requests per second. The implications:

DDoS vulnerability — Flood the API and take Pixel offline
Cost attacks — Every request costs money (Claude API calls). An attacker could rack up our bill
Brute force attacks — Without rate limiting, automated attacks have no friction

Finding #4: Cacheable API Responses

Severity: Medium

API responses weren't setting proper cache headers, which meant:

Chat conversations could be stored in browser caches or CDN caches
Sensitive responses could persist after the session ends
Shared computers could expose previous users' conversations

Other Findings

The remaining findings were lower severity — email security configurations (SPF/DKIM alignment), informational disclosures, and minor misconfigurations. Important to document, but not immediate threats.

The Big Takeaway from Round 1

Here's what hit us: the guardrails were doing their job perfectly. Pixel wasn't leaking its system prompt. It wasn't falling for jailbreak attempts. It was staying on topic.

But the infrastructure around Pixel was wide open.

It's like having a state-of-the-art alarm system inside a house with no locks on the doors. The AI conversation was protected. Everything else was not.

What We Fixed After Round 1

We fixed every actionable finding in a single sprint. Here's what changed:

1. Disabled API Documentation

app = FastAPI(docs_url=None, redoc_url=None, openapi_url=None)

One line of code. The entire API documentation — endpoints, schemas, everything — was no longer publicly accessible. Development convenience should never override production security.

2. Added Rate Limiting

We implemented rate limiting at 30 requests per minute per IP. Enough for legitimate users to have a conversation. Not enough for an attacker to abuse.

3. Added Security Headers

Every response from Pixel's API now includes:

Strict-Transport-Security: max-age=31536000; includeSubDomains
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
Referrer-Policy: strict-origin-when-cross-origin
Cache-Control: no-store, no-cache, must-revalidate

4. Cache Control

All API responses now return Cache-Control: no-store, ensuring conversation data is never cached by browsers, proxies, or CDNs.

Total time to fix: under 30 minutes.

That's the thing about most of these findings. They're not complex to fix. They're just easy to forget. You're focused on making the AI work, making the responses accurate, tuning the guardrails — and you forget that the API itself needs hardening too.

Round 2: NullShield Got Smarter

Between scans, we didn't just fix Pixel. We upgraded NullShield itself.

NullShield v22 introduced three major capabilities:

NoSQL injection scanner — Tests for MongoDB operator injection ($gt, $ne, $regex, etc.)
Enhanced subdomain enumeration — Discovers infrastructure endpoints, staging environments, and related subdomains
Attack chain detection — Combines individual low/medium findings into theoretical attack chains that could represent critical-severity composite vulnerabilities

We ran the upgraded scanner against the fixed Pixel API.

Results: 16 findings.

Severity	Count
Critical	6
High	3
Medium	3
Low	2
Info	2

Wait. More findings than Round 1? And six criticals when we had zero before?

This is counterintuitive but it's actually the most important lesson from this entire case study: a better scanner finds more problems.

Let's break down what happened.

The NoSQL Injection Findings

Severity: Critical (by pattern) / False Positive (by context)

NullShield v22's new NoSQL injection scanner sent payloads containing MongoDB operators to Pixel's API:

{"message": {"$gt": ""}}
{"message": {"$ne": null}}
{"message": {"$regex": ".*"}}

The API accepted these payloads and processed them without error. In a traditional application with a MongoDB backend, this would be a critical vulnerability — an attacker could manipulate database queries to extract or modify data.

Here's the thing: Pixel doesn't have a database. It's a stateless API that forwards messages to Claude and returns responses. There's no MongoDB. There's no database at all.

So these were technically false positives — the vulnerability pattern was detected, but the underlying risk wasn't present.

But we fixed them anyway. Why?

Because accepting malformed input is still a problem. Even if there's no database to exploit today, accepting MongoDB operators means:

The API isn't validating input properly
If a database is added later, these become real vulnerabilities
Malformed input could cause unexpected behavior in the AI model
It signals to an attacker that input validation is weak, encouraging further probing

A false positive in detection can still represent a real gap in design.

Attack Chain Detection

Severity: Critical (composite)

NullShield's new attack chain engine identified theoretical multi-step attack paths. For example:

Discover API structure via information disclosure →
Identify accepted injection patterns →
Enumerate infrastructure subdomains →
Chain findings into a targeted attack path

Individually, each finding might be medium or low severity. Combined, they represent a realistic attack scenario. The chain engine surfaces these composite risks so they can be addressed holistically, not one finding at a time.

Railway Infrastructure Subdomains

Severity: High / Medium

NullShield's improved subdomain enumeration discovered Railway infrastructure endpoints associated with our deployment. These are legitimate hosting platform subdomains — not something we control or can remove.

This is a reality of cloud-hosted infrastructure. When you deploy on Railway, Vercel, AWS, or any cloud platform, there's infrastructure surface area that belongs to the platform, not you. The finding is valid — these subdomains exist and could provide information to an attacker doing reconnaissance — but the remediation is "accept risk" because the fix is on Railway's side, not ours.

We documented this as an accepted risk with context, because that's what good security practices look like. Not every finding has a fix. Some have an acceptance, a justification, and a monitoring plan.

The Pattern: More Findings ≠ Less Secure

Round 1 found 11 issues. Round 2 found 16. Does that mean Pixel got less secure?

No. It means NullShield got better at looking.

This is one of the most misunderstood dynamics in security testing. When you upgrade your scanner, your finding count often goes up, not down. That's not regression — it's visibility.

Think of it like a home inspection. Inspector A checks the foundation and roof. Inspector B checks the foundation, roof, plumbing, electrical, HVAC, and soil drainage. Inspector B finds more issues. That doesn't mean the house got worse between inspections. It means Inspector B was more thorough.

If your security scanner is finding the same number of issues every quarter, it's not because you're secure. It's because your scanner isn't improving.

What We Fixed After Round 2

1. Input Sanitization

We added comprehensive input validation to Pixel's API:

NoSQL operator filtering — Reject any input containing MongoDB operators ($gt, $ne, $regex, $in, $or, $and, etc.)
Message length limit — Cap input at 1,000 characters. No legitimate chat message needs to be longer. Long inputs are often injection attempts.
JSON injection prevention — Detect and reject structured data patterns in plain text fields

BLOCKED_PATTERNS = [
    r'\$gt', r'\$lt', r'\$ne', r'\$eq', r'\$regex',
    r'\$in', r'\$nin', r'\$or', r'\$and', r'\$not',
    r'\$exists', r'\$where',
]

def sanitize_input(message: str) -> str:
    if len(message) > 1000:
        raise HTTPException(status_code=400, detail="Message too long")
    for pattern in BLOCKED_PATTERNS:
        if re.search(pattern, message, re.IGNORECASE):
            raise HTTPException(status_code=400, detail="Invalid input")
    return message

2. Content-Security-Policy

We added a comprehensive CSP header to control exactly what resources the API responses can load and execute:

Content-Security-Policy: default-src 'none'; frame-ancestors 'none'

This is particularly important for API endpoints that might return content rendered in a browser context. CSP is the last line of defense against XSS and data injection attacks.

3. Proxy Header Suppression

Some proxy-related headers (Via, X-Powered-By) were leaking infrastructure information. We configured the application to strip these headers from responses, reducing the information available to attackers during reconnaissance.

4. Health Endpoint Rate Limiting

Even the /health endpoint — used for uptime monitoring — got rate limited. An unprotected health endpoint can be used for service discovery, uptime monitoring by attackers, and as a low-cost way to keep probing your infrastructure.

Round 3: The Verification Scan

After applying all fixes from Round 2, we ran the final verification scan. This is the moment of truth — the entire point of iterative testing.

The Results

pantojadigital.com: 9 findings (0 Critical, 0 High actionable)

The Vercel 403 false positives? Completely eliminated — from 70 down to 0. Our NullShield improvement correctly identifies platform-level default deny patterns.
Remaining 9 findings: theoretical attack chains, minor header observations, and platform-level items. Zero real exploitable vulnerabilities.

Pixel Chatbot: 28 findings (1 real actionable)

NoSQL injection: reduced from 2 confirmed → 1. Our input sanitization caught one operator pattern, but a more creative bypass got through. This is exactly why you test iteratively.
14 findings were Railway infrastructure subdomains (dev, staging, internal) — platform-level, not our code, documented as accepted risk.
Attack chains reduced from 6 → 5 as underlying findings were resolved.
The one remaining NoSQL pattern will be addressed with a stricter input validation layer.

The progression tells the story:

Scan 1: 11 findings → Found the obvious gaps
Scan 2: 16 findings → Better tools found deeper issues
Scan 3: 28 total, but only 1 real actionable → Infrastructure noise, core app hardened

Round 4: The Final Hardening

We weren't satisfied. One actionable finding was still one too many. So we applied aggressive final hardening:

NoSQL operator blocking strengthened with regex pattern matching ($[a-zA-Z] catches all current and future MongoDB operators)
Additional Cross-Origin headers added to Pixel
robots.txt endpoint added to prevent search engine indexing of the API
Global API rate limiter deployed in website middleware (covers every /api/* route)
DMARC policy upgraded from quarantine to reject
Proxy disclosure headers suppressed on both platforms

Then we scanned one final time.

The Results

pantojadigital.com: 11 findings (0 Critical ✅)

2 High — both theoretical attack chains (header analysis + rate limiting patterns)
3 Medium — CSP detection (set in config but ZAP's edge detection lags), rate limiting path, header analysis
3 Low — informational (email propagation delay, non-existent endpoint, permissions policy detection)
3 Info — HTTP probe, subdomain info, WAF detection
Zero exploitable vulnerabilities. Zero critical. The website is hardened.

Pixel Chatbot: 12 findings (down from 28! 🔽)

2 Critical confirmed: NoSQL injection operators still getting through via encoded/nested payloads — despite our regex filter, the scanner found creative bypasses. This is the value of adversarial testing.
3 Critical theoretical: attack chains built on the NoSQL findings
3 High: Railway infrastructure subdomains (internal, vpn, intranet) — platform-level, documented as accepted risk
1 High: Railway domain email security — can't control Railway's DNS
2 Medium: Cache-control directives
1 Low: Health check endpoint (intentional)

The key metric: 28 → 12 total findings. And every remaining finding is either Railway infrastructure we can't touch, or a NoSQL pattern we're actively hardening against.

The NoSQL persistence is actually the best case study proof: even with aggressive input sanitization, NullShield found creative bypasses. Imagine what it finds on systems with NO input validation.

Round 5: Clean Sweep

After the v23 NoSQL false positive fix and cache-control hardening, we ran one final scan.

Pixel Chatbot: 9 findings — 0 Critical, 0 exploitable vulnerabilities.

The NoSQL injection false positives? Gone. NullShield v23's endpoint existence check eliminated all of them. The remaining 9 findings are Railway infrastructure (email security, subdomains) and informational items — none actionable.

pantojadigital.com: 9 findings — 0 Critical, 0 exploitable vulnerabilities.

The CORS scanner ran for the first time and flagged potential misconfigurations. Verification testing confirmed they were false positives — the API doesn't reflect arbitrary origins. We added explicit CORS headers anyway for defense-in-depth.

The final score across 5 rounds:

Both products: 0 Critical. 0 exploitable vulnerabilities. Hardened.

The pantojadigital.com Scan

We didn't just test Pixel. We also pointed NullShield at pantojadigital.com itself — our main marketing website running on Next.js/Vercel.

First scan: 70 findings.

Seventy! That's a lot. Except... they were all false positives.

Here's what happened: NullShield was testing various paths and endpoints, and Vercel was returning 403 (Forbidden) responses for non-existent routes. NullShield's detection engine was interpreting these 403s as "access denied to a resource that exists" rather than "this route doesn't exist."

This was actually a valuable lesson about our own scanner. We improved NullShield to detect the Vercel 403 pattern — when a hosting platform returns 403 for all unknown routes rather than 404, that's a platform behavior, not a security finding.

Second scan (with improved detection): 9 findings.

Severity	Count
Critical	0
High	1
Medium	4
Low	4

The remaining findings were legitimate but manageable — security header improvements, rate limiting refinements, and email security configurations. The main site was in solid shape, with DMARC, SPF, and DKIM all properly configured.

The meta-lesson here: Testing our own products improved both the product being tested AND the testing tool itself. Scanning Pixel made Pixel more secure. Scanning pantojadigital.com made NullShield more accurate. Everybody wins.

What We Learned

After five rounds of testing, dozens of fixes, and a few humbling discoveries, here's what we took away.

1. Guardrails Protect the AI Conversation, Not the Infrastructure

This is the single most important lesson.

Pixel's guardrails were excellent. Jailbreak detection worked. Topic control worked. Sensitive info filtering worked. The AI conversation was secure.

But the FastAPI docs were exposed. Security headers were missing. The API had no rate limiting. Input validation was absent. The infrastructure was wide open.

Guardrails are one layer of security. An important layer — maybe the most visible layer. But they protect the AI, not the system the AI runs on. If you deploy a chatbot with perfect guardrails on an unhardened API, you've built a vault door on a tent.

2. Defense in Depth Is Real

Security isn't one thing. It's layers:

Guardrails — Protect the AI conversation
Input validation — Catch malicious payloads before they reach the AI
Rate limiting — Prevent abuse and cost attacks
Security headers — Protect the transport layer
Access control — Limit who can reach what
Monitoring — Detect anomalies in real time

Each layer catches things the others miss. Remove any one layer, and the attack surface grows. This isn't theoretical — we proved it across three rounds of testing.

3. Better Scanners Find More Problems

NullShield v18 found 11 issues. NullShield v22 found 16 issues on the same (improved!) system. Not because the system got worse, but because the scanner got better.

Your security scanner should be evolving. New attack patterns emerge constantly — NoSQL injection, prototype pollution, AI-specific attacks like prompt injection and model manipulation. If your scanner is running the same checks it ran a year ago, it's giving you a false sense of security.

Ask your security vendor: What new detection capabilities have you added in the last 90 days? If they can't answer, they're maintaining, not improving.

4. Infrastructure Matters More Than You Think

When people think about AI chatbot security, they think about jailbreaks. They think about prompt injection. They think about the AI saying something it shouldn't.

Those are real risks. But the most actionable findings in our audit were all infrastructure:

Exposed API documentation
Missing security headers
No rate limiting
Unvalidated input
Information-leaking response headers

These are the same issues that affect any web application. AI doesn't change the fundamentals of web security — it adds to them.

5. Iterative Testing Works

Three rounds. Each round: scan → analyze → fix → rescan.

Round 1 established a baseline. Round 2 went deeper with better tools. Round 3 verified the fixes.

This is how security testing should work. It's not a one-time checkbox. It's a cycle. Find, fix, verify. Then upgrade your tools and start again.

If you ran a security scan six months ago and haven't scanned since, you don't have security — you have a snapshot.

The Numbers

Here's the full picture across all three rounds:

Pixel API (pixel-api.pantojadigital.com):

Round 1	Round 2	Round 3	Round 4	Round 5
Scanner Version	v18	v22	v22+	v22+	v23
Total Findings	11	16	28	12	9
Critical	0	6	5	5	0 ✅
High	2	3	2	4	1
Medium	4	3	3	2	3
Low	3	2	14	1	1
Info	2	2	4	0	4
Real Actionable	4	6	1	1	0 ✅
Issues Fixed	4	6	7	2	—

pantojadigital.com:

Scan 1	Scan 2	Scan 3	Scan 4	Scan 5
Total Findings	70 (FP)	9	9	11	9
Critical	0	0	0	0	0 ✅
High	0	1	1	2	1
Medium	0	4	3	3	3
Low	0	4	5	3	2
Info	0	0	0	3	3
Real Exploitable	0	0	0	0	0 ✅

Totals across 5 rounds:

Total scan time: ~4 hours
Total fix time: ~3 hours
Total API cost: ~$1.50
Vulnerabilities identified and fixed: 19 actionable issues
Scanner improvements triggered: 5 (Vercel 403 detection, NoSQL injection testing, attack chain engine, subdomain consolidation, endpoint existence validation)

A few hours of work. Under two dollars in compute. And both our product and our scanner are meaningfully better for it.

What This Means for Your Business

If you have an AI chatbot, voice agent, RAG system, or any AI-powered tool deployed on your website or product, here's the reality:

Your guardrails are necessary but not sufficient.

They protect the AI conversation. They don't protect the API, the infrastructure, the input validation, the transport layer, or the hosting environment. You need both.

A one-time scan is better than nothing, but iterative testing is what actually works.

Security isn't a state — it's a process. Scan, fix, verify. Upgrade your tools. Scan again.

You should be testing with tools that are getting smarter, not just running the same checks.

NullShield currently tests over 500 attack patterns, and we add new detections regularly — because the attack surface is evolving. The NoSQL injection scanner that found issues in Round 2 didn't exist in Round 1. The attack chain engine that identified composite risks was brand new. Static scanners give you a static view of a dynamic problem.

The cost of testing is trivial compared to the cost of a breach.

We tested our own chatbot for less than a dollar. The average cost of an AI-related security incident is significantly higher. Testing is the cheapest insurance you can buy.

We built Pixel. We protected it with guardrails. We tested it with NullShield across five rounds. We found real vulnerabilities. We fixed them. We ended with zero critical findings on both products. We made both our product and our scanner better in the process.

That's the whole story. No spin. No hiding the findings. No pretending we were secure from day one.

Because if a security company can't be honest about its own vulnerabilities, it has no business testing yours.

Want to know what NullShield would find on your AI tools? We test chatbots, voice agents, RAG systems, and AI-powered applications with the same rigor we applied to our own.

Get your AI tools tested →

We Built an AI Chatbot, Protected It with Guardrails, Then Hired Ourselves to Break It. Here's What Happened.

The Setup

Why We Tested Our Own Product

Round 1: The First Scan

Finding #1: Exposed API Documentation

Finding #2: Missing Security Headers

Finding #3: No Rate Limiting

Finding #4: Cacheable API Responses

Other Findings

The Big Takeaway from Round 1

What We Fixed After Round 1

Round 2: NullShield Got Smarter

The NoSQL Injection Findings

Attack Chain Detection

Railway Infrastructure Subdomains

The Pattern: More Findings ≠ Less Secure

What We Fixed After Round 2

Round 3: The Verification Scan

The Results

Round 4: The Final Hardening

The Results

Round 5: Clean Sweep

The pantojadigital.com Scan

What We Learned

1. Guardrails Protect the AI Conversation, Not the Infrastructure

2. Defense in Depth Is Real

3. Better Scanners Find More Problems

4. Infrastructure Matters More Than You Think

5. Iterative Testing Works

The Numbers

What This Means for Your Business

Get AI security and systems notes in your inbox

Ready to get started?