Building a Secure, AI-Powered Blog Platform on Cloud Run — From Zero to Production
A security-first serverless platform that publishes via API, narrates posts with AI, generates its own social cards, and ships a hidden, post-grounded AI assistant. Here's how every piece fits — and the bugs that shaped it.
FastAPICloud RunTerraform GeminiGoogle ADKGCS Cloud BuildCloud ArmorSecurity
1. Why I Built This
I wanted a blog I could publish to by automation — drop an HTML file at an endpoint and have it go live, daily, without touching a CMS. But "just a blog" quickly became a canvas for everything I care about as a Cloud Architect, Security Specialist, and AI Architect: how do you make an upload endpoint safe? How do you add AI without exploding cost? How do you run a public AI agent that can't be turned against you?
The result is a serverless platform where security is a first-class concern at every boundary, AI features run at publish time (not per-view), and the whole thing is reproducible from Terraform and shipped through CI/CD.
2. Architecture
GitHub ──push──▶ Cloud Build ──build/push──▶ Artifact Registry
│ deploy │ pull
Custom Domain ▼ ▼
blog.domain ─▶ Global LB + Cloud Armor (WAF + rate limit)
│ internal-ingress-only
▼
┌──────────────────────────────────────┐
│ Cloud Run (FastAPI) │
│ public blog · auth upload API · │
│ audio · social cards · hidden agent │
└───┬──────────┬───────────┬────────────┘
│ │ │
┌─────▼───┐ ┌───▼─────┐ ┌──▼──────────────┐
│ GCS │ │ Secret │ │ Vertex AI │
│ posts/ │ │ Manager │ │ (Gemini) + TTS │
│ audio/ │ └─────────┘ └─────────────────┘
│ cards/ │
│ contact/├─finalize─▶ EventArc ─▶ Cloud Function ─▶ Email
└─────────┘
3. Features
| Feature | How it works |
|---|---|
| Authenticated HTML upload | POST /api/posts with an API key — designed for daily automation |
| HTML sanitization | Two-pass (BeautifulSoup decompose + bleach allowlist) — posts render in the site theme |
| Categories + aliases | Allowlisted; AI auto-resolves to ai-ml |
| Search + autosuggest | Client-side over a cached index — zero server cost |
| AI audio summaries | Gemini summary → Cloud TTS → MP3, generated at publish time |
| Auto social cards | 1200×630 og:image per post via Pillow |
| SEO | RSS, sitemap, robots, Open Graph / Twitter cards |
| Hidden AI assistant | Token-gated chat grounded to one post |
| Contact form | Honeypot + rate limit → GCS → Cloud Function email |
| Modern UI | Dark/light, animations, mobile nav, syntax highlighting, a11y |
4. Security: First-Class, Not Bolted On
Every boundary has a control. The threat I cared most about: a public upload endpoint and a public AI endpoint are both attractive targets.
| Control | What it does | Threat |
|---|---|---|
| API key (constant-time) | hmac.compare_digest on the upload key | Unauthorized publishing, timing attacks |
| Secret Manager | Secrets injected at runtime, never in code/state | Credential leakage |
| Two-pass sanitization | Decompose <script>/<style>/<iframe> then allowlist | Stored XSS |
| Security headers + CSP | HSTS, X-Frame-Options, script-src 'self' | XSS, clickjacking, MIME sniffing |
| LB-only ingress | *.run.app blocked; traffic must pass the WAF | WAF/rate-limit bypass |
| Cloud Armor | SQLi/XSS preconfigured rules + rate limiting | Exploit probes, DoS |
| Slug validation | Rejects ../ and absolute paths | Path traversal |
| Least-privilege SAs | Each service gets only the IAM it needs | Blast-radius containment |
| Non-root container | Runs as appuser | Container escape |
| Generic error handler | Tracebacks to logs, not clients | Information disclosure |
bleach.clean(strip=True) — which removes disallowed
tags but keeps their text content. A post's <style> block
got stripped, but all its CSS dumped into the page as visible text. The same path would surface
<script> source. Fix: a first pass decomposes dangerous elements
entirely before bleach runs. Strip ≠ remove.5. The Hidden AI Agent
The platform includes a hidden chat assistant (opened via a secret URL hash + access token) that answers questions about one post at a time. Its security is architectural: the agent has zero tools, so a prompt injection can make it say something off-policy but never do anything. The user supplies only a question; the server fetches and injects the single post.
| Layer | Mechanism | Guarantee |
|---|---|---|
| Capability starvation | Agent(tools=[]) | No data access, no actions — text only |
| Server-side grounding | One post fetched by validated slug | Can't reach other data or inject content |
| Pre-flight code gate | Regex refuses code at 0 tokens | No code generation + no cost-DoS |
| Output backstop | Code fences replaced with a refusal | Catches model disobedience |
6. Automated Publishing & CI/CD
Posting is a single authenticated API call — ideal for a daily cron that generates and uploads HTML. Infrastructure and app both ship through Cloud Build:
- App pipeline — on push to
main:test → build → push → deploy, image pinned to the git SHA (immutable). - Infra pipeline — Terraform
planon PR (read-only SA),applyon merge. - Path-filtered triggers —
app/**andinfra/**changes run independent pipelines.
# Publish a post (what the daily automation calls)
curl -X POST https://blog.domain/api/posts \
-H "X-API-Key: $KEY" \
-F "file=@post.html;type=text/html" \
-F "category=ai-ml"
7. Scalability & Reliability
Scalability
- Cloud Run auto-scales 0→N; scale-to-zero when idle
- GCS scales infinitely — no DB to outgrow
- AI runs at publish time, not per-view — cost stays flat with traffic
- Client-side search = zero server cost
- CDN-ready global LB
Reliability
- Fail-fast startup → bad config never serves traffic (auto-rollback)
- Graceful degradation → audio/card failure doesn't block publishing
- GCS versioning → accidental-delete recovery
- Immutable SHA-pinned deploys
- Tested CI gate → broken code never reaches prod
8. Problems Faced
| Problem | Root Cause | Fix |
|---|---|---|
| 500 on upload | Env var typo | Fail-fast bucket check at startup |
| Raw CSS dumped into post | bleach keeps stripped-tag content | Two-pass decompose-then-clean |
| FastAPI ↔ ADK dependency clash | starlette version conflict | Bumped FastAPI to compatible release |
| Gemini model 404 mid-build | Provider deprecated the model | Model ID → env var |
| Audio/agent routes 404 | include_router missing | Registered + route-existence test |
| Agent refusal cost 6k tokens | Refused after the LLM call | Pre-flight 0-token gate |
| UI cramped & center-aligned | One width for all content | Per-page responsive containers |
py_compile
plus the test suite — so a dropped feature or syntax error can't reach production.9. Future Scope
| Enhancement | Why |
|---|---|
| OIDC upload auth | Replace static API keys with signed identity tokens |
| Cloud CDN | Edge-cache static + pages for speed and cost |
| Hard global rate limits | Move soft per-instance limits to shared Redis/Firestore |
| Post update / drafts | Add PUT + draft workflow |
| Agent streaming (SSE) | Stream long answers instead of one blob |
| Dedicated agent service | Keep the blob image lean; isolate ADK deps |
10. References
- Cloud Run: cloud.google.com/run/docs
- FastAPI: fastapi.tiangolo.com
- Terraform Google Provider: registry.terraform.io
- Cloud Build CI/CD: cloud.google.com/build/docs
- Cloud Armor: cloud.google.com/armor/docs
- Google ADK: google.github.io/adk-docs
- Vertex AI (Gemini): cloud.google.com/vertex-ai/generative-ai
- OWASP Top 10 for LLM Apps: owasp.org/.../top-10-for-llm
- MDN — Content Security Policy: developer.mozilla.org/.../CSP
- Secret Manager: cloud.google.com/secret-manager/docs