DocPro Cloud — The Proof

Architecture

The Memory Pipeline

Every session produces knowledge. The pipeline extracts it, stores it, enforces it, compresses it, and recalls it — across every future session.

Session IDE or web session generates conversation turns

Extraction Preferences, decisions, and knowledge isolated per team member

Memory Persistent per-member storage with timestamped sessions

Enforcement Client preferences injected into every session prompt

Compression AI-driven reduction preserving critical details and preferences

Recall Topic-relevant memory assembled per session start

Call Pipeline

Voice callbacks with dynamic context generation. The team calls your phone with session-aware openers — not scripts.

context_service event_service ide_integration onboarding_service

Build Mode

Autonomous multi-milestone project execution. Subagent-per-milestone dispatch with fresh context windows.

project_service advance_milestone orchestrator session_cap

Document Engine

Professional output generation — SOWs, SOPs, user manuals, frameworks. Template-driven with brand injection.

sow_builder sop_builder manual_builder framework_gen

Real Code

Production Patterns

Not cherry-picked examples. Actual patterns from the running system that make persistent memory work.

persona_memory_service.py Memory

# Every session appends to persistent memory
# with timestamped source tracking

timestamp = datetime.now(timezone.utc)
separator = f"\n\n--- {source} ({timestamp}) ---\n"

if memory:
    existing = memory.content.strip()
    memory.content = existing + separator + new_content
    memory.updated_at = datetime.now(timezone.utc)
else:
    memory = PersonaMemory(
        user_id=user_id,
        persona_key=persona_key,
        content=new_content,
    )

Memory grows per session, per team member. Each append is timestamped and source-tagged — the system knows where every piece of knowledge came from.

context_service.py Pipeline

# Call context adapts to WHY the call is happening

def get_trigger_context(trigger_type, call_topic=None):
    if trigger_type == "manual":
        if call_topic:
            return (
                f'The user asked you to call about '
                f'a SPECIFIC topic: "{call_topic}". '
                'Lead with the topic.'
            )
    elif trigger_type == "team_blocked":
        return "The team is working on a milestone..."

Six trigger types. Each one shapes how the team member opens the call — from "you requested this" to "the build hit a blocker." Context-aware, not scripted.

persona_memory_service.py Compression

# Compression preserves critical preferences
# MUST-ENFORCE count validated before and after

before_count = content.count("MUST-ENFORCE")
compressed = await compress_with_ai(content)
after_count = compressed.count("MUST-ENFORCE")

if after_count < before_count:
    logger.warning("Compression dropped preferences — aborting")
    return None  # Never lose client preferences

# Optimistic lock prevents concurrent corruption
if memory.updated_at != lock_timestamp:
    return None  # Someone wrote while we compressed

Memory gets sharper, not just smaller. AI compression with a hard rule: never lose a client preference. Optimistic locking prevents concurrent writes from corrupting the compressed result.

project_service.py Build Mode

# Each milestone runs as an isolated subagent
# with a fresh 1M context window

async def advance_milestone(project, action, status):
    if action == "complete":
        sign_off_milestone(current)
        next_ms = kick_off_milestone(project)
        context = build_milestone_context(next_ms)
        return {"action": "continue", "context": context}
    elif action == "blocked":
        project.build_mode_paused = True
        initiate_call(persona="carl")
        return {"action": "paused"}

Build Mode dispatches autonomous agents per milestone. When blocked, Carl calls your phone with the context. When complete, the chain continues. No manual intervention.

By the Numbers

Production Metrics

Not aspirational projections. Real numbers from the running system, verifiable against the codebase.

3,000+

Hours of Building

Sessions, commits, calls, architecture — the full investment

186

Automated Tests

CI pipeline with PostgreSQL service container

73K+

Chars of Memory

Largest team member's persistent knowledge base

1,522

Commits Shipped

git log — every change tracked, every decision documented

Stack

What Powers Each Layer

Backend

Python 3.12 FastAPI SQLAlchemy asyncpg Alembic

Frontend

React 18 Vite Zustand Lucide Icons

Database

PostgreSQL 16 Fernet Encryption Async Sessions

AI Layer

Claude Opus 4.6 1M Context Streaming SSE Tool Execution

IDE

VS Code Extension MCP Server Session State Build Mode UI

Infrastructure

AWS Lightsail nginx systemd GitHub Actions CI

Verify

Trace the Architecture

Every pattern mentioned on this page has a real implementation. Here's where to look.

Orchestrator Facade Pattern

When a 4,000-line module splits into six, every consumer keeps working. The facade re-exports the public API. Zero breaking changes across 11 import sites.

call_pipeline/__init__.py → 6 submodules

Memory Compression Engine

AI-driven compression with preference count validation. If a single MUST-ENFORCE preference is lost during compression, the operation aborts and the original is preserved.

persona_memory_service.py → compress_persona_memory()

Build Mode Dispatch

Each milestone runs as an isolated subagent with a fresh context window. The orchestrator chains completion calls — when one finishes, the next begins automatically.

ide.py → advance_milestone endpoint

POST-EXTRACTION RULE

After any component extraction, every JSX identifier is grep-verified against the import list. Three extraction bugs taught this rule. It's now enforced on every split.

Enforced across all frontend decompositions

Every claim. Verified.