From 62651b4d2eba620fcfabb01d58fa5660e1e3b85d Mon Sep 17 00:00:00 2001
From: xcaliber <jasafpro@gmail.com>
Date: Tue, 13 Jan 2026 18:44:41 +0000
Subject: [PATCH] Add Multi agent orchestration design.md

---
 Multi agent orchestration design.md | 1692 +++++++++++++++++++++++++++
 1 file changed, 1692 insertions(+)
 create mode 100644 Multi agent orchestration design.md

diff --git a/Multi agent orchestration design.md b/Multi agent orchestration design.md
new file mode 100644
index 0000000..80746d9
--- /dev/null
+++ b/Multi agent orchestration design.md	
@@ -0,0 +1,1692 @@
+# Multi-Agent Orchestration System Design v2.0
+
+**Version:** 2.0.0  
+**Author:** Jeff Smith + Claude  
+**Date:** January 11, 2026  
+**Status:** RFC (Request for Comments)  
+**Data Source:** Venice.ai billing history Dec 12, 2025 - Jan 11, 2026 (13,582 transactions)
+
+---
+
+## Executive Summary
+
+This document describes the architecture for a multi-model AI development system built on Open WebUI, Venice.ai, and Gitea. Based on 30 days of actual billing data, we demonstrate that full NPE automation costs **0.21 DIEM/day** (2.6% of the 8.1 DIEM daily budget), leaving **7.89 DIEM** for interactive work.
+
+**Core Thesis:** Your actual usage patterns prove orchestrated automation is not just feasible—it's virtually free. The real challenge is context management and cache optimization, not model costs.
+
+---
+
+## Table of Contents
+
+1. [Billing Data Analysis](#1-billing-data-analysis)
+2. [Model Selection Strategy](#2-model-selection-strategy)
+3. [Context Management](#3-context-management)
+4. [Tool Architecture](#4-tool-architecture)
+5. [NPE Personas & Roles](#5-npe-personas--roles)
+6. [Cron & Scheduling](#6-cron--scheduling)
+7. [Workflow Patterns](#7-workflow-patterns)
+8. [Cost Management](#8-cost-management)
+9. [Implementation Roadmap](#9-implementation-roadmap)
+10. [Open Questions](#10-open-questions)
+
+---
+
+## 1. Billing Data Analysis
+
+### 1.1 30-Day Summary
+
+```
+Period:           Dec 12, 2025 - Jan 11, 2026
+Total Records:    13,582 billing events
+Total Spend:      61.24 DIEM + 2.86 USD
+Days Active:      18 days
+Average Daily:    3.56 DIEM/day
+Max Daily:        9.96 DIEM (Jan 2, 2026)
+Median Daily:     3.72 DIEM/day
+```
+
+**Budget Analysis:**
+- Daily Budget: 8.1 DIEM (staked)
+- Average Spend: 3.56 DIEM/day
+- **Average Surplus: 4.54 DIEM/day (56% unutilized)**
+- This surplus is lost at 19:00 EST reset
+
+### 1.2 Spend by Model Family
+
+```
+┌────────────────────────────────────────────────────────────────────┐
+│                    ACTUAL SPEND BY MODEL (30 days)                 │
+├────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│  GLM-4.6         ████████████████████████████  16.66 DIEM (26.0%)  │
+│  Qwen            ████████████████████████      15.60 DIEM (24.3%)  │
+│  Claude-Opus-4.5 ██████████████████            11.86 DIEM (18.5%)  │
+│  Grok-41-Fast    ██████████                     6.71 DIEM (10.5%)  │
+│  Image-Gen       █████████                      6.01 DIEM  (9.4%)  │
+│  MiniMax-M21     █████                          3.31 DIEM  (5.2%)  │
+│  Kimi-K2         ██                             1.53 DIEM  (2.4%)  │
+│  GLM-4.7         ██                             1.25 DIEM  (2.0%)  │
+│  Other           █                              1.17 DIEM  (1.7%)  │
+│                                                                     │
+│  TOTAL                                         64.10 DIEM          │
+│                                                                     │
+└────────────────────────────────────────────────────────────────────┘
+```
+
+### 1.3 Actual Effective Rates (DIEM per 1M Tokens)
+
+From your billing data, sorted by cost efficiency:
+
+| Model | Input Rate | Output Rate | Cache Rate | **Effective Rate** | Calls | Total Cost |
+|-------|------------|-------------|------------|-------------------|-------|------------|
+| Qwen-Instruct | $0.071 | $0.27 | - | **0.0626** | 408 | 0.92 |
+| Grok-Code | $0.140 | $1.87 | $0.030 | **0.0721** | 611 | 1.67 |
+| DeepSeek | $0.329 | $1.00 | $0.200 | **0.1825** | 36 | 0.10 |
+| Grok-41-Fast | $0.314 | $1.25 | $0.125 | **0.1857** | 1,152 | 5.03 |
+| MiniMax-M21 | $0.316 | $1.60 | $0.040 | **0.2232** | 360 | 3.31 |
+| Qwen-Thinking | $0.450 | $3.50 | - | **0.3339** | 105 | 1.61 |
+| Qwen-Coder | $0.750 | $3.00 | - | **0.3830** | 530 | 12.99 |
+| Kimi-K2-Thinking | $0.595 | $3.20 | $0.375 | **0.4152** | 111 | 1.53 |
+| GLM-4.6 | $0.850 | $2.75 | - | **0.4455** | 1,724 | 16.66 |
+| Claude-Opus-4.5 | $6.000 | $30.00 | - | **5.2751** | 78 | 11.86 |
+
+**Key Insights:**
+1. **Grok is 28× cheaper than Claude** per token
+2. **Qwen-Instruct is 84× cheaper than Claude** for bulk work
+3. **Cache hits reduce Grok input costs by 75%**
+4. Claude is only 78 calls but 18.5% of total spend
+
+### 1.4 Context Size Distribution
+
+Your actual context sizes reveal optimization opportunities:
+
+```
+GROK-41-FAST (1,152 calls):
+  Median: 5,291 tokens | P75: 7,217 | Max: 582,506
+  Distribution: 0-5k: 1671 | 5-10k: 1245 | 10-20k: 485 | 20-50k: 23 | 50k+: 17
+  ✓ WELL MANAGED - 85% of calls under 10k tokens
+
+KIMI-K2-THINKING (111 calls):
+  Median: 5,164 tokens | P75: 17,797 | Max: 48,893
+  Distribution: 0-5k: 144 | 5-10k: 39 | 10-20k: 83 | 20-50k: 34
+  ⚠ CONTEXT BLEEDING - P75 jumps to 18k, needs pruning
+
+MINIMAX-M21 (360 calls):
+  Median: 10,090 tokens | P75: 23,071 | Max: 169,708
+  Distribution: 0-5k: 211 | 5-10k: 202 | 10-20k: 181 | 20-50k: 198 | 50k+: 38
+  ⚠ HEAVY CONTEXTS - 11% of calls over 50k tokens
+
+QWEN-CODER (530 calls):
+  Median: 17,016 tokens | P75: 47,601 | Max: 253,462
+  Distribution: 0-5k: 340 | 5-10k: 124 | 10-20k: 82 | 20-50k: 284 | 50k+: 230
+  ⚠ CODE CONTEXTS ARE LARGE - expected but optimize where possible
+
+CLAUDE-OPUS-4.5 (78 calls):
+  Median: 7,198 tokens | P75: 11,956 | Max: 51,368
+  ✓ REASONABLE - given high cost, context is well-controlled
+```
+
+### 1.5 Cache Efficiency Analysis
+
+```
+┌────────────────────────────────────────────────────────────────────┐
+│                    CACHE HIT RATES BY MODEL                        │
+├────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│  GROK-CODE:     ████████████████████████████████████  67.8%        │
+│                 Saved: $1.08 | Cache rate: $0.030 vs $0.250 full   │
+│                                                                     │
+│  KIMI-K2:       █████                                   9.8%        │
+│                 Saved: $0.05 | Cache rate: $0.375 vs $0.750 full   │
+│                                                                     │
+│  GROK-41-FAST:  █████                                   8.8%        │
+│                 Saved: $0.27 | Cache rate: $0.125 vs $0.500 full   │
+│                                                                     │
+│  MINIMAX-M21:   ████                                    7.2%        │
+│                 Saved: $0.16 | Cache rate: $0.040 vs $0.400 full   │
+│                                                                     │
+└────────────────────────────────────────────────────────────────────┘
+```
+
+**Grok-Code's 67.8% cache hit rate** demonstrates what's possible with sequential operations using the same system prompt and context.
+
+### 1.6 Hourly Usage Pattern (EST)
+
+```
+Hour (EST) | Spend    | Activity Level
+───────────┼──────────┼────────────────────────────────
+  00-03    |   0.00   | 💤 Dead
+  04-05    |   0.84   | 🌅 Early morning
+  06-08    |   9.86   | ☕ Morning peak
+  09-11    |   3.57   | 📊 Late morning
+  12-13    |   7.34   | 🍽️ Lunch peak
+  14-15    |   8.13   | 💻 Afternoon work
+  16-17    |  14.25   | 🔥 PEAK (4-6pm)
+  18-19    |   8.13   | 🌆 Evening
+  20-21    |   9.93   | 🌙 Night session
+  22-23    |   0.00   | 💤 Dead
+
+AUTOMATION WINDOW: 22:00 - 07:00 EST (9 hours of minimal usage)
+```
+
+---
+
+## 2. Model Selection Strategy
+
+### 2.1 Tier Architecture (Data-Driven)
+
+Based on actual billing patterns:
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                    MODEL SELECTION PYRAMID                          │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                      │
+│                        ┌─────────────┐                              │
+│                        │   TIER 4    │  Claude Opus 4.5             │
+│                        │   ORACLE    │  5.28 DIEM/1M effective      │
+│                        │    <2%      │  Architecture, Security,     │
+│                        └──────┬──────┘  Deadlock Resolution         │
+│                               │         Budget: 0.05 DIEM/day       │
+│                               │                                      │
+│                     ┌─────────┴─────────┐                           │
+│                     │      TIER 3       │  Kimi-K2, Qwen-Thinking   │
+│                     │    REASONING      │  0.33-0.42 DIEM/1M        │
+│                     │       5%          │  Complex analysis,        │
+│                     └─────────┬─────────┘  Multi-step reasoning     │
+│                               │            Budget: 0.10 DIEM/day    │
+│                               │                                      │
+│               ┌───────────────┴───────────────┐                     │
+│               │           TIER 2              │  MiniMax, DeepSeek  │
+│               │         BALANCED              │  0.18-0.22 DIEM/1M  │
+│               │           15%                 │  Standard tasks,    │
+│               └───────────────┬───────────────┘  Code generation    │
+│                               │                  Budget: 0.50 DIEM  │
+│                               │                                      │
+│  ┌────────────────────────────┴────────────────────────────┐        │
+│  │                       TIER 1                             │        │
+│  │                    WORKHORSES                            │        │
+│  │                       78%                                │        │
+│  │  Grok-41-Fast: 0.19 DIEM/1M | Grok-Code: 0.07 DIEM/1M  │        │
+│  │  Qwen-Instruct: 0.06 DIEM/1M                            │        │
+│  │  Routing, Quick checks, Bulk processing, PM tasks       │        │
+│  │  Budget: Remaining (~7.45 DIEM/day)                     │        │
+│  └─────────────────────────────────────────────────────────┘        │
+│                                                                      │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+### 2.2 Model Selection Matrix
+
+| Task Type | Primary Model | Fallback | Cost/Call | Rationale |
+|-----------|---------------|----------|-----------|-----------|
+| **Routing/Dispatch** | Grok-41-Fast | Qwen-Instruct | 0.002 | Cheapest with cache |
+| **PM Coordination** | Grok-41-Fast | MiniMax | 0.003 | Simple decisions |
+| **Code Generation** | Grok-Code | Grok-41-Fast | 0.005 | 67% cache hits |
+| **Code Review** | Grok-41-Fast | DeepSeek | 0.004 | Good reasoning |
+| **Bulk Processing** | Qwen-Instruct | Grok-41-Fast | 0.001 | 0.06 DIEM/1M |
+| **Complex Reasoning** | Kimi-K2 | Qwen-Thinking | 0.012 | Thinking models |
+| **Architecture** | Claude-Opus | Kimi-K2 | 0.054 | Only when needed |
+| **Security Review** | Claude-Opus | - | 0.054 | Non-negotiable |
+
+### 2.3 Model Selection Logic
+
+```python
+from enum import Enum
+from dataclasses import dataclass
+from typing import Optional
+
+class TaskComplexity(Enum):
+    TRIVIAL = 1   # Routing, yes/no decisions
+    SIMPLE = 2    # Single-step tasks
+    MODERATE = 3  # Multi-step, needs context
+    COMPLEX = 4   # Reasoning required
+    CRITICAL = 5  # Architecture, security
+
+@dataclass
+class ModelConfig:
+    id: str
+    input_rate: float   # DIEM per 1M tokens
+    output_rate: float  # DIEM per 1M tokens
+    cache_rate: float   # DIEM per 1M tokens (0 if no cache)
+    max_context: int    # Recommended max context
+    tier: int
+
+MODELS = {
+    "qwen-instruct": ModelConfig("qwen3-235b-a22b-instruct-2507", 0.15, 0.27, 0, 32000, 1),
+    "grok-code": ModelConfig("grok-code-fast-1", 0.25, 1.87, 0.03, 16000, 1),
+    "grok-fast": ModelConfig("grok-41-fast", 0.50, 1.25, 0.125, 16000, 1),
+    "deepseek": ModelConfig("deepseek-chat", 0.50, 1.00, 0.20, 32000, 2),
+    "minimax": ModelConfig("minimax-m21", 0.40, 1.60, 0.04, 32000, 2),
+    "kimi": ModelConfig("kimi-k2", 0.75, 3.20, 0.375, 32000, 3),
+    "qwen-thinking": ModelConfig("qwen3-235b-a22b-thinking-2507", 0.45, 3.50, 0, 32000, 3),
+    "claude": ModelConfig("claude-opus-4-5", 6.00, 30.00, 0, 200000, 4),
+}
+
+def select_model(
+    task_type: str,
+    complexity: TaskComplexity,
+    budget_remaining: float,
+    context_size: int = 0
+) -> str:
+    """
+    Select optimal model based on task, complexity, and budget.
+    
+    Returns Venice model ID.
+    """
+    # Budget gates
+    if budget_remaining < 0.5:
+        return MODELS["qwen-instruct"].id  # Emergency mode
+    
+    if budget_remaining < 2.0:
+        # Low budget - force Tier 1
+        if complexity == TaskComplexity.CRITICAL:
+            return MODELS["kimi"].id  # Downgrade from Claude
+        return MODELS["grok-fast"].id
+    
+    # Task-based selection
+    selection_map = {
+        # Task type -> {complexity: model_key}
+        "routing": {
+            TaskComplexity.TRIVIAL: "qwen-instruct",
+            TaskComplexity.SIMPLE: "grok-fast",
+            TaskComplexity.MODERATE: "grok-fast",
+        },
+        "pm_coordination": {
+            TaskComplexity.TRIVIAL: "grok-fast",
+            TaskComplexity.SIMPLE: "grok-fast",
+            TaskComplexity.MODERATE: "minimax",
+            TaskComplexity.COMPLEX: "kimi",
+        },
+        "code_generation": {
+            TaskComplexity.SIMPLE: "grok-code",
+            TaskComplexity.MODERATE: "grok-code",
+            TaskComplexity.COMPLEX: "minimax",
+            TaskComplexity.CRITICAL: "kimi",
+        },
+        "code_review": {
+            TaskComplexity.SIMPLE: "grok-fast",
+            TaskComplexity.MODERATE: "grok-fast",
+            TaskComplexity.COMPLEX: "deepseek",
+            TaskComplexity.CRITICAL: "claude",
+        },
+        "architecture": {
+            TaskComplexity.MODERATE: "kimi",
+            TaskComplexity.COMPLEX: "kimi",
+            TaskComplexity.CRITICAL: "claude",
+        },
+        "security": {
+            TaskComplexity.SIMPLE: "grok-fast",
+            TaskComplexity.MODERATE: "kimi",
+            TaskComplexity.COMPLEX: "claude",
+            TaskComplexity.CRITICAL: "claude",
+        },
+    }
+    
+    task_map = selection_map.get(task_type, {})
+    model_key = task_map.get(complexity, "grok-fast")
+    
+    return MODELS[model_key].id
+
+
+def estimate_cost(model_key: str, input_tokens: int, output_tokens: int, cache_tokens: int = 0) -> float:
+    """Estimate DIEM cost for a completion."""
+    model = MODELS[model_key]
+    
+    input_cost = ((input_tokens - cache_tokens) / 1_000_000) * model.input_rate
+    cache_cost = (cache_tokens / 1_000_000) * model.cache_rate
+    output_cost = (output_tokens / 1_000_000) * model.output_rate
+    
+    return input_cost + cache_cost + output_cost
+```
+
+---
+
+## 3. Context Management
+
+### 3.1 The Context Problem
+
+Your billing data reveals context size is the **primary cost driver**:
+
+- Kimi calls at 48K tokens: **0.040 DIEM each**
+- Kimi calls at 5K tokens: **0.006 DIEM each** (6.7× cheaper)
+- Grok calls at 5K tokens: **0.003 DIEM each**
+
+**Every 10K tokens of unnecessary context costs ~0.005-0.015 DIEM.**
+
+### 3.2 Context Budget by Role
+
+```python
+# Maximum context tokens per NPE role
+CONTEXT_LIMITS = {
+    "orchestrator": 4_000,   # Minimal - just state and decisions
+    "pm": 6_000,             # Moderate - task list and status
+    "coder": 12_000,         # Larger - needs code context
+    "reviewer": 8_000,       # Moderate - diff + surrounding code
+    "editorial": 10_000,     # Article + guidelines
+}
+
+# Warning thresholds (emit warning in logs)
+CONTEXT_WARNINGS = {
+    "orchestrator": 3_000,
+    "pm": 5_000,
+    "coder": 10_000,
+    "reviewer": 6_000,
+    "editorial": 8_000,
+}
+```
+
+### 3.3 Context Management Strategy
+
+```python
+from typing import Optional
+import tiktoken
+
+class ContextManager:
+    """
+    Aggressive context pruning for cost control.
+    
+    Strategy:
+    1. Always keep: system prompt, last 2 user messages, last assistant response
+    2. Summarize: everything older than 3 exchanges
+    3. Prune: tool outputs older than 2 exchanges
+    4. Compress: code blocks to signatures only (in summaries)
+    """
+    
+    def __init__(self, model: str = "grok-fast"):
+        self.encoder = tiktoken.get_encoding("cl100k_base")
+        self.summary_model = model  # Use cheap model for summaries
+    
+    def count_tokens(self, text: str) -> int:
+        """Count tokens in text."""
+        return len(self.encoder.encode(text))
+    
+    def count_messages(self, messages: list[dict]) -> int:
+        """Count total tokens in message list."""
+        total = 0
+        for msg in messages:
+            total += self.count_tokens(msg.get("content", ""))
+            # Add overhead for role, etc.
+            total += 4
+        return total
+    
+    async def prepare_context(
+        self,
+        role: str,
+        system_prompt: str,
+        messages: list[dict],
+        force_limit: Optional[int] = None
+    ) -> tuple[str, list[dict]]:
+        """
+        Prune context to fit within role's budget.
+        
+        Returns: (possibly_modified_system_prompt, pruned_messages)
+        """
+        limit = force_limit or CONTEXT_LIMITS.get(role, 8_000)
+        warning = CONTEXT_WARNINGS.get(role, limit - 1000)
+        
+        system_tokens = self.count_tokens(system_prompt)
+        message_tokens = self.count_messages(messages)
+        total = system_tokens + message_tokens
+        
+        if total <= limit:
+            if total > warning:
+                print(f"⚠️ Context at {total} tokens (warning: {warning})")
+            return system_prompt, messages
+        
+        print(f"🔄 Context pruning: {total} -> {limit} tokens")
+        
+        # Strategy 1: Keep essential messages
+        essential = []
+        essential_tokens = 0
+        
+        # Always keep last 3 messages (2 user + 1 assistant typically)
+        for msg in messages[-3:]:
+            essential.append(msg)
+            essential_tokens += self.count_tokens(msg.get("content", "")) + 4
+        
+        remaining_budget = limit - system_tokens - essential_tokens - 500  # Buffer for summary
+        
+        if remaining_budget < 200:
+            # Can't fit summary, just use essential
+            return system_prompt, essential
+        
+        # Strategy 2: Summarize older messages
+        old_messages = messages[:-3]
+        if old_messages:
+            summary = await self._summarize_messages(old_messages, remaining_budget)
+            
+            # Prepend summary as system context
+            augmented_system = f"{system_prompt}\n\n## Previous Context Summary\n{summary}"
+            return augmented_system, essential
+        
+        return system_prompt, essential
+    
+    async def _summarize_messages(self, messages: list[dict], max_tokens: int) -> str:
+        """
+        Summarize old messages into compact form.
+        Uses Grok for cheap summarization (~0.001 DIEM).
+        """
+        # Build summary request
+        content_parts = []
+        for msg in messages[-10:]:  # Last 10 messages max
+            role = msg.get("role", "unknown")
+            text = msg.get("content", "")[:1000]  # Truncate each
+            content_parts.append(f"{role}: {text}")
+        
+        content = "\n---\n".join(content_parts)
+        
+        summary_prompt = f"""Summarize this conversation in {max_tokens // 4} words or less.
+Focus on: decisions made, current task, blockers, key context.
+Do NOT include pleasantries or meta-commentary.
+
+Conversation:
+{content[:4000]}
+
+Summary:"""
+        
+        # Call cheap model for summary
+        response = await venice_completion(
+            model=self.summary_model,
+            messages=[{"role": "user", "content": summary_prompt}],
+            max_tokens=min(max_tokens, 500)
+        )
+        
+        return response.strip()
+    
+    def compress_code_context(self, code: str, max_lines: int = 50) -> str:
+        """
+        Compress code to essential structure for context.
+        Keeps signatures, docstrings, removes implementation.
+        """
+        lines = code.split("\n")
+        
+        if len(lines) <= max_lines:
+            return code
+        
+        compressed = []
+        in_function = False
+        brace_depth = 0
+        
+        for line in lines:
+            stripped = line.strip()
+            
+            # Always keep: imports, class/function definitions, docstrings
+            if any(stripped.startswith(kw) for kw in ["import ", "from ", "class ", "def ", "async def ", '"""', "'''"]):
+                compressed.append(line)
+                if stripped.startswith(("def ", "async def ", "class ")):
+                    in_function = True
+            elif in_function and stripped.startswith(('"""', "'''")):
+                compressed.append(line)
+                if stripped.count('"""') == 2 or stripped.count("'''") == 2:
+                    compressed.append("        # ... implementation ...")
+                    in_function = False
+            elif stripped == "":
+                compressed.append("")
+        
+        return "\n".join(compressed)
+```
+
+### 3.4 Cache Optimization
+
+Your Grok-Code's 67.8% cache hit rate shows what's achievable:
+
+```python
+class CacheOptimizer:
+    """
+    Maximize Venice's prompt caching for cost savings.
+    
+    Venice caches the PREFIX of prompts. To maximize hits:
+    1. Put static content (system prompt) FIRST
+    2. Put stable context (project info) SECOND
+    3. Put variable content (current task) LAST
+    """
+    
+    @staticmethod
+    def build_cacheable_prompt(
+        system_prompt: str,
+        project_context: str,
+        task_context: str,
+        user_message: str
+    ) -> list[dict]:
+        """
+        Build message list optimized for cache hits.
+        
+        Structure:
+        1. System prompt (static) - CACHED after first call
+        2. Project context as system addendum - CACHED if unchanged
+        3. Task context as assistant message - Varies
+        4. User message - Always new
+        """
+        messages = [
+            {
+                "role": "system",
+                "content": f"{system_prompt}\n\n## Project Context\n{project_context}"
+            }
+        ]
+        
+        if task_context:
+            messages.append({
+                "role": "assistant",
+                "content": f"Current task context:\n{task_context}"
+            })
+        
+        messages.append({
+            "role": "user",
+            "content": user_message
+        })
+        
+        return messages
+    
+    @staticmethod
+    def batch_similar_tasks(tasks: list[dict]) -> list[list[dict]]:
+        """
+        Group tasks by system prompt and project to maximize cache hits.
+        
+        Running 5 code reviews for the same project sequentially
+        means prompts 2-5 get ~75% cache hits on the system prompt.
+        """
+        batches = {}
+        
+        for task in tasks:
+            key = (task.get("system_prompt_hash"), task.get("project_id"))
+            if key not in batches:
+                batches[key] = []
+            batches[key].append(task)
+        
+        return list(batches.values())
+```
+
+---
+
+## 4. Tool Architecture
+
+### 4.1 Current Tools
+
+| Tool | Version | Purpose | Used By |
+|------|---------|---------|---------|
+| `gitea_dev` | 1.1.0 | File ops, branches, PRs, issues | Coder NPEs |
+| `gitea_admin` | 1.1.0 | Teams, permissions, org management | PM NPEs |
+| `venice_info` | 1.0.0 | Model discovery, cost tracking | All NPEs |
+| `editorial_pipeline` | 1.0.0 | Content creation workflow | Editorial NPEs |
+
+### 4.2 Required New Tools
+
+#### 4.2.1 Cost Tracker Tool (`cost_tracker.py`)
+
+**Purpose:** Real-time cost monitoring and budget enforcement.
+
+```python
+class Valves(BaseModel):
+    VENICE_API_KEY: str = Field(default="", description="Venice API key")
+    DAILY_BUDGET: float = Field(default=8.1, description="Daily DIEM budget")
+    AUTOMATION_RESERVE: float = Field(default=0.5, description="Reserve for automation")
+    ESCALATION_RESERVE: float = Field(default=0.1, description="Reserve for Claude escalations")
+    WARNING_THRESHOLD: float = Field(default=0.8, description="Warn at this % of budget")
+
+class Tools:
+    async def get_balance(self) -> str:
+        """Get current Venice DIEM balance."""
+    
+    async def get_remaining_today(self) -> str:
+        """Get remaining budget for today (resets 19:00 EST)."""
+    
+    async def estimate_cost(self, model: str, input_tokens: int, output_tokens: int) -> str:
+        """Estimate DIEM cost for a completion."""
+    
+    async def can_afford(self, estimated_cost: float, include_reserve: bool = True) -> str:
+        """Check if operation fits within budget."""
+    
+    async def record_cost(self, amount: float, model: str, npe_id: str, project_id: str = None) -> str:
+        """Record actual cost after operation."""
+    
+    async def get_daily_report(self) -> str:
+        """Get today's spend breakdown by model and NPE."""
+    
+    async def check_budget_alerts(self) -> str:
+        """Check for budget warnings and return any alerts."""
+```
+
+#### 4.2.2 Project Manager Tool (`project_manager.py`)
+
+**Purpose:** Manage development projects and work items.
+
+```python
+class Tools:
+    # Project CRUD
+    async def create_project(self, name: str, description: str, daily_budget: float = 1.0) -> str
+    async def get_project(self, project_id: str) -> str
+    async def list_projects(self, status: str = "active") -> str
+    async def update_project(self, project_id: str, **updates) -> str
+    
+    # Work Item Management
+    async def create_work_item(self, project_id: str, title: str, item_type: str, assigned_model: str = None) -> str
+    async def get_work_item(self, item_id: str) -> str
+    async def list_work_items(self, project_id: str, status: str = "open") -> str
+    async def update_work_item(self, item_id: str, **updates) -> str
+    async def add_comment(self, item_id: str, comment: str, author: str) -> str
+    
+    # Budget Tracking
+    async def get_project_budget(self, project_id: str) -> str
+    async def record_project_expense(self, project_id: str, amount: float, description: str) -> str
+```
+
+#### 4.2.3 NPE Manager Tool (`npe_manager.py`)
+
+**Purpose:** Create and manage NPE identities.
+
+```python
+class Tools:
+    # NPE Lifecycle
+    async def create_npe(self, name: str, role: str, model: str, persona: str, tools: list[str]) -> str
+    async def get_npe(self, npe_id: str) -> str
+    async def list_npes(self, role: str = None, status: str = "active") -> str
+    async def update_npe(self, npe_id: str, **updates) -> str
+    async def deactivate_npe(self, npe_id: str) -> str
+    
+    # Activity Tracking
+    async def get_npe_activity(self, npe_id: str, days: int = 7) -> str
+    async def get_npe_cost_report(self, npe_id: str, period: str = "today") -> str
+```
+
+#### 4.2.4 Workflow Engine Tool (`workflow_engine.py`)
+
+**Purpose:** Execute and monitor multi-step workflows.
+
+```python
+class Tools:
+    # Workflow Execution
+    async def start_workflow(self, workflow_type: str, params: dict, project_id: str = None) -> str
+    async def get_workflow_status(self, workflow_id: str) -> str
+    async def complete_step(self, workflow_id: str, step_id: str, result: dict) -> str
+    async def fail_step(self, workflow_id: str, step_id: str, error: str) -> str
+    
+    # Circuit Breaker
+    async def check_circuit(self, workflow_id: str) -> str
+    async def trip_circuit(self, workflow_id: str, reason: str) -> str
+    async def reset_circuit(self, workflow_id: str) -> str
+    
+    # Escalation
+    async def escalate(self, workflow_id: str, reason: str, to_model: str = "claude-opus-4-5") -> str
+```
+
+### 4.3 Tool Permission Matrix
+
+| Tool | Orchestrator | PM | Coder | Reviewer |
+|------|:------------:|:--:|:-----:|:--------:|
+| cost_tracker | ✓ | R | - | - |
+| project_manager | ✓ | ✓ | R | R |
+| npe_manager | ✓ | - | - | - |
+| workflow_engine | ✓ | ✓ | - | - |
+| gitea_dev (read) | ✓ | ✓ | ✓ | ✓ |
+| gitea_dev (write) | - | - | ✓ | - |
+| gitea_admin | ✓ | ✓ | - | - |
+| venice_info | ✓ | ✓ | ✓ | ✓ |
+
+---
+
+## 5. NPE Personas & Roles
+
+### 5.1 NPE Identity Structure
+
+```python
+@dataclass
+class NPEIdentity:
+    # Core Identity
+    id: str                      # e.g., "npe-pm-main"
+    name: str                    # e.g., "Project Manager - Main"
+    role: str                    # orchestrator, pm, coder, reviewer
+    status: str                  # active, suspended, archived
+    
+    # Model Configuration
+    base_model: str              # Venice model ID
+    tier: int                    # 1-4 based on cost
+    
+    # Context Limits
+    max_context: int             # Max input tokens
+    target_output: int           # Target output tokens
+    
+    # Budget
+    daily_budget: float          # DIEM limit per day
+    spent_today: float           # Running total
+    
+    # Tools
+    enabled_tools: list[str]     # Tool IDs this NPE can use
+```
+
+### 5.2 Orchestrator Persona
+
+**ID:** `npe-orchestrator`  
+**Model:** `grok-41-fast` (Tier 1)  
+**Cost:** ~0.003 DIEM/call  
+**Context Limit:** 4,000 tokens
+
+```markdown
+# System Prompt: Orchestrator NPE
+
+You are the Orchestrator, responsible for coordinating all automated development work.
+
+## Core Responsibilities
+1. Receive triggers from cron jobs and webhooks
+2. Route work to appropriate NPEs
+3. Monitor workflow progress
+4. Handle escalations
+5. Manage budget allocation
+
+## Constraints
+- You do NOT perform work yourself
+- You MUST check budget before spawning work
+- You MUST use structured JSON for all outputs
+- You MUST keep context under 4,000 tokens
+
+## Output Format
+All outputs must be valid JSON:
+
+### Spawn Work
+{
+  "action": "spawn_workflow",
+  "workflow_type": "code_review|feature|bugfix",
+  "project_id": "string",
+  "assigned_npe": "npe-id",
+  "budget_limit": 0.5,
+  "priority": "high|medium|low"
+}
+
+### Route Escalation
+{
+  "action": "escalate",
+  "workflow_id": "string",
+  "reason": "string",
+  "to_model": "claude-opus-4-5",
+  "context_summary": "string (max 500 words)"
+}
+
+### Budget Check
+{
+  "action": "budget_check",
+  "remaining": 5.5,
+  "can_proceed": true,
+  "warnings": []
+}
+
+## Decision Rules
+1. If remaining budget < 0.5 DIEM: STOP all non-critical work
+2. If task is security-related: Route to Claude
+3. If task is simple routing: Do it yourself (no spawn needed)
+4. If stuck for > 30 minutes: Escalate
+```
+
+### 5.3 PM Persona
+
+**ID:** `npe-pm-{project}`  
+**Model:** `grok-41-fast` (Tier 1)  
+**Cost:** ~0.003 DIEM/call  
+**Context Limit:** 6,000 tokens
+
+```markdown
+# System Prompt: Project Manager NPE
+
+You are a Project Manager responsible for coordinating development work.
+
+## Core Responsibilities
+1. Break down requirements into work items
+2. Assign work to Coder NPEs
+3. Review completed work
+4. Track progress and budget
+
+## Constraints
+- You do NOT write code
+- You do NOT modify files
+- You MUST check project budget before assigning work
+- You MUST use structured JSON for work assignments
+
+## Work Assignment Format
+{
+  "action": "assign_work",
+  "work_item": {
+    "id": "WI-{timestamp}",
+    "title": "Brief title",
+    "description": "Requirements in 200 words or less",
+    "type": "feature|bugfix|refactor",
+    "assigned_to": "npe-coder-{specialty}",
+    "estimated_tokens": 5000,
+    "files_to_modify": ["path/to/file.py"],
+    "acceptance_criteria": ["criterion 1"]
+  }
+}
+
+## Review Format
+{
+  "action": "review_complete",
+  "work_item_id": "WI-xxx",
+  "verdict": "approve|request_changes|escalate",
+  "feedback": "string (50 words max)"
+}
+```
+
+### 5.4 Coder Persona
+
+**ID:** `npe-coder-{specialty}`  
+**Model:** `grok-code-fast-1` (Tier 1)  
+**Cost:** ~0.005 DIEM/call  
+**Context Limit:** 12,000 tokens
+
+```markdown
+# System Prompt: Coder NPE
+
+You are a Coder responsible for implementing assigned work items.
+
+## Core Responsibilities
+1. Read work item requirements
+2. Examine existing code
+3. Implement changes
+4. Commit via Gitea tool
+
+## Constraints
+- You ONLY work on assigned items
+- You do NOT make architectural decisions
+- You MUST follow existing code style
+- You MUST output structured JSON for commits
+
+## Code Output Format
+{
+  "action": "commit_changes",
+  "work_item_id": "WI-xxx",
+  "changes": [
+    {
+      "file_path": "src/module/file.py",
+      "action": "create|update|delete",
+      "content": "full file content",
+      "description": "what this change does (20 words max)"
+    }
+  ],
+  "commit_message": "feat: description",
+  "ready_for_review": true
+}
+
+## When Stuck
+{
+  "action": "request_help",
+  "work_item_id": "WI-xxx",
+  "blocker": "description (50 words max)",
+  "attempted": ["approach 1", "approach 2"]
+}
+```
+
+### 5.5 Reviewer Persona
+
+**ID:** `npe-reviewer-{specialty}`  
+**Model:** `grok-41-fast` (Tier 1)  
+**Cost:** ~0.004 DIEM/call  
+**Context Limit:** 8,000 tokens
+
+```markdown
+# System Prompt: Code Reviewer NPE
+
+You are a Code Reviewer responsible for ensuring code quality.
+
+## Core Responsibilities
+1. Review code changes
+2. Check for bugs, security issues, style violations
+3. Provide actionable feedback
+4. Approve or request changes
+
+## Constraints
+- You do NOT modify code
+- You do NOT approve your own changes
+- You MUST be specific and actionable
+- You MUST output structured JSON
+
+## Review Output Format
+{
+  "action": "review_complete",
+  "work_item_id": "WI-xxx",
+  "verdict": "approve|request_changes|escalate",
+  "summary": "one line summary",
+  "issues": [
+    {
+      "severity": "critical|major|minor",
+      "file": "path",
+      "line": 42,
+      "issue": "what's wrong",
+      "fix": "how to fix"
+    }
+  ],
+  "security_concerns": [],
+  "approved": true|false
+}
+
+## Escalation Triggers
+- Security vulnerability
+- Architectural concern
+- >3 major issues
+```
+
+---
+
+## 6. Cron & Scheduling
+
+### 6.1 Architecture: Hybrid Scheduler
+
+Based on your usage patterns, I recommend a **hybrid scheduler**:
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                      SCHEDULER ARCHITECTURE                         │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                      │
+│  KUBERNETES CRONJOBS                                                │
+│  ═══════════════════                                                │
+│                                                                      │
+│  ┌─────────────────────────────────────────────────────────────┐   │
+│  │              MASTER SCHEDULER (Every 15 min)                 │   │
+│  │                                                              │   │
+│  │  • Health check all systems                                  │   │
+│  │  • Process pending triggers                                  │   │
+│  │  • Check for stuck workflows                                 │   │
+│  │  • Route escalations                                         │   │
+│  │                                                              │   │
+│  │  Cost: ~0.003 DIEM × 4/hour = 0.012 DIEM/hour               │   │
+│  └─────────────────────────────────────────────────────────────┘   │
+│                                                                      │
+│  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐          │
+│  │ BURN WINDOW   │  │    DAILY      │  │    WEEKLY     │          │
+│  │ (18:45 EST)   │  │ (00:00 UTC)   │  │ (Sun 06:00)   │          │
+│  │               │  │               │  │               │          │
+│  │ Use surplus   │  │ Cleanup       │  │ Full report   │          │
+│  │ before reset  │  │ Archive       │  │ Cost analysis │          │
+│  │               │  │ Reset budgets │  │               │          │
+│  └───────────────┘  └───────────────┘  └───────────────┘          │
+│                                                                      │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+### 6.2 Master Scheduler Script
+
+```python
+#!/usr/bin/env python3
+"""
+Master Scheduler for NPE Orchestration.
+Runs every 15 minutes via Kubernetes CronJob.
+"""
+
+import asyncio
+import os
+from datetime import datetime, timezone
+from typing import Optional
+
+import httpx
+
+# Configuration
+OWUI_URL = os.environ["OWUI_URL"]
+OWUI_TOKEN = os.environ["OWUI_TOKEN"]
+ORCHESTRATOR_CHAT_ID = os.environ.get("ORCHESTRATOR_CHAT_ID")
+EST_OFFSET = -5  # EST timezone offset
+
+
+async def main():
+    """Main scheduler loop."""
+    async with httpx.AsyncClient(
+        base_url=OWUI_URL,
+        headers={"Authorization": f"Bearer {OWUI_TOKEN}"},
+        timeout=60.0
+    ) as client:
+        
+        now = datetime.now(timezone.utc)
+        hour_est = (now.hour + EST_OFFSET) % 24
+        
+        # 1. Always: Health check
+        health = await check_system_health(client)
+        if not health["ok"]:
+            await alert_admin(client, f"System unhealthy: {health['issues']}")
+            return
+        
+        # 2. Always: Budget check
+        budget = await get_budget_status(client)
+        log_budget(budget)
+        
+        if budget["remaining"] < 0.5:
+            await alert_admin(client, f"Budget critical: {budget['remaining']:.2f} DIEM")
+            # Don't stop - still need to process escalations
+        
+        # 3. Conditional: Process work based on hour
+        if 22 <= hour_est or hour_est < 7:
+            # Night automation window (22:00 - 07:00 EST)
+            await process_automation_queue(client, budget)
+        elif hour_est == 18 and now.minute >= 45:
+            # Burn window (18:45 - 19:00 EST)
+            await run_burn_window(client, budget)
+        else:
+            # Daytime - only process high-priority triggers
+            await process_high_priority_only(client, budget)
+        
+        # 4. Always: Check for stuck workflows
+        stuck = await find_stuck_workflows(client, max_age_minutes=30)
+        for workflow in stuck:
+            await handle_stuck_workflow(client, workflow)
+        
+        # 5. Always: Route pending escalations
+        escalations = await get_pending_escalations(client)
+        for escalation in escalations:
+            await route_escalation(client, escalation, budget)
+
+
+async def check_system_health(client: httpx.AsyncClient) -> dict:
+    """Verify all system components are operational."""
+    issues = []
+    
+    # Check Open WebUI
+    try:
+        resp = await client.get("/health")
+        if resp.status_code != 200:
+            issues.append("Open WebUI unhealthy")
+    except Exception as e:
+        issues.append(f"Open WebUI unreachable: {e}")
+    
+    # Check Venice balance
+    try:
+        resp = await client.get("/api/v1/venice/balance")
+        balance = resp.json().get("balance", 0)
+        if balance < 0.1:
+            issues.append(f"Venice balance critical: {balance}")
+    except Exception as e:
+        issues.append(f"Venice check failed: {e}")
+    
+    return {"ok": len(issues) == 0, "issues": issues}
+
+
+async def get_budget_status(client: httpx.AsyncClient) -> dict:
+    """Get current budget status."""
+    # Calculate time until reset (19:00 EST = 00:00 UTC)
+    now = datetime.now(timezone.utc)
+    hours_until_reset = (24 - now.hour) % 24
+    
+    try:
+        resp = await client.get("/api/v1/venice/balance")
+        data = resp.json()
+        remaining = data.get("balance", 0)
+        
+        # Get today's spend from cost tracker
+        spend_resp = await client.get("/api/v1/cost-tracker/today")
+        spent_today = spend_resp.json().get("total", 0)
+    except Exception:
+        remaining = 8.1  # Assume full budget on error
+        spent_today = 0
+    
+    return {
+        "remaining": remaining,
+        "spent_today": spent_today,
+        "hours_until_reset": hours_until_reset,
+        "automation_reserve": 0.5,
+        "escalation_reserve": 0.1,
+        "available_for_work": remaining - 0.6  # reserves
+    }
+
+
+async def run_burn_window(client: httpx.AsyncClient, budget: dict):
+    """
+    Use surplus DIEM before 19:00 EST reset.
+    
+    ROI: 0.10 DIEM spend can utilize 1.0+ DIEM that would be lost.
+    """
+    surplus = budget["remaining"] - 2.0  # Keep 2.0 DIEM for tomorrow morning
+    
+    if surplus < 0.10:
+        print(f"No surplus to burn: {budget['remaining']:.2f} DIEM")
+        return
+    
+    print(f"Burn window: {surplus:.2f} DIEM surplus available")
+    
+    tasks = []
+    
+    # Priority 1: Summarize active workflows (saves context tomorrow)
+    if surplus >= 0.05:
+        tasks.append(("summarize_workflows", 0.05))
+        surplus -= 0.05
+    
+    # Priority 2: Pre-plan tomorrow's tasks
+    if surplus >= 0.08:
+        tasks.append(("pre_plan", 0.08))
+        surplus -= 0.08
+    
+    # Priority 3: Run pending reviews
+    if surplus >= 0.05:
+        tasks.append(("pending_reviews", surplus))
+    
+    for task, budget_limit in tasks:
+        await dispatch_burn_task(client, task, budget_limit)
+
+
+async def process_automation_queue(client: httpx.AsyncClient, budget: dict):
+    """Process automation tasks during night window."""
+    if budget["available_for_work"] < 0.1:
+        print("Budget too low for automation")
+        return
+    
+    # Get pending automation tasks
+    triggers = await get_pending_triggers(client)
+    
+    for trigger in triggers:
+        # Estimate cost
+        estimated = estimate_trigger_cost(trigger)
+        
+        if estimated > budget["available_for_work"]:
+            print(f"Skipping {trigger['type']}: cost {estimated} > available {budget['available_for_work']}")
+            continue
+        
+        await dispatch_trigger(client, trigger)
+        budget["available_for_work"] -= estimated
+
+
+def estimate_trigger_cost(trigger: dict) -> float:
+    """Estimate DIEM cost for a trigger."""
+    costs = {
+        "code_review": 0.015,      # Grok review + routing
+        "feature_request": 0.050,  # PM + Coder + Review
+        "bug_fix": 0.030,          # Triage + Fix + Review
+        "cleanup": 0.005,          # Simple Grok task
+        "health_check": 0.003,     # Minimal
+    }
+    return costs.get(trigger.get("type"), 0.010)
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+### 6.3 Cron Schedule (Kubernetes)
+
+```yaml
+# Master Scheduler - Every 15 minutes
+apiVersion: batch/v1
+kind: CronJob
+metadata:
+  name: npe-master-scheduler
+  namespace: open-webui
+spec:
+  schedule: "*/15 * * * *"
+  concurrencyPolicy: Forbid
+  jobTemplate:
+    spec:
+      template:
+        spec:
+          containers:
+          - name: scheduler
+            image: python:3.11-slim
+            command: ["python", "/scripts/master_scheduler.py"]
+            envFrom:
+            - secretRef:
+                name: npe-secrets
+          restartPolicy: OnFailure
+          
+---
+# Burn Window - 18:45 EST (23:45 UTC)
+apiVersion: batch/v1
+kind: CronJob
+metadata:
+  name: npe-burn-window
+spec:
+  schedule: "45 23 * * *"
+  jobTemplate:
+    spec:
+      template:
+        spec:
+          containers:
+          - name: burner
+            image: python:3.11-slim
+            command: ["python", "/scripts/burn_window.py"]
+
+---
+# Daily Maintenance - 00:00 UTC (19:00 EST - after reset)
+apiVersion: batch/v1
+kind: CronJob
+metadata:
+  name: npe-daily-maintenance
+spec:
+  schedule: "0 0 * * *"
+  jobTemplate:
+    spec:
+      template:
+        spec:
+          containers:
+          - name: maintenance
+            image: python:3.11-slim
+            command: ["python", "/scripts/daily_maintenance.py"]
+```
+
+---
+
+## 7. Workflow Patterns
+
+### 7.1 Code Review Workflow
+
+**Trigger:** Gitea webhook on PR create/update  
+**Cost:** ~0.015 DIEM  
+**Duration:** 2-5 minutes
+
+```
+┌─────────┐    ┌──────────────┐    ┌──────────────┐    ┌─────────┐
+│  START  │───▶│ Load Context │───▶│    Review    │───▶│ Verdict │
+│         │    │ (Grok, 0.003)│    │ (Grok, 0.008)│    │         │
+└─────────┘    └──────────────┘    └──────────────┘    └────┬────┘
+                                                             │
+                    ┌────────────────────────────────────────┤
+                    │                    │                   │
+                    ▼                    ▼                   ▼
+              ┌──────────┐        ┌───────────┐       ┌───────────┐
+              │ APPROVE  │        │  REQUEST  │       │ ESCALATE  │
+              │          │        │  CHANGES  │       │ to Claude │
+              │ Post     │        │           │       │ (0.054)   │
+              │ Comment  │        │ Post      │       └───────────┘
+              │ (0.002)  │        │ Feedback  │
+              └──────────┘        │ (0.002)   │
+                                  └───────────┘
+```
+
+### 7.2 Feature Development Workflow
+
+**Trigger:** Issue with label "feature"  
+**Cost:** ~0.050 DIEM  
+**Duration:** 15-30 minutes
+
+```
+┌─────────┐    ┌──────────────┐    ┌──────────────┐    ┌──────────────┐
+│  START  │───▶│ PM: Analyze  │───▶│ PM: Breakdown│───▶│   Coder:     │
+│         │    │ (Grok, 0.003)│    │ (Grok, 0.003)│    │  Implement   │
+└─────────┘    └──────────────┘    └──────────────┘    │ (Grok, 0.015)│
+                                                        └──────┬───────┘
+                                                               │
+                    ┌──────────────────────────────────────────┤
+                    │                                          │
+                    ▼                                          ▼
+              ┌───────────┐                             ┌───────────┐
+              │  Review   │◀────── Revision Loop ──────│  FAILED   │
+              │  (0.008)  │        (max 3x)            │           │
+              └─────┬─────┘                            └───────────┘
+                    │
+                    ▼
+              ┌───────────┐
+              │ Create PR │
+              │  (0.003)  │
+              └───────────┘
+```
+
+### 7.3 Workflow State Machine
+
+```python
+from enum import Enum
+from dataclasses import dataclass, field
+from datetime import datetime
+from typing import Optional
+
+class WorkflowState(Enum):
+    PENDING = "pending"
+    RUNNING = "running"
+    WAITING_INPUT = "waiting_input"
+    STEP_FAILED = "step_failed"
+    ESCALATED = "escalated"
+    COMPLETED = "completed"
+    FAILED = "failed"
+
+@dataclass
+class CircuitBreaker:
+    max_failures: int = 3
+    failure_count: int = 0
+    last_failure: Optional[datetime] = None
+    cooldown_seconds: int = 300
+    
+    def record_failure(self):
+        self.failure_count += 1
+        self.last_failure = datetime.now()
+    
+    def is_open(self) -> bool:
+        if self.failure_count >= self.max_failures:
+            if self.last_failure:
+                elapsed = (datetime.now() - self.last_failure).seconds
+                if elapsed < self.cooldown_seconds:
+                    return True
+                # Reset after cooldown
+                self.failure_count = 0
+        return False
+    
+    def should_escalate(self) -> bool:
+        return self.failure_count >= (self.max_failures - 1)
+
+@dataclass
+class Workflow:
+    id: str
+    type: str
+    state: WorkflowState = WorkflowState.PENDING
+    current_step: str = ""
+    assigned_npe: str = ""
+    project_id: Optional[str] = None
+    budget_limit: float = 1.0
+    budget_spent: float = 0.0
+    circuit: CircuitBreaker = field(default_factory=CircuitBreaker)
+    steps_completed: list = field(default_factory=list)
+    created_at: datetime = field(default_factory=datetime.now)
+    updated_at: datetime = field(default_factory=datetime.now)
+    
+    def can_proceed(self) -> tuple[bool, str]:
+        if self.circuit.is_open():
+            return False, "Circuit breaker open"
+        if self.budget_spent >= self.budget_limit:
+            return False, "Budget exhausted"
+        return True, "OK"
+    
+    def record_cost(self, amount: float):
+        self.budget_spent += amount
+        self.updated_at = datetime.now()
+    
+    def complete_step(self, step_id: str, result: dict):
+        self.steps_completed.append({
+            "step_id": step_id,
+            "completed_at": datetime.now().isoformat(),
+            "result": result
+        })
+        self.updated_at = datetime.now()
+    
+    def fail_step(self, step_id: str, error: str):
+        self.circuit.record_failure()
+        self.state = WorkflowState.STEP_FAILED
+        self.updated_at = datetime.now()
+```
+
+---
+
+## 8. Cost Management
+
+### 8.1 Budget Allocation (Based on Actual Data)
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                    DAILY BUDGET ALLOCATION                          │
+│                    (8.1 DIEM total)                                 │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                      │
+│  ┌───────────────────────────────────────────────────────────────┐ │
+│  │ INTERACTIVE WORK                                   5.50 DIEM  │ │
+│  │ (68% of budget)                                               │ │
+│  │                                                               │ │
+│  │   Claude sessions (1-2/day)         ~1.00 DIEM               │ │
+│  │   Grok chat (continuous)            ~2.50 DIEM               │ │
+│  │   Qwen bulk processing              ~1.00 DIEM               │ │
+│  │   Image generation                  ~1.00 DIEM               │ │
+│  └───────────────────────────────────────────────────────────────┘ │
+│                                                                      │
+│  ┌───────────────────────────────────────────────────────────────┐ │
+│  │ NPE AUTOMATION                                     0.25 DIEM  │ │
+│  │ (3% of budget)                                                │ │
+│  │                                                               │ │
+│  │   PM checks (12/day × 0.003)        ~0.04 DIEM               │ │
+│  │   Coder tasks (10/day × 0.005)      ~0.05 DIEM               │ │
+│  │   Reviews (8/day × 0.004)           ~0.03 DIEM               │ │
+│  │   Orchestrator (96/day × 0.001)     ~0.10 DIEM               │ │
+│  │   Buffer                            ~0.03 DIEM               │ │
+│  └───────────────────────────────────────────────────────────────┘ │
+│                                                                      │
+│  ┌───────────────────────────────────────────────────────────────┐ │
+│  │ RESERVES                                           0.60 DIEM  │ │
+│  │ (7% of budget)                                                │ │
+│  │                                                               │ │
+│  │   Escalation reserve (Claude)       ~0.10 DIEM               │ │
+│  │   Automation reserve                ~0.50 DIEM               │ │
+│  └───────────────────────────────────────────────────────────────┘ │
+│                                                                      │
+│  ┌───────────────────────────────────────────────────────────────┐ │
+│  │ BUFFER / BURN WINDOW                              1.75 DIEM  │ │
+│  │ (22% of budget)                                               │ │
+│  │                                                               │ │
+│  │   Available for burn window automation if unused              │ │
+│  │   Target: Use 80%+ of daily budget                           │ │
+│  └───────────────────────────────────────────────────────────────┘ │
+│                                                                      │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+### 8.2 Cost Enforcement
+
+```python
+class BudgetEnforcer:
+    """Enforce budget limits at multiple levels."""
+    
+    def __init__(self, daily_budget: float = 8.1):
+        self.daily_budget = daily_budget
+        self.reserves = {
+            "escalation": 0.10,
+            "automation": 0.50,
+        }
+    
+    async def can_proceed(
+        self,
+        estimated_cost: float,
+        npe_id: str,
+        project_id: Optional[str] = None,
+        use_reserves: bool = False
+    ) -> tuple[bool, str]:
+        """Check if operation can proceed within budget."""
+        
+        # Get current balance
+        balance = await self.get_venice_balance()
+        
+        # Calculate available
+        reserved = sum(self.reserves.values()) if not use_reserves else 0
+        available = balance - reserved
+        
+        # Check global
+        if available < estimated_cost:
+            return False, f"Insufficient budget: {available:.3f} < {estimated_cost:.3f}"
+        
+        # Check per-NPE daily limit
+        npe_spent = await self.get_npe_spent_today(npe_id)
+        npe_limit = await self.get_npe_daily_limit(npe_id)
+        
+        if npe_spent + estimated_cost > npe_limit:
+            return False, f"NPE budget exceeded: {npe_spent:.3f} + {estimated_cost:.3f} > {npe_limit:.3f}"
+        
+        # Check per-project if applicable
+        if project_id:
+            project_spent = await self.get_project_spent_today(project_id)
+            project_limit = await self.get_project_daily_limit(project_id)
+            
+            if project_spent + estimated_cost > project_limit:
+                return False, f"Project budget exceeded"
+        
+        return True, "OK"
+    
+    async def record_and_verify(
+        self,
+        actual_cost: float,
+        estimated_cost: float,
+        npe_id: str,
+        operation: str
+    ):
+        """Record cost and check for anomalies."""
+        # Record
+        await self.record_cost(actual_cost, npe_id, operation)
+        
+        # Check for cost overrun
+        if actual_cost > estimated_cost * 1.5:
+            await self.alert(
+                f"Cost overrun: {operation} estimated {estimated_cost:.4f}, actual {actual_cost:.4f}"
+            )
+        
+        # Check for budget warnings
+        remaining = await self.get_remaining_today()
+        if remaining < self.daily_budget * 0.2:
+            await self.alert(f"Budget warning: only {remaining:.2f} DIEM remaining today")
+```
+
+### 8.3 Automation Cost Projections
+
+Based on your actual rates:
+
+| Scenario | Model | Per Call | Calls/Day | Daily Cost | Monthly |
+|----------|-------|----------|-----------|------------|---------|
+| PM Check | Grok | 0.0021 | 12 | 0.0252 | 0.76 |
+| Coder Task | Grok-Code | 0.0050 | 10 | 0.0500 | 1.50 |
+| Code Review | Grok | 0.0035 | 8 | 0.0280 | 0.84 |
+| Orchestrator | Grok | 0.0010 | 96 | 0.0960 | 2.88 |
+| Deep Analysis | Kimi | 0.0120 | 3 | 0.0360 | 1.08 |
+| Escalation | Claude | 0.0540 | 1 | 0.0540 | 1.62 |
+| **TOTAL** | | | | **0.2892** | **8.68** |
+
+**Conclusion:** Full NPE automation costs **0.29 DIEM/day** (3.6% of budget), leaving **7.81 DIEM** for interactive work.
+
+---
+
+## 9. Implementation Roadmap
+
+### Phase 1: Foundation (Week 1)
+
+**Goal:** Basic infrastructure working  
+**Budget Impact:** None (setup only)
+
+- [ ] Deploy corrected Gitea tools (v1.1.0)
+- [ ] Create `cost_tracker` tool
+- [ ] Set up 2 NPEs manually:
+  - [ ] Orchestrator (Grok)
+  - [ ] PM (Grok)
+- [ ] Create master scheduler (health check only)
+- [ ] Test: Manual trigger → Orchestrator routes to PM
+
+**Success Criteria:**
+- Orchestrator receives triggers
+- Cost tracked per operation
+- No automation yet - just infrastructure
+
+### Phase 2: Automation (Week 2)
+
+**Goal:** Night automation working  
+**Budget Impact:** +0.05 DIEM/day
+
+- [ ] Add Coder NPE (Grok-Code)
+- [ ] Add Reviewer NPE (Grok)
+- [ ] Implement Code Review workflow
+- [ ] Enable night automation (22:00-07:00 EST)
+- [ ] Test: PR created → auto-review → comment posted
+
+**Success Criteria:**
+- PRs reviewed automatically during night window
+- Reviews posted as Gitea comments
+- Circuit breaker prevents loops
+
+### Phase 3: Full Workflows (Week 3-4)
+
+**Goal:** Complete workflow coverage  
+**Budget Impact:** +0.20 DIEM/day
+
+- [ ] Create `project_manager` tool
+- [ ] Create `workflow_engine` tool
+- [ ] Implement Feature Development workflow
+- [ ] Implement Bug Fix workflow
+- [ ] Enable burn window automation
+- [ ] Test: Full feature cycle from issue to PR
+
+**Success Criteria:**
+- Features developed from issue to merged PR
+- Budget tracked per project
+- Burn window uses surplus effectively
+
+### Phase 4: Escalation (Week 5)
+
+**Goal:** Claude integration for complex cases  
+**Budget Impact:** +0.05 DIEM/day
+
+- [ ] Implement escalation paths
+- [ ] Create context compression for Claude calls
+- [ ] Add security review workflow
+- [ ] Test: Complex review → escalate → Claude response
+
+**Success Criteria:**
+- Escalation triggers when needed
+- Claude calls stay under 0.06 DIEM each
+- Responses routed back to workflow
+
+### Phase 5: Optimization (Week 6+)
+
+**Goal:** Cost optimization and scaling  
+**Budget Impact:** -0.05 DIEM/day (savings)
+
+- [ ] Implement cache optimization strategy
+- [ ] Add context compression to all NPEs
+- [ ] Tune model selection based on success rates
+- [ ] Add metrics dashboard
+- [ ] Document runbooks
+
+**Success Criteria:**
+- Cache hit rates > 50% for Grok
+- System runs 7 days unattended
+- Total automation < 0.25 DIEM/day
+
+---
+
+## 10. Open Questions
+
+### 10.1 Unresolved Decisions
+
+| Question | Options | Recommendation | Notes |
+|----------|---------|----------------|-------|
+| State storage | OpenWebUI folders vs. SQLite | OpenWebUI folders | Simpler, no new deps |
+| Token rotation | 30 vs 90 days | 90 days | Manual for now |
+| Max concurrent workflows | 3 vs 5 vs 10 | 5 | Test and adjust |
+| Chat retention | 7 vs 30 vs 90 days | 30 days | Balance audit vs. storage |
+
+### 10.2 Your Input Needed
+
+1. **Gitea Webhooks:** How are webhooks exposed? Need ingress path for triggers.
+
+2. **Claude API Key:** Using Venice's Claude or direct Anthropic? Venice is simpler.
+
+3. **Multi-file Commits:** Do you need atomic batch commits in gitea_dev?
+
+4. **Test Execution:** Skip CI/CD for Phase 1-5? Add later?
+
+5. **Human Approval UI:** Chat-only for now? Dashboard later?
+
+### 10.3 Known Limitations
+
+1. **No real-time collaboration** - NPEs work asynchronously
+2. **No visual review** - Can't review UI changes
+3. **Venice dependency** - All LLM calls through Venice
+4. **Single Gitea instance** - No multi-repo federation yet
+
+---
+
+## Appendix A: Quick Reference
+
+### Model Rates (DIEM per 1M tokens)
+
+| Model | Input | Output | Cache | Effective |
+|-------|-------|--------|-------|-----------|
+| Qwen-Instruct | 0.15 | 0.27 | - | 0.06 |
+| Grok-Code | 0.25 | 1.87 | 0.03 | 0.07 |
+| Grok-41-Fast | 0.50 | 1.25 | 0.125 | 0.19 |
+| MiniMax-M21 | 0.40 | 1.60 | 0.04 | 0.22 |
+| Kimi-K2 | 0.75 | 3.20 | 0.375 | 0.42 |
+| Claude-Opus | 6.00 | 30.00 | - | 5.28 |
+
+### Context Limits
+
+| Role | Max Tokens | Warning At |
+|------|------------|------------|
+| Orchestrator | 4,000 | 3,000 |
+| PM | 6,000 | 5,000 |
+| Coder | 12,000 | 10,000 |
+| Reviewer | 8,000 | 6,000 |
+
+### Budget Summary
+
+| Category | Daily DIEM | % of Budget |
+|----------|------------|-------------|
+| Interactive | 5.50 | 68% |
+| Automation | 0.25 | 3% |
+| Reserves | 0.60 | 7% |
+| Buffer/Burn | 1.75 | 22% |
+| **Total** | **8.10** | **100%** |
+
+---
+
+*Document Status: RFC v2.0 - Based on 30-day billing analysis*  
+*Last Updated: January 11, 2026*  
+*Data Source: 13,582 Venice.ai transactions*
\ No newline at end of file