From 62651b4d2eba620fcfabb01d58fa5660e1e3b85d Mon Sep 17 00:00:00 2001 From: xcaliber Date: Tue, 13 Jan 2026 18:44:41 +0000 Subject: [PATCH] Add Multi agent orchestration design.md --- Multi agent orchestration design.md | 1692 +++++++++++++++++++++++++++ 1 file changed, 1692 insertions(+) create mode 100644 Multi agent orchestration design.md diff --git a/Multi agent orchestration design.md b/Multi agent orchestration design.md new file mode 100644 index 0000000..80746d9 --- /dev/null +++ b/Multi agent orchestration design.md @@ -0,0 +1,1692 @@ +# Multi-Agent Orchestration System Design v2.0 + +**Version:** 2.0.0 +**Author:** Jeff Smith + Claude +**Date:** January 11, 2026 +**Status:** RFC (Request for Comments) +**Data Source:** Venice.ai billing history Dec 12, 2025 - Jan 11, 2026 (13,582 transactions) + +--- + +## Executive Summary + +This document describes the architecture for a multi-model AI development system built on Open WebUI, Venice.ai, and Gitea. Based on 30 days of actual billing data, we demonstrate that full NPE automation costs **0.21 DIEM/day** (2.6% of the 8.1 DIEM daily budget), leaving **7.89 DIEM** for interactive work. + +**Core Thesis:** Your actual usage patterns prove orchestrated automation is not just feasible—it's virtually free. The real challenge is context management and cache optimization, not model costs. + +--- + +## Table of Contents + +1. [Billing Data Analysis](#1-billing-data-analysis) +2. [Model Selection Strategy](#2-model-selection-strategy) +3. [Context Management](#3-context-management) +4. [Tool Architecture](#4-tool-architecture) +5. [NPE Personas & Roles](#5-npe-personas--roles) +6. [Cron & Scheduling](#6-cron--scheduling) +7. [Workflow Patterns](#7-workflow-patterns) +8. [Cost Management](#8-cost-management) +9. [Implementation Roadmap](#9-implementation-roadmap) +10. [Open Questions](#10-open-questions) + +--- + +## 1. Billing Data Analysis + +### 1.1 30-Day Summary + +``` +Period: Dec 12, 2025 - Jan 11, 2026 +Total Records: 13,582 billing events +Total Spend: 61.24 DIEM + 2.86 USD +Days Active: 18 days +Average Daily: 3.56 DIEM/day +Max Daily: 9.96 DIEM (Jan 2, 2026) +Median Daily: 3.72 DIEM/day +``` + +**Budget Analysis:** +- Daily Budget: 8.1 DIEM (staked) +- Average Spend: 3.56 DIEM/day +- **Average Surplus: 4.54 DIEM/day (56% unutilized)** +- This surplus is lost at 19:00 EST reset + +### 1.2 Spend by Model Family + +``` +┌────────────────────────────────────────────────────────────────────┐ +│ ACTUAL SPEND BY MODEL (30 days) │ +├────────────────────────────────────────────────────────────────────┤ +│ │ +│ GLM-4.6 ████████████████████████████ 16.66 DIEM (26.0%) │ +│ Qwen ████████████████████████ 15.60 DIEM (24.3%) │ +│ Claude-Opus-4.5 ██████████████████ 11.86 DIEM (18.5%) │ +│ Grok-41-Fast ██████████ 6.71 DIEM (10.5%) │ +│ Image-Gen █████████ 6.01 DIEM (9.4%) │ +│ MiniMax-M21 █████ 3.31 DIEM (5.2%) │ +│ Kimi-K2 ██ 1.53 DIEM (2.4%) │ +│ GLM-4.7 ██ 1.25 DIEM (2.0%) │ +│ Other █ 1.17 DIEM (1.7%) │ +│ │ +│ TOTAL 64.10 DIEM │ +│ │ +└────────────────────────────────────────────────────────────────────┘ +``` + +### 1.3 Actual Effective Rates (DIEM per 1M Tokens) + +From your billing data, sorted by cost efficiency: + +| Model | Input Rate | Output Rate | Cache Rate | **Effective Rate** | Calls | Total Cost | +|-------|------------|-------------|------------|-------------------|-------|------------| +| Qwen-Instruct | $0.071 | $0.27 | - | **0.0626** | 408 | 0.92 | +| Grok-Code | $0.140 | $1.87 | $0.030 | **0.0721** | 611 | 1.67 | +| DeepSeek | $0.329 | $1.00 | $0.200 | **0.1825** | 36 | 0.10 | +| Grok-41-Fast | $0.314 | $1.25 | $0.125 | **0.1857** | 1,152 | 5.03 | +| MiniMax-M21 | $0.316 | $1.60 | $0.040 | **0.2232** | 360 | 3.31 | +| Qwen-Thinking | $0.450 | $3.50 | - | **0.3339** | 105 | 1.61 | +| Qwen-Coder | $0.750 | $3.00 | - | **0.3830** | 530 | 12.99 | +| Kimi-K2-Thinking | $0.595 | $3.20 | $0.375 | **0.4152** | 111 | 1.53 | +| GLM-4.6 | $0.850 | $2.75 | - | **0.4455** | 1,724 | 16.66 | +| Claude-Opus-4.5 | $6.000 | $30.00 | - | **5.2751** | 78 | 11.86 | + +**Key Insights:** +1. **Grok is 28× cheaper than Claude** per token +2. **Qwen-Instruct is 84× cheaper than Claude** for bulk work +3. **Cache hits reduce Grok input costs by 75%** +4. Claude is only 78 calls but 18.5% of total spend + +### 1.4 Context Size Distribution + +Your actual context sizes reveal optimization opportunities: + +``` +GROK-41-FAST (1,152 calls): + Median: 5,291 tokens | P75: 7,217 | Max: 582,506 + Distribution: 0-5k: 1671 | 5-10k: 1245 | 10-20k: 485 | 20-50k: 23 | 50k+: 17 + ✓ WELL MANAGED - 85% of calls under 10k tokens + +KIMI-K2-THINKING (111 calls): + Median: 5,164 tokens | P75: 17,797 | Max: 48,893 + Distribution: 0-5k: 144 | 5-10k: 39 | 10-20k: 83 | 20-50k: 34 + ⚠ CONTEXT BLEEDING - P75 jumps to 18k, needs pruning + +MINIMAX-M21 (360 calls): + Median: 10,090 tokens | P75: 23,071 | Max: 169,708 + Distribution: 0-5k: 211 | 5-10k: 202 | 10-20k: 181 | 20-50k: 198 | 50k+: 38 + ⚠ HEAVY CONTEXTS - 11% of calls over 50k tokens + +QWEN-CODER (530 calls): + Median: 17,016 tokens | P75: 47,601 | Max: 253,462 + Distribution: 0-5k: 340 | 5-10k: 124 | 10-20k: 82 | 20-50k: 284 | 50k+: 230 + ⚠ CODE CONTEXTS ARE LARGE - expected but optimize where possible + +CLAUDE-OPUS-4.5 (78 calls): + Median: 7,198 tokens | P75: 11,956 | Max: 51,368 + ✓ REASONABLE - given high cost, context is well-controlled +``` + +### 1.5 Cache Efficiency Analysis + +``` +┌────────────────────────────────────────────────────────────────────┐ +│ CACHE HIT RATES BY MODEL │ +├────────────────────────────────────────────────────────────────────┤ +│ │ +│ GROK-CODE: ████████████████████████████████████ 67.8% │ +│ Saved: $1.08 | Cache rate: $0.030 vs $0.250 full │ +│ │ +│ KIMI-K2: █████ 9.8% │ +│ Saved: $0.05 | Cache rate: $0.375 vs $0.750 full │ +│ │ +│ GROK-41-FAST: █████ 8.8% │ +│ Saved: $0.27 | Cache rate: $0.125 vs $0.500 full │ +│ │ +│ MINIMAX-M21: ████ 7.2% │ +│ Saved: $0.16 | Cache rate: $0.040 vs $0.400 full │ +│ │ +└────────────────────────────────────────────────────────────────────┘ +``` + +**Grok-Code's 67.8% cache hit rate** demonstrates what's possible with sequential operations using the same system prompt and context. + +### 1.6 Hourly Usage Pattern (EST) + +``` +Hour (EST) | Spend | Activity Level +───────────┼──────────┼──────────────────────────────── + 00-03 | 0.00 | 💤 Dead + 04-05 | 0.84 | 🌅 Early morning + 06-08 | 9.86 | ☕ Morning peak + 09-11 | 3.57 | 📊 Late morning + 12-13 | 7.34 | 🍽️ Lunch peak + 14-15 | 8.13 | 💻 Afternoon work + 16-17 | 14.25 | 🔥 PEAK (4-6pm) + 18-19 | 8.13 | 🌆 Evening + 20-21 | 9.93 | 🌙 Night session + 22-23 | 0.00 | 💤 Dead + +AUTOMATION WINDOW: 22:00 - 07:00 EST (9 hours of minimal usage) +``` + +--- + +## 2. Model Selection Strategy + +### 2.1 Tier Architecture (Data-Driven) + +Based on actual billing patterns: + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ MODEL SELECTION PYRAMID │ +├─────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌─────────────┐ │ +│ │ TIER 4 │ Claude Opus 4.5 │ +│ │ ORACLE │ 5.28 DIEM/1M effective │ +│ │ <2% │ Architecture, Security, │ +│ └──────┬──────┘ Deadlock Resolution │ +│ │ Budget: 0.05 DIEM/day │ +│ │ │ +│ ┌─────────┴─────────┐ │ +│ │ TIER 3 │ Kimi-K2, Qwen-Thinking │ +│ │ REASONING │ 0.33-0.42 DIEM/1M │ +│ │ 5% │ Complex analysis, │ +│ └─────────┬─────────┘ Multi-step reasoning │ +│ │ Budget: 0.10 DIEM/day │ +│ │ │ +│ ┌───────────────┴───────────────┐ │ +│ │ TIER 2 │ MiniMax, DeepSeek │ +│ │ BALANCED │ 0.18-0.22 DIEM/1M │ +│ │ 15% │ Standard tasks, │ +│ └───────────────┬───────────────┘ Code generation │ +│ │ Budget: 0.50 DIEM │ +│ │ │ +│ ┌────────────────────────────┴────────────────────────────┐ │ +│ │ TIER 1 │ │ +│ │ WORKHORSES │ │ +│ │ 78% │ │ +│ │ Grok-41-Fast: 0.19 DIEM/1M | Grok-Code: 0.07 DIEM/1M │ │ +│ │ Qwen-Instruct: 0.06 DIEM/1M │ │ +│ │ Routing, Quick checks, Bulk processing, PM tasks │ │ +│ │ Budget: Remaining (~7.45 DIEM/day) │ │ +│ └─────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +### 2.2 Model Selection Matrix + +| Task Type | Primary Model | Fallback | Cost/Call | Rationale | +|-----------|---------------|----------|-----------|-----------| +| **Routing/Dispatch** | Grok-41-Fast | Qwen-Instruct | 0.002 | Cheapest with cache | +| **PM Coordination** | Grok-41-Fast | MiniMax | 0.003 | Simple decisions | +| **Code Generation** | Grok-Code | Grok-41-Fast | 0.005 | 67% cache hits | +| **Code Review** | Grok-41-Fast | DeepSeek | 0.004 | Good reasoning | +| **Bulk Processing** | Qwen-Instruct | Grok-41-Fast | 0.001 | 0.06 DIEM/1M | +| **Complex Reasoning** | Kimi-K2 | Qwen-Thinking | 0.012 | Thinking models | +| **Architecture** | Claude-Opus | Kimi-K2 | 0.054 | Only when needed | +| **Security Review** | Claude-Opus | - | 0.054 | Non-negotiable | + +### 2.3 Model Selection Logic + +```python +from enum import Enum +from dataclasses import dataclass +from typing import Optional + +class TaskComplexity(Enum): + TRIVIAL = 1 # Routing, yes/no decisions + SIMPLE = 2 # Single-step tasks + MODERATE = 3 # Multi-step, needs context + COMPLEX = 4 # Reasoning required + CRITICAL = 5 # Architecture, security + +@dataclass +class ModelConfig: + id: str + input_rate: float # DIEM per 1M tokens + output_rate: float # DIEM per 1M tokens + cache_rate: float # DIEM per 1M tokens (0 if no cache) + max_context: int # Recommended max context + tier: int + +MODELS = { + "qwen-instruct": ModelConfig("qwen3-235b-a22b-instruct-2507", 0.15, 0.27, 0, 32000, 1), + "grok-code": ModelConfig("grok-code-fast-1", 0.25, 1.87, 0.03, 16000, 1), + "grok-fast": ModelConfig("grok-41-fast", 0.50, 1.25, 0.125, 16000, 1), + "deepseek": ModelConfig("deepseek-chat", 0.50, 1.00, 0.20, 32000, 2), + "minimax": ModelConfig("minimax-m21", 0.40, 1.60, 0.04, 32000, 2), + "kimi": ModelConfig("kimi-k2", 0.75, 3.20, 0.375, 32000, 3), + "qwen-thinking": ModelConfig("qwen3-235b-a22b-thinking-2507", 0.45, 3.50, 0, 32000, 3), + "claude": ModelConfig("claude-opus-4-5", 6.00, 30.00, 0, 200000, 4), +} + +def select_model( + task_type: str, + complexity: TaskComplexity, + budget_remaining: float, + context_size: int = 0 +) -> str: + """ + Select optimal model based on task, complexity, and budget. + + Returns Venice model ID. + """ + # Budget gates + if budget_remaining < 0.5: + return MODELS["qwen-instruct"].id # Emergency mode + + if budget_remaining < 2.0: + # Low budget - force Tier 1 + if complexity == TaskComplexity.CRITICAL: + return MODELS["kimi"].id # Downgrade from Claude + return MODELS["grok-fast"].id + + # Task-based selection + selection_map = { + # Task type -> {complexity: model_key} + "routing": { + TaskComplexity.TRIVIAL: "qwen-instruct", + TaskComplexity.SIMPLE: "grok-fast", + TaskComplexity.MODERATE: "grok-fast", + }, + "pm_coordination": { + TaskComplexity.TRIVIAL: "grok-fast", + TaskComplexity.SIMPLE: "grok-fast", + TaskComplexity.MODERATE: "minimax", + TaskComplexity.COMPLEX: "kimi", + }, + "code_generation": { + TaskComplexity.SIMPLE: "grok-code", + TaskComplexity.MODERATE: "grok-code", + TaskComplexity.COMPLEX: "minimax", + TaskComplexity.CRITICAL: "kimi", + }, + "code_review": { + TaskComplexity.SIMPLE: "grok-fast", + TaskComplexity.MODERATE: "grok-fast", + TaskComplexity.COMPLEX: "deepseek", + TaskComplexity.CRITICAL: "claude", + }, + "architecture": { + TaskComplexity.MODERATE: "kimi", + TaskComplexity.COMPLEX: "kimi", + TaskComplexity.CRITICAL: "claude", + }, + "security": { + TaskComplexity.SIMPLE: "grok-fast", + TaskComplexity.MODERATE: "kimi", + TaskComplexity.COMPLEX: "claude", + TaskComplexity.CRITICAL: "claude", + }, + } + + task_map = selection_map.get(task_type, {}) + model_key = task_map.get(complexity, "grok-fast") + + return MODELS[model_key].id + + +def estimate_cost(model_key: str, input_tokens: int, output_tokens: int, cache_tokens: int = 0) -> float: + """Estimate DIEM cost for a completion.""" + model = MODELS[model_key] + + input_cost = ((input_tokens - cache_tokens) / 1_000_000) * model.input_rate + cache_cost = (cache_tokens / 1_000_000) * model.cache_rate + output_cost = (output_tokens / 1_000_000) * model.output_rate + + return input_cost + cache_cost + output_cost +``` + +--- + +## 3. Context Management + +### 3.1 The Context Problem + +Your billing data reveals context size is the **primary cost driver**: + +- Kimi calls at 48K tokens: **0.040 DIEM each** +- Kimi calls at 5K tokens: **0.006 DIEM each** (6.7× cheaper) +- Grok calls at 5K tokens: **0.003 DIEM each** + +**Every 10K tokens of unnecessary context costs ~0.005-0.015 DIEM.** + +### 3.2 Context Budget by Role + +```python +# Maximum context tokens per NPE role +CONTEXT_LIMITS = { + "orchestrator": 4_000, # Minimal - just state and decisions + "pm": 6_000, # Moderate - task list and status + "coder": 12_000, # Larger - needs code context + "reviewer": 8_000, # Moderate - diff + surrounding code + "editorial": 10_000, # Article + guidelines +} + +# Warning thresholds (emit warning in logs) +CONTEXT_WARNINGS = { + "orchestrator": 3_000, + "pm": 5_000, + "coder": 10_000, + "reviewer": 6_000, + "editorial": 8_000, +} +``` + +### 3.3 Context Management Strategy + +```python +from typing import Optional +import tiktoken + +class ContextManager: + """ + Aggressive context pruning for cost control. + + Strategy: + 1. Always keep: system prompt, last 2 user messages, last assistant response + 2. Summarize: everything older than 3 exchanges + 3. Prune: tool outputs older than 2 exchanges + 4. Compress: code blocks to signatures only (in summaries) + """ + + def __init__(self, model: str = "grok-fast"): + self.encoder = tiktoken.get_encoding("cl100k_base") + self.summary_model = model # Use cheap model for summaries + + def count_tokens(self, text: str) -> int: + """Count tokens in text.""" + return len(self.encoder.encode(text)) + + def count_messages(self, messages: list[dict]) -> int: + """Count total tokens in message list.""" + total = 0 + for msg in messages: + total += self.count_tokens(msg.get("content", "")) + # Add overhead for role, etc. + total += 4 + return total + + async def prepare_context( + self, + role: str, + system_prompt: str, + messages: list[dict], + force_limit: Optional[int] = None + ) -> tuple[str, list[dict]]: + """ + Prune context to fit within role's budget. + + Returns: (possibly_modified_system_prompt, pruned_messages) + """ + limit = force_limit or CONTEXT_LIMITS.get(role, 8_000) + warning = CONTEXT_WARNINGS.get(role, limit - 1000) + + system_tokens = self.count_tokens(system_prompt) + message_tokens = self.count_messages(messages) + total = system_tokens + message_tokens + + if total <= limit: + if total > warning: + print(f"⚠️ Context at {total} tokens (warning: {warning})") + return system_prompt, messages + + print(f"🔄 Context pruning: {total} -> {limit} tokens") + + # Strategy 1: Keep essential messages + essential = [] + essential_tokens = 0 + + # Always keep last 3 messages (2 user + 1 assistant typically) + for msg in messages[-3:]: + essential.append(msg) + essential_tokens += self.count_tokens(msg.get("content", "")) + 4 + + remaining_budget = limit - system_tokens - essential_tokens - 500 # Buffer for summary + + if remaining_budget < 200: + # Can't fit summary, just use essential + return system_prompt, essential + + # Strategy 2: Summarize older messages + old_messages = messages[:-3] + if old_messages: + summary = await self._summarize_messages(old_messages, remaining_budget) + + # Prepend summary as system context + augmented_system = f"{system_prompt}\n\n## Previous Context Summary\n{summary}" + return augmented_system, essential + + return system_prompt, essential + + async def _summarize_messages(self, messages: list[dict], max_tokens: int) -> str: + """ + Summarize old messages into compact form. + Uses Grok for cheap summarization (~0.001 DIEM). + """ + # Build summary request + content_parts = [] + for msg in messages[-10:]: # Last 10 messages max + role = msg.get("role", "unknown") + text = msg.get("content", "")[:1000] # Truncate each + content_parts.append(f"{role}: {text}") + + content = "\n---\n".join(content_parts) + + summary_prompt = f"""Summarize this conversation in {max_tokens // 4} words or less. +Focus on: decisions made, current task, blockers, key context. +Do NOT include pleasantries or meta-commentary. + +Conversation: +{content[:4000]} + +Summary:""" + + # Call cheap model for summary + response = await venice_completion( + model=self.summary_model, + messages=[{"role": "user", "content": summary_prompt}], + max_tokens=min(max_tokens, 500) + ) + + return response.strip() + + def compress_code_context(self, code: str, max_lines: int = 50) -> str: + """ + Compress code to essential structure for context. + Keeps signatures, docstrings, removes implementation. + """ + lines = code.split("\n") + + if len(lines) <= max_lines: + return code + + compressed = [] + in_function = False + brace_depth = 0 + + for line in lines: + stripped = line.strip() + + # Always keep: imports, class/function definitions, docstrings + if any(stripped.startswith(kw) for kw in ["import ", "from ", "class ", "def ", "async def ", '"""', "'''"]): + compressed.append(line) + if stripped.startswith(("def ", "async def ", "class ")): + in_function = True + elif in_function and stripped.startswith(('"""', "'''")): + compressed.append(line) + if stripped.count('"""') == 2 or stripped.count("'''") == 2: + compressed.append(" # ... implementation ...") + in_function = False + elif stripped == "": + compressed.append("") + + return "\n".join(compressed) +``` + +### 3.4 Cache Optimization + +Your Grok-Code's 67.8% cache hit rate shows what's achievable: + +```python +class CacheOptimizer: + """ + Maximize Venice's prompt caching for cost savings. + + Venice caches the PREFIX of prompts. To maximize hits: + 1. Put static content (system prompt) FIRST + 2. Put stable context (project info) SECOND + 3. Put variable content (current task) LAST + """ + + @staticmethod + def build_cacheable_prompt( + system_prompt: str, + project_context: str, + task_context: str, + user_message: str + ) -> list[dict]: + """ + Build message list optimized for cache hits. + + Structure: + 1. System prompt (static) - CACHED after first call + 2. Project context as system addendum - CACHED if unchanged + 3. Task context as assistant message - Varies + 4. User message - Always new + """ + messages = [ + { + "role": "system", + "content": f"{system_prompt}\n\n## Project Context\n{project_context}" + } + ] + + if task_context: + messages.append({ + "role": "assistant", + "content": f"Current task context:\n{task_context}" + }) + + messages.append({ + "role": "user", + "content": user_message + }) + + return messages + + @staticmethod + def batch_similar_tasks(tasks: list[dict]) -> list[list[dict]]: + """ + Group tasks by system prompt and project to maximize cache hits. + + Running 5 code reviews for the same project sequentially + means prompts 2-5 get ~75% cache hits on the system prompt. + """ + batches = {} + + for task in tasks: + key = (task.get("system_prompt_hash"), task.get("project_id")) + if key not in batches: + batches[key] = [] + batches[key].append(task) + + return list(batches.values()) +``` + +--- + +## 4. Tool Architecture + +### 4.1 Current Tools + +| Tool | Version | Purpose | Used By | +|------|---------|---------|---------| +| `gitea_dev` | 1.1.0 | File ops, branches, PRs, issues | Coder NPEs | +| `gitea_admin` | 1.1.0 | Teams, permissions, org management | PM NPEs | +| `venice_info` | 1.0.0 | Model discovery, cost tracking | All NPEs | +| `editorial_pipeline` | 1.0.0 | Content creation workflow | Editorial NPEs | + +### 4.2 Required New Tools + +#### 4.2.1 Cost Tracker Tool (`cost_tracker.py`) + +**Purpose:** Real-time cost monitoring and budget enforcement. + +```python +class Valves(BaseModel): + VENICE_API_KEY: str = Field(default="", description="Venice API key") + DAILY_BUDGET: float = Field(default=8.1, description="Daily DIEM budget") + AUTOMATION_RESERVE: float = Field(default=0.5, description="Reserve for automation") + ESCALATION_RESERVE: float = Field(default=0.1, description="Reserve for Claude escalations") + WARNING_THRESHOLD: float = Field(default=0.8, description="Warn at this % of budget") + +class Tools: + async def get_balance(self) -> str: + """Get current Venice DIEM balance.""" + + async def get_remaining_today(self) -> str: + """Get remaining budget for today (resets 19:00 EST).""" + + async def estimate_cost(self, model: str, input_tokens: int, output_tokens: int) -> str: + """Estimate DIEM cost for a completion.""" + + async def can_afford(self, estimated_cost: float, include_reserve: bool = True) -> str: + """Check if operation fits within budget.""" + + async def record_cost(self, amount: float, model: str, npe_id: str, project_id: str = None) -> str: + """Record actual cost after operation.""" + + async def get_daily_report(self) -> str: + """Get today's spend breakdown by model and NPE.""" + + async def check_budget_alerts(self) -> str: + """Check for budget warnings and return any alerts.""" +``` + +#### 4.2.2 Project Manager Tool (`project_manager.py`) + +**Purpose:** Manage development projects and work items. + +```python +class Tools: + # Project CRUD + async def create_project(self, name: str, description: str, daily_budget: float = 1.0) -> str + async def get_project(self, project_id: str) -> str + async def list_projects(self, status: str = "active") -> str + async def update_project(self, project_id: str, **updates) -> str + + # Work Item Management + async def create_work_item(self, project_id: str, title: str, item_type: str, assigned_model: str = None) -> str + async def get_work_item(self, item_id: str) -> str + async def list_work_items(self, project_id: str, status: str = "open") -> str + async def update_work_item(self, item_id: str, **updates) -> str + async def add_comment(self, item_id: str, comment: str, author: str) -> str + + # Budget Tracking + async def get_project_budget(self, project_id: str) -> str + async def record_project_expense(self, project_id: str, amount: float, description: str) -> str +``` + +#### 4.2.3 NPE Manager Tool (`npe_manager.py`) + +**Purpose:** Create and manage NPE identities. + +```python +class Tools: + # NPE Lifecycle + async def create_npe(self, name: str, role: str, model: str, persona: str, tools: list[str]) -> str + async def get_npe(self, npe_id: str) -> str + async def list_npes(self, role: str = None, status: str = "active") -> str + async def update_npe(self, npe_id: str, **updates) -> str + async def deactivate_npe(self, npe_id: str) -> str + + # Activity Tracking + async def get_npe_activity(self, npe_id: str, days: int = 7) -> str + async def get_npe_cost_report(self, npe_id: str, period: str = "today") -> str +``` + +#### 4.2.4 Workflow Engine Tool (`workflow_engine.py`) + +**Purpose:** Execute and monitor multi-step workflows. + +```python +class Tools: + # Workflow Execution + async def start_workflow(self, workflow_type: str, params: dict, project_id: str = None) -> str + async def get_workflow_status(self, workflow_id: str) -> str + async def complete_step(self, workflow_id: str, step_id: str, result: dict) -> str + async def fail_step(self, workflow_id: str, step_id: str, error: str) -> str + + # Circuit Breaker + async def check_circuit(self, workflow_id: str) -> str + async def trip_circuit(self, workflow_id: str, reason: str) -> str + async def reset_circuit(self, workflow_id: str) -> str + + # Escalation + async def escalate(self, workflow_id: str, reason: str, to_model: str = "claude-opus-4-5") -> str +``` + +### 4.3 Tool Permission Matrix + +| Tool | Orchestrator | PM | Coder | Reviewer | +|------|:------------:|:--:|:-----:|:--------:| +| cost_tracker | ✓ | R | - | - | +| project_manager | ✓ | ✓ | R | R | +| npe_manager | ✓ | - | - | - | +| workflow_engine | ✓ | ✓ | - | - | +| gitea_dev (read) | ✓ | ✓ | ✓ | ✓ | +| gitea_dev (write) | - | - | ✓ | - | +| gitea_admin | ✓ | ✓ | - | - | +| venice_info | ✓ | ✓ | ✓ | ✓ | + +--- + +## 5. NPE Personas & Roles + +### 5.1 NPE Identity Structure + +```python +@dataclass +class NPEIdentity: + # Core Identity + id: str # e.g., "npe-pm-main" + name: str # e.g., "Project Manager - Main" + role: str # orchestrator, pm, coder, reviewer + status: str # active, suspended, archived + + # Model Configuration + base_model: str # Venice model ID + tier: int # 1-4 based on cost + + # Context Limits + max_context: int # Max input tokens + target_output: int # Target output tokens + + # Budget + daily_budget: float # DIEM limit per day + spent_today: float # Running total + + # Tools + enabled_tools: list[str] # Tool IDs this NPE can use +``` + +### 5.2 Orchestrator Persona + +**ID:** `npe-orchestrator` +**Model:** `grok-41-fast` (Tier 1) +**Cost:** ~0.003 DIEM/call +**Context Limit:** 4,000 tokens + +```markdown +# System Prompt: Orchestrator NPE + +You are the Orchestrator, responsible for coordinating all automated development work. + +## Core Responsibilities +1. Receive triggers from cron jobs and webhooks +2. Route work to appropriate NPEs +3. Monitor workflow progress +4. Handle escalations +5. Manage budget allocation + +## Constraints +- You do NOT perform work yourself +- You MUST check budget before spawning work +- You MUST use structured JSON for all outputs +- You MUST keep context under 4,000 tokens + +## Output Format +All outputs must be valid JSON: + +### Spawn Work +{ + "action": "spawn_workflow", + "workflow_type": "code_review|feature|bugfix", + "project_id": "string", + "assigned_npe": "npe-id", + "budget_limit": 0.5, + "priority": "high|medium|low" +} + +### Route Escalation +{ + "action": "escalate", + "workflow_id": "string", + "reason": "string", + "to_model": "claude-opus-4-5", + "context_summary": "string (max 500 words)" +} + +### Budget Check +{ + "action": "budget_check", + "remaining": 5.5, + "can_proceed": true, + "warnings": [] +} + +## Decision Rules +1. If remaining budget < 0.5 DIEM: STOP all non-critical work +2. If task is security-related: Route to Claude +3. If task is simple routing: Do it yourself (no spawn needed) +4. If stuck for > 30 minutes: Escalate +``` + +### 5.3 PM Persona + +**ID:** `npe-pm-{project}` +**Model:** `grok-41-fast` (Tier 1) +**Cost:** ~0.003 DIEM/call +**Context Limit:** 6,000 tokens + +```markdown +# System Prompt: Project Manager NPE + +You are a Project Manager responsible for coordinating development work. + +## Core Responsibilities +1. Break down requirements into work items +2. Assign work to Coder NPEs +3. Review completed work +4. Track progress and budget + +## Constraints +- You do NOT write code +- You do NOT modify files +- You MUST check project budget before assigning work +- You MUST use structured JSON for work assignments + +## Work Assignment Format +{ + "action": "assign_work", + "work_item": { + "id": "WI-{timestamp}", + "title": "Brief title", + "description": "Requirements in 200 words or less", + "type": "feature|bugfix|refactor", + "assigned_to": "npe-coder-{specialty}", + "estimated_tokens": 5000, + "files_to_modify": ["path/to/file.py"], + "acceptance_criteria": ["criterion 1"] + } +} + +## Review Format +{ + "action": "review_complete", + "work_item_id": "WI-xxx", + "verdict": "approve|request_changes|escalate", + "feedback": "string (50 words max)" +} +``` + +### 5.4 Coder Persona + +**ID:** `npe-coder-{specialty}` +**Model:** `grok-code-fast-1` (Tier 1) +**Cost:** ~0.005 DIEM/call +**Context Limit:** 12,000 tokens + +```markdown +# System Prompt: Coder NPE + +You are a Coder responsible for implementing assigned work items. + +## Core Responsibilities +1. Read work item requirements +2. Examine existing code +3. Implement changes +4. Commit via Gitea tool + +## Constraints +- You ONLY work on assigned items +- You do NOT make architectural decisions +- You MUST follow existing code style +- You MUST output structured JSON for commits + +## Code Output Format +{ + "action": "commit_changes", + "work_item_id": "WI-xxx", + "changes": [ + { + "file_path": "src/module/file.py", + "action": "create|update|delete", + "content": "full file content", + "description": "what this change does (20 words max)" + } + ], + "commit_message": "feat: description", + "ready_for_review": true +} + +## When Stuck +{ + "action": "request_help", + "work_item_id": "WI-xxx", + "blocker": "description (50 words max)", + "attempted": ["approach 1", "approach 2"] +} +``` + +### 5.5 Reviewer Persona + +**ID:** `npe-reviewer-{specialty}` +**Model:** `grok-41-fast` (Tier 1) +**Cost:** ~0.004 DIEM/call +**Context Limit:** 8,000 tokens + +```markdown +# System Prompt: Code Reviewer NPE + +You are a Code Reviewer responsible for ensuring code quality. + +## Core Responsibilities +1. Review code changes +2. Check for bugs, security issues, style violations +3. Provide actionable feedback +4. Approve or request changes + +## Constraints +- You do NOT modify code +- You do NOT approve your own changes +- You MUST be specific and actionable +- You MUST output structured JSON + +## Review Output Format +{ + "action": "review_complete", + "work_item_id": "WI-xxx", + "verdict": "approve|request_changes|escalate", + "summary": "one line summary", + "issues": [ + { + "severity": "critical|major|minor", + "file": "path", + "line": 42, + "issue": "what's wrong", + "fix": "how to fix" + } + ], + "security_concerns": [], + "approved": true|false +} + +## Escalation Triggers +- Security vulnerability +- Architectural concern +- >3 major issues +``` + +--- + +## 6. Cron & Scheduling + +### 6.1 Architecture: Hybrid Scheduler + +Based on your usage patterns, I recommend a **hybrid scheduler**: + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ SCHEDULER ARCHITECTURE │ +├─────────────────────────────────────────────────────────────────────┤ +│ │ +│ KUBERNETES CRONJOBS │ +│ ═══════════════════ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────┐ │ +│ │ MASTER SCHEDULER (Every 15 min) │ │ +│ │ │ │ +│ │ • Health check all systems │ │ +│ │ • Process pending triggers │ │ +│ │ • Check for stuck workflows │ │ +│ │ • Route escalations │ │ +│ │ │ │ +│ │ Cost: ~0.003 DIEM × 4/hour = 0.012 DIEM/hour │ │ +│ └─────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ +│ │ BURN WINDOW │ │ DAILY │ │ WEEKLY │ │ +│ │ (18:45 EST) │ │ (00:00 UTC) │ │ (Sun 06:00) │ │ +│ │ │ │ │ │ │ │ +│ │ Use surplus │ │ Cleanup │ │ Full report │ │ +│ │ before reset │ │ Archive │ │ Cost analysis │ │ +│ │ │ │ Reset budgets │ │ │ │ +│ └───────────────┘ └───────────────┘ └───────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +### 6.2 Master Scheduler Script + +```python +#!/usr/bin/env python3 +""" +Master Scheduler for NPE Orchestration. +Runs every 15 minutes via Kubernetes CronJob. +""" + +import asyncio +import os +from datetime import datetime, timezone +from typing import Optional + +import httpx + +# Configuration +OWUI_URL = os.environ["OWUI_URL"] +OWUI_TOKEN = os.environ["OWUI_TOKEN"] +ORCHESTRATOR_CHAT_ID = os.environ.get("ORCHESTRATOR_CHAT_ID") +EST_OFFSET = -5 # EST timezone offset + + +async def main(): + """Main scheduler loop.""" + async with httpx.AsyncClient( + base_url=OWUI_URL, + headers={"Authorization": f"Bearer {OWUI_TOKEN}"}, + timeout=60.0 + ) as client: + + now = datetime.now(timezone.utc) + hour_est = (now.hour + EST_OFFSET) % 24 + + # 1. Always: Health check + health = await check_system_health(client) + if not health["ok"]: + await alert_admin(client, f"System unhealthy: {health['issues']}") + return + + # 2. Always: Budget check + budget = await get_budget_status(client) + log_budget(budget) + + if budget["remaining"] < 0.5: + await alert_admin(client, f"Budget critical: {budget['remaining']:.2f} DIEM") + # Don't stop - still need to process escalations + + # 3. Conditional: Process work based on hour + if 22 <= hour_est or hour_est < 7: + # Night automation window (22:00 - 07:00 EST) + await process_automation_queue(client, budget) + elif hour_est == 18 and now.minute >= 45: + # Burn window (18:45 - 19:00 EST) + await run_burn_window(client, budget) + else: + # Daytime - only process high-priority triggers + await process_high_priority_only(client, budget) + + # 4. Always: Check for stuck workflows + stuck = await find_stuck_workflows(client, max_age_minutes=30) + for workflow in stuck: + await handle_stuck_workflow(client, workflow) + + # 5. Always: Route pending escalations + escalations = await get_pending_escalations(client) + for escalation in escalations: + await route_escalation(client, escalation, budget) + + +async def check_system_health(client: httpx.AsyncClient) -> dict: + """Verify all system components are operational.""" + issues = [] + + # Check Open WebUI + try: + resp = await client.get("/health") + if resp.status_code != 200: + issues.append("Open WebUI unhealthy") + except Exception as e: + issues.append(f"Open WebUI unreachable: {e}") + + # Check Venice balance + try: + resp = await client.get("/api/v1/venice/balance") + balance = resp.json().get("balance", 0) + if balance < 0.1: + issues.append(f"Venice balance critical: {balance}") + except Exception as e: + issues.append(f"Venice check failed: {e}") + + return {"ok": len(issues) == 0, "issues": issues} + + +async def get_budget_status(client: httpx.AsyncClient) -> dict: + """Get current budget status.""" + # Calculate time until reset (19:00 EST = 00:00 UTC) + now = datetime.now(timezone.utc) + hours_until_reset = (24 - now.hour) % 24 + + try: + resp = await client.get("/api/v1/venice/balance") + data = resp.json() + remaining = data.get("balance", 0) + + # Get today's spend from cost tracker + spend_resp = await client.get("/api/v1/cost-tracker/today") + spent_today = spend_resp.json().get("total", 0) + except Exception: + remaining = 8.1 # Assume full budget on error + spent_today = 0 + + return { + "remaining": remaining, + "spent_today": spent_today, + "hours_until_reset": hours_until_reset, + "automation_reserve": 0.5, + "escalation_reserve": 0.1, + "available_for_work": remaining - 0.6 # reserves + } + + +async def run_burn_window(client: httpx.AsyncClient, budget: dict): + """ + Use surplus DIEM before 19:00 EST reset. + + ROI: 0.10 DIEM spend can utilize 1.0+ DIEM that would be lost. + """ + surplus = budget["remaining"] - 2.0 # Keep 2.0 DIEM for tomorrow morning + + if surplus < 0.10: + print(f"No surplus to burn: {budget['remaining']:.2f} DIEM") + return + + print(f"Burn window: {surplus:.2f} DIEM surplus available") + + tasks = [] + + # Priority 1: Summarize active workflows (saves context tomorrow) + if surplus >= 0.05: + tasks.append(("summarize_workflows", 0.05)) + surplus -= 0.05 + + # Priority 2: Pre-plan tomorrow's tasks + if surplus >= 0.08: + tasks.append(("pre_plan", 0.08)) + surplus -= 0.08 + + # Priority 3: Run pending reviews + if surplus >= 0.05: + tasks.append(("pending_reviews", surplus)) + + for task, budget_limit in tasks: + await dispatch_burn_task(client, task, budget_limit) + + +async def process_automation_queue(client: httpx.AsyncClient, budget: dict): + """Process automation tasks during night window.""" + if budget["available_for_work"] < 0.1: + print("Budget too low for automation") + return + + # Get pending automation tasks + triggers = await get_pending_triggers(client) + + for trigger in triggers: + # Estimate cost + estimated = estimate_trigger_cost(trigger) + + if estimated > budget["available_for_work"]: + print(f"Skipping {trigger['type']}: cost {estimated} > available {budget['available_for_work']}") + continue + + await dispatch_trigger(client, trigger) + budget["available_for_work"] -= estimated + + +def estimate_trigger_cost(trigger: dict) -> float: + """Estimate DIEM cost for a trigger.""" + costs = { + "code_review": 0.015, # Grok review + routing + "feature_request": 0.050, # PM + Coder + Review + "bug_fix": 0.030, # Triage + Fix + Review + "cleanup": 0.005, # Simple Grok task + "health_check": 0.003, # Minimal + } + return costs.get(trigger.get("type"), 0.010) + + +if __name__ == "__main__": + asyncio.run(main()) +``` + +### 6.3 Cron Schedule (Kubernetes) + +```yaml +# Master Scheduler - Every 15 minutes +apiVersion: batch/v1 +kind: CronJob +metadata: + name: npe-master-scheduler + namespace: open-webui +spec: + schedule: "*/15 * * * *" + concurrencyPolicy: Forbid + jobTemplate: + spec: + template: + spec: + containers: + - name: scheduler + image: python:3.11-slim + command: ["python", "/scripts/master_scheduler.py"] + envFrom: + - secretRef: + name: npe-secrets + restartPolicy: OnFailure + +--- +# Burn Window - 18:45 EST (23:45 UTC) +apiVersion: batch/v1 +kind: CronJob +metadata: + name: npe-burn-window +spec: + schedule: "45 23 * * *" + jobTemplate: + spec: + template: + spec: + containers: + - name: burner + image: python:3.11-slim + command: ["python", "/scripts/burn_window.py"] + +--- +# Daily Maintenance - 00:00 UTC (19:00 EST - after reset) +apiVersion: batch/v1 +kind: CronJob +metadata: + name: npe-daily-maintenance +spec: + schedule: "0 0 * * *" + jobTemplate: + spec: + template: + spec: + containers: + - name: maintenance + image: python:3.11-slim + command: ["python", "/scripts/daily_maintenance.py"] +``` + +--- + +## 7. Workflow Patterns + +### 7.1 Code Review Workflow + +**Trigger:** Gitea webhook on PR create/update +**Cost:** ~0.015 DIEM +**Duration:** 2-5 minutes + +``` +┌─────────┐ ┌──────────────┐ ┌──────────────┐ ┌─────────┐ +│ START │───▶│ Load Context │───▶│ Review │───▶│ Verdict │ +│ │ │ (Grok, 0.003)│ │ (Grok, 0.008)│ │ │ +└─────────┘ └──────────────┘ └──────────────┘ └────┬────┘ + │ + ┌────────────────────────────────────────┤ + │ │ │ + ▼ ▼ ▼ + ┌──────────┐ ┌───────────┐ ┌───────────┐ + │ APPROVE │ │ REQUEST │ │ ESCALATE │ + │ │ │ CHANGES │ │ to Claude │ + │ Post │ │ │ │ (0.054) │ + │ Comment │ │ Post │ └───────────┘ + │ (0.002) │ │ Feedback │ + └──────────┘ │ (0.002) │ + └───────────┘ +``` + +### 7.2 Feature Development Workflow + +**Trigger:** Issue with label "feature" +**Cost:** ~0.050 DIEM +**Duration:** 15-30 minutes + +``` +┌─────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ +│ START │───▶│ PM: Analyze │───▶│ PM: Breakdown│───▶│ Coder: │ +│ │ │ (Grok, 0.003)│ │ (Grok, 0.003)│ │ Implement │ +└─────────┘ └──────────────┘ └──────────────┘ │ (Grok, 0.015)│ + └──────┬───────┘ + │ + ┌──────────────────────────────────────────┤ + │ │ + ▼ ▼ + ┌───────────┐ ┌───────────┐ + │ Review │◀────── Revision Loop ──────│ FAILED │ + │ (0.008) │ (max 3x) │ │ + └─────┬─────┘ └───────────┘ + │ + ▼ + ┌───────────┐ + │ Create PR │ + │ (0.003) │ + └───────────┘ +``` + +### 7.3 Workflow State Machine + +```python +from enum import Enum +from dataclasses import dataclass, field +from datetime import datetime +from typing import Optional + +class WorkflowState(Enum): + PENDING = "pending" + RUNNING = "running" + WAITING_INPUT = "waiting_input" + STEP_FAILED = "step_failed" + ESCALATED = "escalated" + COMPLETED = "completed" + FAILED = "failed" + +@dataclass +class CircuitBreaker: + max_failures: int = 3 + failure_count: int = 0 + last_failure: Optional[datetime] = None + cooldown_seconds: int = 300 + + def record_failure(self): + self.failure_count += 1 + self.last_failure = datetime.now() + + def is_open(self) -> bool: + if self.failure_count >= self.max_failures: + if self.last_failure: + elapsed = (datetime.now() - self.last_failure).seconds + if elapsed < self.cooldown_seconds: + return True + # Reset after cooldown + self.failure_count = 0 + return False + + def should_escalate(self) -> bool: + return self.failure_count >= (self.max_failures - 1) + +@dataclass +class Workflow: + id: str + type: str + state: WorkflowState = WorkflowState.PENDING + current_step: str = "" + assigned_npe: str = "" + project_id: Optional[str] = None + budget_limit: float = 1.0 + budget_spent: float = 0.0 + circuit: CircuitBreaker = field(default_factory=CircuitBreaker) + steps_completed: list = field(default_factory=list) + created_at: datetime = field(default_factory=datetime.now) + updated_at: datetime = field(default_factory=datetime.now) + + def can_proceed(self) -> tuple[bool, str]: + if self.circuit.is_open(): + return False, "Circuit breaker open" + if self.budget_spent >= self.budget_limit: + return False, "Budget exhausted" + return True, "OK" + + def record_cost(self, amount: float): + self.budget_spent += amount + self.updated_at = datetime.now() + + def complete_step(self, step_id: str, result: dict): + self.steps_completed.append({ + "step_id": step_id, + "completed_at": datetime.now().isoformat(), + "result": result + }) + self.updated_at = datetime.now() + + def fail_step(self, step_id: str, error: str): + self.circuit.record_failure() + self.state = WorkflowState.STEP_FAILED + self.updated_at = datetime.now() +``` + +--- + +## 8. Cost Management + +### 8.1 Budget Allocation (Based on Actual Data) + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ DAILY BUDGET ALLOCATION │ +│ (8.1 DIEM total) │ +├─────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌───────────────────────────────────────────────────────────────┐ │ +│ │ INTERACTIVE WORK 5.50 DIEM │ │ +│ │ (68% of budget) │ │ +│ │ │ │ +│ │ Claude sessions (1-2/day) ~1.00 DIEM │ │ +│ │ Grok chat (continuous) ~2.50 DIEM │ │ +│ │ Qwen bulk processing ~1.00 DIEM │ │ +│ │ Image generation ~1.00 DIEM │ │ +│ └───────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────┐ │ +│ │ NPE AUTOMATION 0.25 DIEM │ │ +│ │ (3% of budget) │ │ +│ │ │ │ +│ │ PM checks (12/day × 0.003) ~0.04 DIEM │ │ +│ │ Coder tasks (10/day × 0.005) ~0.05 DIEM │ │ +│ │ Reviews (8/day × 0.004) ~0.03 DIEM │ │ +│ │ Orchestrator (96/day × 0.001) ~0.10 DIEM │ │ +│ │ Buffer ~0.03 DIEM │ │ +│ └───────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────┐ │ +│ │ RESERVES 0.60 DIEM │ │ +│ │ (7% of budget) │ │ +│ │ │ │ +│ │ Escalation reserve (Claude) ~0.10 DIEM │ │ +│ │ Automation reserve ~0.50 DIEM │ │ +│ └───────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌───────────────────────────────────────────────────────────────┐ │ +│ │ BUFFER / BURN WINDOW 1.75 DIEM │ │ +│ │ (22% of budget) │ │ +│ │ │ │ +│ │ Available for burn window automation if unused │ │ +│ │ Target: Use 80%+ of daily budget │ │ +│ └───────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +### 8.2 Cost Enforcement + +```python +class BudgetEnforcer: + """Enforce budget limits at multiple levels.""" + + def __init__(self, daily_budget: float = 8.1): + self.daily_budget = daily_budget + self.reserves = { + "escalation": 0.10, + "automation": 0.50, + } + + async def can_proceed( + self, + estimated_cost: float, + npe_id: str, + project_id: Optional[str] = None, + use_reserves: bool = False + ) -> tuple[bool, str]: + """Check if operation can proceed within budget.""" + + # Get current balance + balance = await self.get_venice_balance() + + # Calculate available + reserved = sum(self.reserves.values()) if not use_reserves else 0 + available = balance - reserved + + # Check global + if available < estimated_cost: + return False, f"Insufficient budget: {available:.3f} < {estimated_cost:.3f}" + + # Check per-NPE daily limit + npe_spent = await self.get_npe_spent_today(npe_id) + npe_limit = await self.get_npe_daily_limit(npe_id) + + if npe_spent + estimated_cost > npe_limit: + return False, f"NPE budget exceeded: {npe_spent:.3f} + {estimated_cost:.3f} > {npe_limit:.3f}" + + # Check per-project if applicable + if project_id: + project_spent = await self.get_project_spent_today(project_id) + project_limit = await self.get_project_daily_limit(project_id) + + if project_spent + estimated_cost > project_limit: + return False, f"Project budget exceeded" + + return True, "OK" + + async def record_and_verify( + self, + actual_cost: float, + estimated_cost: float, + npe_id: str, + operation: str + ): + """Record cost and check for anomalies.""" + # Record + await self.record_cost(actual_cost, npe_id, operation) + + # Check for cost overrun + if actual_cost > estimated_cost * 1.5: + await self.alert( + f"Cost overrun: {operation} estimated {estimated_cost:.4f}, actual {actual_cost:.4f}" + ) + + # Check for budget warnings + remaining = await self.get_remaining_today() + if remaining < self.daily_budget * 0.2: + await self.alert(f"Budget warning: only {remaining:.2f} DIEM remaining today") +``` + +### 8.3 Automation Cost Projections + +Based on your actual rates: + +| Scenario | Model | Per Call | Calls/Day | Daily Cost | Monthly | +|----------|-------|----------|-----------|------------|---------| +| PM Check | Grok | 0.0021 | 12 | 0.0252 | 0.76 | +| Coder Task | Grok-Code | 0.0050 | 10 | 0.0500 | 1.50 | +| Code Review | Grok | 0.0035 | 8 | 0.0280 | 0.84 | +| Orchestrator | Grok | 0.0010 | 96 | 0.0960 | 2.88 | +| Deep Analysis | Kimi | 0.0120 | 3 | 0.0360 | 1.08 | +| Escalation | Claude | 0.0540 | 1 | 0.0540 | 1.62 | +| **TOTAL** | | | | **0.2892** | **8.68** | + +**Conclusion:** Full NPE automation costs **0.29 DIEM/day** (3.6% of budget), leaving **7.81 DIEM** for interactive work. + +--- + +## 9. Implementation Roadmap + +### Phase 1: Foundation (Week 1) + +**Goal:** Basic infrastructure working +**Budget Impact:** None (setup only) + +- [ ] Deploy corrected Gitea tools (v1.1.0) +- [ ] Create `cost_tracker` tool +- [ ] Set up 2 NPEs manually: + - [ ] Orchestrator (Grok) + - [ ] PM (Grok) +- [ ] Create master scheduler (health check only) +- [ ] Test: Manual trigger → Orchestrator routes to PM + +**Success Criteria:** +- Orchestrator receives triggers +- Cost tracked per operation +- No automation yet - just infrastructure + +### Phase 2: Automation (Week 2) + +**Goal:** Night automation working +**Budget Impact:** +0.05 DIEM/day + +- [ ] Add Coder NPE (Grok-Code) +- [ ] Add Reviewer NPE (Grok) +- [ ] Implement Code Review workflow +- [ ] Enable night automation (22:00-07:00 EST) +- [ ] Test: PR created → auto-review → comment posted + +**Success Criteria:** +- PRs reviewed automatically during night window +- Reviews posted as Gitea comments +- Circuit breaker prevents loops + +### Phase 3: Full Workflows (Week 3-4) + +**Goal:** Complete workflow coverage +**Budget Impact:** +0.20 DIEM/day + +- [ ] Create `project_manager` tool +- [ ] Create `workflow_engine` tool +- [ ] Implement Feature Development workflow +- [ ] Implement Bug Fix workflow +- [ ] Enable burn window automation +- [ ] Test: Full feature cycle from issue to PR + +**Success Criteria:** +- Features developed from issue to merged PR +- Budget tracked per project +- Burn window uses surplus effectively + +### Phase 4: Escalation (Week 5) + +**Goal:** Claude integration for complex cases +**Budget Impact:** +0.05 DIEM/day + +- [ ] Implement escalation paths +- [ ] Create context compression for Claude calls +- [ ] Add security review workflow +- [ ] Test: Complex review → escalate → Claude response + +**Success Criteria:** +- Escalation triggers when needed +- Claude calls stay under 0.06 DIEM each +- Responses routed back to workflow + +### Phase 5: Optimization (Week 6+) + +**Goal:** Cost optimization and scaling +**Budget Impact:** -0.05 DIEM/day (savings) + +- [ ] Implement cache optimization strategy +- [ ] Add context compression to all NPEs +- [ ] Tune model selection based on success rates +- [ ] Add metrics dashboard +- [ ] Document runbooks + +**Success Criteria:** +- Cache hit rates > 50% for Grok +- System runs 7 days unattended +- Total automation < 0.25 DIEM/day + +--- + +## 10. Open Questions + +### 10.1 Unresolved Decisions + +| Question | Options | Recommendation | Notes | +|----------|---------|----------------|-------| +| State storage | OpenWebUI folders vs. SQLite | OpenWebUI folders | Simpler, no new deps | +| Token rotation | 30 vs 90 days | 90 days | Manual for now | +| Max concurrent workflows | 3 vs 5 vs 10 | 5 | Test and adjust | +| Chat retention | 7 vs 30 vs 90 days | 30 days | Balance audit vs. storage | + +### 10.2 Your Input Needed + +1. **Gitea Webhooks:** How are webhooks exposed? Need ingress path for triggers. + +2. **Claude API Key:** Using Venice's Claude or direct Anthropic? Venice is simpler. + +3. **Multi-file Commits:** Do you need atomic batch commits in gitea_dev? + +4. **Test Execution:** Skip CI/CD for Phase 1-5? Add later? + +5. **Human Approval UI:** Chat-only for now? Dashboard later? + +### 10.3 Known Limitations + +1. **No real-time collaboration** - NPEs work asynchronously +2. **No visual review** - Can't review UI changes +3. **Venice dependency** - All LLM calls through Venice +4. **Single Gitea instance** - No multi-repo federation yet + +--- + +## Appendix A: Quick Reference + +### Model Rates (DIEM per 1M tokens) + +| Model | Input | Output | Cache | Effective | +|-------|-------|--------|-------|-----------| +| Qwen-Instruct | 0.15 | 0.27 | - | 0.06 | +| Grok-Code | 0.25 | 1.87 | 0.03 | 0.07 | +| Grok-41-Fast | 0.50 | 1.25 | 0.125 | 0.19 | +| MiniMax-M21 | 0.40 | 1.60 | 0.04 | 0.22 | +| Kimi-K2 | 0.75 | 3.20 | 0.375 | 0.42 | +| Claude-Opus | 6.00 | 30.00 | - | 5.28 | + +### Context Limits + +| Role | Max Tokens | Warning At | +|------|------------|------------| +| Orchestrator | 4,000 | 3,000 | +| PM | 6,000 | 5,000 | +| Coder | 12,000 | 10,000 | +| Reviewer | 8,000 | 6,000 | + +### Budget Summary + +| Category | Daily DIEM | % of Budget | +|----------|------------|-------------| +| Interactive | 5.50 | 68% | +| Automation | 0.25 | 3% | +| Reserves | 0.60 | 7% | +| Buffer/Burn | 1.75 | 22% | +| **Total** | **8.10** | **100%** | + +--- + +*Document Status: RFC v2.0 - Based on 30-day billing analysis* +*Last Updated: January 11, 2026* +*Data Source: 13,582 Venice.ai transactions* \ No newline at end of file