# Cost Management Rules & Model Routing

**Generated:** May 7, 2026  
**Purpose:** Cost control framework, token budgeting, and model selection strategy

---

## **COST STRUCTURE OVERVIEW**

### **Model Pricing (Anthropic)**

| Model | Input | Output | Use Case |
|-------|-------|--------|----------|
| **Haiku** (default) | $0.80/1M tokens | $4.00/1M tokens | Fast, efficient, routine tasks |
| **Sonnet** | $3.00/1M tokens | $15.00/1M tokens | Complex reasoning, security analysis |

**Cost multiplier:** Sonnet is ~4-6x more expensive than Haiku

---

## **DAILY BUDGET SYSTEM**

### **Daily Limits**

```
WARNING THRESHOLD:   $1.00/day
HARD LIMIT:          $3.00/day
```

**Trigger levels:**
- **$0.50-$1.00:** Monitor usage, consider model switch for next task
- **$1.00-$2.00:** Escalate to Haiku-only mode for routine work
- **$2.00-$3.00:** Critical — hold non-essential requests, notify Paul
- **>$3.00:** Hard stop, return error, no further LLM calls until reset

---

## **MONTHLY BUDGET SYSTEM**

### **Monthly Limits**

```
WARNING THRESHOLD:   $10.00/month
HARD LIMIT:          $25.00/month
```

**Check at:** Daily billing summaries, weekly review

---

## **MODEL ROUTING RULES**

### **Default: ALWAYS use Haiku**

Starting model for all tasks. Fast, cheap, usually sufficient.

### **Switch to Sonnet ONLY when:**

1. **Architecture decisions**
   - System design, multi-service planning
   - Infrastructure changes, hosting migration analysis

2. **Production code review**
   - Security analysis, vulnerability assessment
   - Database schema review, data integrity
   - Deployment safety checks

3. **Complex debugging**
   - Multi-component failure analysis
   - Deep reasoning about error chains
   - Edge case handling

4. **Strategic decisions**
   - Multi-project planning and tradeoffs
   - Business logic decisions
   - Long-term roadmap implications

5. **Legal/financial analysis**
   - Contract review (if ever needed)
   - Financial decisions
   - Regulatory compliance

### **When in doubt: Try Haiku first**

If Haiku output is insufficient, you can request Sonnet for follow-up.

---

## **TOKEN BUDGETING**

### **Typical Token Costs (Haiku)**

| Task | Input | Output | Total Cost |
|------|-------|--------|------------|
| Daily system report | 200K | 4K | ~$0.35 |
| 7 AM brief | 100K | 1.5K | ~$0.08 |
| File read/analysis | 50K | 2K | ~$0.04 |
| Web search summary | 100K | 2K | ~$0.08 |
| Research task | 150K | 3K | ~$0.12 |
| Documentation write | 100K | 4K | ~$0.08 |

**Daily cron jobs (both report + brief):** ~$0.43/day × 30 = ~$13/month

### **Cost Optimization**

1. **Batch similar tasks**
   - Don't run 10 separate web searches; combine into one request
   - Process multiple files in single call if possible

2. **Reuse context**
   - Load MEMORY.md once, reference within session
   - Don't reload historical files unnecessarily

3. **Compress prompts**
   - Avoid verbose explanations if task is straightforward
   - Use bullet points instead of paragraphs

4. **Defer non-urgent work**
   - If daily budget approaching limit, queue tasks for next day
   - Don't use Sonnet for "nice to have" features

---

## **RATE LIMITS (Anthropic API)**

### **Request Throttling**

```
Minimum 5 seconds between API calls
Maximum 10 requests per minute
If 429 error (rate limited):
  → STOP
  → Wait 5 minutes
  → Retry
```

### **Batch Limits**

```
Web searches:  5 max per batch, then 2-min break
API calls:     10 max per batch, then 1-min break
```

---

## **DAILY WORKFLOW**

### **Morning (4:30 AM + 7:00 AM)**

**Cron jobs:**
- Daily report: 200K tokens (Haiku) — ~$0.35
- Morning brief: 100K tokens (Haiku) — ~$0.08
- **Subtotal:** ~$0.43

### **Business Hours (Ad-hoc)**

**Paul's requests:**
- Research: Budget 100-150K per task (Haiku)
- Documentation: Budget 100-200K per task (Haiku)
- Code review: Budget 300K (Sonnet, if complexity requires)
- Deployment: Budget 50-100K (Haiku)

**Daily target:** $0.50-$1.00 total

### **Evening**

**Backup jobs** (non-LLM, no cost)
- Database dumps to S3
- Cron scheduling

---

## **ESCALATION PROCEDURE**

### **If daily cost hits warning ($1.00)**

1. Stop accepting new non-critical requests
2. Notify Paul: "Budget approaching daily limit, recommend pausing new work"
3. Resume next calendar day (UTC+10 Brisbane midnight)

### **If monthly cost hits warning ($10.00)**

1. Audit usage patterns
2. Review Sonnet usage — was it necessary?
3. Consider increasing monthly budget if growth justified
4. Recommend to Paul before end of month

### **If hard limit hit ($3.00/day or $25.00/month)**

1. Refuse all LLM-dependent tasks with error message
2. Suggest alternative (web search, file read, document prep)
3. Notify Paul immediately
4. Wait for confirmation before resuming

---

## **SPECIAL CASES**

### **Emergency/Urgent Tasks**

If Paul explicitly requests something urgent:
- Can use Sonnet if safety-critical or time-sensitive
- But still confirm within same message
- Log reason for Sonnet usage in memory

### **Experimentation/Learning**

- Use Haiku for first attempt
- Only use Sonnet if Haiku clearly insufficient
- Document learnings to avoid repeat costs

### **Session Status Checks**

`session_status` command:
- Free (no API cost)
- Use for budget checks, usage monitoring
- Call this instead of asking about costs

---

## **MONITORING & REPORTING**

### **Daily Cost Tracking**

Paul receives daily system report with cost section:
- Today's usage
- Monthly run-rate
- Warnings if threshold approaching
- Budget status

### **Weekly Review**

Suggested: Review actual spending vs. budget mid-week
- Identify trends (if costs rising)
- Adjust routing if needed

### **Monthly Reconciliation**

End of month:
- Review total spend
- Compare to $10 warning / $25 hard limit
- Plan for next month

---

## **COST REDUCTION STRATEGIES**

### **Quick Wins**

1. **Use web_fetch instead of Claude for content extraction**
   - `web_fetch` is free (just downloads HTML)
   - Have Claude summarize if needed (cheaper than having Claude fetch)

2. **Batch requests**
   - Instead of 5 separate "analyze file" calls, do one batch

3. **Leverage stateless operations**
   - File reads, web searches, shell commands = free
   - Only use LLM for analysis/reasoning/creation

4. **Cache context**
   - Read MEMORY.md once per session
   - Reference within session instead of reloading

5. **Use cheaper models for feedback loops**
   - Haiku for outline → Sonnet for final review
   - Not Sonnet → Haiku

---

## **BUDGET SUSTAINABILITY**

**Current run-rate (as of May 2026):**

| Component | Monthly | Notes |
|-----------|---------|-------|
| Cron jobs (daily + brief) | ~$13.00 | Fixed, reliable |
| Ad-hoc requests | ~$5-10 | Varies, Paul-driven |
| **Total** | **~$18-23** | Within $25 hard limit |

**Headroom:** $2-7/month for unexpected tasks

---

## **IF BUDGET NEEDS CHANGE**

If Paul wants to:
- **Increase monthly limit:** Can adjust hard limit up to $50 (quarterly review)
- **Shift to Sonnet-heavy:** Will need $40-50/month budget
- **Add more cron jobs:** Need to model cost impact first

All changes should be documented in MEMORY.md and confirmed by Paul.

---

**Last Updated:** May 7, 2026  
**Note:** Budget limits and rules subject to revision based on actual usage patterns.
