AI 成本优化

AI 功能虽然强大，但频繁使用可能产生较高费用。本文介绍多种优化策略，帮助你在保持效果的同时降低成本。

成本来源分析

API 调用费用

模型	输入价格	输出价格	适合
GPT-4o	$2.5/1M	$10/1M	复杂推理
GPT-4o-mini	$0.15/1M	$0.6/1M	日常使用
Claude 3.5 Sonnet	$3/1M	$15/1M	长文本分析
Claude 3 Haiku	$0.25/1M	$1.25/1M	快速任务
本地模型	免费	免费	隐私优先

隐藏成本

类型	说明
嵌入计算	向量化笔记的成本
Token 浪费	不必要的重复请求
重试开销	失败请求的重试
存储成本	向量数据库存储

模型选择策略

按任务选模型

任务	推荐模型	原因
简单问答	GPT-4o-mini	便宜、够用
笔记摘要	GPT-4o-mini	短文本生成
长文分析	Claude 3.5 Sonnet	长上下文
代码生成	GPT-4o	代码能力强
翻译	GPT-4o-mini	翻译不需要强模型
复杂推理	GPT-4o / Claude 3.5	需要推理能力
标题生成	本地模型	简单任务

级联策略

先尝试小模型，不满意再升级：

text

问题 → GPT-4o-mini（快速、便宜）
         ↓ 不满意
       GPT-4o（更准确、较贵）
         ↓ 仍不满意
       人工处理

请求优化

减少 Token 使用

1. 精简 Prompt

text

# ❌ 冗长
请帮我总结以下笔记的要点，请注意要简洁明了，不要太长，突出重点，使用中文...

# ✅ 精简
总结以下笔记要点，3-5 条，中文：

2. 控制上下文长度

typescript

// 限制传入的笔记长度
function truncateContent(content: string, maxTokens = 2000): string {
  // 简单估算：1 个中文字 ≈ 2 个 token
  const maxChars = maxTokens * 0.5;
  if (content.length > maxChars) {
    return content.slice(0, maxChars) + "\n...(内容已截断)";
  }
  return content;
}

3. 缓存结果

typescript

// 简单的本地缓存
const cache = new Map<string, string>();

async function cachedCompletion(prompt: string): Promise<string> {
  const key = prompt.slice(0, 100); // 简单缓存键
  if (cache.has(key)) {
    return cache.get(key)!;
  }

  const result = await callAPI(prompt);
  cache.set(key, result);
  return result;
}

4. 批量处理

typescript

// ❌ 逐个处理
for (const note of notes) {
  await generateSummary(note); // 每次 API 调用
}

// ✅ 批量处理
const batchPrompt = notes.map((n, i) => `笔记${i + 1}: ${n.slice(0, 200)}`).join("\n---\n");
const summaries = await generateAllSummaries(batchPrompt); // 一次 API 调用

请求频率控制

typescript

// 简单的速率限制器
class RateLimiter {
  private queue: Array<() => Promise<any>> = [];
  private processing = false;
  private minInterval: number;

  constructor(requestsPerMinute: number) {
    this.minInterval = 60000 / requestsPerMinute;
  }

  async add<T>(fn: () => Promise<T>): Promise<T> {
    return new Promise((resolve, reject) => {
      this.queue.push(async () => {
        try {
          const result = await fn();
          resolve(result);
        } catch (error) {
          reject(error);
        }
      });
      this.processQueue();
    });
  }

  private async processQueue() {
    if (this.processing || this.queue.length === 0) return;
    this.processing = true;

    while (this.queue.length > 0) {
      const task = this.queue.shift();
      if (task) {
        await task();
        await new Promise((r) => setTimeout(r, this.minInterval));
      }
    }

    this.processing = false;
  }
}

本地模型部署

Ollama 部署

Ollama 是最简单的本地模型方案：

bash

# 安装 Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 拉取模型
ollama pull llama3.1         # 8B 参数，4.7GB
ollama pull qwen2.5          # 中文优化
ollama pull nomic-embed-text # 嵌入模型

# 运行
ollama run llama3.1

性能与成本对比

方案	月成本	性能	隐私
GPT-4o API	$10-50	⭐⭐⭐⭐⭐	⚠️
GPT-4o-mini API	$1-5	⭐⭐⭐⭐	⚠️
Ollama (8B)	电费	⭐⭐⭐	✅
Ollama (70B)	电费	⭐⭐⭐⭐	✅
混合方案	$2-10	⭐⭐⭐⭐	✅

混合方案

简单任务用本地模型，复杂任务用 API：

typescript

async function smartCompletion(prompt: string, complexity: "low" | "high"): Promise<string> {
  if (complexity === "low") {
    // 简单任务用本地模型
    return await callLocalModel(prompt);
  } else {
    // 复杂任务用 API
    return await callAPI(prompt, "gpt-4o-mini");
  }
}

本地模型推荐

模型	参数量	内存需求	适合
Llama 3.1 8B	8B	8GB	通用
Qwen 2.5 7B	7B	8GB	中文优化
Phi-3 Mini	3.8B	4GB	轻量
Mistral 7B	7B	8GB	代码
Nomic Embed	0.1B	1GB	嵌入

Obsidian 插件成本优化

Copilot 设置优化

设置	建议值	说明
Model	`gpt-4o-mini`	性价比最高
Max Tokens	500-1000	限制输出长度
Temperature	0.3	减少随机性
Context	仅选中文本	减少输入

Smart Connections 设置优化

设置	建议值	说明
Embed Model	`text-embedding-3-small`	便宜
Chat Model	`gpt-4o-mini`	性价比
Chunk Size	800-1000	平衡精度和成本
Top K	3-5	减少上下文

Text Generator 设置优化

设置	建议值	说明
Model	`gpt-4o-mini`	日常使用
Max Tokens	300	控制输出
Template	精简模板	减少 Token

使用习惯优化

1. 避免重复请求

对相同问题复用之前的结果
缓存常用生成内容
使用「重新生成」而非重新提问

2. 预处理内容

typescript

// 预处理笔记，只发送关键内容
function preprocessForAI(content: string): string {
  // 移除 frontmatter
  content = content.replace(/^---[\s\S]*?---\n/, "");
  
  // 移除嵌入（保留引用）
  content = content.replace(/!\[\[([^\]]+)\]\]/g, "[$1]");
  
  // 移除 Dataview 查询块
  content = content.replace(/```dataview[\s\S]*?```/g, "[Dataview Query]");
  
  // 截断过长内容
  if (content.length > 3000) {
    content = content.slice(0, 3000) + "\n...(truncated)";
  }
  
  return content;
}

3. 利用模板减少 Token

markdown

<!-- 简洁的 AI 提示模板 -->
总结以下内容，3 条要点：
{selection}

4. 监控使用量

typescript

// 简单的用量追踪
class UsageTracker {
  private usage: Map<string, { tokens: number; cost: number }> = new Map();

  track(model: string, inputTokens: number, outputTokens: number) {
    const costs: Record<string, { input: number; output: number }> = {
      "gpt-4o": { input: 2.5 / 1_000_000, output: 10 / 1_000_000 },
      "gpt-4o-mini": { input: 0.15 / 1_000_000, output: 0.6 / 1_000_000 },
    };

    const cost = (inputTokens * (costs[model]?.input || 0)) + 
                 (outputTokens * (costs[model]?.output || 0));

    const current = this.usage.get(model) || { tokens: 0, cost: 0 };
    this.usage.set(model, {
      tokens: current.tokens + inputTokens + outputTokens,
      cost: current.cost + cost,
    });
  }

  getReport(): string {
    let report = "AI 使用报告:\n";
    for (const [model, data] of this.usage) {
      report += `- ${model}: ${data.tokens} tokens, $${data.cost.toFixed(4)}\n`;
    }
    return report;
  }
}

成本估算

每日使用成本估算

使用场景	模型	每日调用	月成本
笔记摘要	GPT-4o-mini	10 次	~$0.30
AI 问答	GPT-4o-mini	20 次	~$0.60
翻译	GPT-4o-mini	5 次	~$0.15
嵌入索引	text-embedding-3-small	100 篇	~$0.01
合计			~$1.06

与本地模型对比

方案	月成本	额外需求
纯 API	$1-10	无
纯本地	$0（电费）	8GB+ RAM
混合	$0.5-3	4GB+ RAM

AI 成本优化 ​

成本来源分析 ​

API 调用费用 ​

隐藏成本 ​

模型选择策略 ​

按任务选模型 ​

级联策略 ​

请求优化 ​

减少 Token 使用 ​

请求频率控制 ​

本地模型部署 ​

Ollama 部署 ​

性能与成本对比 ​

混合方案 ​

本地模型推荐 ​

Obsidian 插件成本优化 ​

Copilot 设置优化 ​

Smart Connections 设置优化 ​

Text Generator 设置优化 ​

使用习惯优化 ​

1. 避免重复请求 ​

2. 预处理内容 ​

3. 利用模板减少 Token ​

4. 监控使用量 ​

成本估算 ​

每日使用成本估算 ​

与本地模型对比 ​

相关资源 ​

AI 成本优化

成本来源分析

API 调用费用

隐藏成本

模型选择策略

按任务选模型

级联策略

请求优化

减少 Token 使用

请求频率控制

本地模型部署

Ollama 部署

性能与成本对比

混合方案

本地模型推荐

Obsidian 插件成本优化

Copilot 设置优化

Smart Connections 设置优化

Text Generator 设置优化

使用习惯优化

1. 避免重复请求

2. 预处理内容

3. 利用模板减少 Token

4. 监控使用量

成本估算

每日使用成本估算

与本地模型对比

相关资源