RAG 工作流指南

RAG（Retrieval-Augmented Generation，检索增强生成）是将你的知识库与 AI 结合的最佳方式。本文介绍如何在 Obsidian 中构建 RAG 工作流。

什么是 RAG？

RAG 是一种 AI 应用模式，先从知识库中检索相关内容，再将检索结果作为上下文提供给大语言模型，生成更准确、更相关的回答。

text

用户提问
   ↓
知识库检索 → 找到相关笔记
   ↓
组装 Prompt（提问 + 相关笔记内容）
   ↓
大语言模型生成回答
   ↓
基于你的知识库的精准回答

RAG vs 直接提问

方式	准确性	上下文	幻觉风险
直接问 AI	低	通用	高
RAG 问答	高	你的知识	低

RAG 架构

核心组件

text

┌─────────────────────────────────────────┐
│              RAG 工作流                   │
├─────────────────────────────────────────┤
│                                         │
│  1. 文档处理                             │
│     笔记 → 分块 → 向量化 → 存储          │
│                                         │
│  2. 检索                                │
│     问题 → 向量化 → 相似度搜索 → 结果     │
│                                         │
│  3. 生成                                │
│     问题 + 检索结果 → LLM → 回答         │
│                                         │
└─────────────────────────────────────────┘

向量数据库选择

数据库	特点	适合
ChromaDB	开源、轻量	本地部署
Pinecone	云端、快速	生产环境
Qdrant	高性能、Rust	大规模
LanceDB	嵌入式、零配置	个人项目

使用 Smart Connections

Smart Connections 是 Obsidian 最流行的 RAG 插件。

安装配置

安装 Smart Connections 社区插件
配置嵌入模型（推荐 OpenAI 或本地模型）
等待笔记索引完成

基本使用

打开 Smart Connections 面板
输入问题或选择笔记
查看相关笔记和 AI 回答

配置选项

设置	说明	推荐值
Embed Model	嵌入模型	`text-embedding-3-small`
Chat Model	对话模型	`gpt-4o-mini`
Chunk Size	分块大小	1000
Chunk Overlap	分块重叠	200
Top K	返回结果数	5

使用本地模型

如果不想使用云 API，可以配置本地模型：

安装 Ollama
拉取嵌入模型：ollama pull nomic-embed-text
拉取对话模型：ollama pull llama3
在 Smart Connections 中配置 Ollama 端点

使用 Copilot 插件

Obsidian Copilot 也支持 RAG 模式。

配置 RAG

安装 Copilot 插件
配置 API Key
启用「Vault QA」模式
等待索引完成

使用方式

text

# 在聊天中引用笔记
@笔记名 你的问题

# 搜索整个知识库
@vault 你的问题

# 引用特定文件夹
@folder:Projects 你的问题

自建 RAG 系统

架构设计

text

Obsidian 仓库
    ↓ (文件监听)
文档处理器
    ↓ (分块 + 嵌入)
向量数据库 (ChromaDB)
    ↑ (检索)
API 服务 (FastAPI)
    ↑ (HTTP)
Obsidian 插件 / 聊天界面

后端服务

python

# server.py — RAG 后端服务
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import chromadb
from openai import OpenAI
import os

app = FastAPI()

# 初始化 ChromaDB
chroma_client = chromadb.PersistentClient(path="./chroma_db")
collection = chroma_client.get_or_create_collection("obsidian_notes")

# 初始化 OpenAI
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def get_embedding(text: str) -> list[float]:
    """获取文本嵌入向量"""
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text,
    )
    return response.data[0].embedding

def query_notes(question: str, top_k: int = 5) -> list[dict]:
    """查询相关笔记"""
    query_embedding = get_embedding(question)
    
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=top_k,
    )
    
    notes = []
    for i in range(len(results["ids"][0])):
        notes.append({
            "id": results["ids"][0][i],
            "content": results["documents"][0][i],
            "metadata": results["metadatas"][0][i],
            "distance": results["distances"][0][i],
        })
    
    return notes

def generate_answer(question: str, context_notes: list[dict]) -> str:
    """基于上下文生成回答"""
    context = "\n\n---\n\n".join(
        f"来源: {note['metadata']['source']}\n{note['content']}"
        for note in context_notes
    )
    
    prompt = f"""基于以下笔记内容回答问题。如果无法从笔记中找到答案，请说明。

笔记内容：
{context}

问题：{question}

回答："""

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "你是一个知识库助手，基于用户的笔记回答问题。"},
            {"role": "user", "content": prompt},
        ],
    )
    
    return response.choices[0].message.content

# API 端点
class QuestionRequest(BaseModel):
    question: str
    top_k: int = 5

class QuestionResponse(BaseModel):
    answer: str
    sources: list[dict]

@app.post("/ask", response_model=QuestionResponse)
async def ask_question(request: QuestionRequest):
    """问答接口"""
    notes = query_notes(request.question, request.top_k)
    answer = generate_answer(request.question, notes)
    
    return QuestionResponse(
        answer=answer,
        sources=[{"source": n["metadata"]["source"], "distance": n["distance"]} for n in notes],
    )

@app.post("/index")
async def index_note(path: str, content: str):
    """索引笔记"""
    # 简单分块
    chunk_size = 1000
    chunks = [content[i:i+chunk_size] for i in range(0, len(content), chunk_size)]
    
    for i, chunk in enumerate(chunks):
        embedding = get_embedding(chunk)
        collection.upsert(
            ids=[f"{path}_{i}"],
            embeddings=[embedding],
            documents=[chunk],
            metadatas=[{"source": path, "chunk": i}],
        )
    
    return {"status": "ok", "chunks": len(chunks)}

文档处理脚本

python

# indexer.py — 索引 Obsidian 仓库
import os
import frontmatter
from pathlib import Path

def process_vault(vault_path: str):
    """处理整个仓库"""
    for root, dirs, files in os.walk(vault_path):
        # 跳过 .obsidian 目录
        if ".obsidian" in dirs:
            dirs.remove(".obsidian")
        
        for file in files:
            if not file.endswith(".md"):
                continue
            
            file_path = os.path.join(root, file)
            rel_path = os.path.relpath(file_path, vault_path)
            
            with open(file_path, "r", encoding="utf-8") as f:
                post = frontmatter.load(f)
                content = post.content
            
            # 调用 API 索引
            import requests
            requests.post("http://localhost:8000/index", params={
                "path": rel_path,
                "content": content,
            })
            
            print(f"已索引: {rel_path}")

if __name__ == "__main__":
    process_vault("/path/to/vault")

Obsidian 插件集成

typescript

// 简单的 RAG 插件
import { Plugin, Notice } from "obsidian";

interface RAGResponse {
  answer: string;
  sources: { source: string; distance: number }[];
}

export default class RAGPlugin extends Plugin {
  private apiUrl = "http://localhost:8000";

  async onload() {
    this.addCommand({
      id: "ask-rag",
      name: "Ask your knowledge base",
      callback: () => this.askQuestion(),
    });
  }

  async askQuestion() {
    // 获取用户输入
    const question = await this.getInput("输入你的问题：");
    if (!question) return;

    try {
      const response = await fetch(`${this.apiUrl}/ask`, {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ question, top_k: 5 }),
      });

      const data: RAGResponse = await response.json();

      // 显示回答
      const md = `## AI 回答\n\n${data.answer}\n\n### 来源\n${data.sources.map((s) => `- ${s.source}`).join("\n")}`;

      // 创建回答笔记或显示通知
      const noteName = `AI-Answer-${Date.now()}`;
      await this.app.vault.create(`${noteName}.md`, md);
      new Notice("回答已生成");
    } catch (error) {
      new Notice("请求失败: " + error.message);
    }
  }

  async getInput(prompt: string): Promise<string | null> {
    return new Promise((resolve) => {
      const modal = new InputModal(this.app, prompt, resolve);
      modal.open();
    });
  }
}

优化 RAG 效果

分块策略

策略	说明	适合
固定大小	按字符数分割	通用
段落分割	按段落分割	长文档
语义分割	按语义变化分割	精确检索
Markdown 标题	按标题分割	结构化笔记

检索优化

混合检索：结合向量搜索和关键词搜索
重排序：使用 Cross-Encoder 对结果重排序
查询扩展：将用户问题扩展为多个查询
过滤：按标签、时间、路径过滤结果

Prompt 优化

text

你是一个专业的知识库助手。请严格基于提供的笔记内容回答问题。

规则：
1. 只基于笔记内容回答，不要编造信息
2. 引用具体的笔记来源
3. 如果笔记中没有相关信息，明确说明
4. 保持回答简洁、准确
5. 如果有多个相关笔记，综合所有信息

笔记内容：
{context}

问题：{question}

常见问题

索引需要多长时间？

取决于笔记数量和嵌入模型速度：

100 篇笔记：1-2 分钟
1000 篇笔记：10-20 分钟
10000 篇笔记：1-2 小时

RAG 回答不准确？

检查分块大小是否合适
增加检索结果数量（top_k）
使用更好的嵌入模型
优化 Prompt

成本如何控制？

使用本地模型（Ollama + 开源模型）
使用小模型生成（gpt-4o-mini）
批量嵌入而非逐条处理
缓存嵌入结果避免重复计算

RAG 工作流指南 ​

什么是 RAG？ ​

RAG vs 直接提问 ​

RAG 架构 ​

核心组件 ​

向量数据库选择 ​

使用 Smart Connections ​

安装配置 ​

基本使用 ​

配置选项 ​

使用本地模型 ​

使用 Copilot 插件 ​

配置 RAG ​

使用方式 ​

自建 RAG 系统 ​

架构设计 ​

后端服务 ​

文档处理脚本 ​

Obsidian 插件集成 ​

优化 RAG 效果 ​

分块策略 ​

检索优化 ​

Prompt 优化 ​

常见问题 ​

索引需要多长时间？ ​

RAG 回答不准确？ ​

成本如何控制？ ​

相关资源 ​

RAG 工作流指南

什么是 RAG？

RAG vs 直接提问

RAG 架构

核心组件

向量数据库选择

使用 Smart Connections

安装配置

基本使用

配置选项

使用本地模型

使用 Copilot 插件

配置 RAG

使用方式

自建 RAG 系统

架构设计

后端服务

文档处理脚本

Obsidian 插件集成

优化 RAG 效果

分块策略

检索优化

Prompt 优化

常见问题

索引需要多长时间？

RAG 回答不准确？

成本如何控制？

相关资源