feat: replace single-pass enricher with 4-step pipeline

Upgrades content processing from a single LLM call to a structured 5-step document reconstruction pipeline: 1. Normalize — 구어체 정제, 문장부호 복원, 핵심 엔티티 추출 2. Index Tree — 텍스트 전체 스캔 → 계층적 목차(JSON) 생성 3. Leaf Summarize — 섹션별 상세 요약 (context overlap 300자 적용) 4. Consistency Check — 누락 엔티티 검증 및 보완 5. Assemble — 최종 Markdown 문서 조립 (LLM 불필요) - Short texts (< 3000 chars): simple 1-pass fallback - Long texts: full pipeline (N+4 LLM calls where N = section count) - worker.py: uses body_md from enricher as Obsidian note body Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-02 18:02:00 +09:00
parent 128fde3ad6
commit d0c2aa3857
2 changed files with 479 additions and 68 deletions
--- a/daemon/worker.py
+++ b/daemon/worker.py
@@ -55,11 +55,12 @@ def process_item(item: dict) -> None:
        meta = enrich(input_type, yt_title, url, text)
        title = meta.get("title") or yt_title or url or row_id[:8]

+        # body_md: 4단계 파이프라인이 생성한 구조화 문서 (없으면 원문 폴백)
        note_path = save_note(
            content_type=input_type,
            title=title,
            summary=meta.get("summary", ""),
-            body=text,
+            body=meta.get("body_md") or text,
            tags=meta.get("tags", []),
            source_url=url,
            author=meta.get("author") or "",