3 Commits

Author SHA1 Message Date
joungmin
d0c2aa3857 feat: replace single-pass enricher with 4-step pipeline
Upgrades content processing from a single LLM call to a structured
5-step document reconstruction pipeline:

  1. Normalize  — 구어체 정제, 문장부호 복원, 핵심 엔티티 추출
  2. Index Tree — 텍스트 전체 스캔 → 계층적 목차(JSON) 생성
  3. Leaf Summarize — 섹션별 상세 요약 (context overlap 300자 적용)
  4. Consistency Check — 누락 엔티티 검증 및 보완
  5. Assemble — 최종 Markdown 문서 조립 (LLM 불필요)

- Short texts (< 3000 chars): simple 1-pass fallback
- Long texts: full pipeline (N+4 LLM calls where N = section count)
- worker.py: uses body_md from enricher as Obsidian note body

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-02 18:02:00 +09:00
joungmin
a9db6a8771 feat: add English vocab extraction and Anki card registration
- core/vocab.py: extract B1-B2 level vocabulary from English content via Gemini Flash
- core/anki.py: register vocab cards to AnkiConnect (English::Vocabulary deck)
- core/enricher.py: add language detection field + summary_ko (Korean summary)
- core/obsidian.py: render Korean + English summary in note
- daemon/worker.py: call vocab extraction and Anki registration for English content
2026-02-28 08:39:58 +09:00
joungmin
86a4104ae3 feat: initial knowledge-inbox pipeline implementation
- Oracle ADB queue table (sql/schema.sql)
- Queue CRUD: core/queue_db.py
- YouTube transcript: core/youtube.py
- Web page fetch: core/web.py
- LLM enrichment via OCI GenAI Gemini Flash: core/enricher.py
- Text chunker: core/chunker.py
- Obsidian note writer: core/obsidian.py
- Oracle vector store insertion: core/vector.py
- Polling daemon: daemon/worker.py
- Telegram bot: bot/telegram_bot.py
- Main runner: main.py
2026-02-28 08:16:11 +09:00