knowledge-inbox

joungmin/knowledge-inbox

Fork 0

Commit Graph

Author	SHA1	Message	Date
joungmin	d0c2aa3857	feat: replace single-pass enricher with 4-step pipeline Upgrades content processing from a single LLM call to a structured 5-step document reconstruction pipeline: 1. Normalize — 구어체 정제, 문장부호 복원, 핵심 엔티티 추출 2. Index Tree — 텍스트 전체 스캔 → 계층적 목차(JSON) 생성 3. Leaf Summarize — 섹션별 상세 요약 (context overlap 300자 적용) 4. Consistency Check — 누락 엔티티 검증 및 보완 5. Assemble — 최종 Markdown 문서 조립 (LLM 불필요) - Short texts (< 3000 chars): simple 1-pass fallback - Long texts: full pipeline (N+4 LLM calls where N = section count) - worker.py: uses body_md from enricher as Obsidian note body Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-02 18:02:00 +09:00
joungmin	a9db6a8771	feat: add English vocab extraction and Anki card registration - core/vocab.py: extract B1-B2 level vocabulary from English content via Gemini Flash - core/anki.py: register vocab cards to AnkiConnect (English::Vocabulary deck) - core/enricher.py: add language detection field + summary_ko (Korean summary) - core/obsidian.py: render Korean + English summary in note - daemon/worker.py: call vocab extraction and Anki registration for English content	2026-02-28 08:39:58 +09:00
joungmin	86a4104ae3	feat: initial knowledge-inbox pipeline implementation - Oracle ADB queue table (sql/schema.sql) - Queue CRUD: core/queue_db.py - YouTube transcript: core/youtube.py - Web page fetch: core/web.py - LLM enrichment via OCI GenAI Gemini Flash: core/enricher.py - Text chunker: core/chunker.py - Obsidian note writer: core/obsidian.py - Oracle vector store insertion: core/vector.py - Polling daemon: daemon/worker.py - Telegram bot: bot/telegram_bot.py - Main runner: main.py	2026-02-28 08:16:11 +09:00

Author

SHA1

Message

Date

joungmin

d0c2aa3857

feat: replace single-pass enricher with 4-step pipeline

Upgrades content processing from a single LLM call to a structured
5-step document reconstruction pipeline:

  1. Normalize  — 구어체 정제, 문장부호 복원, 핵심 엔티티 추출
  2. Index Tree — 텍스트 전체 스캔 → 계층적 목차(JSON) 생성
  3. Leaf Summarize — 섹션별 상세 요약 (context overlap 300자 적용)
  4. Consistency Check — 누락 엔티티 검증 및 보완
  5. Assemble — 최종 Markdown 문서 조립 (LLM 불필요)

- Short texts (< 3000 chars): simple 1-pass fallback
- Long texts: full pipeline (N+4 LLM calls where N = section count)
- worker.py: uses body_md from enricher as Obsidian note body

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-02 18:02:00 +09:00

joungmin

a9db6a8771

feat: add English vocab extraction and Anki card registration

- core/vocab.py: extract B1-B2 level vocabulary from English content via Gemini Flash
- core/anki.py: register vocab cards to AnkiConnect (English::Vocabulary deck)
- core/enricher.py: add language detection field + summary_ko (Korean summary)
- core/obsidian.py: render Korean + English summary in note
- daemon/worker.py: call vocab extraction and Anki registration for English content

2026-02-28 08:39:58 +09:00

joungmin

86a4104ae3

feat: initial knowledge-inbox pipeline implementation

- Oracle ADB queue table (sql/schema.sql)
- Queue CRUD: core/queue_db.py
- YouTube transcript: core/youtube.py
- Web page fetch: core/web.py
- LLM enrichment via OCI GenAI Gemini Flash: core/enricher.py
- Text chunker: core/chunker.py
- Obsidian note writer: core/obsidian.py
- Oracle vector store insertion: core/vector.py
- Polling daemon: daemon/worker.py
- Telegram bot: bot/telegram_bot.py
- Main runner: main.py

2026-02-28 08:16:11 +09:00

3 Commits