sundol

Author	SHA1	Message	Date
joungmin	9569309e49	크롤링 복구: Chrome CDP를 PM2 sundol-chrome로 상시화 Chrome 136+가 기본 프로필 디렉토리에서 원격 디버깅(CDP)을 거부하여 4월 13일 이후 웹크롤링 3차 폴백/유튜브 자막 추출이 전부 실패하던 문제 해결. - 프로필을 non-default 디렉토리(~/.config/google-chrome-cdp)로 이동해 로그인 세션 유지한 채 CDP 허용 - start-chrome.sh 신규: 기존 Chrome 정리 + stale lock 제거 후 --remote-debugging-port=9222 --remote-debugging-address=127.0.0.1 로 기동 - ecosystem.config.cjs: sundol-chrome PM2 앱 추가 (수동 실행 금지, PM2 통일) ※ frontend script의 /usr/local/bin/node 변경은 이전 작업분이 함께 포함됨 - PlaywrightBrowserService: CDP_URL을 127.0.0.1로 고정 (IPv6 ::1 해석 함정 제거) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 01:06:58 +00:00
joungmin	1088b23790	Add Notes, Voice Clone TTS, fix auth persistence and maxTokens Notes: - notes table with TEXT/AUDIO types, category support - Audio upload → OpenRouter Gemini STT → OCI GenAI polish/summary - Raw STT saved separately in raw_content column - Polish/summary button for manual re-processing - Async processing with real-time polling Voice Clone TTS: - Qwen3-TTS 1.7B model on A10 GPU via FastAPI server - Voice profile registration (record/upload → save embedding) - Profile-based TTS generation API - TTS web page with recording, profile management, generation Auth fixes: - Store both access + refresh tokens in localStorage - Initialize state from localStorage synchronously (no flash) - Request interceptor reads token from localStorage every request - Refresh via body (not just cookie) Other fixes: - maxTokens 4096 → 65536 (OCI GenAI Gemini supports up to 65536) - Fix broken Korean chars in source files - OpenRouter config for STT - ffmpeg installed for audio conversion - Ollama + Gemma 4 E4B installed (STT fallback) - nginx proxy for TTS server (/api/tts/) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 07:34:18 +00:00
joungmin	6c2129d42e	Add category view, pagination, and persist login across deployments - Add 2-panel category view: sidebar tree + filtered item list - Category counts use DISTINCT with descendant inclusion - Hide empty categories, show category badges on item cards - Add client-side pagination (10 items/page) for both views - Persist access token in localStorage to survive page refresh - Fix token refresh retry on backend restart Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 04:11:08 +00:00
joungmin	f9f710ec90	Add English level settings, improve content structuring and rendering - Add english_level column to users table (CEFR with TOEIC mapping) - Add UserController (GET/PATCH /api/users/me) and Settings page - Enhance structuring prompts: sequential TOC, no summary sections, no content overlap, English expression extraction by CEFR level - Remove sub-TOC analysis (caused content repetition), use simple per-section generation with truncation detection and continuation - Fix CLOB truncation: explicit Clob-to-String conversion in repository - Replace regex-based markdown rendering with react-markdown - Add wallet renewal procedure to troubleshooting docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 23:48:38 +00:00
joungmin	4cde775809	Switch to user Chrome CDP for YouTube transcript, fix auth and ads - Replace Playwright standalone browser with CDP connection to user Chrome (bypasses YouTube bot detection by using logged-in Chrome session) - Add video playback, ad detection/skip, and play confirmation before transcript extraction - Extract transcript JS to separate resource files (fix SyntaxError in evaluate) - Add ytInitialPlayerResponse-based transcript extraction as primary method - Fix token refresh: retry on network error during backend restart - Fix null userId logout, CLOB type hint for structured_content - Disable XFCE screen lock/screensaver - Add troubleshooting entries (#10-12) and YouTube transcript guide Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 21:01:49 +00:00
joungmin	9abb770e37	Add knowledge structuring feature with incremental LLM processing - Add structured_content column and STRUCTURING pipeline step - Split LLM structuring into TOC + per-section calls to avoid token limit - Save intermediate results to DB for real-time frontend polling (3s) - Add manual "정리하기" button with async processing - Fix browser login modal by customizing authentication entry point - Fix standalone build symlinks for server.js and static files - Add troubleshooting guide Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 19:43:21 +00:00
joungmin	afc9cdcde6	Refactor Playwright to singleton browser with tab-based crawling - Add PlaywrightBrowserService: singleton Chromium browser with auto-recovery - Refactor WebCrawlerService/YouTubeTranscriptService to use shared browser tabs - Fix YouTube transcript: extract from DOM panel + fmt=json3 fallback - Keep browser window alive (about:blank instead of page.close) - Add docs: X Window setup, operation manual, crawling guide Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 19:18:33 +00:00
joungmin	db4155c36d	Add error logging and improve HTTP handling for YouTube transcript fetching Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 18:38:23 +00:00
joungmin	56d5752095	Add YouTube cookie support to Playwright fallback for bot bypass Load cookies.txt (Netscape format) into Playwright browser context before navigating to YouTube, enabling authenticated access to bypass bot detection that blocks transcript retrieval. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 07:37:48 +00:00
joungmin	677a79978f	Use youtube-transcript-api library with Playwright fallback for YouTube transcripts Replace Jsoup-based approach with io.github.thoroldvix:youtube-transcript-api as primary method (supports manual/generated captions, language priority). Playwright head mode kept as fallback when API fails. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 07:26:52 +00:00
joungmin	1bfe55d5a8	Switch YouTube transcript fetching from Jsoup to Playwright head mode Jsoup was blocked by YouTube bot detection. Now uses Playwright with headed Chromium via Xvfb virtual display to bypass restrictions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 07:11:52 +00:00
joungmin	bb5a601433	Add YouTube transcript auto-fetch button on Knowledge add page - YouTubeTranscriptService: fetches captions from YouTube page (ko > en > first available) - GET /api/knowledge/youtube-transcript endpoint - Frontend: "트랜스크립트 자동 가져오기" button appears when valid YouTube URL entered Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 04:20:13 +00:00
joungmin	f0f7b62e3d	Add Playwright headless browser as 3rd crawling fallback Crawl chain: Jsoup → Jina Reader → Playwright (headless Chromium). Error page detection (403, Access Denied, etc.) triggers next fallback. Switch to exploded classpath for Playwright driver-bundle compatibility. Fix Next.js standalone static file serving with symlink. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 22:36:24 +00:00
joungmin	0cc84354f5	Add Jina Reader API fallback for web crawling Jsoup fails on bot-blocked sites (403). Now tries Jsoup first, then falls back to Jina Reader (r.jina.ai) for better coverage. Supports optional API key via JINA_READER_API_KEY env var. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 22:03:09 +00:00
joungmin	9929322de0	Implement all core features: Knowledge pipeline, RAG chat, Todos, Habits, Study Cards, Tags, Dashboard - Google OAuth authentication with callback flow - Knowledge ingest pipeline (TEXT/WEB/YOUTUBE → chunking → categorization → embedding) - OCI GenAI integration (chat, embeddings) with multi-model support - Semantic search via Oracle VECTOR_DISTANCE - RAG-based AI chat with source attribution - Todos with subtasks, filters, and priority levels - Habits with daily check-in, streak tracking, and color customization - Study Cards with SM-2 spaced repetition and LLM auto-generation - Tags system with knowledge item mapping - Dashboard with live data from all modules Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 21:43:51 +00:00
joungmin	3d2aa6cf46	Add backend/frontend scaffolding with Oracle ADB wallet config - Backend: Spring Boot 3 + WebFlux, JWT auth, Oracle ADB wallet, 8 controllers/services/repositories (Auth~Tag), DTOs, exception handling - Frontend: Next.js 15, TypeScript, Tailwind CSS, AuthContext, 7 pages (dashboard, knowledge, chat, study, todos, habits, login) - DB: V1 migration with 12 tables including VECTOR(1024) + HNSW index - Ops: PM2 ecosystem config, deploy.sh, start-backend.sh - CLAUDE.md: DB credentials replaced with env var references Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 06:56:26 +00:00

16 Commits