Chrome 136+가 기본 프로필 디렉토리에서 원격 디버깅(CDP)을 거부하여
4월 13일 이후 웹크롤링 3차 폴백/유튜브 자막 추출이 전부 실패하던 문제 해결.
- 프로필을 non-default 디렉토리(~/.config/google-chrome-cdp)로 이동해
로그인 세션 유지한 채 CDP 허용
- start-chrome.sh 신규: 기존 Chrome 정리 + stale lock 제거 후
--remote-debugging-port=9222 --remote-debugging-address=127.0.0.1 로 기동
- ecosystem.config.cjs: sundol-chrome PM2 앱 추가 (수동 실행 금지, PM2 통일)
※ frontend script의 /usr/local/bin/node 변경은 이전 작업분이 함께 포함됨
- PlaywrightBrowserService: CDP_URL을 127.0.0.1로 고정 (IPv6 ::1 해석 함정 제거)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Notes:
- notes table with TEXT/AUDIO types, category support
- Audio upload → OpenRouter Gemini STT → OCI GenAI polish/summary
- Raw STT saved separately in raw_content column
- Polish/summary button for manual re-processing
- Async processing with real-time polling
Voice Clone TTS:
- Qwen3-TTS 1.7B model on A10 GPU via FastAPI server
- Voice profile registration (record/upload → save embedding)
- Profile-based TTS generation API
- TTS web page with recording, profile management, generation
Auth fixes:
- Store both access + refresh tokens in localStorage
- Initialize state from localStorage synchronously (no flash)
- Request interceptor reads token from localStorage every request
- Refresh via body (not just cookie)
Other fixes:
- maxTokens 4096 → 65536 (OCI GenAI Gemini supports up to 65536)
- Fix broken Korean chars in source files
- OpenRouter config for STT
- ffmpeg installed for audio conversion
- Ollama + Gemma 4 E4B installed (STT fallback)
- nginx proxy for TTS server (/api/tts/)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 2-panel category view: sidebar tree + filtered item list
- Category counts use DISTINCT with descendant inclusion
- Hide empty categories, show category badges on item cards
- Add client-side pagination (10 items/page) for both views
- Persist access token in localStorage to survive page refresh
- Fix token refresh retry on backend restart
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace Playwright standalone browser with CDP connection to user Chrome
(bypasses YouTube bot detection by using logged-in Chrome session)
- Add video playback, ad detection/skip, and play confirmation before transcript extraction
- Extract transcript JS to separate resource files (fix SyntaxError in evaluate)
- Add ytInitialPlayerResponse-based transcript extraction as primary method
- Fix token refresh: retry on network error during backend restart
- Fix null userId logout, CLOB type hint for structured_content
- Disable XFCE screen lock/screensaver
- Add troubleshooting entries (#10-12) and YouTube transcript guide
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Load cookies.txt (Netscape format) into Playwright browser context
before navigating to YouTube, enabling authenticated access to bypass
bot detection that blocks transcript retrieval.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace Jsoup-based approach with io.github.thoroldvix:youtube-transcript-api
as primary method (supports manual/generated captions, language priority).
Playwright head mode kept as fallback when API fails.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Jsoup was blocked by YouTube bot detection. Now uses Playwright with
headed Chromium via Xvfb virtual display to bypass restrictions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- YouTubeTranscriptService: fetches captions from YouTube page (ko > en > first available)
- GET /api/knowledge/youtube-transcript endpoint
- Frontend: "트랜스크립트 자동 가져오기" button appears when valid YouTube URL entered
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Jsoup fails on bot-blocked sites (403). Now tries Jsoup first,
then falls back to Jina Reader (r.jina.ai) for better coverage.
Supports optional API key via JINA_READER_API_KEY env var.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Google OAuth authentication with callback flow
- Knowledge ingest pipeline (TEXT/WEB/YOUTUBE → chunking → categorization → embedding)
- OCI GenAI integration (chat, embeddings) with multi-model support
- Semantic search via Oracle VECTOR_DISTANCE
- RAG-based AI chat with source attribution
- Todos with subtasks, filters, and priority levels
- Habits with daily check-in, streak tracking, and color customization
- Study Cards with SM-2 spaced repetition and LLM auto-generation
- Tags system with knowledge item mapping
- Dashboard with live data from all modules
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>