Commit Graph

4 Commits

Author SHA1 Message Date
afc9cdcde6 Refactor Playwright to singleton browser with tab-based crawling
- Add PlaywrightBrowserService: singleton Chromium browser with auto-recovery
- Refactor WebCrawlerService/YouTubeTranscriptService to use shared browser tabs
- Fix YouTube transcript: extract from DOM panel + fmt=json3 fallback
- Keep browser window alive (about:blank instead of page.close)
- Add docs: X Window setup, operation manual, crawling guide

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 19:18:33 +00:00
1bfe55d5a8 Switch YouTube transcript fetching from Jsoup to Playwright head mode
Jsoup was blocked by YouTube bot detection. Now uses Playwright with
headed Chromium via Xvfb virtual display to bypass restrictions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 07:11:52 +00:00
f0f7b62e3d Add Playwright headless browser as 3rd crawling fallback
Crawl chain: Jsoup → Jina Reader → Playwright (headless Chromium).
Error page detection (403, Access Denied, etc.) triggers next fallback.
Switch to exploded classpath for Playwright driver-bundle compatibility.
Fix Next.js standalone static file serving with symlink.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 22:36:24 +00:00
3d2aa6cf46 Add backend/frontend scaffolding with Oracle ADB wallet config
- Backend: Spring Boot 3 + WebFlux, JWT auth, Oracle ADB wallet,
  8 controllers/services/repositories (Auth~Tag), DTOs, exception handling
- Frontend: Next.js 15, TypeScript, Tailwind CSS, AuthContext,
  7 pages (dashboard, knowledge, chat, study, todos, habits, login)
- DB: V1 migration with 12 tables including VECTOR(1024) + HNSW index
- Ops: PM2 ecosystem config, deploy.sh, start-backend.sh
- CLAUDE.md: DB credentials replaced with env var references

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 06:56:26 +00:00