Refactor Playwright to singleton browser with tab-based crawling

- Add PlaywrightBrowserService: singleton Chromium browser with auto-recovery
- Refactor WebCrawlerService/YouTubeTranscriptService to use shared browser tabs
- Fix YouTube transcript: extract from DOM panel + fmt=json3 fallback
- Keep browser window alive (about:blank instead of page.close)
- Add docs: X Window setup, operation manual, crawling guide

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-09 19:18:33 +00:00
parent db4155c36d
commit afc9cdcde6
7 changed files with 934 additions and 169 deletions

View File

@@ -11,11 +11,8 @@ export PLAYWRIGHT_BROWSERS_PATH=/home/opc/.cache/ms-playwright
export PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1
# Xvfb virtual display for Playwright head mode (YouTube transcript)
export DISPLAY=:99
if ! pgrep -x Xvfb > /dev/null; then
Xvfb :99 -screen 0 1280x720x24 -nolisten tcp &
sleep 1
fi
# :99 = Xvfb (백그라운드), :1 = VNC (디버깅용)
export DISPLAY=:1
# Playwright driver-bundle requires exploded classpath (fat JAR extraction fails)
BACKEND_DIR=/home/opc/sundol/sundol-backend