Add Playwright headless browser as 3rd crawling fallback

Crawl chain: Jsoup → Jina Reader → Playwright (headless Chromium).
Error page detection (403, Access Denied, etc.) triggers next fallback.
Switch to exploded classpath for Playwright driver-bundle compatibility.
Fix Next.js standalone static file serving with symlink.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-30 22:36:24 +00:00
parent 0cc84354f5
commit f0f7b62e3d
4 changed files with 132 additions and 10 deletions

View File

@@ -6,4 +6,10 @@ set +a
JAVA_HOME=${JAVA_HOME:-/usr/lib/jvm/java-21}
export JAVA_HOME
exec $JAVA_HOME/bin/java -jar /home/opc/sundol/sundol-backend/target/sundol-backend-0.0.1-SNAPSHOT.jar
# Playwright: use pre-installed browsers, skip auto-download
export PLAYWRIGHT_BROWSERS_PATH=/home/opc/.cache/ms-playwright
export PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1
# Playwright driver-bundle requires exploded classpath (fat JAR extraction fails)
BACKEND_DIR=/home/opc/sundol/sundol-backend
exec $JAVA_HOME/bin/java -cp "$BACKEND_DIR/target/classes:$BACKEND_DIR/target/dependency/*" com.sundol.SundolApplication