Add Playwright headless browser as 3rd crawling fallback

Crawl chain: Jsoup → Jina Reader → Playwright (headless Chromium).
Error page detection (403, Access Denied, etc.) triggers next fallback.
Switch to exploded classpath for Playwright driver-bundle compatibility.
Fix Next.js standalone static file serving with symlink.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-30 22:36:24 +00:00
parent 0cc84354f5
commit f0f7b62e3d
4 changed files with 132 additions and 10 deletions

View File

@@ -7,6 +7,7 @@ module.exports = {
cwd: "/home/opc/sundol",
env: {
JAVA_HOME: "/usr/lib/jvm/java-21",
PLAYWRIGHT_NODEJS_PATH: "/home/opc/.playwright-driver/driver/linux/node",
},
},
{