Add Playwright headless browser as 3rd crawling fallback

Crawl chain: Jsoup → Jina Reader → Playwright (headless Chromium).
Error page detection (403, Access Denied, etc.) triggers next fallback.
Switch to exploded classpath for Playwright driver-bundle compatibility.
Fix Next.js standalone static file serving with symlink.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-30 22:36:24 +00:00
parent 0cc84354f5
commit f0f7b62e3d
4 changed files with 132 additions and 10 deletions

View File

@@ -104,6 +104,18 @@
<version>1.18.3</version>
</dependency>
<!-- Playwright (headless browser, driver-bundle includes node runtime) -->
<dependency>
<groupId>com.microsoft.playwright</groupId>
<artifactId>playwright</artifactId>
<version>1.51.0</version>
</dependency>
<dependency>
<groupId>com.microsoft.playwright</groupId>
<artifactId>driver-bundle</artifactId>
<version>1.51.0</version>
</dependency>
<!-- Jackson -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>