Add Jina Reader API fallback for web crawling

Jsoup fails on bot-blocked sites (403). Now tries Jsoup first,
then falls back to Jina Reader (r.jina.ai) for better coverage.
Supports optional API key via JINA_READER_API_KEY env var.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-30 22:03:09 +00:00
parent 9929322de0
commit 0cc84354f5
2 changed files with 88 additions and 10 deletions

View File

@@ -29,6 +29,10 @@ oci:
model: ${OCI_GENAI_MODEL:google.gemini-2.5-flash}
base-url: ${OCI_GENAI_BASE_URL:https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions}
jina:
reader:
api-key: ${JINA_READER_API_KEY:}
logging:
level:
com.sundol: DEBUG