Use youtube-transcript-api library with Playwright fallback for YouTube transcripts

Replace Jsoup-based approach with io.github.thoroldvix:youtube-transcript-api
as primary method (supports manual/generated captions, language priority).
Playwright head mode kept as fallback when API fails.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-01 07:26:52 +00:00
parent 1bfe55d5a8
commit 677a79978f
2 changed files with 83 additions and 30 deletions

View File

@@ -104,6 +104,13 @@
<version>1.18.3</version>
</dependency>
<!-- YouTube Transcript API -->
<dependency>
<groupId>io.github.thoroldvix</groupId>
<artifactId>youtube-transcript-api</artifactId>
<version>0.4.0</version>
</dependency>
<!-- Playwright (headless browser, driver-bundle includes node runtime) -->
<dependency>
<groupId>com.microsoft.playwright</groupId>