sundol

joungmin/sundol

Fork 0

Commit Graph

Author	SHA1	Message	Date
joungmin	20210830cf	Fix TTS: switch to 1.7B with ref_audio, speakable text on all lines - Use 1.7B model (0.6B had tensor mismatch with cached prompts) - Speak endpoint uses ref_audio directly (not cached pkl) as fallback - Cache voice clone prompts in memory on startup - Add SpeakableText component: 🔊 icon on each p and li element - Remove old TTSReader sequential approach - Add global exception handler to TTS server - Fix profile localStorage caching - inference_mode + bf16 optimization Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 12:14:06 +00:00
joungmin	1088b23790	Add Notes, Voice Clone TTS, fix auth persistence and maxTokens Notes: - notes table with TEXT/AUDIO types, category support - Audio upload → OpenRouter Gemini STT → OCI GenAI polish/summary - Raw STT saved separately in raw_content column - Polish/summary button for manual re-processing - Async processing with real-time polling Voice Clone TTS: - Qwen3-TTS 1.7B model on A10 GPU via FastAPI server - Voice profile registration (record/upload → save embedding) - Profile-based TTS generation API - TTS web page with recording, profile management, generation Auth fixes: - Store both access + refresh tokens in localStorage - Initialize state from localStorage synchronously (no flash) - Request interceptor reads token from localStorage every request - Refresh via body (not just cookie) Other fixes: - maxTokens 4096 → 65536 (OCI GenAI Gemini supports up to 65536) - Fix broken Korean chars in source files - OpenRouter config for STT - ffmpeg installed for audio conversion - Ollama + Gemma 4 E4B installed (STT fallback) - nginx proxy for TTS server (/api/tts/) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 07:34:18 +00:00

Author

SHA1

Message

Date

joungmin

20210830cf

Fix TTS: switch to 1.7B with ref_audio, speakable text on all lines

- Use 1.7B model (0.6B had tensor mismatch with cached prompts)
- Speak endpoint uses ref_audio directly (not cached pkl) as fallback
- Cache voice clone prompts in memory on startup
- Add SpeakableText component: 🔊 icon on each p and li element
- Remove old TTSReader sequential approach
- Add global exception handler to TTS server
- Fix profile localStorage caching
- inference_mode + bf16 optimization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-13 12:14:06 +00:00

joungmin

1088b23790

Add Notes, Voice Clone TTS, fix auth persistence and maxTokens

Notes:
- notes table with TEXT/AUDIO types, category support
- Audio upload → OpenRouter Gemini STT → OCI GenAI polish/summary
- Raw STT saved separately in raw_content column
- Polish/summary button for manual re-processing
- Async processing with real-time polling

Voice Clone TTS:
- Qwen3-TTS 1.7B model on A10 GPU via FastAPI server
- Voice profile registration (record/upload → save embedding)
- Profile-based TTS generation API
- TTS web page with recording, profile management, generation

Auth fixes:
- Store both access + refresh tokens in localStorage
- Initialize state from localStorage synchronously (no flash)
- Request interceptor reads token from localStorage every request
- Refresh via body (not just cookie)

Other fixes:
- maxTokens 4096 → 65536 (OCI GenAI Gemini supports up to 65536)
- Fix broken Korean chars in source files
- OpenRouter config for STT
- ffmpeg installed for audio conversion
- Ollama + Gemma 4 E4B installed (STT fallback)
- nginx proxy for TTS server (/api/tts/)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-13 07:34:18 +00:00

2 Commits