# S.U.N.D.O.L — System Specification **Smart Unified Natural Dog-Operated Layer** Personal Knowledge House · AI Assistant · Productivity Hub Intended for: Claude Code autonomous implementation Version: 1.0.0 | Date: 2026-03-30 --- ## Table of Contents 1. Product Vision & Features 2. System Architecture 3. Database Schema — Oracle 23ai 4. Backend — Spring Boot 3 5. AI Layer — OCI Generative AI 6. Frontend — Next.js 14 7. Security — Google SSO + JWT 8. REST API Reference 9. Key Algorithms 10. Environment Variables 11. Project Structure --- ## 1. Product Vision & Features SUNDOL is a personal intelligence layer. It ingests anything valuable (YouTube videos, blog posts, news articles, raw text), extracts and understands the content, and makes it instantly searchable and conversational through AI. It also manages tasks and habits — a single hub for knowledge, productivity, and self-improvement. ### Feature Areas | Feature | What it does | Key details | |---------|-------------|-------------| | Knowledge Ingestion | Add YouTube, blog, news, raw text, or any URL | Auto-extracts transcript (YT), crawls web pages (Jsoup), stores raw text | | Semantic Search | Find knowledge by meaning, not keywords | Oracle 23ai VECTOR_DISTANCE cosine similarity + keyword hybrid | | AI Chat (RAG) | Ask questions answered from your knowledge base | Retrieval-Augmented Generation with cited source chunks, chat history | | Study Cards (SRS) | Auto-generate flashcards from any knowledge item | SM-2 spaced repetition scheduler, ease factor, review intervals | | Todos | Task and subtask management | Priority, due date, status, nested subtasks | | Habit Tracker | Build good habits, break bad ones | Daily check-ins, streak tracking (current + best), BUILD/STOP habit types | | Tagging | Organise all knowledge with custom tags | Many-to-many tag–item, color-coded | --- ## 2. System Architecture ``` ┌─────────────────────────────────────────────────────────┐ │ CLIENT TIER │ │ Next.js 14 (App Router, TypeScript) │ │ Google OAuth 2.0 initiation · JWT in httpOnly cookies │ └────────────────────┬────────────────────────────────────┘ │ HTTPS · JWT Bearer token ┌────────────────────▼────────────────────────────────────┐ │ SECURITY LAYER │ │ Spring Security 6 · JwtAuthFilter · Google ID verify │ └────────────────────┬────────────────────────────────────┘ │ ┌────────────────────▼────────────────────────────────────┐ │ SPRING BOOT 3 — SERVICE LAYER │ │ Auth · Ingest · Search · Chat · Todo · Habit │ │ StudyCard · Tag · Background @Async workers │ └──────────┬──────────────────────────┬───────────────────┘ │ │ ┌──────────▼──────────┐ ┌───────────▼───────────────────┐ │ AI / LLM LAYER │ │ DATA LAYER │ │ OCI GenAI Service │ │ Oracle DB 23ai │ │ · Cohere / Llama │ │ · Relational tables │ │ · Embedding model │ │ · VECTOR column (chunks) │ │ YouTube Transcript │ │ · HNSW vector index │ │ Jsoup web crawler │ │ Flyway migrations │ └─────────────────────┘ └───────────────────────────────┘ ``` ### Communication Patterns - **Client → API**: HTTPS REST, `Authorization: Bearer ` on all authenticated endpoints - **Google SSO**: Client → Google consent → Google ID token → Spring verifies → issues own JWT pair (access 15min + refresh 30d) - **Ingest pipeline**: `POST /api/ingest` returns `202 Accepted`. `@Async` worker extracts text → chunks → embeds via OCI GenAI → stores VECTOR in Oracle - **RAG query**: user query → embed (OCI) → VECTOR_DISTANCE search → top-K chunks → build prompt → OCI GenAI completion → response with citations - **Habit streaks**: nightly `@Scheduled` job recomputes streaks, marks missed if no log entry for that day --- ## 3. Database Schema — Oracle 23ai > **Oracle 23ai VECTOR**: Use `VECTOR(1024, FLOAT32)` to match OCI GenAI embed-multilingual-v3 (1024 dimensions). Similarity search uses `VECTOR_DISTANCE(embedding, :queryVec, COSINE)` with an HNSW index. No separate vector DB needed. ### USERS ```sql CREATE TABLE users ( id RAW(16) DEFAULT SYS_GUID() PRIMARY KEY, email VARCHAR2(320) NOT NULL UNIQUE, display_name VARCHAR2(255), avatar_url VARCHAR2(1000), google_sub VARCHAR2(255) UNIQUE, -- Google subject ID refresh_token VARCHAR2(1000), -- hashed rotating refresh token created_at TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL, updated_at TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL ); ``` ### KNOWLEDGE_ITEMS ```sql -- type: YOUTUBE | BLOG | NEWS | TEXT | URL -- status: PENDING | EXTRACTING | CHUNKING | EMBEDDING | READY | FAILED CREATE TABLE knowledge_items ( id RAW(16) DEFAULT SYS_GUID() PRIMARY KEY, user_id RAW(16) NOT NULL REFERENCES users(id) ON DELETE CASCADE, type VARCHAR2(20) NOT NULL, title VARCHAR2(1000), raw_content CLOB, -- original URL or raw text input extracted_text CLOB, -- cleaned text after extraction source_url VARCHAR2(2000), status VARCHAR2(20) DEFAULT 'PENDING', metadata CLOB CHECK (metadata IS JSON), -- e.g. {"duration":3600,"author":"..."} created_at TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL, updated_at TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL ); CREATE INDEX ki_user_idx ON knowledge_items(user_id); CREATE INDEX ki_status_idx ON knowledge_items(status); CREATE INDEX ki_type_idx ON knowledge_items(user_id, type); ``` ### CHUNKS ```sql -- embedding dimensions match OCI GenAI embed-multilingual-v3 (1024) CREATE TABLE chunks ( id RAW(16) DEFAULT SYS_GUID() PRIMARY KEY, knowledge_item_id RAW(16) NOT NULL REFERENCES knowledge_items(id) ON DELETE CASCADE, chunk_index NUMBER NOT NULL, content CLOB NOT NULL, embedding VECTOR(1024, FLOAT32), token_count NUMBER, created_at TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL ); CREATE INDEX chunk_ki_idx ON chunks(knowledge_item_id); -- HNSW approximate nearest-neighbour vector index CREATE VECTOR INDEX chunk_vec_idx ON chunks(embedding) ORGANIZATION INMEMORY NEIGHBOR GRAPH DISTANCE COSINE WITH TARGET ACCURACY 95; ``` ### STUDY_CARDS ```sql CREATE TABLE study_cards ( id RAW(16) DEFAULT SYS_GUID() PRIMARY KEY, knowledge_item_id RAW(16) NOT NULL REFERENCES knowledge_items(id) ON DELETE CASCADE, user_id RAW(16) NOT NULL REFERENCES users(id) ON DELETE CASCADE, question VARCHAR2(2000) NOT NULL, answer CLOB NOT NULL, ease_factor NUMBER(4,2) DEFAULT 2.5, -- SM-2 algorithm interval_days NUMBER DEFAULT 1, next_review_at TIMESTAMP DEFAULT SYSTIMESTAMP, review_count NUMBER DEFAULT 0, created_at TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL ); CREATE INDEX sc_user_review_idx ON study_cards(user_id, next_review_at); ``` ### CHAT_SESSIONS & MESSAGES ```sql CREATE TABLE chat_sessions ( id RAW(16) DEFAULT SYS_GUID() PRIMARY KEY, user_id RAW(16) NOT NULL REFERENCES users(id) ON DELETE CASCADE, title VARCHAR2(500), created_at TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL, last_message_at TIMESTAMP DEFAULT SYSTIMESTAMP ); -- role: USER | ASSISTANT | SYSTEM CREATE TABLE messages ( id RAW(16) DEFAULT SYS_GUID() PRIMARY KEY, session_id RAW(16) NOT NULL REFERENCES chat_sessions(id) ON DELETE CASCADE, role VARCHAR2(20) NOT NULL, content CLOB NOT NULL, source_chunks CLOB CHECK (source_chunks IS JSON), -- [{chunk_id, score, snippet}] tokens_used NUMBER, created_at TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL ); CREATE INDEX msg_session_idx ON messages(session_id, created_at); ``` ### TODOS ```sql -- status: TODO | IN_PROGRESS | DONE | CANCELLED -- priority: LOW | MEDIUM | HIGH | URGENT CREATE TABLE todos ( id RAW(16) DEFAULT SYS_GUID() PRIMARY KEY, user_id RAW(16) NOT NULL REFERENCES users(id) ON DELETE CASCADE, parent_id RAW(16) REFERENCES todos(id) ON DELETE CASCADE, -- null = root task title VARCHAR2(1000) NOT NULL, description CLOB, status VARCHAR2(20) DEFAULT 'TODO', priority VARCHAR2(10) DEFAULT 'MEDIUM', due_at TIMESTAMP, completed_at TIMESTAMP, created_at TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL ); CREATE INDEX todo_user_status_idx ON todos(user_id, status); CREATE INDEX todo_parent_idx ON todos(parent_id); ``` ### HABITS & HABIT_LOGS ```sql -- habit_type: BUILD | STOP -- frequency: DAILY | WEEKLY -- target_days: "DAILY" or comma-separated "MON,WED,FRI" CREATE TABLE habits ( id RAW(16) DEFAULT SYS_GUID() PRIMARY KEY, user_id RAW(16) NOT NULL REFERENCES users(id) ON DELETE CASCADE, title VARCHAR2(500) NOT NULL, habit_type VARCHAR2(10) DEFAULT 'BUILD', frequency VARCHAR2(10) DEFAULT 'DAILY', target_days VARCHAR2(100) DEFAULT 'DAILY', streak_current NUMBER DEFAULT 0, streak_best NUMBER DEFAULT 0, is_active NUMBER(1) DEFAULT 1, created_at TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL ); CREATE TABLE habit_logs ( id RAW(16) DEFAULT SYS_GUID() PRIMARY KEY, habit_id RAW(16) NOT NULL REFERENCES habits(id) ON DELETE CASCADE, log_date DATE NOT NULL, checked_in NUMBER(1) DEFAULT 1, note VARCHAR2(1000), created_at TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL, CONSTRAINT habit_log_unique UNIQUE (habit_id, log_date) ); CREATE INDEX hl_habit_date_idx ON habit_logs(habit_id, log_date); ``` ### TAGS & KNOWLEDGE_TAGS ```sql CREATE TABLE tags ( id RAW(16) DEFAULT SYS_GUID() PRIMARY KEY, user_id RAW(16) NOT NULL REFERENCES users(id) ON DELETE CASCADE, name VARCHAR2(100) NOT NULL, color VARCHAR2(7) DEFAULT '#6366F1', CONSTRAINT tag_user_name_unique UNIQUE (user_id, name) ); CREATE TABLE knowledge_tags ( knowledge_item_id RAW(16) NOT NULL REFERENCES knowledge_items(id) ON DELETE CASCADE, tag_id RAW(16) NOT NULL REFERENCES tags(id) ON DELETE CASCADE, PRIMARY KEY (knowledge_item_id, tag_id) ); ``` --- ## 4. Backend — Spring Boot 3 ### Tech Stack | Dependency | Purpose | |-----------|---------| | spring-boot-starter-web | REST controllers | | spring-boot-starter-security | JWT filter chain | | spring-boot-starter-data-jpa | JPA / Hibernate ORM | | ojdbc11 + ucp | Oracle JDBC driver + connection pool | | flyway-core | Schema migrations | | jjwt-api + jjwt-impl | JWT issue & validation | | google-api-client | Google ID token verification | | jsoup | HTML web crawler / text extraction | | oci-java-sdk-generativeaiinference | OCI GenAI embeddings + chat | | spring-boot-starter-data-redis | Session cache, rate limiting | | spring-boot-starter-validation | Request validation | ### Project Structure ``` sundol-backend/ ├── src/main/java/com/sundol/ │ ├── config/ │ │ ├── SecurityConfig.java # Spring Security + JWT filter chain │ │ ├── OciGenAiConfig.java # OCI SDK client bean │ │ └── AsyncConfig.java # @Async thread pool config │ ├── auth/ │ │ ├── AuthController.java # POST /api/auth/google, /refresh, /logout │ │ ├── AuthService.java │ │ ├── JwtService.java # issue / validate JWT │ │ ├── JwtAuthFilter.java # OncePerRequestFilter │ │ └── GoogleTokenVerifier.java # verify Google ID token via Google API │ ├── knowledge/ │ │ ├── KnowledgeController.java # CRUD /api/knowledge │ │ ├── KnowledgeService.java │ │ ├── IngestService.java # async orchestrator │ │ ├── extractor/ │ │ │ ├── YoutubeExtractor.java # fetch transcript via YouTube Data API v3 │ │ │ └── WebCrawler.java # Jsoup: URL → clean text │ │ └── ChunkingService.java # sliding window text chunker │ ├── embedding/ │ │ └── OciEmbeddingService.java # call OCI GenAI embed endpoint │ ├── search/ │ │ └── SearchService.java # VECTOR_DISTANCE query + keyword fallback │ ├── chat/ │ │ ├── ChatController.java # /api/chat/sessions, /messages │ │ ├── ChatService.java # RAG orchestration │ │ └── OciChatService.java # OCI GenAI chat completion │ ├── studycard/ │ │ ├── StudyCardController.java # /api/study-cards │ │ └── StudyCardService.java # SM-2 scheduler + AI generation │ ├── todo/ │ │ ├── TodoController.java # /api/todos │ │ └── TodoService.java │ ├── habit/ │ │ ├── HabitController.java # /api/habits │ │ ├── HabitService.java │ │ └── HabitStreakScheduler.java # @Scheduled nightly streak job │ └── tag/ │ ├── TagController.java # /api/tags │ └── TagService.java └── src/main/resources/ ├── application.yml └── db/migration/ ├── V1__create_users.sql ├── V2__create_knowledge.sql ├── V3__create_chunks_vector.sql ├── V4__create_chat.sql ├── V5__create_todos.sql ├── V6__create_habits.sql └── V7__create_tags.sql ``` ### application.yml ```yaml spring: datasource: url: jdbc:oracle:thin:@${DB_HOST}:1521/${DB_SERVICE} username: ${DB_USER} password: ${DB_PASSWORD} driver-class-name: oracle.jdbc.OracleDriver jpa: hibernate: ddl-auto: validate # Flyway owns the schema, Hibernate only validates show-sql: false properties: hibernate.dialect: org.hibernate.dialect.OracleDialect flyway: enabled: true locations: classpath:db/migration data: redis: host: ${REDIS_HOST:localhost} port: ${REDIS_PORT:6379} app: jwt: secret: ${JWT_SECRET} # min 256-bit hex string access-expiry-minutes: 15 refresh-expiry-days: 30 google: client-id: ${GOOGLE_CLIENT_ID} oci: compartment-id: ${OCI_COMPARTMENT_ID} region: ${OCI_REGION} # e.g. us-chicago-1 embedding-model-id: cohere.embed-multilingual-v3 # 1024 dims chat-model-id: cohere.command-r-plus # or meta.llama-3-70b-instruct chunking: chunk-size-tokens: 512 chunk-overlap-tokens: 64 rag: top-k: 5 # chunks to retrieve per query min-score: 0.70 # cosine similarity threshold ``` ### SecurityConfig — JWT Filter Chain ```java @Configuration @EnableWebSecurity public class SecurityConfig { @Bean public SecurityFilterChain filterChain(HttpSecurity http, JwtAuthFilter jwtAuthFilter) throws Exception { http .csrf(AbstractHttpConfigurer::disable) .sessionManagement(s -> s.sessionCreationPolicy(STATELESS)) .authorizeHttpRequests(auth -> auth .requestMatchers("/api/auth/**", "/actuator/health").permitAll() .anyRequest().authenticated() ) .addFilterBefore(jwtAuthFilter, UsernamePasswordAuthenticationFilter.class); return http.build(); } } ``` ### JwtAuthFilter skeleton ```java @Component public class JwtAuthFilter extends OncePerRequestFilter { @Override protected void doFilterInternal(HttpServletRequest req, HttpServletResponse res, FilterChain chain) throws ServletException, IOException { String header = req.getHeader("Authorization"); if (header != null && header.startsWith("Bearer ")) { String token = header.substring(7); if (jwtService.isValid(token)) { String userId = jwtService.extractUserId(token); // load UserDetails, set SecurityContextHolder authentication } } chain.doFilter(req, res); } } ``` ### IngestService — Async Pipeline ```java @Service public class IngestService { @Async public void ingest(KnowledgeItem item) { try { // 1. Mark EXTRACTING updateStatus(item, "EXTRACTING"); String text = switch (item.getType()) { case "YOUTUBE" -> youtubeExtractor.extract(item.getSourceUrl()); case "BLOG", "NEWS", "URL" -> webCrawler.crawl(item.getSourceUrl()); case "TEXT" -> item.getRawContent(); default -> throw new IllegalArgumentException("Unknown type: " + item.getType()); }; item.setExtractedText(text); // 2. Mark CHUNKING updateStatus(item, "CHUNKING"); List chunks = chunkingService.chunk(text); // 3. Mark EMBEDDING updateStatus(item, "EMBEDDING"); for (int i = 0; i < chunks.size(); i++) { float[] embedding = ociEmbeddingService.embed(chunks.get(i)); chunkRepository.save(new Chunk(item.getId(), i, chunks.get(i), embedding)); } // 4. Optionally auto-generate study cards via OCI GenAI studyCardService.generateFromItem(item); updateStatus(item, "READY"); } catch (Exception e) { updateStatus(item, "FAILED"); } } } ``` ### SearchService — Vector Query ```java @Repository public interface ChunkRepository extends JpaRepository { // Native Oracle VECTOR_DISTANCE query @Query(value = """ SELECT c.* FROM chunks c JOIN knowledge_items ki ON c.knowledge_item_id = ki.id WHERE ki.user_id = :userId AND VECTOR_DISTANCE(c.embedding, :queryVec, COSINE) < :threshold ORDER BY VECTOR_DISTANCE(c.embedding, :queryVec, COSINE) FETCH FIRST :topK ROWS ONLY """, nativeQuery = true) List findSimilar( @Param("userId") byte[] userId, @Param("queryVec") float[] queryVec, @Param("threshold") double threshold, @Param("topK") int topK ); } ``` ### HabitStreakScheduler ```java @Component public class HabitStreakScheduler { // Runs every night at 00:05 @Scheduled(cron = "0 5 0 * * *") public void recalculateStreaks() { List activeHabits = habitRepository.findByIsActive(true); LocalDate yesterday = LocalDate.now().minusDays(1); for (Habit habit : activeHabits) { boolean checkedIn = habitLogRepository .existsByHabitIdAndLogDate(habit.getId(), yesterday); if (checkedIn) { habit.setStreakCurrent(habit.getStreakCurrent() + 1); habit.setStreakBest(Math.max(habit.getStreakBest(), habit.getStreakCurrent())); } else { habit.setStreakCurrent(0); // streak broken } habitRepository.save(habit); } } } ``` --- ## 5. AI Layer — OCI Generative AI ### OCI GenAI Services Used | Purpose | Model | Notes | |---------|-------|-------| | Text embeddings | cohere.embed-multilingual-v3 | 1024-dim, multilingual, batch up to 96 texts | | Chat completion | cohere.command-r-plus | Strong RAG / grounding support | | Chat completion (alt) | meta.llama-3-70b-instruct | Alternative if Cohere unavailable | ### OciEmbeddingService ```java @Service public class OciEmbeddingService { private final GenerativeAiInferenceClient client; public float[] embed(String text) { EmbedTextDetails details = EmbedTextDetails.builder() .inputs(List.of(text)) .servingMode(OnDemandServingMode.builder() .modelId(embeddingModelId) .build()) .compartmentId(compartmentId) .inputType(EmbedTextDetails.InputType.SearchDocument) .build(); EmbedTextResponse response = client.embedText(EmbedTextRequest.builder() .embedTextDetails(details) .build()); List vec = response.getEmbedTextResult().getEmbeddings().get(0); float[] result = new float[vec.size()]; for (int i = 0; i < vec.size(); i++) result[i] = vec.get(i).floatValue(); return result; } } ``` ### RAG Prompt Template ``` You are Sundol, your personal secretary dog. Answer the user's question using ONLY the context provided below. If the context does not contain enough information, say so clearly. Always cite the source by referencing [Source N]. CONTEXT: [Source 1] (from: {title_1}) {chunk_content_1} [Source 2] (from: {title_2}) {chunk_content_2} ... (up to top-K chunks) CHAT HISTORY (last 5 turns): {chat_history} USER QUESTION: {user_query} ANSWER: ``` ### YouTube Transcript Extraction Use the YouTube Data API v3 captions endpoint, or as a fallback use the youtube-transcript-api compatible approach: ```java // Option A: youtube-transcript-api (Python microservice or direct HTTP) // GET https://www.youtube.com/watch?v={videoId} // Parse timedtext XML from: https://www.youtube.com/api/timedtext?lang=en&v={videoId} // Option B: YouTube Data API v3 captions.list // Requires OAuth if captions are private; works for auto-generated captions String transcriptUrl = "https://www.googleapis.com/youtube/v3/captions" + "?part=snippet&videoId=" + videoId + "&key=" + youtubeApiKey; ``` ### Web Crawler (Jsoup) ```java @Service public class WebCrawler { public String crawl(String url) throws IOException { Document doc = Jsoup.connect(url) .userAgent("Mozilla/5.0 (compatible; SUNDOL-bot/1.0)") .timeout(10_000) .get(); // Remove nav, footer, ads doc.select("nav, footer, header, script, style, .ad, #cookie-banner").remove(); // Prefer article body Element article = doc.selectFirst("article, main, .post-content, .article-body"); return (article != null ? article : doc.body()).text(); } } ``` ### Text Chunker — Sliding Window ```java @Service public class ChunkingService { // Simple word-count approximation (1 token ≈ 0.75 words) public List chunk(String text) { String[] words = text.split("\\s+"); int chunkWords = (int)(chunkSizeTokens * 0.75); int overlapWords = (int)(chunkOverlapTokens * 0.75); List chunks = new ArrayList<>(); int i = 0; while (i < words.length) { int end = Math.min(i + chunkWords, words.length); chunks.add(String.join(" ", Arrays.copyOfRange(words, i, end))); i += chunkWords - overlapWords; } return chunks; } } ``` --- ## 6. Frontend — Next.js 14 ### Tech Stack | Package | Purpose | |---------|---------| | next 14 (App Router) | Framework, SSR/CSR | | typescript | Type safety | | tailwindcss | Styling | | next-auth v5 | Google OAuth + session management | | @tanstack/react-query | Server state, caching, mutations | | axios | HTTP client with JWT interceptor | | zustand | Lightweight client state | | react-markdown | Render AI responses with markdown | | @radix-ui/react-* | Accessible UI primitives | ### Project Structure ``` sundol-frontend/ ├── app/ │ ├── (auth)/ │ │ └── login/page.tsx # Google sign-in button │ ├── (app)/ │ │ ├── layout.tsx # Sidebar + auth guard │ │ ├── dashboard/page.tsx # Overview: recent items, due cards, todos │ │ ├── knowledge/ │ │ │ ├── page.tsx # Knowledge list with search │ │ │ ├── [id]/page.tsx # Item detail + chunks viewer │ │ │ └── add/page.tsx # Add form: URL / text / YouTube │ │ ├── chat/ │ │ │ ├── page.tsx # Session list │ │ │ └── [sessionId]/page.tsx # Chat interface with citations │ │ ├── study/page.tsx # SRS flashcard review │ │ ├── todos/page.tsx # Todo list with subtasks │ │ ├── habits/page.tsx # Habit grid + check-in │ │ └── settings/page.tsx │ ├── api/auth/[...nextauth]/route.ts # next-auth handler │ └── layout.tsx ├── components/ │ ├── knowledge/ │ │ ├── AddKnowledgeForm.tsx # Detect URL type, show appropriate fields │ │ ├── KnowledgeCard.tsx │ │ └── StatusBadge.tsx # PENDING → READY pipeline status │ ├── chat/ │ │ ├── ChatWindow.tsx │ │ ├── MessageBubble.tsx # Renders markdown, shows citations │ │ └── SourceCitation.tsx │ ├── habits/ │ │ ├── HabitCard.tsx │ │ └── StreakBadge.tsx │ ├── study/ │ │ └── FlashCard.tsx # Flip animation, SM-2 rating buttons │ └── ui/ # Shared: Button, Input, Modal, etc. ├── lib/ │ ├── api.ts # Axios instance with JWT interceptor │ ├── auth.ts # next-auth config │ └── utils.ts └── types/ └── index.ts # Shared TypeScript interfaces ``` ### Axios JWT Interceptor ```typescript // lib/api.ts import axios from 'axios'; import { getSession } from 'next-auth/react'; const api = axios.create({ baseURL: process.env.NEXT_PUBLIC_API_URL }); api.interceptors.request.use(async (config) => { const session = await getSession(); if (session?.accessToken) { config.headers.Authorization = `Bearer ${session.accessToken}`; } return config; }); // Auto-refresh on 401 api.interceptors.response.use( (res) => res, async (error) => { if (error.response?.status === 401) { // Trigger token refresh via next-auth window.location.href = '/api/auth/signin'; } return Promise.reject(error); } ); export default api; ``` ### Knowledge Ingest Flow (Frontend) 1. User pastes URL or text in `AddKnowledgeForm` 2. Frontend detects type (YouTube URL regex, plain URL, raw text) 3. `POST /api/knowledge/ingest` → receives `202 Accepted` with `item.id` 4. Frontend polls `GET /api/knowledge/{id}` every 3s, shows status badge: `PENDING → EXTRACTING → CHUNKING → EMBEDDING → READY` 5. On `READY`, navigates to item detail page --- ## 7. Security — Google SSO + JWT ### Flow ``` 1. User clicks "Sign in with Google" 2. Next.js (next-auth) redirects → Google OAuth consent 3. Google returns authorization code → next-auth exchanges for ID token 4. next-auth calls POST /api/auth/google with { idToken } 5. Spring: a. Verifies ID token via Google API (audience = GOOGLE_CLIENT_ID) b. Extracts { sub, email, name, picture } c. Upserts user in DB (insert if new, update avatar if existing) d. Issues: accessToken (JWT, 15min) + refreshToken (JWT, 30d) e. Stores hashed refreshToken in users.refresh_token 6. next-auth stores tokens in encrypted session cookie (httpOnly) 7. All subsequent API calls send Authorization: Bearer 8. On expiry, next-auth calls POST /api/auth/refresh with { refreshToken } 9. Spring validates refresh token hash, issues new token pair (rotation) ``` ### JWT Claims ```json { "sub": "user-uuid", "email": "user@example.com", "type": "ACCESS", "iat": 1711800000, "exp": 1711800900 } ``` ### next-auth Config Sketch ```typescript // lib/auth.ts export const authOptions: NextAuthOptions = { providers: [ GoogleProvider({ clientId: process.env.GOOGLE_CLIENT_ID!, clientSecret: process.env.GOOGLE_CLIENT_SECRET!, }), ], callbacks: { async signIn({ account }) { // Exchange Google ID token for our own JWT pair const res = await fetch(`${process.env.API_URL}/api/auth/google`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ idToken: account?.id_token }), }); const data = await res.json(); account!.accessToken = data.accessToken; account!.refreshToken = data.refreshToken; return res.ok; }, async jwt({ token, account }) { if (account) { token.accessToken = account.accessToken; token.refreshToken = account.refreshToken; } return token; }, async session({ session, token }) { session.accessToken = token.accessToken as string; return session; }, }, }; ``` --- ## 8. REST API Reference All endpoints require `Authorization: Bearer ` unless marked **public**. ### Auth | Method | Path | Auth | Description | |--------|------|------|-------------| | POST | /api/auth/google | public | Exchange Google ID token → JWT pair | | POST | /api/auth/refresh | public | Refresh access token | | POST | /api/auth/logout | yes | Invalidate refresh token | ### Knowledge | Method | Path | Description | |--------|------|-------------| | GET | /api/knowledge | List items (filter: type, status, tag, search) | | POST | /api/knowledge/ingest | Submit new item for ingestion → 202 | | GET | /api/knowledge/{id} | Get item detail + status | | PATCH | /api/knowledge/{id} | Update title / tags | | DELETE | /api/knowledge/{id} | Delete item + all chunks | | GET | /api/knowledge/{id}/chunks | List chunks for an item | ### Search | Method | Path | Description | |--------|------|-------------| | GET | /api/search?q={query}&topK=5 | Semantic search across user's chunks | ### Chat | Method | Path | Description | |--------|------|-------------| | GET | /api/chat/sessions | List chat sessions | | POST | /api/chat/sessions | Create new session | | GET | /api/chat/sessions/{id}/messages | Get message history | | POST | /api/chat/sessions/{id}/messages | Send message → RAG response | | DELETE | /api/chat/sessions/{id} | Delete session | ### Study Cards | Method | Path | Description | |--------|------|-------------| | GET | /api/study-cards/due | Cards due for review today | | GET | /api/study-cards?knowledgeItemId={id} | Cards for a specific item | | POST | /api/study-cards/generate/{knowledgeItemId} | Trigger AI generation | | POST | /api/study-cards/{id}/review | Submit review result {rating: 1-5} | ### Todos | Method | Path | Description | |--------|------|-------------| | GET | /api/todos | List (filter: status, priority, dueDate) | | POST | /api/todos | Create todo | | PATCH | /api/todos/{id} | Update fields | | DELETE | /api/todos/{id} | Delete (cascades subtasks) | | GET | /api/todos/{id}/subtasks | List subtasks | ### Habits | Method | Path | Description | |--------|------|-------------| | GET | /api/habits | List habits | | POST | /api/habits | Create habit | | PATCH | /api/habits/{id} | Update habit | | DELETE | /api/habits/{id} | Delete habit | | POST | /api/habits/{id}/checkin | Check in for today {note?: string} | | GET | /api/habits/{id}/logs | Log history (filter: from, to dates) | ### Tags | Method | Path | Description | |--------|------|-------------| | GET | /api/tags | List user's tags | | POST | /api/tags | Create tag | | PATCH | /api/tags/{id} | Rename / recolor | | DELETE | /api/tags/{id} | Delete tag (removes from items) | --- ## 9. Key Algorithms ### SM-2 Spaced Repetition (Study Cards) ``` Input: ease_factor (EF), interval, rating (1–5 from user) if rating >= 3 (pass): if interval == 0: new_interval = 1 elif interval == 1: new_interval = 6 else: new_interval = round(interval * EF) new_EF = EF + (0.1 - (5 - rating) * (0.08 + (5 - rating) * 0.02)) new_EF = max(1.3, new_EF) else (fail, rating < 3): new_interval = 1 # reset new_EF = max(1.3, EF - 0.2) next_review_at = now + new_interval days ``` ### RAG Pipeline (ChatService) ``` 1. embed(user_query) → queryVector [1024 floats] 2. searchService.findSimilar(userId, queryVector, topK=5, minScore=0.70) 3. Build prompt: - System: SUNDOL persona + citation instructions - Context: top-K chunks with source titles - History: last 5 messages from session - User: current query 4. ociChatService.complete(prompt) → response text 5. Parse citations from response 6. Save message to DB with source_chunks JSON 7. Return { content, sourceCitations[], tokensUsed } ``` ### Habit Streak Logic (Nightly Job) ``` for each active habit: yesterday = today - 1 day if habit.target_days == "DAILY" or yesterday.dayOfWeek in habit.target_days: if habit_log exists for yesterday AND checked_in = true: streak_current += 1 streak_best = max(streak_best, streak_current) else: streak_current = 0 // broken // days not in target_days are ignored (streak preserved) ``` --- ## 10. Environment Variables ### Backend (.env / application.yml) ``` # Database DB_HOST=localhost DB_SERVICE=XEPDB1 DB_USER=sundol DB_PASSWORD=yourpassword # JWT JWT_SECRET=your-256-bit-hex-secret-at-least-32-chars # Google OAuth GOOGLE_CLIENT_ID=xxx.apps.googleusercontent.com # OCI GenAI OCI_COMPARTMENT_ID=ocid1.compartment.oc1..xxx OCI_REGION=us-chicago-1 # OCI credentials via ~/.oci/config or Instance Principal # YouTube (optional, for private caption access) YOUTUBE_API_KEY=AIza... # Redis REDIS_HOST=localhost REDIS_PORT=6379 ``` ### Frontend (.env.local) ``` NEXT_PUBLIC_API_URL=http://localhost:8080 NEXTAUTH_URL=http://localhost:3000 NEXTAUTH_SECRET=your-nextauth-secret GOOGLE_CLIENT_ID=xxx.apps.googleusercontent.com GOOGLE_CLIENT_SECRET=your-google-secret API_URL=http://localhost:8080 # server-side fetch ``` --- ## 11. Project Structure (Monorepo) ``` sundol/ ├── sundol-backend/ # Spring Boot 3 (Java 21) │ ├── pom.xml │ └── src/ ├── sundol-frontend/ # Next.js 14 (TypeScript) │ ├── package.json │ └── app/ ├── db/ │ └── migration/ # Flyway SQL scripts (V1–V7) ├── docs/ │ └── SUNDOL_SPEC.md # ← this file ├── docker-compose.yml # Oracle 23ai + Redis + backend + frontend └── README.md ``` ### docker-compose.yml (development) ```yaml version: '3.9' services: oracle: image: gvenzl/oracle-free:23-slim environment: ORACLE_PASSWORD: sundolpass APP_USER: sundol APP_USER_PASSWORD: sundol123 ports: - "1521:1521" volumes: - oracle_data:/opt/oracle/oradata redis: image: redis:7-alpine ports: - "6379:6379" backend: build: ./sundol-backend ports: - "8080:8080" environment: DB_HOST: oracle DB_SERVICE: FREEPDB1 DB_USER: sundol DB_PASSWORD: sundol123 JWT_SECRET: ${JWT_SECRET} GOOGLE_CLIENT_ID: ${GOOGLE_CLIENT_ID} OCI_COMPARTMENT_ID: ${OCI_COMPARTMENT_ID} OCI_REGION: ${OCI_REGION} REDIS_HOST: redis volumes: - ~/.oci:/root/.oci:ro # OCI credentials depends_on: - oracle - redis frontend: build: ./sundol-frontend ports: - "3000:3000" environment: NEXT_PUBLIC_API_URL: http://localhost:8080 API_URL: http://backend:8080 NEXTAUTH_URL: http://localhost:3000 NEXTAUTH_SECRET: ${NEXTAUTH_SECRET} GOOGLE_CLIENT_ID: ${GOOGLE_CLIENT_ID} GOOGLE_CLIENT_SECRET: ${GOOGLE_CLIENT_SECRET} depends_on: - backend volumes: oracle_data: ``` --- ## Implementation Order (Recommended for Claude Code) 1. **DB migrations** — run all V1–V7 Flyway scripts, verify Oracle VECTOR index created 2. **Backend: Auth** — Google SSO flow, JWT issue/validate, filter chain 3. **Backend: Knowledge CRUD** — ingest endpoint (202), status polling 4. **Backend: Ingest pipeline** — YoutubeExtractor, WebCrawler, ChunkingService 5. **Backend: OCI Embedding** — embed chunks, store VECTOR in Oracle 6. **Backend: Search** — VECTOR_DISTANCE query, SearchService 7. **Backend: Chat / RAG** — ChatService, OciChatService, prompt builder 8. **Backend: Todo + Habit + Tag** — straightforward CRUD + streak scheduler 9. **Backend: Study Cards** — AI generation + SM-2 review endpoint 10. **Frontend: Auth** — next-auth Google provider, session, axios interceptor 11. **Frontend: Knowledge** — add form, status polling, list view 12. **Frontend: Chat** — session list, chat window, citation rendering 13. **Frontend: Todos + Habits** — list views, check-in, streak display 14. **Frontend: Study Cards** — flip card UI, SM-2 rating buttons 15. **Docker Compose** — wire all services, test end-to-end