Files
sundol/docs/SUNDOL_SPEC.md
joungmin 5d8b0fcdb8 Initial project setup with env template and gitignore
- .gitignore: Java/Maven, Node.js, IDE, OS, credentials
- .env.sample: backend + frontend environment variable template
- README.md: project overview and getting started guide
- CLAUDE.md: development rules and guidelines
- docs/: SUNDOL spec and design patterns guide

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 06:37:07 +00:00

38 KiB
Raw Blame History

S.U.N.D.O.L — System Specification

Smart Unified Natural Dog-Operated Layer Personal Knowledge House · AI Assistant · Productivity Hub

Intended for: Claude Code autonomous implementation Version: 1.0.0 | Date: 2026-03-30


Table of Contents

  1. Product Vision & Features
  2. System Architecture
  3. Database Schema — Oracle 23ai
  4. Backend — Spring Boot 3
  5. AI Layer — OCI Generative AI
  6. Frontend — Next.js 14
  7. Security — Google SSO + JWT
  8. REST API Reference
  9. Key Algorithms
  10. Environment Variables
  11. Project Structure

1. Product Vision & Features

SUNDOL is a personal intelligence layer. It ingests anything valuable (YouTube videos, blog posts, news articles, raw text), extracts and understands the content, and makes it instantly searchable and conversational through AI. It also manages tasks and habits — a single hub for knowledge, productivity, and self-improvement.

Feature Areas

Feature What it does Key details
Knowledge Ingestion Add YouTube, blog, news, raw text, or any URL Auto-extracts transcript (YT), crawls web pages (Jsoup), stores raw text
Semantic Search Find knowledge by meaning, not keywords Oracle 23ai VECTOR_DISTANCE cosine similarity + keyword hybrid
AI Chat (RAG) Ask questions answered from your knowledge base Retrieval-Augmented Generation with cited source chunks, chat history
Study Cards (SRS) Auto-generate flashcards from any knowledge item SM-2 spaced repetition scheduler, ease factor, review intervals
Todos Task and subtask management Priority, due date, status, nested subtasks
Habit Tracker Build good habits, break bad ones Daily check-ins, streak tracking (current + best), BUILD/STOP habit types
Tagging Organise all knowledge with custom tags Many-to-many tagitem, color-coded

2. System Architecture

┌─────────────────────────────────────────────────────────┐
│                  CLIENT TIER                            │
│  Next.js 14 (App Router, TypeScript)                   │
│  Google OAuth 2.0 initiation · JWT in httpOnly cookies │
└────────────────────┬────────────────────────────────────┘
                     │ HTTPS · JWT Bearer token
┌────────────────────▼────────────────────────────────────┐
│                 SECURITY LAYER                          │
│  Spring Security 6 · JwtAuthFilter · Google ID verify  │
└────────────────────┬────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────────────┐
│              SPRING BOOT 3 — SERVICE LAYER              │
│  Auth · Ingest · Search · Chat · Todo · Habit          │
│  StudyCard · Tag · Background @Async workers           │
└──────────┬──────────────────────────┬───────────────────┘
           │                          │
┌──────────▼──────────┐  ┌───────────▼───────────────────┐
│    AI / LLM LAYER   │  │        DATA LAYER             │
│  OCI GenAI Service  │  │  Oracle DB 23ai               │
│  · Cohere / Llama   │  │  · Relational tables          │
│  · Embedding model  │  │  · VECTOR column (chunks)     │
│  YouTube Transcript │  │  · HNSW vector index          │
│  Jsoup web crawler  │  │  Flyway migrations            │
└─────────────────────┘  └───────────────────────────────┘

Communication Patterns

  • Client → API: HTTPS REST, Authorization: Bearer <access_token> on all authenticated endpoints
  • Google SSO: Client → Google consent → Google ID token → Spring verifies → issues own JWT pair (access 15min + refresh 30d)
  • Ingest pipeline: POST /api/ingest returns 202 Accepted. @Async worker extracts text → chunks → embeds via OCI GenAI → stores VECTOR in Oracle
  • RAG query: user query → embed (OCI) → VECTOR_DISTANCE search → top-K chunks → build prompt → OCI GenAI completion → response with citations
  • Habit streaks: nightly @Scheduled job recomputes streaks, marks missed if no log entry for that day

3. Database Schema — Oracle 23ai

Oracle 23ai VECTOR: Use VECTOR(1024, FLOAT32) to match OCI GenAI embed-multilingual-v3 (1024 dimensions). Similarity search uses VECTOR_DISTANCE(embedding, :queryVec, COSINE) with an HNSW index. No separate vector DB needed.

USERS

CREATE TABLE users (
    id            RAW(16) DEFAULT SYS_GUID() PRIMARY KEY,
    email         VARCHAR2(320)  NOT NULL UNIQUE,
    display_name  VARCHAR2(255),
    avatar_url    VARCHAR2(1000),
    google_sub    VARCHAR2(255)  UNIQUE,          -- Google subject ID
    refresh_token VARCHAR2(1000),                 -- hashed rotating refresh token
    created_at    TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL,
    updated_at    TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL
);

KNOWLEDGE_ITEMS

-- type:   YOUTUBE | BLOG | NEWS | TEXT | URL
-- status: PENDING | EXTRACTING | CHUNKING | EMBEDDING | READY | FAILED
CREATE TABLE knowledge_items (
    id             RAW(16) DEFAULT SYS_GUID() PRIMARY KEY,
    user_id        RAW(16) NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    type           VARCHAR2(20)  NOT NULL,
    title          VARCHAR2(1000),
    raw_content    CLOB,                          -- original URL or raw text input
    extracted_text CLOB,                          -- cleaned text after extraction
    source_url     VARCHAR2(2000),
    status         VARCHAR2(20)  DEFAULT 'PENDING',
    metadata       CLOB CHECK (metadata IS JSON), -- e.g. {"duration":3600,"author":"..."}
    created_at     TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL,
    updated_at     TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL
);
CREATE INDEX ki_user_idx   ON knowledge_items(user_id);
CREATE INDEX ki_status_idx ON knowledge_items(status);
CREATE INDEX ki_type_idx   ON knowledge_items(user_id, type);

CHUNKS

-- embedding dimensions match OCI GenAI embed-multilingual-v3 (1024)
CREATE TABLE chunks (
    id                  RAW(16) DEFAULT SYS_GUID() PRIMARY KEY,
    knowledge_item_id   RAW(16) NOT NULL REFERENCES knowledge_items(id) ON DELETE CASCADE,
    chunk_index         NUMBER  NOT NULL,
    content             CLOB    NOT NULL,
    embedding           VECTOR(1024, FLOAT32),
    token_count         NUMBER,
    created_at          TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL
);
CREATE INDEX chunk_ki_idx ON chunks(knowledge_item_id);

-- HNSW approximate nearest-neighbour vector index
CREATE VECTOR INDEX chunk_vec_idx ON chunks(embedding)
    ORGANIZATION INMEMORY NEIGHBOR GRAPH
    DISTANCE COSINE
    WITH TARGET ACCURACY 95;

STUDY_CARDS

CREATE TABLE study_cards (
    id                  RAW(16) DEFAULT SYS_GUID() PRIMARY KEY,
    knowledge_item_id   RAW(16) NOT NULL REFERENCES knowledge_items(id) ON DELETE CASCADE,
    user_id             RAW(16) NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    question            VARCHAR2(2000) NOT NULL,
    answer              CLOB          NOT NULL,
    ease_factor         NUMBER(4,2) DEFAULT 2.5,  -- SM-2 algorithm
    interval_days       NUMBER DEFAULT 1,
    next_review_at      TIMESTAMP DEFAULT SYSTIMESTAMP,
    review_count        NUMBER DEFAULT 0,
    created_at          TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL
);
CREATE INDEX sc_user_review_idx ON study_cards(user_id, next_review_at);

CHAT_SESSIONS & MESSAGES

CREATE TABLE chat_sessions (
    id              RAW(16) DEFAULT SYS_GUID() PRIMARY KEY,
    user_id         RAW(16) NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    title           VARCHAR2(500),
    created_at      TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL,
    last_message_at TIMESTAMP DEFAULT SYSTIMESTAMP
);

-- role: USER | ASSISTANT | SYSTEM
CREATE TABLE messages (
    id            RAW(16) DEFAULT SYS_GUID() PRIMARY KEY,
    session_id    RAW(16) NOT NULL REFERENCES chat_sessions(id) ON DELETE CASCADE,
    role          VARCHAR2(20) NOT NULL,
    content       CLOB        NOT NULL,
    source_chunks CLOB CHECK (source_chunks IS JSON), -- [{chunk_id, score, snippet}]
    tokens_used   NUMBER,
    created_at    TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL
);
CREATE INDEX msg_session_idx ON messages(session_id, created_at);

TODOS

-- status:   TODO | IN_PROGRESS | DONE | CANCELLED
-- priority: LOW | MEDIUM | HIGH | URGENT
CREATE TABLE todos (
    id           RAW(16) DEFAULT SYS_GUID() PRIMARY KEY,
    user_id      RAW(16) NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    parent_id    RAW(16) REFERENCES todos(id) ON DELETE CASCADE, -- null = root task
    title        VARCHAR2(1000) NOT NULL,
    description  CLOB,
    status       VARCHAR2(20) DEFAULT 'TODO',
    priority     VARCHAR2(10) DEFAULT 'MEDIUM',
    due_at       TIMESTAMP,
    completed_at TIMESTAMP,
    created_at   TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL
);
CREATE INDEX todo_user_status_idx ON todos(user_id, status);
CREATE INDEX todo_parent_idx      ON todos(parent_id);

HABITS & HABIT_LOGS

-- habit_type:  BUILD | STOP
-- frequency:   DAILY | WEEKLY
-- target_days: "DAILY" or comma-separated "MON,WED,FRI"
CREATE TABLE habits (
    id             RAW(16) DEFAULT SYS_GUID() PRIMARY KEY,
    user_id        RAW(16) NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    title          VARCHAR2(500) NOT NULL,
    habit_type     VARCHAR2(10) DEFAULT 'BUILD',
    frequency      VARCHAR2(10) DEFAULT 'DAILY',
    target_days    VARCHAR2(100) DEFAULT 'DAILY',
    streak_current NUMBER DEFAULT 0,
    streak_best    NUMBER DEFAULT 0,
    is_active      NUMBER(1) DEFAULT 1,
    created_at     TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL
);

CREATE TABLE habit_logs (
    id         RAW(16) DEFAULT SYS_GUID() PRIMARY KEY,
    habit_id   RAW(16) NOT NULL REFERENCES habits(id) ON DELETE CASCADE,
    log_date   DATE NOT NULL,
    checked_in NUMBER(1) DEFAULT 1,
    note       VARCHAR2(1000),
    created_at TIMESTAMP DEFAULT SYSTIMESTAMP NOT NULL,
    CONSTRAINT habit_log_unique UNIQUE (habit_id, log_date)
);
CREATE INDEX hl_habit_date_idx ON habit_logs(habit_id, log_date);

TAGS & KNOWLEDGE_TAGS

CREATE TABLE tags (
    id      RAW(16) DEFAULT SYS_GUID() PRIMARY KEY,
    user_id RAW(16) NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    name    VARCHAR2(100) NOT NULL,
    color   VARCHAR2(7) DEFAULT '#6366F1',
    CONSTRAINT tag_user_name_unique UNIQUE (user_id, name)
);

CREATE TABLE knowledge_tags (
    knowledge_item_id RAW(16) NOT NULL REFERENCES knowledge_items(id) ON DELETE CASCADE,
    tag_id            RAW(16) NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
    PRIMARY KEY (knowledge_item_id, tag_id)
);

4. Backend — Spring Boot 3

Tech Stack

Dependency Purpose
spring-boot-starter-web REST controllers
spring-boot-starter-security JWT filter chain
spring-boot-starter-data-jpa JPA / Hibernate ORM
ojdbc11 + ucp Oracle JDBC driver + connection pool
flyway-core Schema migrations
jjwt-api + jjwt-impl JWT issue & validation
google-api-client Google ID token verification
jsoup HTML web crawler / text extraction
oci-java-sdk-generativeaiinference OCI GenAI embeddings + chat
spring-boot-starter-data-redis Session cache, rate limiting
spring-boot-starter-validation Request validation

Project Structure

sundol-backend/
├── src/main/java/com/sundol/
│   ├── config/
│   │   ├── SecurityConfig.java           # Spring Security + JWT filter chain
│   │   ├── OciGenAiConfig.java           # OCI SDK client bean
│   │   └── AsyncConfig.java              # @Async thread pool config
│   ├── auth/
│   │   ├── AuthController.java           # POST /api/auth/google, /refresh, /logout
│   │   ├── AuthService.java
│   │   ├── JwtService.java               # issue / validate JWT
│   │   ├── JwtAuthFilter.java            # OncePerRequestFilter
│   │   └── GoogleTokenVerifier.java      # verify Google ID token via Google API
│   ├── knowledge/
│   │   ├── KnowledgeController.java      # CRUD /api/knowledge
│   │   ├── KnowledgeService.java
│   │   ├── IngestService.java            # async orchestrator
│   │   ├── extractor/
│   │   │   ├── YoutubeExtractor.java     # fetch transcript via YouTube Data API v3
│   │   │   └── WebCrawler.java           # Jsoup: URL → clean text
│   │   └── ChunkingService.java          # sliding window text chunker
│   ├── embedding/
│   │   └── OciEmbeddingService.java      # call OCI GenAI embed endpoint
│   ├── search/
│   │   └── SearchService.java            # VECTOR_DISTANCE query + keyword fallback
│   ├── chat/
│   │   ├── ChatController.java           # /api/chat/sessions, /messages
│   │   ├── ChatService.java              # RAG orchestration
│   │   └── OciChatService.java           # OCI GenAI chat completion
│   ├── studycard/
│   │   ├── StudyCardController.java      # /api/study-cards
│   │   └── StudyCardService.java         # SM-2 scheduler + AI generation
│   ├── todo/
│   │   ├── TodoController.java           # /api/todos
│   │   └── TodoService.java
│   ├── habit/
│   │   ├── HabitController.java          # /api/habits
│   │   ├── HabitService.java
│   │   └── HabitStreakScheduler.java     # @Scheduled nightly streak job
│   └── tag/
│       ├── TagController.java            # /api/tags
│       └── TagService.java
└── src/main/resources/
    ├── application.yml
    └── db/migration/
        ├── V1__create_users.sql
        ├── V2__create_knowledge.sql
        ├── V3__create_chunks_vector.sql
        ├── V4__create_chat.sql
        ├── V5__create_todos.sql
        ├── V6__create_habits.sql
        └── V7__create_tags.sql

application.yml

spring:
  datasource:
    url: jdbc:oracle:thin:@${DB_HOST}:1521/${DB_SERVICE}
    username: ${DB_USER}
    password: ${DB_PASSWORD}
    driver-class-name: oracle.jdbc.OracleDriver
  jpa:
    hibernate:
      ddl-auto: validate      # Flyway owns the schema, Hibernate only validates
    show-sql: false
    properties:
      hibernate.dialect: org.hibernate.dialect.OracleDialect
  flyway:
    enabled: true
    locations: classpath:db/migration
  data:
    redis:
      host: ${REDIS_HOST:localhost}
      port: ${REDIS_PORT:6379}

app:
  jwt:
    secret: ${JWT_SECRET}              # min 256-bit hex string
    access-expiry-minutes: 15
    refresh-expiry-days: 30
  google:
    client-id: ${GOOGLE_CLIENT_ID}
  oci:
    compartment-id: ${OCI_COMPARTMENT_ID}
    region: ${OCI_REGION}              # e.g. us-chicago-1
    embedding-model-id: cohere.embed-multilingual-v3   # 1024 dims
    chat-model-id: cohere.command-r-plus               # or meta.llama-3-70b-instruct
  chunking:
    chunk-size-tokens: 512
    chunk-overlap-tokens: 64
  rag:
    top-k: 5                           # chunks to retrieve per query
    min-score: 0.70                    # cosine similarity threshold

SecurityConfig — JWT Filter Chain

@Configuration
@EnableWebSecurity
public class SecurityConfig {

    @Bean
    public SecurityFilterChain filterChain(HttpSecurity http, JwtAuthFilter jwtAuthFilter) throws Exception {
        http
            .csrf(AbstractHttpConfigurer::disable)
            .sessionManagement(s -> s.sessionCreationPolicy(STATELESS))
            .authorizeHttpRequests(auth -> auth
                .requestMatchers("/api/auth/**", "/actuator/health").permitAll()
                .anyRequest().authenticated()
            )
            .addFilterBefore(jwtAuthFilter, UsernamePasswordAuthenticationFilter.class);
        return http.build();
    }
}

JwtAuthFilter skeleton

@Component
public class JwtAuthFilter extends OncePerRequestFilter {
    @Override
    protected void doFilterInternal(HttpServletRequest req, HttpServletResponse res,
                                    FilterChain chain) throws ServletException, IOException {
        String header = req.getHeader("Authorization");
        if (header != null && header.startsWith("Bearer ")) {
            String token = header.substring(7);
            if (jwtService.isValid(token)) {
                String userId = jwtService.extractUserId(token);
                // load UserDetails, set SecurityContextHolder authentication
            }
        }
        chain.doFilter(req, res);
    }
}

IngestService — Async Pipeline

@Service
public class IngestService {

    @Async
    public void ingest(KnowledgeItem item) {
        try {
            // 1. Mark EXTRACTING
            updateStatus(item, "EXTRACTING");
            String text = switch (item.getType()) {
                case "YOUTUBE" -> youtubeExtractor.extract(item.getSourceUrl());
                case "BLOG", "NEWS", "URL" -> webCrawler.crawl(item.getSourceUrl());
                case "TEXT" -> item.getRawContent();
                default -> throw new IllegalArgumentException("Unknown type: " + item.getType());
            };
            item.setExtractedText(text);

            // 2. Mark CHUNKING
            updateStatus(item, "CHUNKING");
            List<String> chunks = chunkingService.chunk(text);

            // 3. Mark EMBEDDING
            updateStatus(item, "EMBEDDING");
            for (int i = 0; i < chunks.size(); i++) {
                float[] embedding = ociEmbeddingService.embed(chunks.get(i));
                chunkRepository.save(new Chunk(item.getId(), i, chunks.get(i), embedding));
            }

            // 4. Optionally auto-generate study cards via OCI GenAI
            studyCardService.generateFromItem(item);

            updateStatus(item, "READY");
        } catch (Exception e) {
            updateStatus(item, "FAILED");
        }
    }
}

SearchService — Vector Query

@Repository
public interface ChunkRepository extends JpaRepository<Chunk, UUID> {

    // Native Oracle VECTOR_DISTANCE query
    @Query(value = """
        SELECT c.* FROM chunks c
        JOIN knowledge_items ki ON c.knowledge_item_id = ki.id
        WHERE ki.user_id = :userId
          AND VECTOR_DISTANCE(c.embedding, :queryVec, COSINE) < :threshold
        ORDER BY VECTOR_DISTANCE(c.embedding, :queryVec, COSINE)
        FETCH FIRST :topK ROWS ONLY
        """, nativeQuery = true)
    List<Chunk> findSimilar(
        @Param("userId") byte[] userId,
        @Param("queryVec") float[] queryVec,
        @Param("threshold") double threshold,
        @Param("topK") int topK
    );
}

HabitStreakScheduler

@Component
public class HabitStreakScheduler {

    // Runs every night at 00:05
    @Scheduled(cron = "0 5 0 * * *")
    public void recalculateStreaks() {
        List<Habit> activeHabits = habitRepository.findByIsActive(true);
        LocalDate yesterday = LocalDate.now().minusDays(1);
        for (Habit habit : activeHabits) {
            boolean checkedIn = habitLogRepository
                .existsByHabitIdAndLogDate(habit.getId(), yesterday);
            if (checkedIn) {
                habit.setStreakCurrent(habit.getStreakCurrent() + 1);
                habit.setStreakBest(Math.max(habit.getStreakBest(), habit.getStreakCurrent()));
            } else {
                habit.setStreakCurrent(0); // streak broken
            }
            habitRepository.save(habit);
        }
    }
}

5. AI Layer — OCI Generative AI

OCI GenAI Services Used

Purpose Model Notes
Text embeddings cohere.embed-multilingual-v3 1024-dim, multilingual, batch up to 96 texts
Chat completion cohere.command-r-plus Strong RAG / grounding support
Chat completion (alt) meta.llama-3-70b-instruct Alternative if Cohere unavailable

OciEmbeddingService

@Service
public class OciEmbeddingService {

    private final GenerativeAiInferenceClient client;

    public float[] embed(String text) {
        EmbedTextDetails details = EmbedTextDetails.builder()
            .inputs(List.of(text))
            .servingMode(OnDemandServingMode.builder()
                .modelId(embeddingModelId)
                .build())
            .compartmentId(compartmentId)
            .inputType(EmbedTextDetails.InputType.SearchDocument)
            .build();

        EmbedTextResponse response = client.embedText(EmbedTextRequest.builder()
            .embedTextDetails(details)
            .build());

        List<Double> vec = response.getEmbedTextResult().getEmbeddings().get(0);
        float[] result = new float[vec.size()];
        for (int i = 0; i < vec.size(); i++) result[i] = vec.get(i).floatValue();
        return result;
    }
}

RAG Prompt Template

You are Sundol, your personal secretary dog. Answer the user's question using ONLY
the context provided below. If the context does not contain enough information,
say so clearly. Always cite the source by referencing [Source N].

CONTEXT:
[Source 1] (from: {title_1})
{chunk_content_1}

[Source 2] (from: {title_2})
{chunk_content_2}

... (up to top-K chunks)

CHAT HISTORY (last 5 turns):
{chat_history}

USER QUESTION:
{user_query}

ANSWER:

YouTube Transcript Extraction

Use the YouTube Data API v3 captions endpoint, or as a fallback use the youtube-transcript-api compatible approach:

// Option A: youtube-transcript-api (Python microservice or direct HTTP)
// GET https://www.youtube.com/watch?v={videoId}
// Parse timedtext XML from: https://www.youtube.com/api/timedtext?lang=en&v={videoId}

// Option B: YouTube Data API v3 captions.list
// Requires OAuth if captions are private; works for auto-generated captions
String transcriptUrl = "https://www.googleapis.com/youtube/v3/captions"
    + "?part=snippet&videoId=" + videoId
    + "&key=" + youtubeApiKey;

Web Crawler (Jsoup)

@Service
public class WebCrawler {
    public String crawl(String url) throws IOException {
        Document doc = Jsoup.connect(url)
            .userAgent("Mozilla/5.0 (compatible; SUNDOL-bot/1.0)")
            .timeout(10_000)
            .get();
        // Remove nav, footer, ads
        doc.select("nav, footer, header, script, style, .ad, #cookie-banner").remove();
        // Prefer article body
        Element article = doc.selectFirst("article, main, .post-content, .article-body");
        return (article != null ? article : doc.body()).text();
    }
}

Text Chunker — Sliding Window

@Service
public class ChunkingService {

    // Simple word-count approximation (1 token ≈ 0.75 words)
    public List<String> chunk(String text) {
        String[] words = text.split("\\s+");
        int chunkWords = (int)(chunkSizeTokens * 0.75);
        int overlapWords = (int)(chunkOverlapTokens * 0.75);
        List<String> chunks = new ArrayList<>();
        int i = 0;
        while (i < words.length) {
            int end = Math.min(i + chunkWords, words.length);
            chunks.add(String.join(" ", Arrays.copyOfRange(words, i, end)));
            i += chunkWords - overlapWords;
        }
        return chunks;
    }
}

6. Frontend — Next.js 14

Tech Stack

Package Purpose
next 14 (App Router) Framework, SSR/CSR
typescript Type safety
tailwindcss Styling
next-auth v5 Google OAuth + session management
@tanstack/react-query Server state, caching, mutations
axios HTTP client with JWT interceptor
zustand Lightweight client state
react-markdown Render AI responses with markdown
@radix-ui/react-* Accessible UI primitives

Project Structure

sundol-frontend/
├── app/
│   ├── (auth)/
│   │   └── login/page.tsx             # Google sign-in button
│   ├── (app)/
│   │   ├── layout.tsx                 # Sidebar + auth guard
│   │   ├── dashboard/page.tsx         # Overview: recent items, due cards, todos
│   │   ├── knowledge/
│   │   │   ├── page.tsx               # Knowledge list with search
│   │   │   ├── [id]/page.tsx          # Item detail + chunks viewer
│   │   │   └── add/page.tsx           # Add form: URL / text / YouTube
│   │   ├── chat/
│   │   │   ├── page.tsx               # Session list
│   │   │   └── [sessionId]/page.tsx   # Chat interface with citations
│   │   ├── study/page.tsx             # SRS flashcard review
│   │   ├── todos/page.tsx             # Todo list with subtasks
│   │   ├── habits/page.tsx            # Habit grid + check-in
│   │   └── settings/page.tsx
│   ├── api/auth/[...nextauth]/route.ts # next-auth handler
│   └── layout.tsx
├── components/
│   ├── knowledge/
│   │   ├── AddKnowledgeForm.tsx       # Detect URL type, show appropriate fields
│   │   ├── KnowledgeCard.tsx
│   │   └── StatusBadge.tsx            # PENDING → READY pipeline status
│   ├── chat/
│   │   ├── ChatWindow.tsx
│   │   ├── MessageBubble.tsx          # Renders markdown, shows citations
│   │   └── SourceCitation.tsx
│   ├── habits/
│   │   ├── HabitCard.tsx
│   │   └── StreakBadge.tsx
│   ├── study/
│   │   └── FlashCard.tsx              # Flip animation, SM-2 rating buttons
│   └── ui/                            # Shared: Button, Input, Modal, etc.
├── lib/
│   ├── api.ts                         # Axios instance with JWT interceptor
│   ├── auth.ts                        # next-auth config
│   └── utils.ts
└── types/
    └── index.ts                       # Shared TypeScript interfaces

Axios JWT Interceptor

// lib/api.ts
import axios from 'axios';
import { getSession } from 'next-auth/react';

const api = axios.create({ baseURL: process.env.NEXT_PUBLIC_API_URL });

api.interceptors.request.use(async (config) => {
  const session = await getSession();
  if (session?.accessToken) {
    config.headers.Authorization = `Bearer ${session.accessToken}`;
  }
  return config;
});

// Auto-refresh on 401
api.interceptors.response.use(
  (res) => res,
  async (error) => {
    if (error.response?.status === 401) {
      // Trigger token refresh via next-auth
      window.location.href = '/api/auth/signin';
    }
    return Promise.reject(error);
  }
);

export default api;

Knowledge Ingest Flow (Frontend)

  1. User pastes URL or text in AddKnowledgeForm
  2. Frontend detects type (YouTube URL regex, plain URL, raw text)
  3. POST /api/knowledge/ingest → receives 202 Accepted with item.id
  4. Frontend polls GET /api/knowledge/{id} every 3s, shows status badge: PENDING → EXTRACTING → CHUNKING → EMBEDDING → READY
  5. On READY, navigates to item detail page

7. Security — Google SSO + JWT

Flow

1. User clicks "Sign in with Google"
2. Next.js (next-auth) redirects → Google OAuth consent
3. Google returns authorization code → next-auth exchanges for ID token
4. next-auth calls POST /api/auth/google with { idToken }
5. Spring:
   a. Verifies ID token via Google API (audience = GOOGLE_CLIENT_ID)
   b. Extracts { sub, email, name, picture }
   c. Upserts user in DB (insert if new, update avatar if existing)
   d. Issues: accessToken (JWT, 15min) + refreshToken (JWT, 30d)
   e. Stores hashed refreshToken in users.refresh_token
6. next-auth stores tokens in encrypted session cookie (httpOnly)
7. All subsequent API calls send Authorization: Bearer <accessToken>
8. On expiry, next-auth calls POST /api/auth/refresh with { refreshToken }
9. Spring validates refresh token hash, issues new token pair (rotation)

JWT Claims

{
  "sub": "user-uuid",
  "email": "user@example.com",
  "type": "ACCESS",
  "iat": 1711800000,
  "exp": 1711800900
}

next-auth Config Sketch

// lib/auth.ts
export const authOptions: NextAuthOptions = {
  providers: [
    GoogleProvider({
      clientId: process.env.GOOGLE_CLIENT_ID!,
      clientSecret: process.env.GOOGLE_CLIENT_SECRET!,
    }),
  ],
  callbacks: {
    async signIn({ account }) {
      // Exchange Google ID token for our own JWT pair
      const res = await fetch(`${process.env.API_URL}/api/auth/google`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ idToken: account?.id_token }),
      });
      const data = await res.json();
      account!.accessToken = data.accessToken;
      account!.refreshToken = data.refreshToken;
      return res.ok;
    },
    async jwt({ token, account }) {
      if (account) {
        token.accessToken = account.accessToken;
        token.refreshToken = account.refreshToken;
      }
      return token;
    },
    async session({ session, token }) {
      session.accessToken = token.accessToken as string;
      return session;
    },
  },
};

8. REST API Reference

All endpoints require Authorization: Bearer <token> unless marked public.

Auth

Method Path Auth Description
POST /api/auth/google public Exchange Google ID token → JWT pair
POST /api/auth/refresh public Refresh access token
POST /api/auth/logout yes Invalidate refresh token

Knowledge

Method Path Description
GET /api/knowledge List items (filter: type, status, tag, search)
POST /api/knowledge/ingest Submit new item for ingestion → 202
GET /api/knowledge/{id} Get item detail + status
PATCH /api/knowledge/{id} Update title / tags
DELETE /api/knowledge/{id} Delete item + all chunks
GET /api/knowledge/{id}/chunks List chunks for an item
Method Path Description
GET /api/search?q={query}&topK=5 Semantic search across user's chunks

Chat

Method Path Description
GET /api/chat/sessions List chat sessions
POST /api/chat/sessions Create new session
GET /api/chat/sessions/{id}/messages Get message history
POST /api/chat/sessions/{id}/messages Send message → RAG response
DELETE /api/chat/sessions/{id} Delete session

Study Cards

Method Path Description
GET /api/study-cards/due Cards due for review today
GET /api/study-cards?knowledgeItemId={id} Cards for a specific item
POST /api/study-cards/generate/{knowledgeItemId} Trigger AI generation
POST /api/study-cards/{id}/review Submit review result {rating: 1-5}

Todos

Method Path Description
GET /api/todos List (filter: status, priority, dueDate)
POST /api/todos Create todo
PATCH /api/todos/{id} Update fields
DELETE /api/todos/{id} Delete (cascades subtasks)
GET /api/todos/{id}/subtasks List subtasks

Habits

Method Path Description
GET /api/habits List habits
POST /api/habits Create habit
PATCH /api/habits/{id} Update habit
DELETE /api/habits/{id} Delete habit
POST /api/habits/{id}/checkin Check in for today {note?: string}
GET /api/habits/{id}/logs Log history (filter: from, to dates)

Tags

Method Path Description
GET /api/tags List user's tags
POST /api/tags Create tag
PATCH /api/tags/{id} Rename / recolor
DELETE /api/tags/{id} Delete tag (removes from items)

9. Key Algorithms

SM-2 Spaced Repetition (Study Cards)

Input: ease_factor (EF), interval, rating (15 from user)

if rating >= 3 (pass):
    if interval == 0:   new_interval = 1
    elif interval == 1: new_interval = 6
    else:               new_interval = round(interval * EF)
    new_EF = EF + (0.1 - (5 - rating) * (0.08 + (5 - rating) * 0.02))
    new_EF = max(1.3, new_EF)
else (fail, rating < 3):
    new_interval = 1    # reset
    new_EF = max(1.3, EF - 0.2)

next_review_at = now + new_interval days

RAG Pipeline (ChatService)

1. embed(user_query) → queryVector [1024 floats]
2. searchService.findSimilar(userId, queryVector, topK=5, minScore=0.70)
3. Build prompt:
   - System: SUNDOL persona + citation instructions
   - Context: top-K chunks with source titles
   - History: last 5 messages from session
   - User: current query
4. ociChatService.complete(prompt) → response text
5. Parse citations from response
6. Save message to DB with source_chunks JSON
7. Return { content, sourceCitations[], tokensUsed }

Habit Streak Logic (Nightly Job)

for each active habit:
    yesterday = today - 1 day
    if habit.target_days == "DAILY" or yesterday.dayOfWeek in habit.target_days:
        if habit_log exists for yesterday AND checked_in = true:
            streak_current += 1
            streak_best = max(streak_best, streak_current)
        else:
            streak_current = 0   // broken
    // days not in target_days are ignored (streak preserved)

10. Environment Variables

Backend (.env / application.yml)

# Database
DB_HOST=localhost
DB_SERVICE=XEPDB1
DB_USER=sundol
DB_PASSWORD=yourpassword

# JWT
JWT_SECRET=your-256-bit-hex-secret-at-least-32-chars

# Google OAuth
GOOGLE_CLIENT_ID=xxx.apps.googleusercontent.com

# OCI GenAI
OCI_COMPARTMENT_ID=ocid1.compartment.oc1..xxx
OCI_REGION=us-chicago-1
# OCI credentials via ~/.oci/config or Instance Principal

# YouTube (optional, for private caption access)
YOUTUBE_API_KEY=AIza...

# Redis
REDIS_HOST=localhost
REDIS_PORT=6379

Frontend (.env.local)

NEXT_PUBLIC_API_URL=http://localhost:8080
NEXTAUTH_URL=http://localhost:3000
NEXTAUTH_SECRET=your-nextauth-secret

GOOGLE_CLIENT_ID=xxx.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=your-google-secret

API_URL=http://localhost:8080   # server-side fetch

11. Project Structure (Monorepo)

sundol/
├── sundol-backend/          # Spring Boot 3 (Java 21)
│   ├── pom.xml
│   └── src/
├── sundol-frontend/         # Next.js 14 (TypeScript)
│   ├── package.json
│   └── app/
├── db/
│   └── migration/           # Flyway SQL scripts (V1V7)
├── docs/
│   └── SUNDOL_SPEC.md       # ← this file
├── docker-compose.yml       # Oracle 23ai + Redis + backend + frontend
└── README.md

docker-compose.yml (development)

version: '3.9'
services:
  oracle:
    image: gvenzl/oracle-free:23-slim
    environment:
      ORACLE_PASSWORD: sundolpass
      APP_USER: sundol
      APP_USER_PASSWORD: sundol123
    ports:
      - "1521:1521"
    volumes:
      - oracle_data:/opt/oracle/oradata

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

  backend:
    build: ./sundol-backend
    ports:
      - "8080:8080"
    environment:
      DB_HOST: oracle
      DB_SERVICE: FREEPDB1
      DB_USER: sundol
      DB_PASSWORD: sundol123
      JWT_SECRET: ${JWT_SECRET}
      GOOGLE_CLIENT_ID: ${GOOGLE_CLIENT_ID}
      OCI_COMPARTMENT_ID: ${OCI_COMPARTMENT_ID}
      OCI_REGION: ${OCI_REGION}
      REDIS_HOST: redis
    volumes:
      - ~/.oci:/root/.oci:ro   # OCI credentials
    depends_on:
      - oracle
      - redis

  frontend:
    build: ./sundol-frontend
    ports:
      - "3000:3000"
    environment:
      NEXT_PUBLIC_API_URL: http://localhost:8080
      API_URL: http://backend:8080
      NEXTAUTH_URL: http://localhost:3000
      NEXTAUTH_SECRET: ${NEXTAUTH_SECRET}
      GOOGLE_CLIENT_ID: ${GOOGLE_CLIENT_ID}
      GOOGLE_CLIENT_SECRET: ${GOOGLE_CLIENT_SECRET}
    depends_on:
      - backend

volumes:
  oracle_data:

  1. DB migrations — run all V1V7 Flyway scripts, verify Oracle VECTOR index created
  2. Backend: Auth — Google SSO flow, JWT issue/validate, filter chain
  3. Backend: Knowledge CRUD — ingest endpoint (202), status polling
  4. Backend: Ingest pipeline — YoutubeExtractor, WebCrawler, ChunkingService
  5. Backend: OCI Embedding — embed chunks, store VECTOR in Oracle
  6. Backend: Search — VECTOR_DISTANCE query, SearchService
  7. Backend: Chat / RAG — ChatService, OciChatService, prompt builder
  8. Backend: Todo + Habit + Tag — straightforward CRUD + streak scheduler
  9. Backend: Study Cards — AI generation + SM-2 review endpoint
  10. Frontend: Auth — next-auth Google provider, session, axios interceptor
  11. Frontend: Knowledge — add form, status polling, list view
  12. Frontend: Chat — session list, chat window, citation rendering
  13. Frontend: Todos + Habits — list views, check-in, streak display
  14. Frontend: Study Cards — flip card UI, SM-2 rating buttons
  15. Docker Compose — wire all services, test end-to-end