diff --git a/CHANGELOG.md b/CHANGELOG.md index 9a7db09..c6572ec 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,15 @@ ## 2026-06-15 +### ๐Ÿ”ง #316 โ€” backend resource request ์žฌ์‚ฐ์ • + RollingUpdate ์ •์ฑ… ๋ณต๊ท€ +- **๋ณ€๊ฒฝ ์ „**: cpu 500m/1, mem 768Mi/1536Mi, strategy maxSurge=0/maxUnavailable=1 (์ž„์‹œ ํŒจ์น˜) +- **๋ณ€๊ฒฝ ํ›„**: cpu 300m/800m, mem 512Mi/1024Mi, strategy 25%/25% (๊ธฐ๋ณธ ๋ณต๊ท€) +- **๊ทผ๊ฑฐ**: ์‹ค์ธก idle 0.7% CPU, RSS ~305 MB. peak 30-40% ์ถ”์ • ์•ˆ์—์„œ ์•ˆ์ „. +- **๊ฒ€์ฆ**: rollout ํ›„ ๋…ธ๋“œ ์ž”์—ฌ 330m โ†’ ๋‹ค์Œ ๋ฐฐํฌ ์‹œ ๋‘ Pod ๊ณต์กด ๊ฐ€๋Šฅ, ๋ฌด์ค‘๋‹จ RollingUpdate ํšŒ๋ณต. +- **๋‹ค์šดํƒ€์ž„**: ์ด๋ฒˆ 1ํšŒ ~25์ดˆ (๊ตฌ Pod 500m ์ ์œ  ํ•ด์ œ ์œ„ํ•ด ๊ฐ•์ œ ์ข…๋ฃŒ). ๋‹ค์Œ ๋ฐฐํฌ๋ถ€ํ„ฐ 0์ดˆ. +- **์„ค๊ณ„์„œ**: `docs/design/316-backend-resource-rightsize/README.md` (Approved). +- Refs: #316 (close) + ### ๐Ÿ— OKE ์ธํ”„๋ผ โ€” ๋…ธ๋“œ ๋‹ค์šด์‚ฌ์ด์ง• + LB ์ •๋ฆฌ - **Orphan Classic LB ์‚ญ์ œ**: 132.226.175.247 (100Mbps shape, OKEclusterName ํƒœ๊ทธ๋งŒ ๋‚จ๊ณ  DNS/Service ์ฐธ์กฐ ์—†์Œ) โ†’ ๋น„์šฉ ์ ˆ๊ฐ - **๋…ธ๋“œํ’€ ๊ต์ฒด (๋ธ”๋ฃจ-๊ทธ๋ฆฐ)**: `pool1` (2 ๋…ธ๋“œ ร— 2 OCPU / 8 GB) โ†’ `pool2` (2 ๋…ธ๋“œ ร— 1 OCPU / 6 GB) diff --git a/docs/design/316-backend-resource-rightsize/README.md b/docs/design/316-backend-resource-rightsize/README.md new file mode 100644 index 0000000..7112914 --- /dev/null +++ b/docs/design/316-backend-resource-rightsize/README.md @@ -0,0 +1,127 @@ +# ์„ค๊ณ„์„œ: backend resource request ์žฌ์‚ฐ์ • (#316) + +> **์ƒํƒœ**: Approved +> **์ž‘์„ฑ**: [AI] Architect ยท **์ตœ์ข…์ˆ˜์ •**: 2026-06-15 +> **์ถ”์ ์„ฑ** โ€” Redmine: #316 ยท ๊ด€๋ จ ADR: ์—†์Œ ยท ๋ถ€๋ชจ ์ด์Šˆ: #267 (ํ˜„ํ–‰ํ™”/๋ฐฐํฌ ์ปจํ…์ŠคํŠธ) +> ยท ๊ตฌํ˜„ ํŒŒ์ผ: `k8s/backend-deployment.yaml` +> ยท ํ…Œ์ŠคํŠธ: kubectl rollout ๋ฌด์ค‘๋‹จ (์ˆ˜๋™ ๊ฒ€์ฆ) ยท ์ž๋™ ํ…Œ์ŠคํŠธ ์—†์Œ + +## 1. ๋ชฉ์  (Why) + +๋…ธ๋“œ ๋‹ค์šด์‚ฌ์ด์ง•(2 ร— 2 OCPU/8 GB โ†’ 2 ร— 1 OCPU/6 GB) ์ดํ›„ `backend` Deployment์˜ CPU request 500m์ด ๋…ธ๋“œ ๊ฐ€์šฉ ์ž์›์˜ ์ ˆ๋ฐ˜์„ ์ฐจ์ง€ํ•˜์—ฌ, RollingUpdate ์‹œ ์‹ /๊ตฌ Pod ๊ณต์กด์ด ๋ถˆ๊ฐ€๋Šฅ. ์ž„์‹œ๋กœ `maxSurge=0, maxUnavailable=1` ํŒจ์น˜(๋งค ๋ฐฐํฌ ~30์ดˆ ๋‹ค์šดํƒ€์ž„) ์ƒํƒœ๋ฅผ ํ•ฉ๋ฆฌํ™”ํ•˜์—ฌ 25%/25% ์ •์ฑ…์œผ๋กœ ๋ณต๊ท€ํ•˜๊ณ  ๋ฌด์ค‘๋‹จ ๋ฐฐํฌ๋ฅผ ํšŒ๋ณตํ•œ๋‹ค. + +## 2. ๋ฒ”์œ„ (Scope) + +- **ํฌํ•จ** + - `k8s/backend-deployment.yaml`์˜ `resources.requests`/`limits` ์žฌ์‚ฐ์ •. + - ๊ฐ™์€ ํŒŒ์ผ์˜ `spec.strategy` ๋˜๋Š” ๋ผ์ด๋ธŒ deploy์˜ strategy๋ฅผ 25%/25%๋กœ ๋ณต๊ท€(์ด๋ฏธ ํŒจ์น˜๋˜์–ด ์žˆ๋‹ค๋ฉด ๋””ํดํŠธ๋กœ ํ™˜์›). +- **์ œ์™ธ (out of scope)** + - frontend/redis ๋“ฑ ๋‹ค๋ฅธ Deployment. + - JVM heap/-XX ์˜ต์…˜(๋ณ„๋„ ํŠœ๋‹ ์ด์Šˆ). + - HPA, VPA ๋„์ž…. + - ๋…ธ๋“œ ์ˆ˜/ํ˜•์ƒ ์ถ”๊ฐ€ ๋ณ€๊ฒฝ. + +## 3. ์ธ์ˆ˜์กฐ๊ฑด (Acceptance Criteria) + +- [ ] backend CPU request โ‰ค 300m, memory request โ‰ค 512Mi (์‹ค์ธก ~305 MB ์‚ฌ์šฉ ๊ธฐ์ค€ ์—ฌ์œ  ํฌํ•จ). +- [ ] limit์€ ๋…ธ๋“œ ํ•œ๋„(1 OCPU / 6 GB) ์•ˆ์—์„œ cpu โ‰ค 800m, mem โ‰ค 1Gi. +- [ ] Deployment strategy: `maxSurge: 25%`, `maxUnavailable: 25%`(๋˜๋Š” default). +- [ ] `kubectl apply` ์งํ›„ ์ƒˆ Pod์ด Pending ์—†์ด Running ์ง„์ž…. +- [ ] ์ ์šฉ ๋™์•ˆ `https://www.tasteby.net/api/health` ๊ฐ€ ๋Š๊น€ ์—†์ด 200 ์‘๋‹ต. +- [ ] ๋ณ€๊ฒฝ ์‚ฌํ•ญ์„ git ์ปค๋ฐ‹ยทpush. + +## 4. ์ปจํ…์ŠคํŠธ & ์ œ์•ฝ + +- ์˜์กด์„ฑ: OKE 1.34.2, ARM64 ๋…ธ๋“œ 1 OCPU/6 GB ร— 2. +- ๋…ธ๋“œ ๊ฐ€์šฉ CPU(allocatable, ์‹œ์Šคํ…œ ๋ฐ๋ชฌ ์ฐจ๊ฐ ํ›„): ์•ฝ 940-960m per node. +- ๋…ธ๋“œ ๊ฐ€์šฉ ๋ฉ”๋ชจ๋ฆฌ(allocatable): ์•ฝ 5.0-5.3 GiB per node. +- ๊ฐ™์€ ๋…ธ๋“œ์— frontend(200m/256Mi), kube-system DaemonSet๋“ค(์•ฝ 200m/300Mi ํ•ฉ๊ณ„), ๊ฐ€๋” redis/cert-manager Pod. +- ๋‘ backend Pod์ด ํ•œ ๋…ธ๋“œ์— ๊ณต์กดํ•˜์ง€ ์•Š์•„๋„ ๋จ(replicas=1์ด์ง€๋งŒ RollingUpdate ๋™์•ˆ ์ผ์‹œ์ ์œผ๋กœ 2๊ฐœ). +- ์ œ์•ฝ: ๋น„์šฉ X(์ฝ”๋“œ ๋ณ€๊ฒฝ ์—†์Œ), ์šด์˜ ์˜ํ–ฅ(์ž‘์ง€๋งŒ rollout ํ•œ ๋ฒˆ ๋ฐœ์ƒ). +- ๊ฐ€์ •: ์šด์˜ ์‹ค์ธก idle CPU 0.7%, peak ์ถ”์ • 30-40% (์˜์ƒ ์ถ”์ถœยท๋ฒกํ„ฐ ๊ฒ€์ƒ‰ ์‹œ). + +## 5. ์•„ํ‚คํ…์ฒ˜ ๊ฐœ์š” + +``` +git: k8s/backend-deployment.yaml + โ”‚ + โ–ผ (์ˆ˜์ •) +spec.replicas: 1 (๋ณ€๊ฒฝ ์—†์Œ) +spec.strategy: + type: RollingUpdate + rollingUpdate: + maxSurge: 25% (=0 โ†’ 25%, ์ž„์‹œ ํŒจ์น˜ ๋ณต๊ท€) + maxUnavailable: 25% (=1 โ†’ 25%, ์ž„์‹œ ํŒจ์น˜ ๋ณต๊ท€) +spec.template.spec.containers[0].resources: + requests: { cpu: 300m, memory: 512Mi } (500m/768Mi โ†’ ๋‹ค์šด) + limits: { cpu: 800m, memory: 1024Mi } (1/1536 โ†’ ๋‹ค์šด) + โ”‚ + โ–ผ (kubectl apply) +OKE rolling update โ†’ ์ƒˆ Pod 1๊ฐœ surge โ†’ Ready โ†’ ๊ตฌ Pod ์ข…๋ฃŒ + โ”‚ + โ–ผ +www.tasteby.net ํŠธ๋ž˜ํ”ฝ ๋ฌด์ค‘๋‹จ (Ingress โ†’ Service โ†’ Pod, 100ms ์‘๋‹ต ์œ ์ง€) +``` + +I/O ๊ฒฝ๊ณ„: ๋งค๋‹ˆํŽ˜์ŠคํŠธ(์„ ์–ธ)์™€ ํด๋Ÿฌ์Šคํ„ฐ ์ƒํƒœ(๋Ÿฐํƒ€์ž„)๋Š” `kubectl apply`๋กœ ๋‹จ๋ฐฉํ–ฅ ๋™๊ธฐํ™”. ๊ฒ€์ฆ์€ `rollout status` + ์™ธ๋ถ€ curl. + +## 6. ๋ฐ์ดํ„ฐ ๋ชจ๋ธ + +| ํ•„๋“œ | ๋ณ€๊ฒฝ ์ „ | ๋ณ€๊ฒฝ ํ›„ | ๊ทผ๊ฑฐ | +|------|---------|---------|------| +| `resources.requests.cpu` | `500m` | `300m` | idle 0.7%, peak ์ถ”์ • 30%, 1 OCPU ๋…ธ๋“œ์—์„œ frontend(200m) + ์ž”์—ฌ ์‹œ์Šคํ…œ(~200m) ํ›„ ์—ฌ์œ  | +| `resources.requests.memory` | `768Mi` | `512Mi` | ์‹ค์ธก ~305 MB, JVM heapยท์ฝ”๋“œ์บ์‹œยท๋ฉ”ํƒ€์ŠคํŽ˜์ด์Šค ์—ฌ์œ  | +| `resources.limits.cpu` | `1` | `800m` | 1 OCPU ๋…ธ๋“œ์—์„œ throttle ๋ฐฉ์ง€ + ๋‹ค๋ฅธ Pod ์—ฌ์œ  | +| `resources.limits.memory` | `1536Mi` | `1024Mi` | OOM ์œ„ํ—˜ ์ค„์ด๊ณ  ๋…ธ๋“œ๋‹น 5 GiB allocatable์—์„œ ์•ˆ์ • | +| `strategy.rollingUpdate.maxSurge` | `0` (์ž„์‹œ) | `25%` | ๋ฌด์ค‘๋‹จ RollingUpdate ๋ณต๊ท€ | +| `strategy.rollingUpdate.maxUnavailable` | `1` (์ž„์‹œ) | `25%` | ๋™์ผ | + +## 7. ํ•จ์ˆ˜ ๋ช…์„ธ + +๋งค๋‹ˆํŽ˜์ŠคํŠธ ๋ณ€๊ฒฝ์ด๋ผ ์ฝ”๋“œ ํ•จ์ˆ˜ ์—†์Œ. **๋ณ€๊ฒฝ ๋‹จ์œ„ ํ‘œ**: + +| ๋ณ€๊ฒฝ ๋‹จ์œ„ | ์œ„์น˜ | ์ฑ…์ž„ | ๊ฒ€์ฆ | +|-----------|------|------|------| +| `requests.cpu` ๋‹ค์šด | `k8s/backend-deployment.yaml:37` | ์ƒˆ Pod ์Šค์ผ€์ค„๋ง ๊ฐ€๋Šฅ | `kubectl describe pod` Events์— FailedScheduling ์—†์Œ | +| `requests.memory` ๋‹ค์šด | `k8s/backend-deployment.yaml:38` | ๊ฐ™์Œ + OOM ์•ˆ์ „ | Pod RSS / requests = 70% ์ดํ•˜ | +| `limits.cpu` ๋‹ค์šด | `k8s/backend-deployment.yaml:40` | throttle ์ œ์–ด | `cpu.stat` throttled_usec ์•ˆ ๋Š˜์–ด๋‚จ | +| `limits.memory` ๋‹ค์šด | `k8s/backend-deployment.yaml:41` | OOM ๋ณดํ˜ธ | OOMKilled ์—†์Œ | +| `strategy` ์ถ”๊ฐ€ | `k8s/backend-deployment.yaml` spec | 25%/25% ๋ช…์‹œ | live patch์™€ GitOps ์ผ์น˜ | + +> ๋ชจ๋‘ ๋‹จ์ˆœ ์„ ์–ธ์  ๋ณ€๊ฒฝ. ๋ณต์žก ํ•จ์ˆ˜ ๋ณ„๋„ fn-*.md ๋ถˆํ•„์š”. + +## 8. ํ๋ฆ„ / ์•Œ๊ณ ๋ฆฌ์ฆ˜ + +1. `k8s/backend-deployment.yaml` ํŽธ์ง‘(resources + strategy). +2. `kubectl apply -f k8s/backend-deployment.yaml`. +3. ์‹  Pod 1๊ฐœ surge โ†’ Ready ๋Œ€๊ธฐ (~30-60์ดˆ, JVM startup). +4. Ready ๋˜๋ฉด ๊ตฌ Pod ์ข…๋ฃŒ(graceful, terminationGracePeriodSeconds ๊ธฐ๋ณธ 30์ดˆ). +5. `kubectl rollout status deploy/backend -n tasteby` PASS. +6. ์™ธ๋ถ€ `curl https://www.tasteby.net/api/health` ์—ฐ์† 200 ํ™•์ธ. + +## 9. ์—ฃ์ง€์ผ€์ด์Šค & ์—๋Ÿฌ ์ฒ˜๋ฆฌ + +- **JVM startup์ด readinessProbe initialDelaySeconds(30s)๋ณด๋‹ค ๊ธธ๋ฉด**: ์ƒˆ Pod์ด Ready ๋ชป ๋ฐ›์Œ โ†’ ๊ตฌ Pod ์œ ์ง€ โ†’ rollout ์ง„ํ–‰ ์•ˆ ๋จ. ํ˜„์žฌ backend๋Š” ๋ณดํ†ต 20-25์ดˆ์— Ready. +- **๋…ธ๋“œ ๊ฐ€์šฉ ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ**: ๋‘ backend Pod์ด ํ•œ ๋…ธ๋“œ์— ๊ฐˆ ๊ฒฝ์šฐ ~1 GiB ์ฐจ์ง€. frontend + DaemonSet ํ•ฉ์น˜๋ฉด ์••๋ฐ• ๊ฐ€๋Šฅ. scheduler๊ฐ€ ์ ์ ˆํžˆ ๋ถ„์‚ฐ ๊ธฐ๋Œ€ (replicas=1 surge ์‹œ). +- **OCIR ํ’€ ์‹คํŒจ**: imagePullBackOff ์‹œ rollout ์ค‘๋‹จ. ์ด๋ฏธ์ง€ ์ด๋ฏธ ํ’€๋ผ ์žˆ์œผ๋ฏ€๋กœ ์˜ํ–ฅ ์ ์Œ. +- **rollback**: ๋ฌธ์ œ ์‹œ `kubectl rollout undo deploy/backend` ๋˜๋Š” git revert + apply. + +## 10. ํ…Œ์ŠคํŠธ ๊ณ„ํš + +- ์ˆ˜๋™: ์ ์šฉ ์งํ›„ ์ƒˆ Pod Running ํ™•์ธ + ์™ธ๋ถ€ health 200 ์—ฐ์†(์•ฝ 2๋ถ„๊ฐ„ 5์ดˆ ๊ฐ„๊ฒฉ polling). +- ๋ถ€ํ•˜ ์ธก์ • ํ›„์†: ์šด์˜ ๋ถ€ํ•˜ 24์‹œ๊ฐ„ ๊ด€์ฐฐ โ†’ CPU throttle/OOM ์—†์Œ ํ™•์ธ (๋ณ„๋„ follow-up). +- ์ž๋™ ํ…Œ์ŠคํŠธ: ํ•ด๋‹น ์—†์Œ (์ธํ”„๋ผ ๋งค๋‹ˆํŽ˜์ŠคํŠธ). + +## 11. ๋ฆฌ์Šคํฌ & ๋Œ€์•ˆ ๊ฒ€ํ†  + +- **์„ ํƒ**: cpu 300m / mem 512Mi. +- **๋Œ€์•ˆ A**: cpu 250m / mem 384Mi โ€” ๋” ์—ฌ์œ ๋กญ์ง€๋งŒ peak ์‹œ throttle ๊ฐ€๋Šฅ์„ฑ. +- **๋Œ€์•ˆ B**: cpu 400m / mem 640Mi โ€” ์•ˆ์ „ ๋งˆ์ง„ ํฌ์ง€๋งŒ 25%/25% ๋ณต๊ท€ํ•ด๋„ ๋‘ Pod ๊ณต์กด ๋ถˆ๊ฐ€ ๊ฐ€๋Šฅ(๋…ธ๋“œ ์ž”์—ฌ CPU ๋ถ€์กฑ). +- **๋Œ€์•ˆ C**: replicas=2 + topologySpreadConstraints โ€” ๊ฐ€์šฉ์„ฑโ†‘์ด์ง€๋งŒ ๋น„์šฉยท๋ฆฌ์†Œ์Šคโ†‘, ํ˜„์žฌ ๋…ธ๋“œ ํ•œ๋„์—์„œ ๋ถ€์ ํ•ฉ. +- **ํŠธ๋ ˆ์ด๋“œ์˜คํ”„**: ์„ ํƒ์•ˆ์€ peak์—์„œ ์•ฝ๊ฐ„ ๋น ๋“ฏํ•˜๋‚˜ limits 800m๋กœ burst ํ—ˆ์šฉ. ์šด์˜ 24์‹œ๊ฐ„ ๊ด€์ฐฐ ํ›„ ์žฌ์กฐ์ •. + +## 12. ๋ฏธํ•ด๊ฒฐ ์งˆ๋ฌธ + +- peak ์‹œ(์˜์ƒ ์ถ”์ถœ ๋™์‹œ ๋‹ค์ˆ˜) CPU ์‹ค์ œ ์‚ฌ์šฉ๋Ÿ‰ โ€” ํ˜„์žฌ metrics-server ๋ฏธ์„ค์น˜๋ผ ์ •ํ™• ์ธก์ • ๋ถˆ๊ฐ€. ์ถ”ํ›„ ์„ค์น˜ ํ›„ ์žฌ์‚ฐ์ •. +- HPA ๋„์ž… ์—ฌ๋ถ€ โ€” ๋…ธ๋“œ 1 OCPU์—์„  ์˜๋ฏธ ์ ์Œ. ๋…ธ๋“œ ์ถ”๊ฐ€ ํ›„ ๊ฒ€ํ† . +- replicas=2 ๊ฐ€์šฉ์„ฑ ๊ฐ•ํ™” โ€” ์ƒˆ ๋…ธ๋“œ ํ˜•์ƒ์—์„œ ๋ฉ”๋ชจ๋ฆฌ ์••๋ฐ• ์šฐ๋ ค, ๋ณ„๋„ ๊ฒฐ์ • ํ•„์š”. diff --git a/k8s/backend-deployment.yaml b/k8s/backend-deployment.yaml index b88c6c0..9a4c290 100644 --- a/k8s/backend-deployment.yaml +++ b/k8s/backend-deployment.yaml @@ -5,6 +5,11 @@ metadata: namespace: tasteby spec: replicas: 1 + strategy: + type: RollingUpdate + rollingUpdate: + maxSurge: 25% + maxUnavailable: 25% selector: matchLabels: app: backend @@ -34,11 +39,11 @@ spec: readOnly: true resources: requests: - cpu: 500m - memory: 768Mi + cpu: 300m + memory: 512Mi limits: - cpu: "1" - memory: 1536Mi + cpu: 800m + memory: 1024Mi readinessProbe: tcpSocket: port: 8000