# Implementation Plan: Unified Workflow Engine — Production Hardening & Integrated Context **Branch**: `003-unified-workflow-engine` | **Date**: 2026-05-02 | **Spec**: [spec.md](./spec.md) **Input**: Feature specification from `specs/003-unified-workflow-engine/spec.md` --- ## Summary The Workflow Engine backend infrastructure is substantially implemented (service, entities, guards, DSL, Redlock, Prometheus metrics). This plan closes the remaining production-hardening gaps from ADR-001 v1.1 (optimistic lock, user UUID in history, CASL-mapped DSL roles, per-transition metrics, DSL Redis cache, DLQ + n8n webhook) and completes ADR-021 (step-specific attachment data-wiring in all 4 modules, file preview modal, Admin DSL editor UI). Clarification decisions from `spec.md`: - **Q1**: DSL `require.role` → CASL ability check (FR-002a) - **Q2**: Observability = structured log + counter + histogram (FR-022, FR-023) - **Q3**: File rollback on DB failure = move back to temp, 24h TTL (FR-019) - **Q4**: Admin DSL editor UI is in scope (FR-024, FR-025) - **Q5**: All 4 modules need banner gap-filling (FR-011) --- ## Technical Context **Language/Version**: TypeScript 5.4, Node.js 20 LTS **Primary Dependencies**: NestJS 10, TypeORM 0.3, BullMQ 5, `@willsoto/nestjs-prometheus`, `json-logic-js`, `redlock`, `ioredis` **Frontend**: Next.js 14 (App Router), TanStack Query v5, React Hook Form + Zod, shadcn/ui **Storage**: MariaDB 10.11, Redis 7, StorageService (Two-Phase Upload per ADR-016) **Testing**: Jest + `@nestjs/testing` (backend), Vitest (frontend) **Target Platform**: QNAP NAS Docker Compose (backend), Next.js SSR (frontend) **Performance Goals**: Transition P95 < 1s (no upload); upload+transition P95 < 5s; cache invalidation < 1s across all instances **Constraints**: ADR-009 (no TypeORM migrations), ADR-019 (UUID strings, no parseInt), ADR-016 (Two-Phase Upload), ADR-008 (BullMQ async) **Scale/Scope**: 4 document modules × ~50 active workflows concurrently; up to 20 history records per instance --- ## Constitution Check _GATE: Must pass before Phase 0. Re-checked after Phase 1 design._ | Gate | Rule | Status | Notes | |------|------|--------|-------| | ADR-019 UUID | No `parseInt` on UUIDs; expose `publicId` strings only | ✅ PASS | `WorkflowInstance.id` and `WorkflowHistory.id` are UUID PKs (native CHAR(36)); `action_by_user_uuid` addition follows pattern | | ADR-009 Schema | No TypeORM migrations; edit SQL directly | ✅ PASS | Two new delta files planned (delta-09, delta-10) | | ADR-016 Security | Two-Phase upload; ClamAV; whitelist | ✅ PASS | Already implemented in `processTransition()`; file preview uses existing attachment endpoint | | ADR-008 BullMQ | Async notifications; no inline dispatch | ✅ PASS | `WorkflowEventService` dispatches to `workflow-events` queue; DLQ is the gap | | ADR-007 Errors | Layered exception hierarchy | ✅ PASS | `WorkflowException`, `ConflictException`, `ServiceUnavailableException` already in use | | ADR-002 Numbering | Redlock for document numbering | ✅ N/A | Workflow engine does not generate document numbers | | ADR-018/020 AI | No AI direct DB access | ✅ N/A | No AI integration in this feature | | FR-002 Optimistic Lock | `version_no` column on `workflow_instances` | ⚠️ GAP | Column missing — delta-09 required | | FR-003 User UUID | `action_by_user_uuid` on `workflow_histories` | ⚠️ GAP | Column missing — delta-10 required | **Post-gate verdict**: PASS with two schema deltas required before implementation begins. --- ## Project Structure ### Documentation (this feature) ```text specs/003-unified-workflow-engine/ ├── plan.md ← This file ├── research.md ← Phase 0 output ├── data-model.md ← Phase 1 output ├── quickstart.md ← Phase 1 output └── contracts/ ← Phase 1 output ├── workflow-transition.yaml └── workflow-definitions.yaml ``` ### Source Code Layout ```text backend/src/modules/workflow-engine/ ├── entities/ │ ├── workflow-instance.entity.ts ← ADD versionNo column │ └── workflow-history.entity.ts ← ADD actionByUserUuid column ├── guards/ │ └── workflow-transition.guard.ts ← ADD DSL require.role → CASL mapping (FR-002a) ├── dto/ │ └── workflow-history-item.dto.ts ← ADD actorUuid field ├── workflow-engine.service.ts ← ADD version_no check, structured log, metrics, cache invalidation ├── workflow-event.service.ts ← ADD DLQ processor + n8n webhook (FR-005/006) └── workflow-engine.module.ts ← Register new metrics providers specs/03-Data-and-Storage/deltas/ ├── 09-add-version-no-to-workflow-instances.sql ← NEW └── 10-add-action-by-user-uuid-to-workflow-histories.sql ← NEW frontend/components/workflow/ ├── integrated-banner.tsx ← GAP-FILL: step-attachment upload zone ├── workflow-lifecycle.tsx ← GAP-FILL: history items with attachment list └── file-preview-modal.tsx ← NEW component frontend/app/(admin)/admin/workflows/ └── definitions/ ├── page.tsx ← NEW: DSL list + activate/deactivate └── [id]/ └── page.tsx ← NEW: DSL JSON editor + inline validation frontend/app/(admin)/admin/doc-control/ ├── rfa/[uuid]/page.tsx ← GAP-FILL: availableActions, step-attach ├── transmittals/[uuid]/page.tsx ← GAP-FILL: step-attach upload zone ├── circulation/[uuid]/page.tsx ← GAP-FILL: step-attach upload zone └── correspondence/[uuid]/page.tsx ← GAP-FILL + new IntegratedBanner wiring ``` --- ## Implementation Phases ### Phase B1: Schema Deltas (prerequisite) Apply before any code changes. | Delta | File | Change | |-------|------|--------| | 09 | `09-add-version-no-to-workflow-instances.sql` | `ALTER TABLE workflow_instances ADD COLUMN version_no INT NOT NULL DEFAULT 1` | | 10 | `10-add-action-by-user-uuid-to-workflow-histories.sql` | `ALTER TABLE workflow_histories ADD COLUMN action_by_user_uuid VARCHAR(36) NULL` | ### Phase B2: Entity & DTO Updates | Task | File | Change | |------|------|--------| | B2-1 | `workflow-instance.entity.ts` | Add `@Column() versionNo: number` with `@Version()` decorator | | B2-2 | `workflow-history.entity.ts` | Add `@Column() actionByUserUuid?: string` | | B2-3 | `workflow-history-item.dto.ts` | Add `actorUuid: string` field (exposed in API per ADR-019) | ### Phase B3: Optimistic Lock in `processTransition()` (FR-002) In `workflow-engine.service.ts`: 1. Accept `clientVersionNo?: number` parameter in `processTransition()` 2. If provided: compare against `instance.versionNo` BEFORE Redlock acquisition → throw `ConflictException` (HTTP 409) if mismatch 3. After DB transaction commit: increment `instance.versionNo + 1` via `UPDATE workflow_instances SET version_no = version_no + 1 WHERE id = :id AND version_no = :expected` 4. No separate pessimistic lock change needed — keep both as defense-in-depth ### Phase B4: CASL Role Mapping in Guard (FR-002a) In `workflow-transition.guard.ts`: 1. After Level 1 (Superadmin) check, extract DSL `require.role` from the current step config 2. Map each DSL role string to a CASL ability string via `DSL_ROLE_TO_CASL` config map 3. Check `userPermissions.includes(mappedAbility)` for any match → pass 4. Fall through to existing Level 3 (assignedUserId) check for `"AssignedHandler"` role ```typescript const DSL_ROLE_TO_CASL: Record = { 'Superadmin': 'system.manage_all', 'OrgAdmin': 'organization.manage_users', 'ContractMember': 'contract.view', 'AssignedHandler': '__assigned__', // resolved by existing Level 3 check }; ``` ### Phase B5: Structured Observability (FR-022, FR-023) In `workflow-engine.service.ts`: 1. Inject two new metrics via `@InjectMetric()`: - `workflow_transitions_total` (Counter: `workflow_code`, `action`, `outcome`) - `workflow_transition_duration_ms` (Histogram: `workflow_code`) 2. Wrap `processTransition()` in a `startTimer` → `observe(duration)` block 3. Emit structured log on every outcome: ```typescript this.logger.log(JSON.stringify({ instanceId, action, fromState, toState, userUuid, durationMs, outcome, workflowCode })); ``` 4. Register providers in `workflow-engine.module.ts` ### Phase B6: DSL Redis Cache Invalidation (FR-007) In `workflow-engine.service.ts`: 1. In `createDefinition()`: after save, call `cacheManager.set('wf:def:${code}:${version}', entity, 3600)` 2. In `update()`: call `cacheManager.del('wf:def:${code}:${oldVersion}')` before save 3. In `getDefinitionById()` / cached lookup: read-through with `cacheManager.get()` → fallback to DB 4. On `is_active` toggle: invalidate ALL `wf:def:{code}:*` keys (use `redis.keys()` + `redis.del()` pattern) ### Phase B7: BullMQ DLQ + n8n Webhook (FR-005, FR-006) In `workflow-event.service.ts`: 1. Add `workflow-events-failed` queue registration 2. Add `@OnWorkerEvent('failed')` handler in the processor class 3. On `attempts === maxAttempts`: POST to `process.env.N8N_WEBHOOK_URL` with job payload (never hardcoded) 4. Verify existing `workflow-events` worker has `concurrency: 5, attempts: 3, backoff: { type: 'exponential', delay: 500 }` ### Phase B8: File Rollback on Transaction Failure (FR-019) In `workflow-engine.service.ts` `processTransition()`: 1. After file linkage step inside transaction, if `queryRunner.commitTransaction()` throws: - Call `storageService.moveToTemp(attachmentPublicIds)` in the `catch` block - Log the rollback with attachment IDs for audit 2. The 24h TTL on temp files is handled by existing `FileCleanupService` cron ### Phase F1: File Preview Modal (FR-020) New component: `frontend/components/workflow/file-preview-modal.tsx` - Props: `attachment: WorkflowAttachmentSummary | null`, `onClose: () => void` - Renders PDF via `