Files
lcbp3/specs/05-decisions/ADR-006-redis-caching-strategy.md
admin 047e1b88ce
Some checks failed
Spec Validation / validate-markdown (push) Has been cancelled
Spec Validation / validate-diagrams (push) Has been cancelled
Spec Validation / check-todos (push) Has been cancelled
Main: revise specs to 1.5.0 (completed)
2025-12-01 01:28:32 +07:00

9.3 KiB

ADR-006: Redis Usage and Caching Strategy

Status: Accepted Date: 2025-11-30 Decision Makers: Development Team, System Architect Related Documents:


Context and Problem Statement

LCBP3-DMS ต้องการ High Performance ในการ:

  • Check Permissions (ทุก Request)
  • Document Numbering (Concurrent Safe)
  • Master Data Access (ถูกเรียกบ่อยมาก)
  • Session Management
  • Background Job Queue

Challenges:

  • Database queries ช้า (แม้มี indexing)
  • Concurrent access ต้องมี Locking mechanism
  • Permission checking ต้องเร็ว (< 10ms)
  • Master data แทบไม่เปลี่ยน แต่ถูก query บ่อย

Decision Drivers

  • Performance: Response time < 200ms (p90)
  • Scalability: รองรับ 100+ concurrent users
  • Consistency: Data consistency with database
  • Reliability: Cache must not cause data loss
  • Cost-Effectiveness: ใช้ Resource น้อยที่สุด

Considered Options

Option 1: No Caching (Database Only)

Pros:

  • Simple, no cache invalidation
  • Always consistent

Cons:

  • Slow permission checks (JOIN tables)
  • High DB load
  • No distributed locking

Option 2: Application-Level In-Memory Cache

Pros:

  • Very fast (local memory)
  • No external dependency

Cons:

  • Not shared across instances
  • No distributed locking
  • Cache invalidation issues

Option 3: Redis as Distributed Cache + Lock (Selected)

Pros:

  • Fast: In-memory, < 1ms access
  • Distributed: Shared across instances
  • Locking: Redis locks for concurrency
  • Pub/Sub: Cache invalidation broadcasting
  • Queue: BullMQ for background jobs

Cons:

  • External dependency
  • Requires Redis cluster for HA

Decision Outcome

Chosen Option: Redis as Distributed Cache + Lock Provider


Redis Usage Patterns

1. Distributed Locking (Redlock)

Use Cases:

  • Document Number Generation
  • Critical Sections

Implementation:

const lock = await redlock.acquire([lockKey], 3000); // 3sec TTL
try {
  // Critical section
} finally {
  await lock.release();
}

Configuration:

  • TTL: 2-5 seconds
  • Retry: Exponential backoff, max 3 retries

2. Permission Caching

Cache Structure:

// Key: user:{user_id}:permissions
// Value: JSON array of CASL rules
// TTL: 30 minutes
await redis.set(
  `user:${userId}:permissions`,
  JSON.stringify(abilityRules),
  'EX',
  1800
);

Invalidation Strategy:

  • Role changed → Invalidate all users with that role
  • User assignment changed → Invalidate that user
  • Permission modified → Invalidate all affected roles

3. Master Data Caching

Cached Data:

  • Organizations (TTL: 1 hour)
  • Projects (TTL: 1 hour)
  • Correspondence Types (TTL: 24 hours)
  • RFA Status Codes (TTL: 24 hours)
  • Roles & Permissions (TTL: 30 minutes)

Cache Pattern:

async getOrganizations(): Promise<Organization[]> {
  const cacheKey = 'master:organizations';
  let cached = await redis.get(cacheKey);

  if (!cached) {
    const organizations = await this.orgRepo.find({ where: { is_active: true } });
    await redis.set(cacheKey, JSON.stringify(organizations), 'EX', 3600);
    return organizations;
  }

  return JSON.parse(cached);
}

Invalidation:

  • On CREATE/UPDATE/DELETE → Invalidate immediately
  • Publish event to Redis Pub/Sub for multi-instance sync

4. Session Management

Structure:

// Key: session:{session_id}
// Value: User session data
// TTL: 8 hours
interface SessionData {
  user_id: number;
  username: string;
  organization_id: number;
  last_activity: Date;
}

Refresh Strategy:

  • Update last_activity on every request
  • Extend TTL if activity within last 1 hour

5. Rate Limiting

Implementation:

const key = `rate_limit:${userId}:${endpoint}`;
const current = await redis.incr(key);
if (current === 1) {
  await redis.expire(key, 3600); // 1 hour window
}
if (current > limit) {
  throw new TooManyRequestsException();
}

Limits:

  • File Upload: 50 req/hour per user
  • Search: 500 req/hour per user
  • Anonymous: 100 req/hour per IP

6. Background Job Queue (BullMQ)

Queues:

  1. Email Queue: Send email notifications
  2. Notification Queue: LINE Notify
  3. Indexing Queue: Elasticsearch indexing
  4. Cleanup Queue: Delete temp files
  5. Report Queue: Generate PDF reports

Configuration:

const emailQueue = new Queue('email', {
  connection: redisConnection,
  defaultJobOptions: {
    attempts: 3,
    backoff: {
      type: 'exponential',
      delay: 2000,
    },
    removeOnComplete: 100, // Keep last 100
    removeOnFail: 500,
  },
});

Cache Invalidation Strategy

1. Time-Based Expiration (TTL)

Data Type TTL Rationale
Permissions 30 minutes Balance freshness/performance
Master Data 1 hour Rarely changes
Session 8 hours Match JWT expiration
Search Results 15 minutes Data changes frequently

2. Event-Based Invalidation

Pattern:

@Injectable()
export class CacheInvalidationService {
  async invalidateUserPermissions(userId: number): Promise<void> {
    await this.redis.del(`user:${userId}:permissions`);

    // Broadcast to other instances
    await this.redis.publish(
      'cache:invalidate',
      JSON.stringify({
        pattern: 'user:permissions',
        userId,
      })
    );
  }

  async invalidateMasterData(entity: string): Promise<void> {
    await this.redis.del(`master:${entity}`);
    await this.redis.publish(
      'cache:invalidate',
      JSON.stringify({
        pattern: 'master',
        entity,
      })
    );
  }
}

3. Write-Through Cache

For Master Data:

async updateOrganization(id: number, dto: UpdateOrgDto): Promise<Organization> {
  const org = await this.orgRepo.save({ id, ...dto });

  // Invalidate cache immediately
  await this.cache.invalidateMasterData('organizations');

  return org;
}

Redis Configuration

Production Setup

# docker-compose.yml
redis:
  image: redis:7-alpine
  command: >
    redis-server
    --appendonly yes
    --appendfsync everysec
    --maxmemory 2gb
    --maxmemory-policy allkeys-lru
  volumes:
    - redis-data:/data
  ports:
    - '6379:6379'
  healthcheck:
    test: ['CMD', 'redis-cli', 'ping']
    interval: 10s
    timeout: 3s
    retries: 3

Key Settings:

  • appendonly yes: AOF persistence
  • appendfsync everysec: Write every second (balance performance/durability)
  • maxmemory 2gb: Limit memory usage
  • maxmemory-policy allkeys-lru: Evict least recently used keys

High Availability Considerations

Future Improvements:

  1. Redis Sentinel: Auto-failover
  2. Redis Cluster: Horizontal scaling
  3. Read Replicas: Offload read queries

Current: Single Redis instance (sufficient for MVP)


Monitoring

Key Metrics

@Injectable()
export class RedisMonitoringService {
  @Cron('*/5 * * * *') // Every 5 minutes
  async captureMetrics(): Promise<void> {
    const info = await this.redis.info();

    // Parse and log metrics
    metrics.record({
      'redis.memory.used': parseMemoryUsed(info),
      'redis.memory.peak': parseMemoryPeak(info),
      'redis.keyspace.hits': parseHits(info),
      'redis.keyspace.misses': parseMisses(info),
      'redis.connections.active': parseConnections(info),
    });
  }
}

Alert Thresholds:

  • Memory usage > 80%
  • Hit rate < 70%
  • Connections > 90% of max

Consequences

Positive

  1. Fast Permission Check: < 5ms (vs 50ms from DB)
  2. Reduced DB Load: 70% reduction in queries
  3. Distributed Locking: No race conditions
  4. Queue Management: Background jobs reliable
  5. Scalability: รองรับ Multi-instance deployment

Negative

  1. Dependency: Redis ต้อง Available เสมอ
  2. Memory Limit: ต้อง Monitor และ Evict
  3. Complexity: Cache invalidation ซับซ้อน
  4. Data Loss Risk: ถ้า Redis crash (with AOF mitigates this)

Mit Strategies

  • Dependency: Health checks + Fallback to DB
  • Memory: Set max memory + LRU eviction policy
  • Complexity: Centralize invalidation logic
  • Data Loss: Enable AOF persistence

Compliance

เป็นไปตาม:



References