Real-Time Search Ranking System
Clickstream-Driven Personalization with Distributed Event Pipeline and Caching Layer
PROJECT OVERVIEW
A real-time search ranking system that tracks user click behavior, streams events through Kafka, and re-ranks search results dynamically using a scoring formula that combines click-through rate and recency decay. Results are cached in Redis for low-latency delivery and persisted in PostgreSQL for ranking history.
PROBLEM
Traditional search ranking based purely on click count favors stale popular content indefinitely. Writing ranking updates synchronously on every click creates write amplification at scale. This system decouples click ingestion from ranking computation using an async event pipeline, and applies recency decay to keep results fresh.
CORE GOALS
- Decouple click tracking from ranking computation using Kafka
- Apply a recency-weighted scoring formula to prevent stale content domination
- Cache hot query results in Redis for sub-millisecond delivery
- Persist ranking history in PostgreSQL with atomic upsert operations
- Containerize the full infrastructure stack for consistent deployment
EVENT PIPELINE
- Next.js frontend captures click events and sends them to a REST API
- API route produces click events to a Kafka topic (click-events) asynchronously
- Kafka consumer reads events and updates PostgreSQL rankings without blocking the request cycle
- Redis cache is updated after every ranking change with the latest scored result set
RANKING LOGIC
- Score combines click count and recency: score = click_count * (1 / (days_since_last_click + 1))
- Recent clicks carry more weight than older ones of equal count
- Rankings stored using PostgreSQL upsert with conflict resolution on (result_id, query)
- Score recalculated atomically on every click event
CACHING LAYER
- Redis sits in front of PostgreSQL for all search result reads
- Query string used as cache key storing the full ranked result list as JSON
- Cache updated automatically by the consumer after every ranking change
- Falls back to PostgreSQL on cache miss with automatic result return
TECHNICAL ARCHITECTURE
Next.js 15, TypeScript
Apache Kafka (KafkaJS)
Redis (ioredis)
PostgreSQL 15
Docker Compose
Node.js background worker (tsx)