Problem
Ingesting high-volume event streams into a relational database is error-prone when individual record failures cause entire batch failures. Retry logic without backoff causes thundering herd problems under load.
Solution
Built a Kafka consumer pipeline with configurable batch sizes, transactional batch inserts, dead-letter queue routing for failed records, and exponential backoff retry.
Design
Ingesta is a Python service that consumes from Kafka topics and writes to PostgreSQL in batches.
Batch Strategy
Each consumer thread accumulates records up to a configurable BATCH_SIZE or BATCH_TIMEOUT_MS, whichever comes first. Batches are inserted in a single multi-row INSERT.
Dead-Letter Queue
Records failing schema validation or DB insertion after MAX_RETRIES are routed to a DLQ topic with an error envelope containing the original payload, error type, and timestamp.
Offset Management
Kafka offsets are committed manually after successful batch insertion, ensuring at-least-once delivery.
Deployment
Packaged as a Docker container configured via environment variables. A docker-compose.yml is provided for local development with Kafka and PostgreSQL.