Async Job Processing System
A production-grade, DB-backed asynchronous job processing engine built with Spring Boot 3, PostgreSQL, and ThreadPoolExecutor. Designed for reliability, horizontal scalability, and clean extensibility — without the operational overhead of Kafka or Redis.
Architecture Overview
┌─────────────────┐ ┌──────────────┐ ┌──────────────────┐
│ REST API Layer │────▶│ JobService │────▶│ JobRepository │
│ (Controller) │ │ (Business) │ │ (JPA/Postgres) │
└─────────────────┘ └──────┬───────┘ └──────────────────┘
│ ▲
│ │
┌──────────────────┐ ┌─────┴────────┐ ┌───────┴──────────┐
│ HandlerRegistry │◀────│ JobPoller │────▶│ ThreadPool │
│ (Strategy) │ │ (Scheduled) │ │ Executor │
└──────┬───────────┘ └──────────────┘ └──────────────────┘
│
├── EmailJobHandler
└── ReportJobHandler
Flow:
- Client submits a job via
POST /jobs→ persisted asPENDING - Scheduled poller fetches a batch of
PENDINGjobs every N seconds - Each job is atomically claimed (
PENDING → PROCESSING) using optimistic locking - Claimed jobs are submitted to a
ThreadPoolExecutorfor async execution - Handler executes → job transitions to
SUCCESSor retries untilDEAD
Retry Strategy
Jobs use a bounded retry with exponential back-off potential:
| Attempt | retryCount | Result on Failure |
|---|---|---|
| 1st | 0 → 1 | Status → PENDING (re-queued) |
| 2nd | 1 → 2 | Status → PENDING (re-queued) |
| 3rd | 2 → 3 | Status → DEAD (stopped) |
Why this approach:
- Bounded retries prevent poison-pill jobs from consuming resources indefinitely
- DEAD status creates a clear audit trail — operators can inspect and manually retry via
POST /jobs/{id}/retry - No exponential backoff in this version to keep complexity honest. In production, you'd add a
nextRetryAttimestamp and filterWHERE next_retry_at <= NOW()in the poller query - Max retry count is externalized (
job.max-retry=3) — tunable per environment without code changes
How Optimistic Locking Prevents Double-Processing
The Job entity has a @Version field that JPA auto-increments on every update.
Race condition scenario:
- Instance A and Instance B both poll and fetch
Job X(status=PENDING, version=1) - Instance A calls
markProcessing(jobX)→ UPDATE ... SET status='PROCESSING', version=2 WHERE id=X AND version=1 → succeeds - Instance B calls
markProcessing(jobX)→ UPDATE ... WHERE id=X AND version=1 → 0 rows affected →OptimisticLockException - Instance B catches the exception and silently skips the job
Why optimistic over pessimistic locking (SELECT FOR UPDATE):
- No row-level DB locks held during job execution — much better throughput
- No risk of deadlocks from long-held row locks
- Works cleanly across multiple app instances sharing the same database
- The conflict window is tiny (only the status transition moment), so collisions are rare
SELECT FOR UPDATEwould hold locks for the entire polling transaction, blocking other instances from even reading those rows
Horizontal Scaling
This system is designed to scale horizontally out of the box:
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Instance1│ │ Instance2│ │ Instance3│
│ Poller │ │ Poller │ │ Poller │
│ Workers │ │ Workers │ │ Workers │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└────────────────┼────────────────┘
│
┌───────┴───────┐
│ PostgreSQL │
│ (shared DB) │
└───────────────┘
Deploy N instances pointing at the same database. Each instance runs its own poller and executor. Optimistic locking guarantees that:
- No two instances process the same job
- No distributed coordination (ZooKeeper, etc.) is needed
- Adding instances linearly increases throughput
Tuning per instance:
- Adjust
job.executor.pool-sizebased on instance CPU/memory - Stagger
job.poller.intervalacross instances to reduce DB contention (optional)
Design Tradeoffs: DB Polling vs Message Queue
| Aspect | DB Polling (this project) | Message Queue (Kafka/RabbitMQ) |
|---|---|---|
| Latency | Seconds (poll interval) | Milliseconds (push-based) |
| Throughput | Hundreds/sec | Tens of thousands/sec |
| Infra cost | Zero (reuse existing DB) | Additional broker to operate |
| Delivery guarantees | At-least-once via optimistic lock | Built-in (offsets, acks) |
| Ordering | FIFO via ORDER BY created_at |
Partition-level ordering |
| Operational complexity | Low — just Postgres | High — broker tuning, monitoring |
| Best for | Low-to-medium volume, internal jobs | High-volume event streaming |
This project chose DB polling because:
- For most internal job systems (emails, reports, notifications), sub-second latency isn't needed
- Eliminates an entire infrastructure component and its operational burden
- The DB you already have is your queue — simpler deployment, fewer failure modes
Replacing DB Polling with Kafka (Future Migration Path)
If throughput requirements grow beyond what DB polling can handle:
- Keep the
JobHandlerinterface andHandlerRegistryunchanged — they're decoupled from the delivery mechanism - Replace
JobPollerwith a@KafkaListenerthat consumes from a topic per job type - Replace the
POST /jobsendpoint to publish to Kafka instead of (or in addition to) writing to the DB - Keep the
jobstable as a persistent audit log — write after successful processing - Remove
@Versionoptimistic locking — Kafka consumer groups handle partition assignment and prevent double-processing
The Strategy Pattern in the handler layer means zero changes to business logic — only the delivery mechanism changes.
API Reference
Submit a Job
curl -X POST http://localhost:8080/jobs \
-H "Content-Type: application/json" \
-d '{"type": "EMAIL", "payload": {"to": "user@example.com", "subject": "Welcome"}}'Response: 201 Created
{ "jobId": "550e8400-e29b-41d4-a716-446655440000" }Check Job Status
curl http://localhost:8080/jobs/{jobId}Response: 200 OK
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"type": "EMAIL",
"status": "SUCCESS",
"retryCount": 0,
"createdAt": "2026-03-04T10:15:30Z",
"updatedAt": "2026-03-04T10:15:35Z"
}List Dead Jobs (Paginated)
curl "http://localhost:8080/jobs/dead?page=0&size=20"Retry a Dead Job
curl -X POST http://localhost:8080/jobs/{jobId}/retryHealth Check
curl http://localhost:8080/actuator/health{
"status": "UP",
"components": {
"job": {
"status": "UP",
"details": {
"pendingJobs": 3,
"deadJobs": 0,
"activeThreads": 2,
"poolSize": 5
}
}
}
}Running Locally
Prerequisites
- Java 17+
- Maven 3.8+
- Docker & Docker Compose
With Docker Compose (recommended)
# Build the JAR
./mvnw clean package -DskipTests
# Start PostgreSQL + App
docker-compose up --buildWithout Docker
# Start PostgreSQL separately, then:
./mvnw spring-boot:runRun Tests
./mvnw testTests use H2 in-memory database via the test profile.
Project Structure
com.example.asyncjob
├── controller/ # REST API layer
├── service/ # Business logic, transaction boundaries
├── worker/ # Scheduled poller — the async engine
├── handler/ # Strategy pattern — one handler per job type
├── repository/ # Spring Data JPA interfaces
├── model/ # JPA entities and enums
├── config/ # ThreadPoolExecutor, @ConfigurationProperties
├── exception/ # Global exception handling, API error format
├── health/ # Custom Actuator health indicator
└── dto/ # Request/response records
Tech Stack
| Component | Technology |
|---|---|
| Framework | Spring Boot 3.2 |
| Language | Java 17 |
| Database | PostgreSQL 16 |
| ORM | Hibernate 6 / JPA |
| Concurrency | ThreadPoolExecutor |
| Monitoring | Spring Boot Actuator |
| Logging | SLF4J + Logback |
| Build | Maven |
| Container | Docker + Docker Compose |
| Test DB | H2 (in-memory) |