GitHunt
AV

avibhawnani/asyncJobEngine

Async Job Processing System

A production-grade, DB-backed asynchronous job processing engine built with Spring Boot 3, PostgreSQL, and ThreadPoolExecutor. Designed for reliability, horizontal scalability, and clean extensibility — without the operational overhead of Kafka or Redis.


Architecture Overview

┌─────────────────┐     ┌──────────────┐     ┌──────────────────┐
│  REST API Layer  │────▶│  JobService  │────▶│  JobRepository   │
│  (Controller)    │     │  (Business)  │     │  (JPA/Postgres)  │
└─────────────────┘     └──────┬───────┘     └──────────────────┘
                               │                       ▲
                               │                       │
┌──────────────────┐     ┌─────┴────────┐     ┌───────┴──────────┐
│  HandlerRegistry │◀────│  JobPoller   │────▶│  ThreadPool      │
│  (Strategy)      │     │  (Scheduled) │     │  Executor        │
└──────┬───────────┘     └──────────────┘     └──────────────────┘
       │
       ├── EmailJobHandler
       └── ReportJobHandler

Flow:

  1. Client submits a job via POST /jobs → persisted as PENDING
  2. Scheduled poller fetches a batch of PENDING jobs every N seconds
  3. Each job is atomically claimed (PENDING → PROCESSING) using optimistic locking
  4. Claimed jobs are submitted to a ThreadPoolExecutor for async execution
  5. Handler executes → job transitions to SUCCESS or retries until DEAD

Retry Strategy

Jobs use a bounded retry with exponential back-off potential:

Attempt retryCount Result on Failure
1st 0 → 1 Status → PENDING (re-queued)
2nd 1 → 2 Status → PENDING (re-queued)
3rd 2 → 3 Status → DEAD (stopped)

Why this approach:

  • Bounded retries prevent poison-pill jobs from consuming resources indefinitely
  • DEAD status creates a clear audit trail — operators can inspect and manually retry via POST /jobs/{id}/retry
  • No exponential backoff in this version to keep complexity honest. In production, you'd add a nextRetryAt timestamp and filter WHERE next_retry_at <= NOW() in the poller query
  • Max retry count is externalized (job.max-retry=3) — tunable per environment without code changes

How Optimistic Locking Prevents Double-Processing

The Job entity has a @Version field that JPA auto-increments on every update.

Race condition scenario:

  1. Instance A and Instance B both poll and fetch Job X (status=PENDING, version=1)
  2. Instance A calls markProcessing(jobX) → UPDATE ... SET status='PROCESSING', version=2 WHERE id=X AND version=1 → succeeds
  3. Instance B calls markProcessing(jobX) → UPDATE ... WHERE id=X AND version=1 → 0 rows affectedOptimisticLockException
  4. Instance B catches the exception and silently skips the job

Why optimistic over pessimistic locking (SELECT FOR UPDATE):

  • No row-level DB locks held during job execution — much better throughput
  • No risk of deadlocks from long-held row locks
  • Works cleanly across multiple app instances sharing the same database
  • The conflict window is tiny (only the status transition moment), so collisions are rare
  • SELECT FOR UPDATE would hold locks for the entire polling transaction, blocking other instances from even reading those rows

Horizontal Scaling

This system is designed to scale horizontally out of the box:

┌──────────┐     ┌──────────┐     ┌──────────┐
│ Instance1│     │ Instance2│     │ Instance3│
│  Poller  │     │  Poller  │     │  Poller  │
│  Workers │     │  Workers │     │  Workers │
└────┬─────┘     └────┬─────┘     └────┬─────┘
     │                │                │
     └────────────────┼────────────────┘
                      │
              ┌───────┴───────┐
              │  PostgreSQL   │
              │  (shared DB)  │
              └───────────────┘

Deploy N instances pointing at the same database. Each instance runs its own poller and executor. Optimistic locking guarantees that:

  • No two instances process the same job
  • No distributed coordination (ZooKeeper, etc.) is needed
  • Adding instances linearly increases throughput

Tuning per instance:

  • Adjust job.executor.pool-size based on instance CPU/memory
  • Stagger job.poller.interval across instances to reduce DB contention (optional)

Design Tradeoffs: DB Polling vs Message Queue

Aspect DB Polling (this project) Message Queue (Kafka/RabbitMQ)
Latency Seconds (poll interval) Milliseconds (push-based)
Throughput Hundreds/sec Tens of thousands/sec
Infra cost Zero (reuse existing DB) Additional broker to operate
Delivery guarantees At-least-once via optimistic lock Built-in (offsets, acks)
Ordering FIFO via ORDER BY created_at Partition-level ordering
Operational complexity Low — just Postgres High — broker tuning, monitoring
Best for Low-to-medium volume, internal jobs High-volume event streaming

This project chose DB polling because:

  • For most internal job systems (emails, reports, notifications), sub-second latency isn't needed
  • Eliminates an entire infrastructure component and its operational burden
  • The DB you already have is your queue — simpler deployment, fewer failure modes

Replacing DB Polling with Kafka (Future Migration Path)

If throughput requirements grow beyond what DB polling can handle:

  1. Keep the JobHandler interface and HandlerRegistry unchanged — they're decoupled from the delivery mechanism
  2. Replace JobPoller with a @KafkaListener that consumes from a topic per job type
  3. Replace the POST /jobs endpoint to publish to Kafka instead of (or in addition to) writing to the DB
  4. Keep the jobs table as a persistent audit log — write after successful processing
  5. Remove @Version optimistic locking — Kafka consumer groups handle partition assignment and prevent double-processing

The Strategy Pattern in the handler layer means zero changes to business logic — only the delivery mechanism changes.


API Reference

Submit a Job

curl -X POST http://localhost:8080/jobs \
  -H "Content-Type: application/json" \
  -d '{"type": "EMAIL", "payload": {"to": "user@example.com", "subject": "Welcome"}}'

Response: 201 Created

{ "jobId": "550e8400-e29b-41d4-a716-446655440000" }

Check Job Status

curl http://localhost:8080/jobs/{jobId}

Response: 200 OK

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "type": "EMAIL",
  "status": "SUCCESS",
  "retryCount": 0,
  "createdAt": "2026-03-04T10:15:30Z",
  "updatedAt": "2026-03-04T10:15:35Z"
}

List Dead Jobs (Paginated)

curl "http://localhost:8080/jobs/dead?page=0&size=20"

Retry a Dead Job

curl -X POST http://localhost:8080/jobs/{jobId}/retry

Health Check

curl http://localhost:8080/actuator/health
{
  "status": "UP",
  "components": {
    "job": {
      "status": "UP",
      "details": {
        "pendingJobs": 3,
        "deadJobs": 0,
        "activeThreads": 2,
        "poolSize": 5
      }
    }
  }
}

Running Locally

Prerequisites

  • Java 17+
  • Maven 3.8+
  • Docker & Docker Compose
# Build the JAR
./mvnw clean package -DskipTests

# Start PostgreSQL + App
docker-compose up --build

Without Docker

# Start PostgreSQL separately, then:
./mvnw spring-boot:run

Run Tests

./mvnw test

Tests use H2 in-memory database via the test profile.


Project Structure

com.example.asyncjob
├── controller/       # REST API layer
├── service/          # Business logic, transaction boundaries
├── worker/           # Scheduled poller — the async engine
├── handler/          # Strategy pattern — one handler per job type
├── repository/       # Spring Data JPA interfaces
├── model/            # JPA entities and enums
├── config/           # ThreadPoolExecutor, @ConfigurationProperties
├── exception/        # Global exception handling, API error format
├── health/           # Custom Actuator health indicator
└── dto/              # Request/response records

Tech Stack

Component Technology
Framework Spring Boot 3.2
Language Java 17
Database PostgreSQL 16
ORM Hibernate 6 / JPA
Concurrency ThreadPoolExecutor
Monitoring Spring Boot Actuator
Logging SLF4J + Logback
Build Maven
Container Docker + Docker Compose
Test DB H2 (in-memory)
avibhawnani/asyncJobEngine | GitHunt