Webhooks seem simple — receive an event, POST it somewhere, done. Then you hit reality: the endpoint is down, the network flakes, the consumer processes the same event twice, and suddenly “just send an HTTP request” becomes a distributed systems problem.
Here are four patterns I rely on every time I build a webhook delivery system.
1. Idempotency Keys for Safe Retries
Every webhook delivery gets a unique idempotency key — usually a UUID generated at dispatch time and sent in a header like X-Webhook-Id. Consumers store the keys they’ve processed and skip duplicates.
// Dispatcher side
delivery.idempotency_key = uuid()
headers["X-Webhook-Id"] = delivery.idempotency_key
// Consumer side
if already_processed(request.header("X-Webhook-Id")):
return 200 // Acknowledge but skip
process(request.body)
mark_processed(request.header("X-Webhook-Id"))
This is non-negotiable. Without it, retries become a source of bugs instead of a reliability mechanism.
2. Exponential Backoff with Jitter
When a delivery fails, retry — but not immediately. Hammering a struggling endpoint makes things worse. Exponential backoff spaces retries out: 30s, 2m, 8m, 30m, 2h. Adding random jitter prevents thundering herd problems when multiple deliveries fail at the same time.
delay = base_delay * (2 ^ attempt_number)
jitter = random(0, delay * 0.3)
next_retry_at = now() + delay + jitter
Cap the maximum delay and the total number of attempts. Infinite retries sound resilient but they just fill your queue with zombie jobs.
3. Payload Signing and Verification
Sign every outbound payload with HMAC-SHA256 using a per-consumer secret. Send the signature in a header so consumers can verify the request actually came from you.
signature = hmac_sha256(consumer.secret, request.body)
headers["X-Webhook-Signature"] = "sha256=" + signature
// Consumer verification
expected = hmac_sha256(my_secret, request.raw_body)
if not constant_time_equal(expected, request.header("X-Webhook-Signature")):
return 401
Use constant-time comparison to prevent timing attacks. Rotate secrets periodically and support a grace period where both old and new secrets are valid.
4. Dead Letter Queues for Failed Deliveries
After exhausting retries, don’t silently drop the delivery. Move it to a dead-letter queue — a separate store of deliveries that permanently failed. Alert on it. Build a UI to inspect and replay them.
Dead letters turn invisible data loss into an actionable operational event. Most of the time, the fix is simple: the consumer updated their URL, rotated credentials, or had extended downtime. A replay button gets everything back in sync without re-triggering the original events.
The Bigger Picture
None of these patterns are novel. But in every webhook system I’ve seen fail, it was because one or more of these was skipped in the name of simplicity. The irony is that adding them upfront is simpler than debugging the production incidents that happen without them.