How to Handle API Rate Limits in Python Data Automation

Server rack with blinking status lights inside a data center

Data pipelines fail quietly. A script that pulled 50,000 records yesterday returns 200 records today because the API throttled every request after the first few hundred. The pipeline logs show success -- the HTTP response was 200, just with empty pages -- and no one notices until someone checks the data.

API rate limits are one of the most reliable sources of brittle automation. They are not edge cases: every production API enforces them, the limits change without notice, and hitting them at scale requires strategy, not just try-except blocks.

This guide covers the patterns that make API-dependent Python automation genuinely resilient: reading rate limit signals correctly, exponential backoff, production-grade retry logic with the tenacity library, and token bucket rate limiting for per-endpoint control.

What Rate Limits Are and Why They Fail Pipelines

A rate limit is a constraint on how frequently a client can make API requests, typically expressed as requests per second (RPS), requests per minute (RPM), or requests per day. When you exceed the limit, the API returns HTTP 429 Too Many Requests, often with headers telling you how long to wait.

The failure mode in automation is not usually a single 429. It is the response to a 429. Naive pipelines do one of three things: they crash, they retry immediately (making the situation worse by hammering a throttled endpoint), or they silently skip the failed request and continue -- which is the worst outcome because the data loss is invisible.

Correct rate limit handling requires:

  1. Detecting 429 responses and the headers that accompany them
  2. Waiting the right amount of time before retrying
  3. Distinguishing between transient rate limits (wait and retry) and permanent errors (stop and alert)
  4. Maintaining throughput as close to the API limit as possible without exceeding it

server rack data center blinking lights
Photo by panumas nikhomkhai on Pexels

Reading Rate Limit Headers

Most APIs return rate limit state in response headers. The requests library exposes these through response.headers.get(). The most common header patterns are:

import requests

response = requests.get("https://api.example.com/data", headers={"Authorization": "Bearer TOKEN"})

# Common rate limit headers
retry_after = response.headers.get("Retry-After")          # seconds to wait
x_ratelimit_remaining = response.headers.get("X-RateLimit-Remaining")
x_ratelimit_reset = response.headers.get("X-RateLimit-Reset")  # Unix timestamp
x_ratelimit_limit = response.headers.get("X-RateLimit-Limit")  # total limit

The Retry-After header is the most directly useful. When present on a 429 response, it tells you exactly how long to wait. Some APIs use an integer (seconds), some use an HTTP date string. Parse both:

import time
from datetime import datetime, timezone
from email.utils import parsedate_to_datetime

def get_retry_after(response):
    retry_after = response.headers.get("Retry-After")
    if not retry_after:
        return None
    try:
        return float(retry_after)
    except ValueError:
        # HTTP date format: "Wed, 21 Oct 2015 07:28:00 GMT"
        reset_dt = parsedate_to_datetime(retry_after)
        now = datetime.now(timezone.utc)
        return max(0.0, (reset_dt - now).total_seconds())

Not all APIs provide Retry-After. When it is absent, you need to derive a wait time from other headers or fall back to exponential backoff.

The Problem with Fixed Retry Delays

A fixed retry delay -- sleep for 5 seconds, then retry -- is better than no delay, but it fails in two ways.

First, it is often the wrong duration. A 5-second sleep when the API resets every 60 seconds means you will hit the limit again immediately after retrying. A 60-second sleep when you only needed to wait 3 seconds wastes throughput.

Second, fixed delays do not degrade gracefully under sustained load. If your pipeline generates 100 requests per minute against an API that allows 10 requests per minute, a fixed retry delay just creates a queue of retries, each of which will also be throttled. The pipeline becomes a burst-and-stall pattern that is both slow and hard on the API server.

Exponential backoff solves both problems by increasing wait time with each successive failure, reducing retry frequency under sustained throttling while recovering quickly when the throttle is temporary.

Implementing Exponential Backoff

A minimal exponential backoff implementation:

import time
import random
import requests

def fetch_with_backoff(url, headers, max_retries=5, base_delay=1.0, max_delay=60.0):
    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)

        if response.status_code == 200:
            return response

        if response.status_code == 429:
            retry_after = get_retry_after(response)
            if retry_after:
                wait = retry_after
            else:
                # Exponential backoff with jitter
                wait = min(base_delay * (2 ** attempt) + random.uniform(0, 1), max_delay)

            print(f"Rate limited. Waiting {wait:.1f}s before retry {attempt + 1}/{max_retries}")
            time.sleep(wait)
            continue

        if response.status_code >= 500:
            # Server error: retry with backoff
            wait = min(base_delay * (2 ** attempt), max_delay)
            time.sleep(wait)
            continue

        # 4xx other than 429: do not retry
        response.raise_for_status()

    raise RuntimeError(f"Max retries exceeded for {url}")

The jitter term (random.uniform(0, 1)) is important. Without it, multiple parallel workers that all hit the rate limit at the same time will all retry at the same moment, creating a thundering herd that immediately triggers the limit again.

fiber optic cable glowing blue close up
Photo by Nic Wood on Pexels

Production-Grade Retry Logic with Tenacity

For production pipelines, the tenacity library provides a cleaner, more configurable retry decorator than hand-rolled backoff:

from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential_jitter,
    retry_if_exception_type,
    before_sleep_log,
)
import logging
import requests

logger = logging.getLogger(__name__)

class RateLimitError(Exception):
    pass

class ServerError(Exception):
    pass

def check_response(response):
    if response.status_code == 429:
        raise RateLimitError(f"Rate limited: {response.headers.get('Retry-After', 'unknown wait')}")
    if response.status_code >= 500:
        raise ServerError(f"Server error {response.status_code}")
    response.raise_for_status()
    return response

@retry(
    retry=retry_if_exception_type((RateLimitError, ServerError)),
    wait=wait_exponential_jitter(initial=1, max=60),
    stop=stop_after_attempt(6),
    before_sleep=before_sleep_log(logger, logging.WARNING),
)
def fetch_api_data(url, headers):
    response = requests.get(url, headers=headers, timeout=30)
    return check_response(response)

Tenacity handles the backoff math, logs each retry attempt, and raises a RetryError if all attempts are exhausted. The before_sleep_log parameter writes a warning to your log system before each sleep, which makes it trivial to monitor retry patterns in production.

For APIs that return Retry-After, you can implement a custom wait strategy:

from tenacity import wait_base

class WaitRetryAfter(wait_base):
    def __call__(self, retry_state):
        exc = retry_state.outcome.exception()
        if hasattr(exc, "retry_after") and exc.retry_after:
            return float(exc.retry_after)
        # Fall back to exponential
        return min(2 ** retry_state.attempt_number, 60)

Token Bucket for Proactive Rate Control

Exponential backoff is reactive: it responds after hitting the limit. A token bucket implementation is proactive: it throttles your own request rate to stay below the limit, reducing the number of 429 responses you encounter in the first place.

import threading
import time

class TokenBucket:
    def __init__(self, rate, capacity):
        self.rate = rate          # tokens per second
        self.capacity = capacity  # maximum tokens
        self.tokens = capacity
        self.last_refill = time.monotonic()
        self._lock = threading.Lock()

    def acquire(self, tokens=1):
        with self._lock:
            now = time.monotonic()
            elapsed = now - self.last_refill
            self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
            self.last_refill = now

            if self.tokens >= tokens:
                self.tokens -= tokens
                return 0.0  # no wait needed
            else:
                wait = (tokens - self.tokens) / self.rate
                self.tokens = 0
                return wait

def throttled_fetch(bucket, url, headers):
    wait = bucket.acquire()
    if wait > 0:
        time.sleep(wait)
    return requests.get(url, headers=headers, timeout=30)

For an API that allows 10 requests per second, you would initialize:
bucket = TokenBucket(rate=10, capacity=10)

A per-endpoint bucket lets you independently control rate for different API sections that have different limits, which is common in APIs like Salesforce, HubSpot, or Stripe where bulk endpoints have different limits than record-level endpoints.

ethernet cables network patch panel rack
Photo by Brett Sayles on Pexels

Combining the Patterns

In a production data pipeline, the patterns work together:

import requests
from tenacity import retry, stop_after_attempt, wait_exponential_jitter, retry_if_exception_type

bucket = TokenBucket(rate=5, capacity=10)  # Stay well under limit

@retry(
    retry=retry_if_exception_type((RateLimitError, ServerError)),
    wait=wait_exponential_jitter(initial=2, max=120),
    stop=stop_after_attempt(5),
)
def fetch_page(session, url):
    wait = bucket.acquire()
    if wait > 0:
        time.sleep(wait)
    response = session.get(url, timeout=30)
    return check_response(response)

def paginate_api(base_url, headers, params):
    results = []
    session = requests.Session()
    session.headers.update(headers)
    page = 1

    while True:
        response = fetch_page(session, f"{base_url}?page={page}")
        data = response.json()
        if not data.get("items"):
            break
        results.extend(data["items"])
        page += 1

    return results

The token bucket prevents most 429 responses proactively. The tenacity decorator handles the ones that slip through. The session reuses the TCP connection across requests, reducing overhead.

What Changes in Production

Dennis Traina, founder of 137Foundry, notes: "Most rate limit issues we see in client pipelines are not about backoff logic -- it is that no one has documented which endpoints have which limits, so when a limit changes silently, no one knows what broke or why. The fix starts with treating rate limit headers as first-class instrumentation, not an edge case."

In practice, that means:

  • Logging the current X-RateLimit-Remaining on every response, not just on 429s
  • Setting alerts when remaining drops below 20% of the limit
  • Storing rate limit header values alongside your data for audit purposes
  • Testing your retry logic against a mock API that returns 429 responses deliberately

One underused practice is version-pinning your rate limit assumptions. Document the limits observed for each endpoint -- in a config file or code comment -- along with the date. When a pipeline starts failing unexpectedly, the first diagnostic question is whether the API silently changed its limits. Having a documented baseline makes that diagnosis fast rather than speculative, and it creates a natural record of how your API usage has evolved over time.

Rate limit handling is not a feature you add once and forget. API limits change, new endpoints get different limits, and your data volume grows in ways that change your request patterns. Treating rate limit resilience as an operational concern rather than a coding exercise is what separates brittle pipelines from ones that run unattended for months. Schedule time to review your rate limit configuration alongside regular infrastructure reviews as data volume and API dependencies evolve.

To learn more about how 137Foundry builds rate-limit-resilient data automation into production pipelines, visit our data automation services page.

For more technical guides on data automation and pipeline reliability, browse the 137Foundry articles archive.

Need help with your next project?

137Foundry builds custom software, AI integrations, and automation systems for businesses that need real solutions.

Book a Free Consultation View Services