Back to blog
Screenshot APIRate LimitingAPI Design

Screenshot API Rate Limits and Quotas: What Actually Breaks at Scale

A practical breakdown of screenshot API rate limits and quotas — how they're enforced, common failure modes, and patterns to stay under them.

2026-06-056 min read

Every screenshot API looks fast and friendly in the docs. Then you ship to production, traffic spikes, and suddenly half your OG image requests are returning 429s during a launch. Rate limits and quotas are the boring infrastructure detail that quietly decides whether your feature works.

This post walks through how screenshot API rate limits and quotas actually behave, what trips developers up, and the patterns that keep your throughput predictable.

What rate limits and quotas mean for screenshot APIs

Most APIs throw the two terms around interchangeably, but they're not the same thing:

  • Rate limit: how many requests you can send per unit of time (e.g. 10 req/sec, 60 req/min).
  • Quota: a hard ceiling on total usage over a billing window (e.g. 10,000 screenshots/month).
  • Concurrency limit: how many simultaneous in-flight requests are allowed. This one matters more for screenshots than for typical REST APIs because each render holds a browser worker for 1–10 seconds.

A screenshot API like PxShot enforces all three, because rendering Chromium is expensive. A regular JSON API can serve 1000 req/sec on a single core; a headless browser cannot.

Why concurrency is the limit that bites first

If a provider advertises "100 requests per minute" but only 5 concurrent renders, and each screenshot takes 4 seconds, your real ceiling is roughly 5 / 4 = 1.25 req/sec, or 75 per minute. You'll hit the concurrency wall long before the per-minute counter resets.

Always check the docs for:

  1. Max concurrent requests on your plan
  2. Average render time for your typical pages
  3. Whether queued requests count against the rate limit or get rejected outright

How limits are typically enforced

There are three common enforcement patterns. Knowing which one your provider uses changes how you handle retries.

Token bucket

You get N tokens and they refill at a fixed rate. Bursts up to bucket size are allowed; sustained traffic is capped. Most modern APIs use this. Friendly because short spikes don't fail.

Fixed window

X requests per calendar minute or hour, reset at the boundary. Hostile because two bursts on either side of a window boundary can double your effective rate, then a real burst gets rejected.

Sliding window

Counts requests in a rolling time period. More accurate, fewer edge cases. Usually what you want as a consumer.

Reading the response headers

Any decent screenshot API returns headers you should actually parse instead of ignoring:

X-RateLimit-Limit: 600
X-RateLimit-Remaining: 412
X-RateLimit-Reset: 1730390400
Retry-After: 12

Concrete things to do with them:

  • Log X-RateLimit-Remaining on every response and alert when it drops below 10% of your limit.
  • On a 429, respect Retry-After exactly — don't retry immediately, don't use your own backoff if the server gave you a number.
  • If you batch jobs, pause the queue when remaining hits zero rather than firing requests just to collect 429s.

Common ways teams blow through quotas

1. Generating OG images on every page load

If your site renders an OG image URL that hits the screenshot API on every social crawler visit, you'll burn quota fast. Cache the result in S3 or a CDN keyed on the page's content hash, and only regenerate when the hash changes.

2. No deduplication on visual monitoring

Monitoring 200 URLs every 5 minutes is 57,600 requests/day. If half of those pages haven't changed, you're paying for noise. Hash the DOM or use an ETag check before triggering a screenshot.

3. Retry storms

A flaky upstream page returns 500, your worker retries 5 times with no backoff, and now you've quintupled your usage for nothing. Always cap retries at 2–3 and use exponential backoff.

4. Synchronous requests in user-facing flows

If a user clicks "Export to PDF" and you call the screenshot API synchronously, every refresh-happy user multiplies your usage. Queue the job, return a job ID, and poll or webhook the result.

Designing your client for predictable usage

Here's a pattern that holds up under load:

  1. Central client wrapper: route all screenshot calls through one module. This is where you enforce concurrency locally with a semaphore.
  2. Local concurrency cap: set it slightly below the provider's limit. If PxShot allows 10 concurrent, cap your client at 8 to leave headroom for retries.
  3. Queue with priorities: user-triggered exports get priority over background OG image generation.
  4. Idempotency keys: if your provider supports them, use them so retries don't double-charge.
  5. Usage telemetry: emit a metric per request with status, latency, and remaining quota. Dashboards beat surprises.

Example: simple semaphore-gated client in Node

import pLimit from 'p-limit';
const limit = pLimit(8);

async function screenshot(url) {
  return limit(async () => {
    const res = await fetch(`https://api.pxshot.dev/v1/screenshot?url=${encodeURIComponent(url)}`, {
      headers: { Authorization: `Bearer ${process.env.PXSHOT_KEY}` }
    });
    if (res.status === 429) {
      const retryAfter = Number(res.headers.get('retry-after') ?? 5);
      await new Promise(r => setTimeout(r, retryAfter * 1000));
      return screenshot(url);
    }
    return res.arrayBuffer();
  });
}

This handles concurrency, respects Retry-After, and is roughly 20 lines. You don't need a framework for this.

Picking a plan that matches your traffic shape

Quotas are advertised as monthly totals, but traffic is rarely flat. Before committing to a tier, sketch out:

  • Peak hour: the busiest 60 minutes of your typical week. Multiply by 24 × 30 — if that exceeds the monthly quota, you'll throttle during peaks.
  • Burst tolerance: launches, viral posts, scheduled cron jobs. A plan with high quota but low concurrency won't survive a Product Hunt launch.
  • Overage behavior: does the API hard-stop, soft-throttle, or charge per overage unit? Each has different blast radius.

PxShot publishes concurrency and per-second limits alongside monthly quotas so you can model this before you sign up, rather than discovering it during an incident.

When to negotiate or self-host

If you're consistently hitting 70%+ of a high-tier quota, talk to the provider. Custom limits are usually cheaper than the next tier up, and most screenshot APIs would rather keep you than lose you to a self-hosted Playwright cluster. Self-hosting only makes sense once you're spending more on the API than two engineer-weeks per year on maintenance — browser automation rot is real.

Want to test how rate limits feel in practice before committing? PxShot has a free tier at pxshot.dev with public limits documented up front — spin up an API key and run your actual workload against it.