Every HTTP request a browser sends to your server costs time and bandwidth. HTTP caching exists to eliminate the redundant ones. When it works correctly, repeat visitors load your pages faster, your server handles less traffic, and your CDN delivers more content from edge nodes closer to users.
When caching is misconfigured, the opposite happens. Users receive stale content after a deployment. Dynamic data gets cached for hours. Static assets bypass the cache entirely because the headers are missing or wrong. These are not edge cases. They happen constantly across production web applications, and they are almost entirely preventable.
This guide covers how HTTP caching works, which headers matter, how to set strategies by asset type, and the mistakes that undo most caching efforts.

Photo by Lukas Blazek on Pexels
How the HTTP Caching Model Works
HTTP caching operates at multiple layers: the browser cache, any intermediate proxies, and CDN edge nodes. All three layers consult the same HTTP headers to decide whether to serve a stored response or forward the request upstream.
The browser maintains its own local cache on disk. When a user requests a resource, the browser checks whether it has a cached copy and whether that copy is still fresh. If it is fresh, the request never leaves the machine. If it is stale, the browser either re-fetches the resource or sends a conditional request to check whether the cached copy is still valid.
CDNs add a shared cache layer between users and your origin server. CDN nodes across geographic regions store copies of your cacheable responses. When a user in Berlin requests an asset, the CDN in Frankfurt may serve it directly, with no request reaching your US-based origin. This reduces latency and origin load simultaneously.
The key to managing all of this is HTTP headers. They are how you communicate caching instructions to browsers, proxies, and CDNs.
Cache-Control: The Header That Matters Most
Cache-Control is the authoritative directive for caching behavior. It supports multiple directives that can be combined, and it overrides the older Expires header when both are present.
The most important directives:
max-age=N tells caches to consider the response fresh for N seconds. A browser receiving Cache-Control: max-age=86400 will reuse the cached response for 24 hours without making another request. This is the right directive for most static assets.
no-cache does not mean "do not cache." It means the cache must revalidate with the server before serving the stored response, even if it is fresh. Use this for resources that change often but where even a slightly stale version is unacceptable.
no-store means the response must never be cached. Use this only for genuinely sensitive data like authenticated API responses or pages containing private user information.
public signals that any cache, including shared CDN caches, may store the response. This is appropriate for assets that are identical for all users.
private restricts caching to the browser. CDN caches must not store the response. Use this for personalized content or authenticated pages.
s-maxage=N overrides max-age for shared caches only, letting you set different TTLs for browser and CDN layers independently.
A typical configuration for a JavaScript bundle might look like:
Cache-Control: public, max-age=31536000, immutable
The immutable directive tells the browser that the content at this URL will never change, so there is no need to revalidate during the max-age window. This is safe only when you use content-addressed URLs that change whenever the file changes, which is standard in modern build tools like Webpack, Vite, and Next.js.
Caching Strategies by Asset Type
Different resources warrant different caching strategies. Applying the same TTL to everything is a common mistake that either over-caches dynamic content or under-caches static assets.
Static Assets With Content-Addressed URLs
CSS bundles, JavaScript files, fonts, and images generated by build tools with content hashes in the filename should use long cache TTLs. The URL itself changes when the content changes, so there is no risk of serving stale content.
Cache-Control: public, max-age=31536000, immutable
One year is the practical maximum useful TTL. Files cached this aggressively never get re-fetched unless the URL changes.
HTML Pages
HTML pages are the entry points to your application. They reference versioned static assets, so they need to be fresh for cache invalidation to work. A typical pattern:
Cache-Control: no-cache
Or for pages that change infrequently:
Cache-Control: public, max-age=300
Five minutes is a reasonable TTL for pages where slightly stale content is acceptable. For e-commerce product pages or frequently updated content, use shorter TTLs or no-cache.
API Responses
Most API responses should not be cached at all if they return personalized or session-specific data. Use:
Cache-Control: private, no-store
For public API responses that return slowly changing data, short TTLs are appropriate:
Cache-Control: public, max-age=60
One minute prevents a thundering-herd scenario when many users request the same data simultaneously, without making the data meaningfully stale.

Photo by Brett Sayles on Pexels
ETag and Last-Modified: Validation Caching
When a cached resource becomes stale, the browser does not always need to download the entire resource again. Validation headers let it ask the server: "Has this changed?"
ETag is a fingerprint of the response content. The server generates it and includes it in the initial response. When the browser makes a conditional request with If-None-Match: "etag-value", the server compares the ETag to the current version. If it matches, the server returns 304 Not Modified with no body, and the browser extends the cached version's freshness. If it has changed, the server returns the new content.
Last-Modified works similarly but uses a timestamp. The browser sends If-Modified-Since with the stored timestamp, and the server returns 304 if the content has not changed since then.
ETags are more reliable because they detect content changes that happen within the same second, which timestamps cannot distinguish. Most web frameworks and servers generate ETags automatically. Nginx generates ETags by default for static files.
Validation caching does not eliminate a round trip to the server, but it does eliminate re-downloading unchanged content. For large assets that change occasionally, this is a meaningful performance improvement.
Stale-While-Revalidate: Serving Fast Without Going Stale
The stale-while-revalidate directive is one of the more useful additions to the caching toolbox. It allows a cache to serve a stale response immediately while fetching a fresh copy in the background.
Cache-Control: public, max-age=60, stale-while-revalidate=300
This tells caches to serve the response immediately for 60 seconds. After 60 seconds, if a request comes in, serve the stale response immediately but kick off a background revalidation. If the background fetch fails, the cache can continue serving the stale response for up to 300 seconds total.
The user gets instant delivery. The cache gets updated. This is the right pattern for resources that should be reasonably fresh but where the occasional slightly stale response is acceptable: news feeds, recommendation engines, frequently accessed public APIs.
CDN Caching Behavior
CDNs cache independently of browser caches. A CDN node caches your response based on Cache-Control directives, particularly s-maxage. When you deploy a new version of an asset, the CDN edge nodes may still serve the old version until their TTL expires or until you purge them explicitly.
Most CDNs offer a cache purge API. Cloudflare provides URL-based and tag-based purging. Tag-based purging is more powerful: you tag responses at cache time with a content identifier, then purge all responses with that tag at once.
If you cannot purge immediately after deployment, keep CDN TTLs shorter for HTML pages and rely on content-addressed URLs for static assets. This approach gives you aggressive CDN caching for assets with deterministic URLs while keeping HTML fresh enough that users see updated asset references after deployments.
Cache Invalidation: The Hard Part
Cache invalidation is genuinely difficult because a cached response exists independently of the origin. Once a browser or CDN stores a response with max-age=86400, that response will be served for 24 hours regardless of what happens on the server.
The only reliable invalidation strategies:
Change the URL. Content-addressed filenames are the most robust approach. When the content changes, the URL changes, so the old cached response is never requested again.
Use short TTLs for things that change. If you cannot use content-addressed URLs, keep the TTL short enough that stale content does not cause significant problems.
Purge the CDN cache programmatically. In a deployment pipeline, trigger a cache purge as part of the deploy step. Combine this with short browser TTLs to limit the stale window.
Avoid browser cache purging. You cannot clear a user's browser cache from your server. If a user has a stale resource cached in their browser, your only tool is a URL change or a TTL expiry.
The 137Foundry team at our web development services practice builds deployment pipelines that include cache purge steps as a standard part of the release process, which eliminates the gap between deployment and CDN propagation.

Photo by Godfrey Atima on Pexels
Five Caching Mistakes That Hurt Performance
1. No Cache-Control header on static assets. Without a directive, browsers use heuristic caching based on the Last-Modified date, usually caching for 10 percent of the file age. This is unpredictable and conservative. Set explicit headers.
2. Using max-age on resources without content-addressed URLs. If you cache main.js for a year without a hash in the filename, you have no way to invalidate it when you deploy updates. Users will receive the old version for up to a year.
3. Caching API responses that contain personalized data. If an endpoint returns different data per user but you cache it as public, users will receive each other's data. Always use private or no-store for personalized responses.
4. Setting identical cache directives for HTML and assets. HTML documents reference versioned assets. If the HTML is cached as aggressively as the assets it references, users will not pick up updated asset URLs after a deployment. Cache HTML separately with shorter TTLs.
5. Ignoring Vary headers on content-negotiated responses. If your server returns different content based on Accept-Encoding or Accept-Language, include Vary: Accept-Encoding or the appropriate header. Without it, a cache may serve a gzip-compressed response to a client that requested uncompressed content.
Technical SEO and Caching
HTTP caching has a direct impact on technical SEO performance. Core Web Vitals metrics, particularly Largest Contentful Paint and Time to First Byte, reflect how quickly users receive content. Properly cached resources reduce both metrics for returning visitors, which Google's crawlers treat as a signal of page quality.
For a deeper look at integrating caching with your broader web development workflow, the MDN Web Docs on HTTP caching and Google's web.dev caching guide are the most comprehensive and up-to-date references available.
Summary
HTTP caching is a high-leverage performance improvement that requires careful configuration rather than clever code. The principles are stable: use long TTLs with content-addressed URLs for immutable assets, short TTLs or no-cache for HTML and dynamic content, and private or no-store for anything personalized.
Get the headers right, pair them with a deployment process that handles cache invalidation, and your application will deliver faster for both first-time and returning visitors.

Photo by AS Photography on Pexels