Too Many Requests? Here’s How to Fix That!

March 16, 2025

All About Rate Limiting and How to Implement It

In today’s digital landscape, APIs, web services, and applications are under constant demand from users, clients, and sometimes even malicious actors. That’s where rate limiting becomes a critical tool — protecting your system from abuse, ensuring fair usage, and enhancing security.

Let’s dive deep into what rate limiting is, why it matters, and how you can implement it effectively.

💡 What is Rate Limiting?

Rate limiting is the process of controlling the number of requests a user or client can make to a system in a given time frame. It helps manage load, prevent abuse, and protect backend resources.

Example:

A user can make 100 requests per minute, and any additional requests will be rejected or delayed.

🔒 Why is Rate Limiting Important?

Prevents API Abuse: Protects services from being overwhelmed by too many requests (intentional or accidental).
Avoids DDoS Attacks: Helps mitigate Denial of Service or Distributed DoS attacks.
Ensures Fair Usage: Prevents a single user or client from consuming disproportionate resources.
Improves Performance: Reduces server load and increases overall stability.
Cost Control: Helps cloud-based services manage usage billing by throttling excess usage.

📊 Common Rate Limiting Strategies

Strategy	Description
Fixed Window	Limits requests per fixed time window (e.g., 100 requests per minute).
Sliding Window	Like fixed window, but with a rolling time frame, providing smoother limits.
Token Bucket	Tokens are added at a fixed rate, and each request consumes one token. Bursts are allowed if tokens are available.
Leaky Bucket	Requests enter a queue and are processed at a fixed rate, smoothing out traffic spikes.
Concurrent Limiting	Restricts the number of concurrent requests (useful for real-time systems).

🛠 How to Implement Rate Limiting

1. Using Middleware (for Web Apps/APIs)

Example (Node.js + Express + express-rate-limit):

const rateLimit = require('express-rate-limit');

const limiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 100, // limit each IP to 100 requests per window
  message: 'Too many requests, please try again later.',
});

app.use('/api/', limiter);

Explanation:

require('express-rate-limit'): This imports the express-rate-limit middleware library.
windowMs: 60 * 1000: This sets a time window of 1 minute (60,000 milliseconds). All counting happens within this window.
max: 100: This allows a maximum of 100 requests per IP address within that 1-minute window.
message: The response sent when the limit is crossed. You can also send JSON or a custom status code.
app.use('/api/', limiter): This applies the rate limiting only to routes starting with /api/. So for example, /api/users or /api/products will be rate limited, but /home will not be.

2. Using Redis (Distributed Systems)

Example (Node.js + Redis + rate-limit-redis):

const RedisStore = require('rate-limit-redis');
const limiter = rateLimit({
  store: new RedisStore({
    sendCommand: (...args) => redisClient.call(...args),
  }),
  windowMs: 60 * 1000,
  max: 100,
});

Explanation:

require('rate-limit-redis'): This lets us use Redis as the backend store for tracking request counts across multiple servers.
store: new RedisStore({...}): Instead of in-memory storage (which is default), we store rate-limiting counters in Redis, so it works across a cluster or load-balanced setup.
sendCommand: This connects your Redis client with the rate limiter so that it can store/retrieve counters.
windowMs and max: Same logic as above — max 100 requests per minute per IP.

⚠ Why Redis?: In-memory limits only work on one server. In production with multiple instances, Redis ensures all servers share the same rate limit state.

3. NGINX Rate Limiting (Server Level Control)

NGINX example:

limit_req_zone $binary_remote_addr zone=mylimit:10m rate=10r/s;

server {
    location /api/ {
        limit_req zone=mylimit burst=20 nodelay;
    }
}

Explanation:

limit_req_zone: Defines a rate-limiting zone.
- $binary_remote_addr: Tracks rate limits per client IP.
- zone=mylimit:10m: Names the zone mylimit and allocates 10 MB of memory for storing IP counters.
- rate=10r/s: Allows 10 requests per second per IP.
limit_req: This enforces the rate limit inside your server block or location block.
- burst=20: Allows a temporary burst of up to 20 extra requests, which are handled if the traffic slows down soon after.
- nodelay: Sends bursts immediately rather than delaying.

This is a very efficient and high-performance way to rate limit before traffic even hits your backend code!

Best Practices

Custom Rate Limits per User Role: e.g., Free users: 50 req/min, Premium: 500 req/min.
Return Proper HTTP Status Codes: Use 429 Too Many Requests for rate limit violations.

Inform Clients: Include headers like:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 10
X-RateLimit-Reset: 60

Graceful Degradation: Let users retry after a short delay or offer a backoff strategy.
Monitor & Alert: Use dashboards/logs to track rate limit hits or abuse attempts.

📈 Real-World Use Cases

Twitter API: Limits tweets/reads per user per time period.
GitHub API: Restricts API calls to prevent spamming.
Payment Gateways: Rate limits on transactions to prevent fraud or misuse.

🔚 Conclusion

Rate limiting is a simple yet powerful technique that protects systems, improves performance, and ensures fairness. Whether you’re building a public API, a private service, or a scalable SaaS product, rate limiting should be part of your security and infrastructure strategy.

Search This Blog

The Path to Insight: Exploring Tech, AI, and Cybersecurity