Rate Limiting¶

Rate limiting controls the rate at which requests are sent to prevent overwhelming servers or hitting API rate limits. The fastreq library uses the token bucket algorithm for efficient and flexible rate limiting.

Token Bucket Algorithm¶

The token bucket algorithm is a fundamental rate limiting technique that allows for both controlled request rates and burst capability.

How It Works¶

Imagine a bucket that holds tokens:

┌─────────────────────────────┐
│      Token Bucket           │
│                             │
│  Token Token Token Token    │  ← burst size (max tokens)
│                             │
│  Refill: 10 tokens/second   │
└─────────────────────────────┘
         │
         ▼
   Make request?
   Consume 1 token

Key Concepts¶

Bucket Size (Burst): Maximum number of tokens the bucket can hold.

Refill Rate: How quickly tokens are added to the bucket (tokens per second).

Token Consumption: Each request consumes one token.

Algorithm Details¶

class TokenBucket:
    def __init__(self, requests_per_second: float, burst: int):
        self.requests_per_second = requests_per_second
        self.burst = burst
        self._tokens = float(burst)  # Start full
        self._last_update = time.monotonic()

    def _refill_tokens(self):
        now = time.monotonic()
        elapsed = now - self._last_update
        # Add tokens based on elapsed time
        self._tokens = min(
            self.burst,
            self._tokens + elapsed * self.requests_per_second
        )
        self._last_update = now

Token Refill¶

Tokens are refilled based on elapsed time since the last request:

If requests_per_second = 10 and burst = 5:

Time 0s:  [█████] 5 tokens (full bucket)
Time 0.1s:[████▒▒] 4 tokens (0.1 * 10 = 1 token consumed)
Time 0.2s:[███▒▒▒] 3 tokens
Time 0.3s:[██▒▒▒▒] 2 tokens
Time 0.5s:[█▒▒▒▒▒] 1 token
Time 0.6s:[▒▒▒▒▒▒] 0 tokens (bucket empty)
Time 0.7s:[█▒▒▒▒▒] 1 token (refilled: 0.1 * 10 = 1)

Acquiring Tokens¶

When a request needs to be made:

async def acquire(self, tokens: int = 1):
    while True:
        self._refill_tokens()
        if self._tokens >= tokens:
            self._tokens -= tokens
            return  # Proceed with request

        # Not enough tokens, wait for refill
        wait_time = (tokens - self._tokens) / self.requests_per_second
        await asyncio.sleep(wait_time)

If tokens are available, they're consumed immediately. Otherwise, the request waits until enough tokens are refilled.

Burst Handling¶

Burst capability is a key advantage of the token bucket algorithm. It allows temporary spikes in request rate as long as the average rate stays within limits.

Burst Example¶

With rate_limit=10 and rate_limit_burst=5:

Rate: 10 requests/second
Burst: 5 tokens

Time 0.00s: Request 1 → 4 tokens left (████)
Time 0.01s: Request 2 → 3 tokens left (███▒)
Time 0.02s: Request 3 → 2 tokens left (██▒▒)
Time 0.03s: Request 4 → 1 token left (█▒▒▒)
Time 0.04s: Request 5 → 0 tokens left (▒▒▒▒)  ← Burst exhausted

Time 0.10s: Request 6 → 0 tokens left (bucket refilled 1 token)
Time 0.11s: Request 7 → 0 tokens left
Time 0.20s: Request 8 → 0 tokens left

After the burst is exhausted, requests are throttled to the refill rate (10 requests/second).

Why Burst Matters¶

Without burst (fixed rate): Maximum 10 requests/second, even if server can handle more With burst: Send 5 requests instantly, then 10 requests/second thereafter

Burst is useful for: - Initial data fetching: Get multiple pages quickly - Recovering from downtime: Catch up on queued work - Flexible rate limits: Some APIs allow short bursts

Concurrency Control¶

The library uses a semaphore to control the maximum number of concurrent requests, separate from rate limiting.

Semaphore Operation¶

class AsyncRateLimiter:
    def __init__(self, config: RateLimitConfig):
        self._bucket = TokenBucket(
            config.requests_per_second,
            config.burst
        )
        self._semaphore = asyncio.Semaphore(
            config.max_concurrency
        )

    async def acquire(self):
        async with self._semaphore:      # Limit concurrency
            await self._bucket.acquire()  # Limit rate
            yield                         # Make request

Combined Controls¶

The rate limiter uses both mechanisms:

┌─────────────────────────────────────────────────────┐
│              Rate Limiting Controls                   │
│                                                       │
│  ┌──────────────┐         ┌──────────────────┐       │
│  │  Semaphore   │         │   Token Bucket   │       │
│  │              │         │                  │       │
│  │  Max: 20     │────────▶│  Rate: 10/s      │       │
│  │  Concurrent  │         │  Burst: 5        │       │
│  │  Requests    │         │                  │       │
│  └──────────────┘         └──────────────────┘       │
│         │                          │                  │
│         ▼                          ▼                  │
│  Limits simultaneous    Controls request rate       │
│  connections             (with burst capability)    │
└─────────────────────────────────────────────────────┘

Example: Rate Limit vs Concurrency¶

With rate_limit=10, rate_limit_burst=5, concurrency=20:

Scenario: Need to make 100 requests

Concurrency limit: 20 simultaneous connections
Rate limit: 10 requests/second average
Burst: Can send 5 immediately, then 10/second

Timeline:
0.0s:  5 requests (burst) ────┐
0.1s:  5 requests (burst)     │
0.2s:  5 requests (burst)     │   20 concurrent connections
0.3s:  5 requests (burst) ────┘
0.4s:  10 requests (sustained rate)
0.5s:  10 requests
...
9.5s:  Final request completed

Total time: ~10 seconds

Why Token Bucket vs Other Algorithms?¶

Comparison with Other Algorithms¶

Algorithm	Burst Support	Complexity	Use Case
Token Bucket	✓	Low	General purpose, flexible
Leaky Bucket	✗	Medium	Network traffic shaping
Fixed Window	✗	Very Low	Simple rate limits
Sliding Window	Limited	High	Precise rate limiting

Advantages of Token Bucket¶

1. Burst Capability - Allows temporary spikes - Suitable for real-world usage patterns - More flexible than fixed limits

2. Smooth Request Distribution - Unlike leaky bucket (which regulates outflow) - Token bucket regulates requests directly

3. Simplicity - Easy to understand and implement - Minimal state tracking (tokens, last refill time) - Low computational overhead

4. Predictable Behavior - Maximum burst size is known - Average rate is guaranteed - Easy to tune for specific requirements

When Token Bucket Is Not Ideal¶

Precise per-second limits: Sliding window is more accurate
Network traffic shaping: Leaky bucket is designed for this
Distributed systems: Requires distributed coordination

Rate Limiting Use Cases¶

API Rate Limits¶

Many APIs enforce rate limits (e.g., GitHub: 5000 requests/hour):

# GitHub API: 5,000 requests/hour ≈ 1.4 requests/second
client = FastRequests(
    rate_limit=1.4,
    rate_limit_burst=5,  # Allow bursts
    concurrency=10,
)

Preventing Server Overload¶

Protect your own servers from excessive requests:

# Self-imposed limit: 100 requests/second
client = FastRequests(
    rate_limit=100,
    rate_limit_burst=20,
    concurrency=50,
)

Avoiding IP Blocking¶

When scraping, stay under radar:

# Conservative scraping: 1 request/second
client = FastRequests(
    rate_limit=1,
    rate_limit_burst=3,  # Small burst
    concurrency=5,
)

Cost Management¶

Some APIs charge per request:

# Stay within budget: 10,000 requests/day ≈ 0.12 requests/second
client = FastRequests(
    rate_limit=0.12,
    rate_limit_burst=10,
)

Practical Examples¶

Example 1: Basic Rate Limiting¶

from fastreq import fastreq

# Limit to 5 requests/second, burst of 2
results = fastreq(
    urls=[url] * 50,
    rate_limit=5,
    rate_limit_burst=2,
    concurrency=10,
)

Behavior: - First 2 requests execute immediately (burst) - Remaining requests at ~5/second - Up to 10 concurrent connections

Example 2: High-Burst Scenario¶

# Large burst for initial fetch
results = fastreq(
    urls=[url] * 100,
    rate_limit=10,
    rate_limit_burst=50,  # Can send 50 immediately
    concurrency=50,
)

Behavior: - First 50 requests execute immediately - Remaining 50 at ~10/second - All 100 complete in ~10 seconds

Example 3: Tight Rate Limiting¶

# Strict API limit: 1 request/second
results = fastreq(
    urls=[url] * 10,
    rate_limit=1,
    rate_limit_burst=1,  # No real burst
    concurrency=3,
)

Behavior: - 1 request per second - Up to 3 requests queued/concurrent - All 10 complete in ~10 seconds

Performance Considerations¶

Overhead¶

Rate limiting adds minimal overhead:

# Token bucket operations
_refill_tokens(): O(1)  # Simple arithmetic
available(): O(1)      # Simple arithmetic
acquire(): O(1)        # May wait, but constant time check

Wait Time Calculation¶

When tokens are insufficient, wait time is calculated:

wait_time = (tokens_needed - tokens_available) / refill_rate

For example, with 0 tokens and 10 tokens/second: - Need 1 token: wait 0.1 seconds - Need 5 tokens: wait 0.5 seconds

Memory Usage¶

Rate limiting state is minimal: - _tokens: One float - _last_update: One float (timestamp) - Per limiter: ~16 bytes

Troubleshooting¶

Requests Slower Than Expected¶

Problem: Requests taking longer than rate_limit suggests

Possible Causes: 1. Network latency: Rate limiting doesn't account for network time 2. Backend limitations: Some backends have inherent overhead 3. Server processing time: Server may take time to process requests

Solution: Measure actual throughput and adjust accordingly.

Burst Not Working¶

Problem: Requests not being sent in bursts

Cause: rate_limit_burst too low or already exhausted

Solution: Increase rate_limit_burst or wait for refill:

client = FastRequests(
    rate_limit=10,
    rate_limit_burst=20,  # Larger burst
)

Concurrency Exceeded¶

Problem: More concurrent requests than concurrency setting

Cause: Rate limiting and concurrency are separate controls

Solution: Remember that both limits apply: - concurrency: Max simultaneous connections - rate_limit: Max requests per second

Architecture - How rate limiting integrates with other components
Retry Strategy - How retries interact with rate limiting
How-to: Limit Request Rate - Practical usage guide