Retry Strategy¶

Retry strategies determine how the library handles failed requests. The fastreq library implements exponential backoff with jitter to balance resilience with efficiency.

Exponential Backoff Algorithm¶

Exponential backoff is a standard technique for handling transient failures by increasing the wait time between retries.

The Algorithm¶

The retry strategy calculates delay using this formula:

delay = backoff_multiplier × (2^attempt) ± jitter

Where: - attempt: Retry attempt number (0-indexed) - backoff_multiplier: Base delay in seconds (default: 1.0) - jitter: Random variation as fraction of base delay (default: 10%)

Delay Calculation¶

def _calculate_delay(self, attempt: int) -> float:
    base_delay = self.config.backoff_multiplier * (2 ** attempt)
    jitter_amount = self.config.jitter * base_delay
    jittered_delay = base_delay + random.uniform(-jitter_amount, jitter_amount)
    return float(max(0, jittered_delay))

Example Calculations¶

With backoff_multiplier=1.0 and jitter=0.1 (10%):

Attempt	Base Delay	Jitter Range	Possible Delay
0	1.0s	±0.10s	0.90s - 1.10s
1	2.0s	±0.20s	1.80s - 2.20s
2	4.0s	±0.40s	3.60s - 4.40s
3	8.0s	±0.80s	7.20s - 8.80s

Why Exponential Backoff?¶

Benefits¶

1. Reduces Server Load - Failed requests are retried with increasing delays - Gives failing service time to recover - Prevents immediate retry storms

2. Transient Failure Handling - Network glitches often resolve quickly - Temporary overload clears with time - Database locks release

3. Resource Efficiency - Fewer retries on persistent failures - Faster failure detection - Better use of limited resources

Alternative Strategies¶

Strategy	Advantages	Disadvantages
Exponential Backoff	Balances speed and load	Can be slow for many retries
Linear Backoff	Predictable delay	Doesn't adapt quickly
Fixed Delay	Simple	Inefficient for many failures
No Backoff	Fastest	Overwhelms failing services

Comparison Example¶

Request that fails 3 times:

Fixed 1-second delay:

Attempt 0: Fail → Wait 1s → Retry
Attempt 1: Fail → Wait 1s → Retry
Attempt 2: Fail → Wait 1s → Retry
Total wait: 3 seconds

Exponential backoff (1x, 10% jitter):

Attempt 0: Fail → Wait ~1s → Retry
Attempt 1: Fail → Wait ~2s → Retry
Attempt 2: Fail → Wait ~4s → Retry
Total wait: ~7 seconds

Exponential backoff waits longer but is much gentler on failing services.

Jitter (Random Variation)¶

Jitter adds randomness to retry delays to prevent synchronization issues.

Why Jitter Is Needed¶

Without jitter, multiple clients might retry at the same time:

Without Jitter:
Client A: [Retry]────────────→[Retry]────────────→[Retry]
Client B: [Retry]────────────→[Retry]────────────→[Retry]
Client C: [Retry]────────────→[Retry]────────────→[Retry]
           ↑ All retry at same time

With Jitter:
Client A: [Retry]──────→[Retry]─────────→[Retry]
Client B: [Retry]────────→[Retry]───────→[Retry]
Client C: [Retry]─────→[Retry]──────────→[Retry]
           ↑ Distributed over time

Jitter Formula¶

jitter_amount = base_delay * jitter_fraction
random_delay = base_delay + random.uniform(-jitter_amount, +jitter_amount)

With jitter=0.1 (10%), delays vary by ±10% around the base.

Thundering Herd Problem¶

The thundering herd occurs when many clients retry a failing service simultaneously:

┌─────────────────────────────────────────────────┐
│              Failing Service                    │
│                                                │
│  Time 0s:  [████████████] 100 requests        │
│           Service crashes!                      │
│                                                │
│  Time 1s:  [████████████] 100 clients retry   │
│           Service still overwhelmed            │
│                                                │
│  Time 2s:  [████████████] 100 clients retry   │
│           Service never recovers               │
└─────────────────────────────────────────────────┘

With jitter, retries are distributed:

Time 1s:  [██] 10 clients retry
Time 1.1s:[██] 10 clients retry
Time 1.2s:[██] 10 clients retry
...
Time 2s:  [██] Remaining clients retry

Service has time to recover between retry waves

Retry Logic Flow¶

Decision Process¶

Request Failed
      │
      ▼
┌─────────────────────────────┐
│  Should retry?              │
│  - Check dont_retry_on      │
│  - Check retry_on           │
└─────────────────────────────┘
      │
      ├──► No → Raise error
      │
      ▼
┌─────────────────────────────┐
│  Attempts < max_retries?    │
└─────────────────────────────┘
      │
      ├──► No → Raise RetryExhaustedError
      │
      ▼
┌─────────────────────────────┐
│  Calculate delay             │
│  - Exponential backoff      │
│  - Add jitter                │
└─────────────────────────────┘
      │
      ▼
┌─────────────────────────────┐
│  Wait for delay              │
└─────────────────────────────┘
      │
      ▼
┌─────────────────────────────┐
│  Retry request               │
└─────────────────────────────┘

Selective Retry Logic¶

The library supports selective retrying:

@dataclass
class RetryConfig:
    max_retries: int = 3
    backoff_multiplier: float = 1.0
    jitter: float = 0.1
    retry_on: set[type[Exception]] | None = None
    dont_retry_on: set[type[Exception]] | None = None

Retry on specific errors:

config = RetryConfig(
    retry_on={TimeoutError, ConnectionError},
    # Only retry timeout and connection errors
)

Don't retry specific errors:

config = RetryConfig(
    dont_retry_on={AuthenticationError, PermissionError},
    # Don't retry auth/permission errors
)

Retry Decision Logic¶

def _should_retry(self, error: Exception) -> bool:
    # Never retry if error is in dont_retry_on
    if self.config.dont_retry_on and isinstance(
        error, tuple(self.config.dont_retry_on)
    ):
        return False

    # Only retry if error is in retry_on (if specified)
    if self.config.retry_on:
        return isinstance(error, tuple(self.config.retry_on))

    # Default: retry all errors
    return True

Configuration Examples¶

Default Configuration¶

# Default: 3 retries, 1s base delay, 10% jitter
client = FastRequests(max_retries=3)

Retry delays (with 10% jitter): - Attempt 0: ~1s wait - Attempt 1: ~2s wait - Attempt 2: ~4s wait - After 3 failures: raise error

Aggressive Retries¶

# More retries, faster initial delay
client = FastRequests(
    max_retries=5,
    # Default backoff_multiplier=1.0
)

Retry delays: - Attempt 0: ~1s - Attempt 1: ~2s - Attempt 2: ~4s - Attempt 3: ~8s - Attempt 4: ~16s

Conservative Retries¶

# Fewer retries, slower backoff, more jitter
client = FastRequests(
    max_retries=2,
    backoff_multiplier=2.0,  # Slower backoff
)

Note: Jitter is not configurable in current API (fixed at 10%).

Retry delays (with 10% jitter): - Attempt 0: ~2s - Attempt 1: ~4s

No Jitter (Not Recommended)¶

Jitter is currently fixed at 10% in the implementation. If you need to disable jitter, you would need to modify the RetryStrategy class:

# In RetryStrategy._calculate_delay():
# Original:
jittered_delay = base_delay + random.uniform(-jitter_amount, jitter_amount)

# Without jitter:
jittered_delay = base_delay

Warning: Disabling jitter can cause thundering herd problems.

Integration with Other Features¶

Retry and Rate Limiting¶

Retries respect rate limiting:

client = FastRequests(
    max_retries=3,
    rate_limit=10,
    rate_limit_burst=5,
)

Flow: 1. Request fails 2. Calculate retry delay (e.g., 1s) 3. Wait 1s 4. Acquire rate limit token 5. Retry request

Retry and Concurrency¶

Retries don't increase concurrency:

client = FastRequests(
    max_retries=3,
    concurrency=10,
)

If request fails, retry doesn't use additional concurrency slot. The original slot is held during retry.

Retry and Timeouts¶

Retries are separate from timeouts:

client = FastRequests(
    max_retries=3,
    timeout=5,  # Per-request timeout
)

Each retry attempt has a 5-second timeout
Total max time: 3 retries × (5s timeout + backoff delay)

Best Practices¶

Choosing Retry Settings¶

For API calls:

# API calls: moderate retries
FastRequests(
    max_retries=3,
    backoff_multiplier=1.0,
)

For long-running downloads:

# Downloads: fewer retries, longer timeout
FastRequests(
    max_retries=2,
    backoff_multiplier=2.0,
    timeout=30,
)

For unreliable networks:

# Unstable network: more retries
FastRequests(
    max_retries=5,
    backoff_multiplier=1.0,
)

Handling Specific Errors¶

Never retry authorization errors:

# Auth errors won't be fixed by retrying
# The library handles common non-retryable errors automatically

Always retry timeout errors:

# Timeouts are often transient
# Retried by default unless configured otherwise

Monitoring Retries¶

Enable debug logging to see retry behavior:

client = FastRequests(
    max_retries=3,
    debug=True,
)

Output:

Retry attempt 1/3, waiting 1.05s
Retry attempt 2/3, waiting 2.12s
Retry attempt 3/3, waiting 4.08s

Performance Considerations¶

Total Wait Time¶

With max_retries=3 and backoff_multiplier=1.0:

Maximum total wait time on failure:
= 1s + 2s + 4s = 7 seconds

With max_retries=5:

Maximum total wait time:
= 1s + 2s + 4s + 8s + 16s = 31 seconds

Resource Usage¶

Retries hold resources during backoff:

Semaphore slot: Held during retry wait
Memory: Retry state is minimal (~32 bytes)
CPU: Minimal during sleep

Timeout vs Retry Timeout¶

Per-request timeout: How long to wait for response
Retry delay: How long to wait between retries

Example with timeout=5, max_retries=3:
┌──────┐ 5s   ┌──────┐ 1s   ┌──────┐ 2s   ┌──────┐ 4s
│ Req  │──────▶│Wait  │──────▶│Retry │──────▶│Wait  │─────▶...
└──────┘       └──────┘       └──────┘       └──────┘

Max time per URL: 3 × (5s timeout + avg 2.33s delay) = ~22s

Troubleshooting¶

Too Many Retries¶

Problem: Application hanging due to excessive retries

Solution: Reduce max_retries:

FastRequests(max_retries=1)  # Only retry once

Retries Too Slow¶

Problem: Recoverable errors taking too long to retry

Solution: Reduce backoff_multiplier (not currently configurable via main API):

# Would require custom RetryConfig
# Or accept current default of 1.0

Retries Not Working¶

Problem: Errors not being retried

Cause: Error might be non-retryable (e.g., authentication)

Solution: Check error type or configure retry_on:

# Current API doesn't expose retry_on configuration
# Future enhancement may allow this

Architecture - How retry integrates with other components
Rate Limiting - How retries interact with rate limiting
How-to: Handle Retries - Practical usage guide