Skip to content

Dependency Injection Guide for Duckalog Configuration Architecture

This guide provides a comprehensive overview of the dependency injection patterns in the Duckalog configuration architecture. The modular structure enables better testability, extensibility, and customization of configuration loading behavior.

Implementation Status

Some interfaces described in this guide (notably ConfigLoader and SQLFileLoader from duckalog.config.loading.base) represent the intended architecture but are not yet fully implemented in the current codebase. The EnvProcessor, ImportResolver, PathValidator, and PathResolver protocols are available in duckalog.config.resolution.base and duckalog.config.security.base. For SQL file loading, use duckalog.sql_file_loader.SQLFileLoader directly.

Table of Contents

  1. New Architecture Overview
  2. Dependency Injection Interfaces
  3. Usage Patterns
  4. Migration Examples
  5. Benefits of Dependency Injection

New Architecture Overview

The refactored configuration architecture is organized into modular components that follow dependency injection principles:

src/duckalog/config/
├── api.py                    # Public API orchestration
├── loading/                  # Configuration loading components
│   ├── __init__.py
│   └── sql.py               # SQL file processing
├── resolution/              # Configuration resolution components
│   ├── base.py              # Abstract base classes and protocols
│   ├── env.py               # Environment variable processing
│   └── imports.py           # Import resolution logic
├── security/                # Path security and validation
│   ├── base.py              # Abstract base classes
│   └── path.py              # Path resolution and validation
└── models.py                # Configuration data models

Key Design Principles

  • Separation of Concerns: Each module handles a specific aspect of configuration processing
  • Dependency Injection: Components depend on abstractions, not concrete implementations
  • Backward Compatibility: Existing code continues to work without changes
  • Extensibility: Easy to customize behavior through implementation injection

Dependency Injection Interfaces

Loading Interfaces (config.loading.base)

ConfigLoader

Abstract base class for loading configuration data from various sources:

from abc import ABC, abstractmethod
from typing import Any, Optional, Union
from pathlib import Path

class ConfigLoader(ABC):
    """Abstract base class for configuration loaders."""

    @abstractmethod
    def load(
        self, path: Union[str, Path], filesystem: Optional[Any] = None
    ) -> dict[str, Any]:
        """Load configuration from a source."""
        pass

SQLFileLoader

Abstract base class for loading SQL content from files:

class SQLFileLoader(ABC):
    """Abstract base class for SQL file loaders."""

    @abstractmethod
    def load_sql(self, path: Union[str, Path], filesystem: Optional[Any] = None) -> str:
        """Load SQL content from a file."""
        pass

Resolution Interfaces (config.resolution.base)

EnvProcessor (Protocol)

Interface for processing environment variables and .env files:

from typing import Protocol, Any

@runtime_checkable
class EnvProcessor(Protocol):
    """Interface for environment variable processors."""

    def process(
        self, config_data: dict[str, Any], load_dotenv: bool = True
    ) -> dict[str, Any]:
        """Process environment variables and .env files."""
        ...

ImportResolver (Protocol)

Interface for resolving configuration imports:

@runtime_checkable
class ImportResolver(Protocol):
    """Interface for configuration import resolvers."""

    def resolve(
        self, config_data: dict[str, Any], context: ImportContext
    ) -> dict[str, Any]:
        """Resolve imports within a configuration dictionary."""
        ...

ImportContext

Dataclass that tracks import state during configuration loading:

@dataclass
class ImportContext:
    """Tracks import state during loading."""

    visited_files: set[str] = field(default_factory=set)
    import_stack: list[str] = field(default_factory=list)
    config_cache: dict[str, Any] = field(default_factory=dict)
    import_chain: list[str] = field(default_factory=list)

Security Interfaces (config.security.base)

PathValidator

Abstract base class for validating path security:

class PathValidator(ABC):
    """Abstract base class for path security validation."""

    @abstractmethod
    def validate(self, path: Union[str, Path]) -> None:
        """Validate that a path is secure and accessible."""
        pass

PathResolver

Abstract base class for resolving paths with security checks:

class PathResolver(ABC):
    """Abstract base class for path resolution."""

    @abstractmethod
    def resolve(self, path: str, base_path: Optional[Union[str, Path]] = None) -> str:
        """Resolve a path to an absolute path with security checks."""
        pass

Usage Patterns

Using Default Implementations (Backward Compatibility)

The existing API continues to work without any changes:

from duckalog.config import load_config

# Standard usage - uses default implementations
config = load_config("catalog.yaml")
print(f"Loaded {len(config.views)} views")

Creating Custom Implementations

Custom Environment Processor

Create a custom environment processor that loads variables from a database:

from typing import Any, Dict
from duckalog.config.resolution.base import EnvProcessor

class DatabaseEnvProcessor(EnvProcessor):
    """Custom environment processor that loads from a database."""

    def __init__(self, db_connection):
        self.db = db_connection

    def process(self, config_data: dict[str, Any], load_dotenv: bool = True) -> dict[str, Any]:
        if not load_dotenv:
            return config_data

        # Load environment variables from database
        env_vars = self._load_env_from_db()

        # Apply to config data
        return self._apply_env_vars(config_data, env_vars)

    def _load_env_from_db(self) -> dict[str, str]:
        # Custom logic to load from database
        with self.db.cursor() as cursor:
            cursor.execute("SELECT key, value FROM environment_variables")
            return {row[0]: row[1] for row in cursor.fetchall()}

    def _apply_env_vars(self, config_data: dict[str, Any], env_vars: dict[str, str]) -> dict[str, Any]:
        # Apply environment variables to config
        import os
        os.environ.update(env_vars)
        return config_data

Custom Path Resolver

Create a custom path resolver with enhanced security:

from duckalog.config.security.base import PathResolver, PathValidator
from duckalog.config.security.path import DefaultPathResolver, validate_path_security
from pathlib import Path
from typing import Union, Optional

class EnhancedSecurityPathResolver(PathResolver):
    """Path resolver with enhanced security validation."""

    def __init__(self, allowed_patterns: list[str] = None):
        self.allowed_patterns = allowed_patterns or []
        self.delegate = DefaultPathResolver()
        self.validator = DefaultPathValidator()

    def resolve(self, path: str, base_path: Optional[Union[str, Path]] = None) -> str:
        # Validate against allowed patterns first
        if not self._is_pattern_allowed(path):
            raise ValueError(f"Path pattern not allowed: {path}")

        # Use delegate for standard resolution
        resolved = self.delegate.resolve(path, base_path)

        # Additional security validation
        self.validator.validate(resolved)

        return resolved

    def _is_pattern_allowed(self, path: str) -> bool:
        import re
        for pattern in self.allowed_patterns:
            if re.match(pattern, path):
                return True
        return False

Custom Config Loader

Create a custom config loader that supports additional file formats:

import toml  # Requires tomli or similar library
from duckalog.config.loading.base import ConfigLoader
from typing import Any, Optional, Union
from pathlib import Path

class TomlConfigLoader(ConfigLoader):
    """Config loader that supports TOML format."""

    def load(self, path: Union[str, Path], filesystem: Optional[Any] = None) -> dict[str, Any]:
        if isinstance(path, str):
            path = Path(path)

        if not path.exists():
            raise FileNotFoundError(f"Config file not found: {path}")

        if filesystem is not None:
            content = filesystem.open(str(path), "r").read()
        else:
            content = path.read_text()

        # Parse TOML content
        try:
            return toml.loads(content)
        except Exception as e:
            raise ValueError(f"Failed to parse TOML config: {e}") from e

Injecting Custom Dependencies

Using the API with Custom Implementations

The new API supports dependency injection for advanced use cases:

from duckalog.config.api import load_config
from duckalog.config.resolution.imports import DefaultImportResolver, RequestContext
from duckalog.config.resolution.env import DefaultEnvProcessor, env_cache_scope

# Create custom context with enhanced capabilities
with env_cache_scope() as env_cache:
    request_context = RequestContext(env_cache=env_cache)

    # Use default resolver with custom context
    resolver = DefaultImportResolver(context=request_context)

    # Load config with custom resolver
    config_data = {
        "file_path": "catalog.yaml",
        "filesystem": None,
        "resolve_paths": True,
        "load_sql_files": True,
        "sql_file_loader": None,
        "load_dotenv": True
    }

    resolved_config = resolver.resolve(config_data, request_context.import_context)

Testing with Mock Implementations

Dependency injection makes testing much easier by allowing you to inject mock implementations:

import pytest
from unittest.mock import Mock, MagicMock
from duckalog.config.resolution.base import EnvProcessor
from duckalog.config.resolution.imports import DefaultImportResolver, RequestContext

class MockEnvProcessor(EnvProcessor):
    def process(self, config_data: dict[str, Any], load_dotenv: bool = True) -> dict[str, Any]:
        # Mock environment processing
        config_data["test_env"] = "mock_value"
        return config_data

def test_config_loading_with_mock_env():
    # Create mock context
    mock_context = Mock()
    mock_context.env_cache = Mock()
    mock_context.import_context = Mock()

    # Create resolver with mock context
    resolver = DefaultImportResolver(context=mock_context)

    # Test data
    config_data = {
        "file_path": "test.yaml",
        "content": "views: []",
        "load_dotenv": True
    }

    # Test with mock
    result = resolver.resolve(config_data, mock_context.import_context)

    # Verify mock was called
    assert mock_context.import_context.visited_files.add.assert_called()

Migration Examples

Before: Custom Filesystem Implementation

# Old approach - required modifying core logic
class CustomFileSystem:
    def open(self, path, mode='r'):
        # Custom file system logic
        pass

    def exists(self, path):
        # Custom existence check
        pass

# Had to pass through custom parameters
config = load_config("catalog.yaml", filesystem=CustomFileSystem())

After: Dependency Injection with Custom Config Loader

# New approach - inject custom implementation
from duckalog.config.loading.base import ConfigLoader

class CustomFileSystemLoader(ConfigLoader):
    def __init__(self, custom_fs):
        self.fs = custom_fs

    def load(self, path, filesystem=None):
        # Use custom filesystem directly
        if self.fs.exists(path):
            content = self.fs.open(path).read()
            import yaml
            return yaml.safe_load(content)
        return {}

# Use in testing or custom environments
custom_loader = CustomFileSystemLoader(CustomFileSystem())
# The system can now be extended without modifying core logic

Before: Alternative Environment Processing

# Old approach - had to modify environment variables globally
import os
os.environ.update(custom_env_vars)  # Affects entire application
config = load_config("catalog.yaml")

After: Custom Environment Processor

# New approach - scoped environment processing
from duckalog.config.resolution.base import EnvProcessor

class ScopedEnvProcessor(EnvProcessor):
    def __init__(self, scoped_env_vars):
        self.scoped_env = scoped_env_vars

    def process(self, config_data, load_dotenv=True):
        # Apply scoped environment variables without affecting global state
        original_env = os.environ.copy()
        try:
            os.environ.update(self.scoped_env)
            return config_data  # Process with scoped environment
        finally:
            os.environ.clear()
            os.environ.update(original_env)

# Use without affecting global state
scoped_processor = ScopedEnvProcessor({"DATABASE_URL": "test://localhost"})

Before: Custom Path Resolution Strategies

# Old approach - limited customization options
config = load_config("catalog.yaml", resolve_paths=True)  # Only boolean option

After: Custom Path Resolution

# New approach - inject custom path resolver
from duckalog.config.security.base import PathResolver

class CloudPathResolver(PathResolver):
    def resolve(self, path, base_path=None):
        if path.startswith("cloud://"):
            # Custom cloud path resolution
            return self._resolve_cloud_path(path)
        else:
            # Standard resolution
            return self._resolve_standard_path(path, base_path)

    def _resolve_cloud_path(self, path):
        # Custom logic for cloud paths
        pass

    def _resolve_standard_path(self, path, base_path):
        # Fallback to standard resolution
        from duckalog.config.security.path import DefaultPathResolver
        resolver = DefaultPathResolver()
        return resolver.resolve(path, base_path)

# Can be integrated with the loading system
cloud_resolver = CloudPathResolver()

Benefits of Dependency Injection

1. Enhanced Testability

Dependency injection makes unit testing significantly easier:

  • Isolate Components: Test individual components in isolation
  • Mock Dependencies: Easily mock external dependencies
  • Control State: Test with predictable, controlled state
  • No Side Effects: Avoid modifying global state during tests
# Easy testing with mocked dependencies
def test_config_loading():
    mock_loader = Mock(spec=ConfigLoader)
    mock_loader.load.return_value = {"views": []}

    # Test configuration loading logic with mock loader
    # No actual file I/O required

2. Improved Extensibility

The architecture allows for seamless extension of functionality:

  • Plugin Architecture: Add new loaders, resolvers, and processors as plugins
  • Custom Implementations: Replace any component with custom logic
  • Backward Compatibility: Extend without breaking existing code
  • Feature Flags: Enable/disable features through dependency injection
# Easy to add new file format support
class XmlConfigLoader(ConfigLoader):
    def load(self, path, filesystem=None):
        # XML parsing logic
        pass

# Register and use without touching core code

3. Better Separation of Concerns

Each component has a single, well-defined responsibility:

  • Loading: Responsible only for reading data from sources
  • Resolution: Handles only import resolution logic
  • Security: Focuses solely on path validation and security
  • Environment: Manages environment variable processing exclusively

4. Configuration Flexibility

Different environments can use different implementations:

# Development: Use local file system
dev_config = load_config("dev.yaml", loader=LocalFileLoader())

# Production: Use cloud storage
prod_config = load_config("prod.yaml", loader=CloudStorageLoader())

# Testing: Use in-memory loader
test_config = load_config("test.yaml", loader=MemoryLoader({"views": []}))

5. Performance Optimization

Dependency injection enables performance optimizations:

  • Lazy Loading: Load dependencies only when needed
  • Caching: Inject cached implementations
  • Pooling: Use connection pools for database-backed loaders
  • Async Support: Inject async implementations where needed

6. Maintenance and Debugging

The modular architecture makes maintenance easier:

  • Isolated Bugs: Issues are contained to specific components
  • Clear Dependencies: Easy to understand what each component needs
  • Gradual Migration: Can update components incrementally
  • Better Logging: Each component can provide detailed logging

Advanced Usage Patterns

Composition Root Pattern

Use a composition root to configure dependencies for your application:

class ConfigurationFactory:
    """Factory for creating configured configuration loaders."""

    @staticmethod
    def create_for_environment(env: str):
        if env == "development":
            env_processor = DefaultEnvProcessor()
            path_resolver = DefaultPathResolver()
        elif env == "production":
            env_processor = ProductionEnvProcessor()
            path_resolver = SecurePathResolver(allowed_roots=["/app/config"])
        else:
            env_processor = MemoryEnvProcessor()
            path_resolver = InMemoryPathResolver()

        return DefaultImportResolver(
            context=RequestContext(
                env_cache=EnvCache(),
                import_context=ImportContext()
            )
        )

# Usage
factory = ConfigurationFactory()
resolver = factory.create_for_environment("production")

Middleware Pattern

Create middleware-like processors for configuration:

class ConfigProcessor(ABC):
    @abstractmethod
    def process(self, config_data: dict[str, Any]) -> dict[str, Any]:
        pass

class ValidationProcessor(ConfigProcessor):
    def process(self, config_data):
        # Validation logic
        return config_data

class LoggingProcessor(ConfigProcessor):
    def process(self, config_data):
        # Logging logic
        return config_data

class PipelineConfigLoader(ConfigLoader):
    def __init__(self, base_loader: ConfigLoader, processors: list[ConfigProcessor]):
        self.base_loader = base_loader
        self.processors = processors

    def load(self, path, filesystem=None):
        config = self.base_loader.load(path, filesystem)
        for processor in self.processors:
            config = processor.process(config)
        return config

This dependency injection architecture provides a solid foundation for building flexible, testable, and maintainable configuration loading systems while preserving backward compatibility and enabling future enhancements.