Dependency Injection Guide for Duckalog Configuration Architecture¶
This guide provides a comprehensive overview of the dependency injection patterns in the Duckalog configuration architecture. The modular structure enables better testability, extensibility, and customization of configuration loading behavior.
Implementation Status
Some interfaces described in this guide (notably ConfigLoader and SQLFileLoader from duckalog.config.loading.base) represent the intended architecture but are not yet fully implemented in the current codebase. The EnvProcessor, ImportResolver, PathValidator, and PathResolver protocols are available in duckalog.config.resolution.base and duckalog.config.security.base. For SQL file loading, use duckalog.sql_file_loader.SQLFileLoader directly.
Table of Contents¶
- New Architecture Overview
- Dependency Injection Interfaces
- Usage Patterns
- Migration Examples
- Benefits of Dependency Injection
New Architecture Overview¶
The refactored configuration architecture is organized into modular components that follow dependency injection principles:
src/duckalog/config/
├── api.py # Public API orchestration
├── loading/ # Configuration loading components
│ ├── __init__.py
│ └── sql.py # SQL file processing
├── resolution/ # Configuration resolution components
│ ├── base.py # Abstract base classes and protocols
│ ├── env.py # Environment variable processing
│ └── imports.py # Import resolution logic
├── security/ # Path security and validation
│ ├── base.py # Abstract base classes
│ └── path.py # Path resolution and validation
└── models.py # Configuration data models
Key Design Principles¶
- Separation of Concerns: Each module handles a specific aspect of configuration processing
- Dependency Injection: Components depend on abstractions, not concrete implementations
- Backward Compatibility: Existing code continues to work without changes
- Extensibility: Easy to customize behavior through implementation injection
Dependency Injection Interfaces¶
Loading Interfaces (config.loading.base)¶
ConfigLoader¶
Abstract base class for loading configuration data from various sources:
from abc import ABC, abstractmethod
from typing import Any, Optional, Union
from pathlib import Path
class ConfigLoader(ABC):
"""Abstract base class for configuration loaders."""
@abstractmethod
def load(
self, path: Union[str, Path], filesystem: Optional[Any] = None
) -> dict[str, Any]:
"""Load configuration from a source."""
pass
SQLFileLoader¶
Abstract base class for loading SQL content from files:
class SQLFileLoader(ABC):
"""Abstract base class for SQL file loaders."""
@abstractmethod
def load_sql(self, path: Union[str, Path], filesystem: Optional[Any] = None) -> str:
"""Load SQL content from a file."""
pass
Resolution Interfaces (config.resolution.base)¶
EnvProcessor (Protocol)¶
Interface for processing environment variables and .env files:
from typing import Protocol, Any
@runtime_checkable
class EnvProcessor(Protocol):
"""Interface for environment variable processors."""
def process(
self, config_data: dict[str, Any], load_dotenv: bool = True
) -> dict[str, Any]:
"""Process environment variables and .env files."""
...
ImportResolver (Protocol)¶
Interface for resolving configuration imports:
@runtime_checkable
class ImportResolver(Protocol):
"""Interface for configuration import resolvers."""
def resolve(
self, config_data: dict[str, Any], context: ImportContext
) -> dict[str, Any]:
"""Resolve imports within a configuration dictionary."""
...
ImportContext¶
Dataclass that tracks import state during configuration loading:
@dataclass
class ImportContext:
"""Tracks import state during loading."""
visited_files: set[str] = field(default_factory=set)
import_stack: list[str] = field(default_factory=list)
config_cache: dict[str, Any] = field(default_factory=dict)
import_chain: list[str] = field(default_factory=list)
Security Interfaces (config.security.base)¶
PathValidator¶
Abstract base class for validating path security:
class PathValidator(ABC):
"""Abstract base class for path security validation."""
@abstractmethod
def validate(self, path: Union[str, Path]) -> None:
"""Validate that a path is secure and accessible."""
pass
PathResolver¶
Abstract base class for resolving paths with security checks:
class PathResolver(ABC):
"""Abstract base class for path resolution."""
@abstractmethod
def resolve(self, path: str, base_path: Optional[Union[str, Path]] = None) -> str:
"""Resolve a path to an absolute path with security checks."""
pass
Usage Patterns¶
Using Default Implementations (Backward Compatibility)¶
The existing API continues to work without any changes:
from duckalog.config import load_config
# Standard usage - uses default implementations
config = load_config("catalog.yaml")
print(f"Loaded {len(config.views)} views")
Creating Custom Implementations¶
Custom Environment Processor¶
Create a custom environment processor that loads variables from a database:
from typing import Any, Dict
from duckalog.config.resolution.base import EnvProcessor
class DatabaseEnvProcessor(EnvProcessor):
"""Custom environment processor that loads from a database."""
def __init__(self, db_connection):
self.db = db_connection
def process(self, config_data: dict[str, Any], load_dotenv: bool = True) -> dict[str, Any]:
if not load_dotenv:
return config_data
# Load environment variables from database
env_vars = self._load_env_from_db()
# Apply to config data
return self._apply_env_vars(config_data, env_vars)
def _load_env_from_db(self) -> dict[str, str]:
# Custom logic to load from database
with self.db.cursor() as cursor:
cursor.execute("SELECT key, value FROM environment_variables")
return {row[0]: row[1] for row in cursor.fetchall()}
def _apply_env_vars(self, config_data: dict[str, Any], env_vars: dict[str, str]) -> dict[str, Any]:
# Apply environment variables to config
import os
os.environ.update(env_vars)
return config_data
Custom Path Resolver¶
Create a custom path resolver with enhanced security:
from duckalog.config.security.base import PathResolver, PathValidator
from duckalog.config.security.path import DefaultPathResolver, validate_path_security
from pathlib import Path
from typing import Union, Optional
class EnhancedSecurityPathResolver(PathResolver):
"""Path resolver with enhanced security validation."""
def __init__(self, allowed_patterns: list[str] = None):
self.allowed_patterns = allowed_patterns or []
self.delegate = DefaultPathResolver()
self.validator = DefaultPathValidator()
def resolve(self, path: str, base_path: Optional[Union[str, Path]] = None) -> str:
# Validate against allowed patterns first
if not self._is_pattern_allowed(path):
raise ValueError(f"Path pattern not allowed: {path}")
# Use delegate for standard resolution
resolved = self.delegate.resolve(path, base_path)
# Additional security validation
self.validator.validate(resolved)
return resolved
def _is_pattern_allowed(self, path: str) -> bool:
import re
for pattern in self.allowed_patterns:
if re.match(pattern, path):
return True
return False
Custom Config Loader¶
Create a custom config loader that supports additional file formats:
import toml # Requires tomli or similar library
from duckalog.config.loading.base import ConfigLoader
from typing import Any, Optional, Union
from pathlib import Path
class TomlConfigLoader(ConfigLoader):
"""Config loader that supports TOML format."""
def load(self, path: Union[str, Path], filesystem: Optional[Any] = None) -> dict[str, Any]:
if isinstance(path, str):
path = Path(path)
if not path.exists():
raise FileNotFoundError(f"Config file not found: {path}")
if filesystem is not None:
content = filesystem.open(str(path), "r").read()
else:
content = path.read_text()
# Parse TOML content
try:
return toml.loads(content)
except Exception as e:
raise ValueError(f"Failed to parse TOML config: {e}") from e
Injecting Custom Dependencies¶
Using the API with Custom Implementations¶
The new API supports dependency injection for advanced use cases:
from duckalog.config.api import load_config
from duckalog.config.resolution.imports import DefaultImportResolver, RequestContext
from duckalog.config.resolution.env import DefaultEnvProcessor, env_cache_scope
# Create custom context with enhanced capabilities
with env_cache_scope() as env_cache:
request_context = RequestContext(env_cache=env_cache)
# Use default resolver with custom context
resolver = DefaultImportResolver(context=request_context)
# Load config with custom resolver
config_data = {
"file_path": "catalog.yaml",
"filesystem": None,
"resolve_paths": True,
"load_sql_files": True,
"sql_file_loader": None,
"load_dotenv": True
}
resolved_config = resolver.resolve(config_data, request_context.import_context)
Testing with Mock Implementations¶
Dependency injection makes testing much easier by allowing you to inject mock implementations:
import pytest
from unittest.mock import Mock, MagicMock
from duckalog.config.resolution.base import EnvProcessor
from duckalog.config.resolution.imports import DefaultImportResolver, RequestContext
class MockEnvProcessor(EnvProcessor):
def process(self, config_data: dict[str, Any], load_dotenv: bool = True) -> dict[str, Any]:
# Mock environment processing
config_data["test_env"] = "mock_value"
return config_data
def test_config_loading_with_mock_env():
# Create mock context
mock_context = Mock()
mock_context.env_cache = Mock()
mock_context.import_context = Mock()
# Create resolver with mock context
resolver = DefaultImportResolver(context=mock_context)
# Test data
config_data = {
"file_path": "test.yaml",
"content": "views: []",
"load_dotenv": True
}
# Test with mock
result = resolver.resolve(config_data, mock_context.import_context)
# Verify mock was called
assert mock_context.import_context.visited_files.add.assert_called()
Migration Examples¶
Before: Custom Filesystem Implementation¶
# Old approach - required modifying core logic
class CustomFileSystem:
def open(self, path, mode='r'):
# Custom file system logic
pass
def exists(self, path):
# Custom existence check
pass
# Had to pass through custom parameters
config = load_config("catalog.yaml", filesystem=CustomFileSystem())
After: Dependency Injection with Custom Config Loader¶
# New approach - inject custom implementation
from duckalog.config.loading.base import ConfigLoader
class CustomFileSystemLoader(ConfigLoader):
def __init__(self, custom_fs):
self.fs = custom_fs
def load(self, path, filesystem=None):
# Use custom filesystem directly
if self.fs.exists(path):
content = self.fs.open(path).read()
import yaml
return yaml.safe_load(content)
return {}
# Use in testing or custom environments
custom_loader = CustomFileSystemLoader(CustomFileSystem())
# The system can now be extended without modifying core logic
Before: Alternative Environment Processing¶
# Old approach - had to modify environment variables globally
import os
os.environ.update(custom_env_vars) # Affects entire application
config = load_config("catalog.yaml")
After: Custom Environment Processor¶
# New approach - scoped environment processing
from duckalog.config.resolution.base import EnvProcessor
class ScopedEnvProcessor(EnvProcessor):
def __init__(self, scoped_env_vars):
self.scoped_env = scoped_env_vars
def process(self, config_data, load_dotenv=True):
# Apply scoped environment variables without affecting global state
original_env = os.environ.copy()
try:
os.environ.update(self.scoped_env)
return config_data # Process with scoped environment
finally:
os.environ.clear()
os.environ.update(original_env)
# Use without affecting global state
scoped_processor = ScopedEnvProcessor({"DATABASE_URL": "test://localhost"})
Before: Custom Path Resolution Strategies¶
# Old approach - limited customization options
config = load_config("catalog.yaml", resolve_paths=True) # Only boolean option
After: Custom Path Resolution¶
# New approach - inject custom path resolver
from duckalog.config.security.base import PathResolver
class CloudPathResolver(PathResolver):
def resolve(self, path, base_path=None):
if path.startswith("cloud://"):
# Custom cloud path resolution
return self._resolve_cloud_path(path)
else:
# Standard resolution
return self._resolve_standard_path(path, base_path)
def _resolve_cloud_path(self, path):
# Custom logic for cloud paths
pass
def _resolve_standard_path(self, path, base_path):
# Fallback to standard resolution
from duckalog.config.security.path import DefaultPathResolver
resolver = DefaultPathResolver()
return resolver.resolve(path, base_path)
# Can be integrated with the loading system
cloud_resolver = CloudPathResolver()
Benefits of Dependency Injection¶
1. Enhanced Testability¶
Dependency injection makes unit testing significantly easier:
- Isolate Components: Test individual components in isolation
- Mock Dependencies: Easily mock external dependencies
- Control State: Test with predictable, controlled state
- No Side Effects: Avoid modifying global state during tests
# Easy testing with mocked dependencies
def test_config_loading():
mock_loader = Mock(spec=ConfigLoader)
mock_loader.load.return_value = {"views": []}
# Test configuration loading logic with mock loader
# No actual file I/O required
2. Improved Extensibility¶
The architecture allows for seamless extension of functionality:
- Plugin Architecture: Add new loaders, resolvers, and processors as plugins
- Custom Implementations: Replace any component with custom logic
- Backward Compatibility: Extend without breaking existing code
- Feature Flags: Enable/disable features through dependency injection
# Easy to add new file format support
class XmlConfigLoader(ConfigLoader):
def load(self, path, filesystem=None):
# XML parsing logic
pass
# Register and use without touching core code
3. Better Separation of Concerns¶
Each component has a single, well-defined responsibility:
- Loading: Responsible only for reading data from sources
- Resolution: Handles only import resolution logic
- Security: Focuses solely on path validation and security
- Environment: Manages environment variable processing exclusively
4. Configuration Flexibility¶
Different environments can use different implementations:
# Development: Use local file system
dev_config = load_config("dev.yaml", loader=LocalFileLoader())
# Production: Use cloud storage
prod_config = load_config("prod.yaml", loader=CloudStorageLoader())
# Testing: Use in-memory loader
test_config = load_config("test.yaml", loader=MemoryLoader({"views": []}))
5. Performance Optimization¶
Dependency injection enables performance optimizations:
- Lazy Loading: Load dependencies only when needed
- Caching: Inject cached implementations
- Pooling: Use connection pools for database-backed loaders
- Async Support: Inject async implementations where needed
6. Maintenance and Debugging¶
The modular architecture makes maintenance easier:
- Isolated Bugs: Issues are contained to specific components
- Clear Dependencies: Easy to understand what each component needs
- Gradual Migration: Can update components incrementally
- Better Logging: Each component can provide detailed logging
Advanced Usage Patterns¶
Composition Root Pattern¶
Use a composition root to configure dependencies for your application:
class ConfigurationFactory:
"""Factory for creating configured configuration loaders."""
@staticmethod
def create_for_environment(env: str):
if env == "development":
env_processor = DefaultEnvProcessor()
path_resolver = DefaultPathResolver()
elif env == "production":
env_processor = ProductionEnvProcessor()
path_resolver = SecurePathResolver(allowed_roots=["/app/config"])
else:
env_processor = MemoryEnvProcessor()
path_resolver = InMemoryPathResolver()
return DefaultImportResolver(
context=RequestContext(
env_cache=EnvCache(),
import_context=ImportContext()
)
)
# Usage
factory = ConfigurationFactory()
resolver = factory.create_for_environment("production")
Middleware Pattern¶
Create middleware-like processors for configuration:
class ConfigProcessor(ABC):
@abstractmethod
def process(self, config_data: dict[str, Any]) -> dict[str, Any]:
pass
class ValidationProcessor(ConfigProcessor):
def process(self, config_data):
# Validation logic
return config_data
class LoggingProcessor(ConfigProcessor):
def process(self, config_data):
# Logging logic
return config_data
class PipelineConfigLoader(ConfigLoader):
def __init__(self, base_loader: ConfigLoader, processors: list[ConfigProcessor]):
self.base_loader = base_loader
self.processors = processors
def load(self, path, filesystem=None):
config = self.base_loader.load(path, filesystem)
for processor in self.processors:
config = processor.process(config)
return config
This dependency injection architecture provides a solid foundation for building flexible, testable, and maintainable configuration loading systems while preserving backward compatibility and enabling future enhancements.