Migration Guide: Configuration Architecture Refactor¶
This guide helps developers understand the modular configuration architecture refactor. The refactoring improves maintainability, eliminates circular dependencies, and introduces dependency injection patterns while maintaining backward compatibility.
Implementation Status
Some interfaces described in this guide (notably ConfigLoader and SQLFileLoader from duckalog.config.loading.base) represent the intended architecture but are not yet fully implemented in the current codebase. The EnvProcessor, ImportResolver, PathValidator, and PathResolver protocols are available in duckalog.config.resolution.base and duckalog.config.security.base. For SQL file loading, use duckalog.sql_file_loader.SQLFileLoader directly.
Table of Contents¶
- What Changed
- Migration Paths
- Before/After Examples
- Step-by-Step Migration
- Testing Migration
- Rollback Strategy
- FAQ
What Changed¶
Overview of Architectural Refactoring¶
The configuration system has been restructured from a monolithic 1670-line loader.py module into focused, single-responsibility modules:
Old Architecture¶
config/
├── loader.py # Monolithic configuration loading (removed)
├── __init__.py # Re-exports and utilities
├── models.py # Pydantic models
├── interpolation.py # Environment variable processing (removed)
└── validators.py # Path validation and utilities
New Architecture¶
config/
├── api.py # Public API orchestration layer
├── loading/ # SQL file loading
│ ├── __init__.py
│ └── sql.py # SQL file loading
├── resolution/ # Path and import resolution
│ ├── __init__.py
│ ├── base.py # Abstract base classes and protocols
│ ├── env.py # Environment variable processing
│ └── imports.py # Import resolution logic
├── security/ # Path validation and security
│ ├── __init__.py
│ ├── base.py # Abstract security classes
│ └── path.py # Path security implementation
├── __init__.py # Backward-compatible re-exports
├── models.py # Pydantic models (unchanged)
└── validators.py # Path validation utilities
Why the Changes Were Made¶
- Eliminated Circular Dependencies: Resolved circular imports by restructuring module boundaries between configuration loading and resolution components
- Single Responsibility Principle: Each module now has a focused, well-defined responsibility
- Improved Testability: Dependency injection enables better mocking and testing
- Enhanced Extensibility: Abstract base classes allow custom implementations
- Performance Optimizations: Request-scoped caching reduces redundant operations
- Maintainability: Clearer boundaries make the code easier to understand and modify
Timeline and Compatibility¶
- Phase 1 (Complete): New modular structure implemented alongside legacy code
- Phase 2 (Complete): Legacy modules (
loader.py,interpolation.py) removed; full backward compatibility maintained through re-exports inconfig/__init__.py - Phase 3 (Current): Stable modular architecture with dependency injection and caching
- Migration Status: The refactor is complete. Existing code continues to work unchanged.
Migration Paths¶
No Changes Needed - What Continues Unchanged¶
The following patterns continue to work without any modifications:
1. Standard Configuration Loading¶
# Continues to work exactly as before
from duckalog.config import load_config, Config
config = load_config("catalog.yaml")
print(config.version)
print(len(config.views))
2. Path Resolution Utilities¶
# All existing imports and functions work unchanged
from duckalog.config import (
is_relative_path,
resolve_relative_path,
validate_path_security,
)
if is_relative_path("./data/file.parquet"):
abs_path = resolve_relative_path("./data/file.parquet")
validate_path_security(abs_path)
3. Basic Usage in Python API¶
# High-level API calls work unchanged
from duckalog import generate_sql, validate_config
sql = generate_sql("catalog.yaml")
validate_config("catalog.yaml")
Recommended Updates - What to Update for Better Performance¶
1. Use New Dependency Injection Patterns¶
# Old approach (still works)
from duckalog.config import load_config
config = load_config("catalog.yaml")
# New approach (recommended for custom implementations)
from duckalog.config.api import load_config as api_load_config
from duckalog.config.loading import ConfigLoader
from duckalog.config.resolution import ImportResolver
class MyCustomLoader(ConfigLoader):
def load(self, path, filesystem=None):
# Custom loading logic
return custom_config_data
config = api_load_config(
"catalog.yaml",
filesystem=custom_filesystem,
import_resolver=custom_resolver
)
2. Leverage Request-Scoped Caching¶
# Old approach (no explicit caching control)
from duckalog.config import load_config
config1 = load_config("catalog.yaml")
config2 = load_config("catalog.yaml") # Loads again
# New approach (explicit caching scope)
from duckalog.config.resolution.imports import request_cache_scope
with request_cache_scope():
config1 = load_config("catalog.yaml")
config2 = load_config("catalog.yaml") # Uses cache within scope
3. Use Abstract Base Classes for Extensions¶
# New: Implement custom filesystem with proper interface
from duckalog.config.loading.base import SQLFileLoader
from duckalog.config.security.base import PathValidator
class SecureSQLLoader(SQLFileLoader):
def load_sql(self, path, filesystem=None):
# Custom secure SQL loading
return secure_sql_content
class StrictPathValidator(PathValidator):
def validate(self, path):
# Custom strict validation
if not self.is_allowed_path(path):
raise SecurityError(f"Path not allowed: {path}")
Breaking Changes - Actual Breaking Changes¶
There are currently no breaking changes. All existing code continues to work unchanged through backward-compatible re-exports in duckalog.config.__init__. The monolithic loader.py and interpolation.py modules have been removed; their functionality is now provided by the modular structure under duckalog.config.api, duckalog.config.resolution, and duckalog.config.loading.
Before/After Examples¶
Import Patterns¶
Old Import Patterns (Still Work)¶
# Public API imports (unchanged)
from duckalog.config import load_config, Config, ConfigError
# Note: The monolithic loader.py and interpolation.py modules were removed.
# Their functionality is now provided by the modular structure under
# duckalog.config.api, duckalog.config.resolution, and duckalog.config.loading.
New Import Patterns (Recommended)¶
# Use the new modular structure
from duckalog.config.api import load_config
from duckalog.config.resolution.env import DefaultEnvProcessor
from duckalog.config.loading.base import ConfigLoader, SQLFileLoader
from duckalog.config.security.base import PathValidator
# Abstract base classes for extensions
from duckalog.config.resolution.base import ImportResolver, ImportContext
from duckalog.config.loading.base import ConfigLoader, SQLFileLoader
Custom Implementation Examples¶
Old Custom Filesystem Implementation¶
# Old approach: Pass custom filesystem directly to load_config
from duckalog.config import load_config
import fsspec
class CustomFilesystem:
def open(self, path, mode='r'):
# Custom filesystem logic
pass
# Use via load_config filesystem parameter
config = load_config(
"catalog.yaml",
filesystem=CustomFilesystem()
)
New Custom Filesystem Implementation¶
# New approach: Clean dependency injection
from duckalog.config.api import load_config
from duckalog.config.loading.base import ConfigLoader
from duckalog.config.resolution.base import ImportResolver
class S3ConfigLoader(ConfigLoader):
def __init__(self, s3_client):
self.s3_client = s3_client
def load(self, path, filesystem=None):
# Clean S3 loading implementation
bucket, key = self._parse_s3_path(path)
content = self.s3_client.get_object(Bucket=bucket, Key=key)['Body'].read()
return yaml.safe_load(content)
class CustomImportResolver(ImportResolver):
def __init__(self, config_loader):
self.config_loader = config_loader
def resolve(self, config_data, context):
# Custom import resolution logic
return resolved_config
# Clean usage
s3_loader = S3ConfigLoader(s3_client)
custom_resolver = CustomImportResolver(s3_loader)
config = load_config(
"s3://my-bucket/config.yaml",
filesystem=s3_loader,
import_resolver=custom_resolver
)
Testing Examples¶
Old Testing Pattern¶
# Old approach: Mock internal functions (no longer available)
from unittest.mock import patch
@patch('duckalog.config.api._resolve_paths_in_config')
def test_config_loading(mock_resolve):
mock_resolve.return_value = resolved_config
config = load_config("test.yaml")
assert config is not None
New Testing Pattern¶
# New approach: Use dependency injection for clean testing
from unittest.mock import Mock
from duckalog.config.api import load_config
from duckalog.config.loading.base import ConfigLoader
class MockConfigLoader(ConfigLoader):
def __init__(self, mock_data):
self.mock_data = mock_data
def load(self, path, filesystem=None):
return self.mock_data
def test_config_loading_with_di():
mock_loader = MockConfigLoader({"version": 1, "views": []})
config = load_config("any-path.yaml", filesystem=mock_loader)
assert config.version == 1
assert len(config.views) == 0
Step-by-Step Migration¶
Step 1: Adopt New Import Patterns (Optional)¶
Gradually update imports to use the new modular structure:
# Step 1a: Update new code to use new imports
# Instead of:
# from duckalog.config.loader import _load_config_from_local_file
# Use:
from duckalog.config.api import load_config
# Step 1b: For extensions, use abstract base classes
from duckalog.config.loading.base import ConfigLoader, SQLFileLoader
from duckalog.config.resolution.base import ImportResolver
Step 2: Implement Custom Filesystems with New Patterns¶
If you have custom filesystem implementations:
# Step 2a: Create a class that implements ConfigLoader
from duckalog.config.loading.base import ConfigLoader
class MyCustomFilesystem(ConfigLoader):
def load(self, path, filesystem=None):
# Your custom loading logic
return config_data
def open(self, path, mode='r'):
# File-like interface implementation
pass
# Step 2b: Use it with the new API
from duckalog.config.api import load_config
my_fs = MyCustomFilesystem()
config = load_config("my-path.yaml", filesystem=my_fs)
Step 3: Use Request-Scoped Caching for Performance¶
For applications that load multiple configs:
# Step 3a: Wrap related operations in cache scope
from duckalog.config.resolution.imports import request_cache_scope
with request_cache_scope():
config1 = load_config("base.yaml")
config2 = load_config("views.yaml") # Shares cache with config1
config3 = load_config("analytics.yaml") # Shares cache
# Step 3b: For custom resolvers, respect the cache context
class CachedImportResolver(ImportResolver):
def resolve(self, config_data, context):
# Use context.import_context.config_cache for caching
cache_key = self._get_cache_key(config_data)
if cache_key in context.import_context.config_cache:
return context.import_context.config_cache[cache_key]
resolved = self._do_resolve(config_data)
context.import_context.config_cache[cache_key] = resolved
return resolved
Step 4: Implement Custom Security Validation¶
For enhanced security requirements:
# Step 4a: Implement custom path validator
from duckalog.config.security.base import PathValidator
class StrictSecurityValidator(PathValidator):
def __init__(self, allowed_paths, blocked_patterns):
self.allowed_paths = allowed_paths
self.blocked_patterns = blocked_patterns
def validate(self, path):
# Custom security logic
path_str = str(path)
# Check blocked patterns
for pattern in self.blocked_patterns:
if pattern in path_str:
raise SecurityError(f"Path blocked by pattern: {pattern}")
# Check allowed paths
if not any(path_str.startswith(allowed) for allowed in self.allowed_paths):
raise SecurityError(f"Path not in allowed list: {path_str}")
# Step 4b: Use with config loading
from duckalog.config.api import load_config
validator = StrictSecurityValidator(
allowed_paths=["/safe/data/", "/config/"],
blocked_patterns=["..", "/etc/", "/var/"]
)
config = load_config(
"config.yaml",
path_validator=validator
)
Step 5: Update Custom Environment Processing¶
For specialized environment variable handling:
# Step 5a: Implement custom environment processor
from duckalog.config.resolution.base import EnvProcessor
from duckalog.config.resolution.env import EnvCache
class CustomEnvProcessor(EnvProcessor):
def __init__(self, custom_mappings=None):
self.custom_mappings = custom_mappings or {}
self.cache = EnvCache()
def process(self, config_data, load_dotenv=True):
# Custom environment processing logic
processed = config_data.copy()
# Apply custom mappings
for key, value in self.custom_mappings.items():
if key in processed:
processed[key] = self._substitute_vars(value)
return processed
# Step 5b: Use with config loading
from duckalog.config.api import load_config
env_processor = CustomEnvProcessor({
"database": "${DB_NAME}_${ENVIRONMENT}",
"cache_dir": "/tmp/${PROJECT_NAME}/cache"
})
config = load_config(
"config.yaml",
env_processor=env_processor
)
Testing Migration¶
Update Tests for New Architecture¶
1. Mock Dependency Injection¶
# Old testing approach
from unittest.mock import patch
@patch('duckalog.config.loader._load_yaml_file')
def test_old_style(mock_load):
mock_load.return_value = {"version": 1}
# Test logic...
# New testing approach
from unittest.mock import Mock
from duckalog.config.api import load_config
from duckalog.config.loading.base import ConfigLoader
def test_new_style():
mock_loader = Mock(spec=ConfigLoader)
mock_loader.load.return_value = {"version": 1}
config = load_config("any-path.yaml", filesystem=mock_loader)
assert config.version == 1
2. Test Custom Implementations¶
# Test custom filesystem implementation
class TestCustomFilesystem:
def test_custom_loader_integration(self):
from duckalog.config.loading.base import ConfigLoader
from duckalog.config.api import load_config
class TestLoader(ConfigLoader):
def load(self, path, filesystem=None):
return {"version": 1, "views": []}
loader = TestLoader()
config = load_config("test.yaml", filesystem=loader)
assert config.version == 1
assert len(config.views) == 0
3. Test Caching Behavior¶
# Test request-scoped caching
def test_caching_behavior():
from duckalog.config.resolution.imports import request_cache_scope
from unittest.mock import Mock, call
mock_loader = Mock()
mock_loader.load.return_value = {"version": 1}
with request_cache_scope():
load_config("test.yaml", filesystem=mock_loader)
load_config("test.yaml", filesystem=mock_loader) # Should use cache
# Should only be called once due to caching
assert mock_loader.load.call_count == 1
4. Test Security Features¶
# Test custom security validation
def test_security_validation():
from duckalog.config.security.base import PathValidator
from duckalog.config.api import load_config
class TestValidator(PathValidator):
def validate(self, path):
if "../../../etc/passwd" in str(path):
raise SecurityError("Path traversal detected")
validator = TestValidator()
with pytest.raises(SecurityError):
load_config("../../../etc/passwd", path_validator=validator)
Performance Testing with Caching¶
# Benchmark caching performance
import time
from duckalog.config.resolution.imports import request_cache_scope
def benchmark_cached_vs_uncached():
# Test without caching
start = time.time()
for i in range(100):
config = load_config("complex-config.yaml")
uncached_time = time.time() - start
# Test with caching
start = time.time()
with request_cache_scope():
for i in range(100):
config = load_config("complex-config.yaml")
cached_time = time.time() - start
print(f"Uncached: {uncached_time:.2f}s")
print(f"Cached: {cached_time:.2f}s")
print(f"Speedup: {uncached_time/cached_time:.1f}x")
Rollback Strategy¶
Temporary Revert Options¶
If you encounter issues with the new architecture, you have several rollback options:
1. Use Public API¶
# The public API continues to work unchanged
from duckalog.config import load_config
from duckalog.config.api import load_config
config = load_config("catalog.yaml")
2. Environment Variable for Legacy Mode¶
# Set environment variable to use legacy implementation (if available)
import os
os.environ['DUCKALOG_USE_LEGACY_CONFIG'] = 'true'
from duckalog.config import load_config
config = load_config("catalog.yaml") # Will use legacy implementation
3. Direct Module Import¶
# Import directly from the new API module
from duckalog.config.api import load_config
config = load_config("catalog.yaml")
Feature Flags for Gradual Migration¶
# Use feature flags to control migration
class ConfigMigrationManager:
def __init__(self):
self.use_new_architecture = os.getenv('USE_NEW_CONFIG_ARCH', 'false').lower() == 'true'
self.enable_caching = os.getenv('ENABLE_CONFIG_CACHE', 'false').lower() == 'true'
def load_config(self, path, **kwargs):
if self.use_new_architecture:
from duckalog.config.api import load_config as new_load
if self.enable_caching:
from duckalog.config.resolution.imports import request_cache_scope
with request_cache_scope():
return new_load(path, **kwargs)
else:
return new_load(path, **kwargs)
else:
from duckalog.config import load_config as legacy_load
return legacy_load(path, **kwargs)
# Usage
migration_manager = ConfigMigrationManager()
config = migration_manager.load_config("catalog.yaml")
Compatibility Layers¶
# Create compatibility layer for smooth transition
class CompatibilityLayer:
@staticmethod
def load_config_with_fallback(path, **kwargs):
try:
# Try new architecture first
from duckalog.config.api import load_config
return load_config(path, **kwargs)
except Exception as e:
# Fall back to legacy if new fails
import warnings
warnings.warn(f"New config architecture failed: {e}. Falling back to legacy.",
DeprecationWarning)
from duckalog.config import load_config as legacy_load
return legacy_load(path, **kwargs)
# Usage
config = CompatibilityLayer.load_config_with_fallback("catalog.yaml")
FAQ¶
Common Questions About the Migration¶
Q: Do I need to update my existing code?¶
A: No. All existing code continues to work unchanged. The migration is optional for now, and you can gradually adopt new patterns when you're ready.
Q: Will there be performance improvements?¶
A: Yes, the new architecture includes request-scoped caching that can significantly improve performance for applications that load multiple configurations or handle complex import chains.
Q: What are the benefits of dependency injection?¶
A: Dependency injection makes your code more testable, enables better mocking, allows custom implementations, and reduces coupling between components.
Q: Can I still use internal functions like _load_config_from_local_file?¶
A: No. The monolithic loader.py module was removed as part of the refactor. Use duckalog.config.api.load_config() or the re-exported duckalog.config.load_config() instead.
Q: How do I implement custom filesystems now?¶
A: Implement the ConfigLoader abstract base class from duckalog.config.loading.base and pass it to load_config() via the filesystem parameter.
Q: What happened to the circular dependency issues?¶
A: The circular dependency between configuration loading and resolution modules has been eliminated by restructuring module boundaries and introducing clean abstraction layers.
Q: Can I still use environment variable interpolation?¶
A: Yes, environment variable interpolation continues to work exactly as before. The new architecture provides the EnvProcessor protocol for custom environment processing if needed.
Q: How does request-scoped caching work?¶
A: The cache is active only within a specific load operation context and is cleared afterward, preventing memory leaks while improving performance for related operations.
Q: Are there any breaking changes in this release?¶
A: No, there are no breaking changes in this release. All changes are additive and maintain full backward compatibility.
Q: When will the legacy code be removed?¶
A: The monolithic loader.py and interpolation.py modules were already removed. Their functionality is provided by the modular structure. The public API (duckalog.config.load_config, duckalog.config.Config, etc.) continues to work unchanged through re-exports.
Troubleshooting Migration Issues¶
Issue: Import Error for New Modules¶
Problem: ImportError: No module named 'duckalog.config.loading'
Solution: Ensure you're using the latest version of Duckalog. The new module structure is available in version X.X.X and later.
Issue: Custom Filesystem Not Working¶
Problem: Custom filesystem implementation not being called
Solution: Make sure your custom filesystem implements the ConfigLoader interface:
from duckalog.config.loading.base import ConfigLoader
class MyFilesystem(ConfigLoader):
def load(self, path, filesystem=None):
# Your implementation
pass
Issue: Caching Not Working¶
Problem: Performance not improving with caching
Solution: Ensure you're using the request_cache_scope context manager:
from duckalog.config.resolution.imports import request_cache_scope
with request_cache_scope():
config1 = load_config("file1.yaml")
config2 = load_config("file2.yaml") # Will share cache
Issue: Security Validation Errors¶
Problem: Getting security validation errors for valid paths
Solution: Check if you're using a custom path validator that might be too restrictive. The default security validation should work for most use cases.
Issue: Performance Regression¶
Problem: New architecture is slower than expected
Solution: Enable request-scoped caching and ensure you're not unnecessarily re-instantiating custom loaders or validators.
Issue: Circular Dependency Errors¶
Problem: Still seeing circular dependency warnings
Solution: This should not happen with the new architecture. If you see this, ensure you're not mixing old and new import patterns in a way that recreates the circular dependency.
Getting Help¶
If you encounter issues during migration:
- Check the documentation: Review the latest documentation for updated examples
- Search existing issues: Check if someone else has encountered similar problems
- Create a minimal example: Create a simple test case that demonstrates the issue
- Include version information: Specify which version of Duckalog you're using
- Provide error traces: Include full stack traces and error messages
Next Steps¶
After completing your migration:
- Monitor performance: Use the new caching features to optimize your application
- Clean up legacy imports: Gradually remove deprecated import patterns
- Leverage new features: Take advantage of dependency injection for better testability
- Update documentation: Document any custom implementations using the new patterns
- Plan for future releases: Prepare for the eventual removal of legacy code
The new architecture provides a solid foundation for future enhancements while maintaining the stability and compatibility that existing users depend on.