Path Resolution and Security¶
Duckalog's path resolution feature automatically resolves relative file paths to absolute paths relative to the configuration file location, providing consistent behavior across different working directories while maintaining comprehensive security boundaries and cross-platform compatibility.
Overview¶
The path resolution feature addresses common challenges when working with file-based data sources and database attachments:
- Portability: Configurations can be moved between environments without breaking file references
- Consistency: Paths are resolved consistently regardless of the current working directory
- Security: Built-in validation prevents directory traversal attacks through comprehensive boundary checking
- Flexibility: Works with relative paths, absolute paths, and remote URIs with appropriate security handling
How It Works¶
Implementation Details¶
Path resolution is implemented in the duckalog.config package through the validators submodule:
- Detection:
is_relative_path()identifies relative vs absolute paths - Resolution:
resolve_relative_path()resolves paths relative to config directory - Security:
validate_path_security()enforces boundary validation - Normalization:
normalize_path_for_sql()ensures SQL-safe paths
Automatic Detection and Resolution¶
When path resolution is enabled, Duckalog automatically:
- Detects whether a path is relative or absolute using platform-aware logic
- Resolves relative paths against the configuration file's directory
- Validates the resolved path against security boundaries
- Normalizes paths for SQL usage and cross-platform compatibility
- Checks file accessibility before operations
Security Boundary Validation¶
Security Principles:
- Config Directory Anchoring: All relative paths resolve relative to config file location
- Controlled Parent Access: Limited parent directory navigation allowed for shared resources
- Traversal Protection: Blocks dangerous patterns like ../../../etc/passwd
- System Directory Protection: Prevents access to sensitive system directories
- Cross-Platform Security: Consistent validation across Windows, macOS, and Linux
Validation Process:
flowchart TD
A[Input Path] --> B{Relative?}
B -->|No| C[Security Check Absolute]
B -->|Yes| D[Resolve Relative to Config Dir]
D --> E[Security Boundary Validation]
C --> F{Within Allowed Roots?}
E --> F
F -->|Allowed| G[Normalized Path]
F -->|Blocked| H[Security Error]
G --> I[Accessibility Check]
I -->|Accessible| G
I -->|Inaccessible| J[Warning/Error]
Supported Path Types¶
| Path Type | Example | Resolution Behavior | Security Validation |
|---|---|---|---|
| Relative | data/file.parquet |
/config/dir/data/file.parquet |
✅ Full validation |
| Parent Directory | ../shared/data.parquet |
/parent/dir/shared/data.parquet |
✅ Boundary check |
| Absolute Unix | /absolute/path/file.parquet |
Unchanged | ✅ Boundary check |
| Absolute Windows | C:\data\file.parquet |
Unchanged | ✅ Boundary check |
| Remote URI | s3://bucket/file.parquet |
Unchanged | ✅ No path resolution |
| Dangerous Pattern | ../../../etc/passwd |
❌ Blocked | ❌ Security error |
Configuration¶
Enabling Path Resolution¶
Path resolution is controlled by the resolve_paths parameter when loading configuration:
from duckalog import load_config
# Enable path resolution (default)
config = load_config("catalog.yaml", resolve_paths=True)
# Disable path resolution
config = load_config("catalog.yaml", resolve_paths=False)
Command Line Interface¶
The CLI automatically enables path resolution:
# Path resolution enabled by default
duckalog build catalog.yaml
# Generate SQL with path resolution
duckalog generate-sql catalog.yaml
Usage Examples¶
Basic Relative Path Resolution¶
Project Structure:
analytics/
├── catalog.yaml
├── data/
│ ├── events.parquet
│ └── users.parquet
└── databases/
└── reference.duckdb
Configuration (catalog.yaml):
version: 1
duckdb:
database: catalog.duckdb
attachments:
duckdb:
- alias: refdata
path: ./databases/reference.duckdb
read_only: true
views:
- name: events
source: parquet
uri: data/events.parquet
- name: users
source: parquet
uri: ./data/users.parquet
- name: user_events
sql: |
SELECT
u.user_id,
u.name,
e.event_type,
e.timestamp
FROM users u
JOIN events e ON u.user_id = e.user_id
Result: All relative paths are resolved to absolute paths relative to catalog.yaml.
Parent Directory Access¶
Project Structure:
company/
├── shared/
│ └── reference_data/
│ └── customers.parquet
└── analytics/
└── catalog.yaml
└── data/
└── events.parquet
Configuration (company/analytics/catalog.yaml):
version: 1
duckdb:
database: catalog.duckdb
views:
- name: events
source: parquet
uri: ./data/events.parquet
- name: customers
source: parquet
uri: ../shared/reference_data/customers.parquet
- name: customer_events
sql: |
SELECT
c.customer_id,
c.name,
e.event_type,
e.timestamp
FROM customers c
JOIN events e ON c.customer_id = e.customer_id
Result:
- ./data/events.parquet → /company/analytics/data/events.parquet
- ../shared/reference_data/customers.parquet → /company/shared/reference_data/customers.parquet
Mixed Path Types¶
version: 1
duckdb:
database: catalog.duckdb
views:
# Local relative path - will be resolved
- name: local_data
source: parquet
uri: ./data/local.parquet
# Absolute path - unchanged
- name: absolute_data
source: parquet
uri: /absolute/path/data.parquet
# Remote URI - unchanged
- name: remote_data
source: parquet
uri: s3://my-bucket/data/remote.parquet
# Windows path - unchanged if on Windows
- name: windows_data
source: parquet
uri: D:\data\windows.parquet
Attachment Path Resolution¶
version: 1
duckdb:
database: catalog.duckdb
attachments:
duckdb:
- alias: ref_db
path: ./reference.databases/analytics.duckdb
read_only: true
sqlite:
- alias: legacy_db
path: ../legacy/production.db
views:
- name: ref_data
source: duckdb
database: ref_db
table: customers
- name: legacy_data
source: sqlite
database: legacy_db
table: users
Security Features¶
Directory Traversal Protection¶
Path resolution automatically blocks dangerous path patterns:
# ❌ BLOCKED - Excessive parent directory traversal
views:
- name: malicious
source: parquet
uri: ../../../../etc/passwd
Error: Path resolution violates security rules
Dangerous Location Detection¶
Access to system directories is automatically blocked:
# ❌ BLOCKED - Attempts to access system directories
views:
- name: system_config
source: parquet
uri: ../etc/config.parquet # Resolves to /etc/config.parquet
Reasonable Traversal Allowance¶
Limited parent directory access is permitted for legitimate use cases:
# ✅ ALLOWED - Reasonable parent directory access
views:
- name: shared_data
source: parquet
uri: ../shared/data.parquet # Allowed: 1 level up
- name: project_root_data
source: parquet
uri: ../../project_data/common.parquet # Allowed: 2 levels up
Migration Guide¶
From Relative Paths¶
Before (manual path management):
version: 1
views:
- name: data
source: parquet
uri: /absolute/path/to/data.parquet # Hard-coded absolute path
After (with path resolution):
version: 1
views:
- name: data
source: parquet
uri: ./data.parquet # Relative path - automatically resolved
Migration Steps¶
- Update Configuration Files: Replace absolute paths with relative paths where appropriate
- Test Resolution: Use
duckalog validate catalog.yamlto ensure paths resolve correctly - Enable Resolution: Ensure
resolve_paths=True(default setting) - Verify Results: Use
duckalog generate-sql catalog.yamlto inspect resolved paths
API Usage¶
Programmatic Path Resolution¶
from duckalog.path_resolution import (
resolve_relative_path,
is_relative_path,
validate_path_security
)
from pathlib import Path
# Check if a path is relative
is_rel = is_relative_path("data/file.parquet") # True
is_rel = is_relative_path("/absolute/file.parquet") # False
# Resolve a relative path
config_dir = Path("/project/config")
resolved = resolve_relative_path("data/file.parquet", config_dir)
# Returns: "/project/config/data/file.parquet"
# Validate path security
is_safe = validate_path_security("data/file.parquet", config_dir)
# Returns: True
Loading Configuration with Path Resolution¶
from duckalog import load_config, ConfigError
try:
# Load with path resolution enabled (default)
config = load_config("catalog.yaml")
# Load with path resolution explicitly disabled
config = load_config("catalog.yaml", resolve_paths=False)
except ConfigError as e:
print(f"Configuration error: {e}")
Troubleshooting¶
Common Issues¶
Path Resolution Failed¶
Error: ConfigError: Path resolution failed
Solutions: 1. Check if the path exists and is accessible 2. Verify the path doesn't violate security rules 3. Ensure the configuration file path is correct 4. Use absolute paths for system-level files
Security Violation¶
Error: ValueError: Path resolution violates security rules
Solutions:
1. Reduce the number of parent directory traversals (../)
2. Avoid system directories (/etc/, /usr/, etc.)
3. Use relative paths within reasonable bounds
4. Consider moving data files to a safer location
File Not Found After Resolution¶
Error: DuckDB Error: Failed to open file
Solutions:
1. Verify the resolved path points to an existing file
2. Check file permissions
3. Ensure the file is not a directory
4. Use validate_file_accessibility() to check before loading
Debugging Path Resolution¶
from duckalog.path_resolution import (
is_relative_path,
resolve_relative_path,
detect_path_type
)
from pathlib import Path
# Debug path detection
path = "data/file.parquet"
print(f"Path type: {detect_path_type(path)}") # "relative"
print(f"Is relative: {is_relative_path(path)}") # True
# Debug resolution
config_dir = Path("/project/config")
try:
resolved = resolve_relative_path(path, config_dir)
print(f"Resolved path: {resolved}")
except ValueError as e:
print(f"Resolution failed: {e}")
Best Practices¶
Configuration Structure¶
project/
├── config/
│ ├── catalog.yaml
│ └── catalog-dev.yaml
├── data/
│ ├── raw/
│ ├── processed/
│ └── external/
├── databases/
│ ├── reference.duckdb
│ └── cache.duckdb
└── sql/
├── views/
└── procedures/
Recommended Path Patterns:
- Use ./data/ for project-specific data
- Use ../shared/ for shared project resources
- Use ./databases/ for local database files
- Avoid deep parent directory navigation
Environment-Specific Configurations¶
# config/catalog.yaml
version: 1
duckdb:
database: catalog.duckdb
views:
- name: local_data
source: parquet
uri: ./data/local.parquet # Resolved relative to this config
Testing Path Resolution¶
# Test resolution before building
from duckalog import load_config, generate_sql
try:
# Load and validate with resolution
config = load_config("catalog.yaml")
# Generate SQL to see resolved paths
sql = generate_sql("catalog.yaml")
print("Generated SQL with resolved paths:")
print(sql)
except Exception as e:
print(f"Path resolution issue: {e}")
Performance Considerations¶
Path resolution adds minimal overhead to configuration loading:
- CPU Impact: Negligible for typical configurations
- Memory Impact: Minimal additional memory usage
- I/O Impact: File system access only for path resolution
- Caching: Resolved paths are cached in configuration objects
Cross-Platform Compatibility¶
Windows vs Unix Paths¶
Path resolution handles platform differences automatically:
# Works on both Windows and Unix
views:
- name: cross_platform_data
source: parquet
uri: ./data/file.parquet
Best Practices for Cross-Platform¶
- Use forward slashes in configuration files (works everywhere)
- Avoid platform-specific paths when possible
- Test on target platforms before deployment
- Use environment variables for platform-specific configurations
Last updated: Duckalog v0.2.0