Migration Guide: Path Resolution¶
This guide helps existing Duckalog users migrate from absolute paths to relative paths to take advantage of the new automatic path resolution feature.
What's Changed¶
Before (Legacy Behavior)¶
- Relative paths like
"data/file.parquet"were passed directly to DuckDB - Users had to ensure they ran commands from the correct working directory
- Paths would fail if the working directory changed
- Absolute paths were required for reliable operation
After (New Behavior)¶
- Relative paths are automatically resolved relative to the configuration file's directory
- Commands work consistently regardless of working directory
- Paths are resolved securely with validation
- Both relative and absolute paths continue to work
Migration Benefits¶
✅ Portability¶
# Old way: Required specific working directory
views:
- name: users
source: parquet
uri: "/home/project/data/users.parquet" # Only works from specific location
# New way: Works from anywhere
views:
- name: users
source: parquet
uri: "data/users.parquet" # Works from any working directory
✅ Simplified Project Structure¶
# Recommended structure:
# my-project/
# ├── catalog.yaml # Configuration
# └── data/ # Data files
# ├── users.parquet
# └── events.parquet
# In catalog.yaml:
views:
- name: users
source: parquet
uri: "data/users.parquet" # Simple and clear
✅ Better Collaboration¶
- Team members can work from different locations
- No need to coordinate working directories
- Version control friendly paths
Migration Process¶
Step 1: Assess Current Configuration¶
First, identify all absolute paths in your configuration:
# Search for absolute paths (Unix/Linux/macOS)
grep -r "\"uri.*:/" your-config.yaml
# Search for absolute paths (Windows)
grep -r "\"uri.*[A-Z]:" your-config.yaml
Step 2: Plan Path Changes¶
Create a mapping of old to new paths:
| Old Path | New Path | Notes |
|---|---|---|
/home/user/data/users.parquet |
data/users.parquet |
Move data relative to config |
C:\project\data\users.parquet |
data/users.parquet |
Same structure, different separator |
/absolute/reference.db |
../shared/reference.db |
If shared with other projects |
Step 3: Reorganize Data Files¶
Move your data files to be relative to your configuration:
# Create data directory structure
mkdir -p data processed reference
# Move files to appropriate locations
mv /path/to/users.parquet data/
mv /path/to/events.parquet processed/
mv /path/to/reference.db reference/
Step 4: Update Configuration¶
Update your configuration file with relative paths:
Before:
version: 1
views:
- name: users
source: parquet
uri: "/home/user/project/data/users.parquet"
- name: events
source: parquet
uri: "/home/user/project/data/events.parquet"
attachments:
duckdb:
- alias: reference
path: "/home/user/project/reference/reference.db"
After:
version: 1
views:
- name: users
source: parquet
uri: "data/users.parquet"
- name: events
source: parquet
uri: "processed/events.parquet"
attachments:
duckdb:
- alias: reference
path: "reference/reference.db"
Step 5: Test Migration¶
Validate that your updated configuration works:
# Test from the configuration directory
cd /path/to/config
duckalog validate catalog.yaml
# Test from a different working directory
cd /tmp
duckalog validate /path/to/config/catalog.yaml
Both should work identically.
Advanced Migration Scenarios¶
Scenario 1: Shared Data Between Projects¶
Challenge: You have data shared between multiple projects
Solution: Use relative paths with reasonable parent directory traversal
# Project A
views:
- name: shared_lookup
source: parquet
uri: "../shared-data/lookup.parquet" # Relative to project A
# Project B
views:
- name: shared_lookup
source: parquet
uri: "../shared-data/lookup.parquet" # Same path, works from both
Scenario 2: Environment-Specific Paths¶
Challenge: Different paths for dev/staging/production
Solution: Use environment variables with relative path defaults
# For development (default to relative paths)
attachments:
duckdb:
- alias: reference
path: "${env:REFERENCE_DB_PATH:./reference.duckdb}"
read_only: true
# For production (can override with absolute path if needed)
# export REFERENCE_DB_PATH=/var/data/reference.duckdb
Scenario 3: Windows to Cross-Platform¶
Challenge: Moving from Windows absolute paths to cross-platform relative paths
Before (Windows):
After (Cross-platform):
Rollback Plan¶
If you encounter issues during migration:
- Keep backups: Save your original configuration before making changes
- Test incrementally: Migrate one path at a time and test each change
- Verify functionality: Run your full workflow after each change
To rollback:
# Restore original configuration
cp catalog.yaml.backup catalog.yaml
# Restore original data organization
mv data/* /original/data/location/
Verification Checklist¶
After migration, verify:
- [ ]
duckalog validateworks from any working directory - [ ]
duckalog buildcreates the expected views - [ ]
duckalog generate-sqlproduces correct SQL with absolute paths - [ ] Web UI (if used) loads and functions correctly
- [ ] All data sources are accessible
- [ ] No security violations are reported
Troubleshooting¶
Issue: "File does not exist"¶
Cause: Data files not moved to correct relative location
Solution:
# Check where paths resolve to
python3 -c "
from duckalog.config import load_config
config = load_config('catalog.yaml')
for view in config.views:
if view.uri:
print(f'{view.name}: {view.uri}')
"
Issue: "Path resolution violates security rules"¶
Cause: Path tries to escape reasonable boundaries
Solution: Avoid excessive parent directory traversal:
# Bad
uri: "../../../etc/passwd" # Blocked for security
# Good
uri: "../shared/data.parquet" # Reasonable parent traversal
Issue: Cross-platform compatibility¶
Cause: Using platform-specific path separators
Solution: Use forward slashes for cross-platform compatibility:
# Good (cross-platform)
uri: "data/users.parquet"
# Platform-specific
uri: "data\\users.parquet" # Windows only
Getting Help¶
If you encounter issues during migration:
- Check logs: Look for path resolution debug messages
- Validate step by step: Test each change individually
- Use absolute paths temporarily: You can always fall back to absolute paths
- Consult examples: See
examples/directory for reference patterns
Examples for Reference¶
See these working examples of relative path usage:
- Simple Parquet:
examples/simple_parquet/catalog.yml - Multi-Source Analytics:
examples/data-integration/multi-source-analytics/catalog.yaml - DuckDB Performance:
examples/production-operations/duckdb-performance-settings/catalog-*.yaml
Each example demonstrates different patterns for organizing and referencing data files with relative paths.