Duckalog Documentation¶

Welcome to the Duckalog documentation! This comprehensive guide will help you master Duckalog's features and patterns.

Getting Started¶

The documentation is organized for different learning styles and needs:

Tutorials - Step-by-step hands-on learning for beginners
How-to Guides - Practical solutions for specific problems
Reference - Technical API documentation and configuration schema
Understanding - Background context and architectural concepts
Examples - Real-world configuration examples and patterns

Key Features Overview¶

✅ Multi-Source Data Integration¶

S3 Parquet/Delta/Iceberg: Direct querying of cloud data lakes
Database Attachments: Connect DuckDB, SQLite, PostgreSQL databases
Semantic Layer: Business-friendly dimensions and measures
Path Resolution: Automatic path handling with security validation

✅ Developer Experience¶

Config-Driven: Declarative YAML/JSON configurations
Idempotent: Same config always produces the same catalog
CLI + Python API: Use from command line or in code
Remote Configs: Load configurations from S3, GCS, Azure, GitHub

✅ Production Ready¶

Security: Environment variable credentials, no secrets in configs
Performance: Optimized for large-scale analytics workloads
Monitoring: Comprehensive logging and error handling
Web UI: Interactive dashboard for catalog management

✅ Enterprise Features¶

Semantic Models: Business-friendly metadata layer
Secret Management: Canonical credential configuration
Audit Trail: Config-driven change tracking
Multi-Cloud: Support for AWS, GCP, Azure storage systems

Popular Examples¶

📊 Multi-Source Analytics: Combine Parquet, DuckDB, and PostgreSQL data
🔒 Environment Variables Security: Secure credential management patterns
⚡ DuckDB Performance Settings: Optimize memory, threads, and storage
🏷️ Semantic Layer v2: Business-friendly semantic models with dimensions and measures

Quick Reference¶

# Installation
pip install duckalog           # Core package
pip install duckalog[ui]       # With web dashboard
pip install duckalog[remote]   # With remote configuration support

# Core CLI commands
duckalog init                  # Create starter configuration
duckalog run catalog.yaml    # Build DuckDB catalog
duckalog validate catalog.yaml # Check configuration syntax
duckalog ui catalog.yaml       # Launch web dashboard

# Remote configuration examples
duckalog run s3://bucket/config.yaml          # S3 configuration
duckalog run github://user/repo/config.yaml   # GitHub repository
duckalog run gs://bucket/config.yaml          # Google Cloud Storage

# Python API basics
from duckalog import build_catalog, generate_sql, validate_config

# Start with a template
from duckalog.config_init import create_config_template
config = create_config_template(format="yaml", output_path="my_config.yaml")

# Build and validate
build_catalog("my_config.yaml")
validate_config("my_config.yaml")
sql = generate_sql("my_config.yaml")

Next Steps¶

🆕 New to Duckalog? Start with the Getting Started Tutorial
🎯 Have a specific problem? Browse the How-to Guides
📚 Need technical details? Check the Reference documentation
🏗️ Want to understand the design? Read the Architecture overview
💡 Need ideas? Explore the Examples collection