Duckalog Examples¶
Welcome to Duckalog examples collection! These practical examples demonstrate real-world usage patterns and help you get started with different Duckalog configurations.
Learning Path¶
Examples are organized by difficulty to help you build Duckalog expertise progressively:
🟢 Beginner (Getting Started)¶
Perfect for users new to Duckalog or data cataloging concepts.
🟡 Intermediate (Building Skills)¶
For users comfortable with basic concepts and ready for more complex scenarios.
🔴 Advanced (Mastery)¶
For experienced users tackling enterprise-scale data challenges.
Choosing an Example¶
Use this guide to find the right example for your use case:
🟢 I'm getting started with Duckalog¶
→ Start with: Simple Parquet Example - What you'll learn: Basic configuration, Parquet views, S3 setup - Prerequisites: Basic Python, familiar with data files - Time to complete: 15-30 minutes - Perfect for simple analytics without complex joins
🟡 I need to combine multiple local databases¶
→ Use: Local Attachments Example - What you'll learn: Database attachments, cross-database joins, read-only patterns - Prerequisites: Familiar with basic SQL, local database concepts - Time to complete: 30-45 minutes - Great for consolidating existing local data sources
🔴 I have data in multiple sources (Parquet + databases + cloud storage)¶
→ Follow: Multi-Source Analytics Example - What you'll learn: Enterprise data integration, complex joins, business logic patterns - Prerequisites: Comfortable with SQL, familiar with cloud storage concepts - Time to complete: 60-90 minutes - Real-world analytics workflow with production-ready patterns
🟢 I need to organize my configuration across multiple files¶
→ Use: Config Imports Example - What you'll learn: Modular configuration, team ownership patterns, environment management - Prerequisites: Basic YAML knowledge, file system concepts - Time to complete: 20-30 minutes - Essential for team collaboration and project maintainability
🟡 I need to deploy configs across different environments¶
→ Read: Environment Variables Example - What you'll learn: Secure credential management, deployment patterns, environment separation - Prerequisites: Familiar with environment variables, basic deployment concepts - Time to complete: 30-45 minutes - Critical for production deployments and team workflows
🟡 I want to fine-tune DuckDB performance and behavior¶
→ Explore: DuckDB Settings Example - What you'll learn: Performance optimization, memory management, threading configuration - Prerequisites: DuckDB basics, performance concepts - Time to complete: 20-30 minutes - Essential for production performance tuning
🟡 I need to manage credentials for cloud services and databases¶
→ Use: DuckDB Secrets Example - What you'll learn: Secure credential management, cloud service integration, secret scoping - Prerequisites: Cloud service accounts, security concepts - Time to complete: 25-35 minutes - Essential for secure cloud data access
Learning Progression¶
Step 1: Start with Basics (🟢 Beginner)¶
Complete these examples to build foundational skills:
- Simple Parquet - Learn core concepts
- Config Imports - Organize your configuration
Outcome: You'll understand Duckalog fundamentals and be ready for intermediate scenarios.
Step 2: Build Intermediate Skills (🟡 Intermediate)¶
Tackle more complex data management challenges:
- Local Attachments - Work with multiple databases
- Environment Variables - Master deployment patterns
- DuckDB Settings - Optimize performance
- DuckDB Secrets - Secure cloud access
Outcome: You'll handle real-world data integration and deployment scenarios.
Step 3: Advanced Mastery (🔴 Advanced)¶
Solve enterprise-scale challenges:
- Multi-Source Analytics - Complete production workflow
Outcome: You'll be ready for complex enterprise data projects.
Quick Reference¶
| Example | Difficulty | Time | Prerequisites | Key Skills |
|---|---|---|---|---|
| Simple Parquet | 🟢 Beginner | 15-30 min | Basic Python | Core configuration |
| Config Imports | 🟢 Beginner | 20-30 min | YAML knowledge | Modular organization |
| Local Attachments | 🟡 Intermediate | 30-45 min | Basic SQL | Database attachments |
| Environment Variables | 🟡 Intermediate | 30-45 min | Deployment concepts | Secure deployment |
| DuckDB Settings | 🟡 Intermediate | 20-30 min | Performance concepts | Optimization |
| DuckDB Secrets | 🟡 Intermediate | 25-35 min | Cloud accounts | Credential management |
| Multi-Source Analytics | 🔴 Advanced | 60-90 min | Complex SQL | Enterprise integration |
Prerequisites for All Examples¶
Required Software¶
- Python 3.12+ with Duckalog installed:
Optional but Recommended¶
-
DuckDB CLI for interactive querying:
-
AWS CLI (for S3 examples):
Example Categories¶
By Data Source¶
Parquet Files Only - Simple Parquet - Perfect starting point - Multi-Source Analytics - Includes Parquet with other sources
Local Databases - Local Attachments - DuckDB and SQLite focus - Multi-Source Analytics - Local databases with cloud sources
Cloud Storage & Data Lakes - Simple Parquet - S3 configuration - Multi-Source Analytics - S3 + Iceberg catalogs - Environment Variables - Cloud credential management
Enterprise/Production - Multi-Source Analytics - Production-ready patterns - Environment Variables - Deployment and security
By Use Case¶
Data Analytics - Simple Parquet - Basic analytics - Local Attachments - Cross-database analytics - Multi-Source Analytics - Enterprise analytics
Data Integration - Local Attachments - Local data unification - Multi-Source Analytics - Cloud + local integration
Configuration Management - Config Imports - Modular configuration patterns - Environment Variables - Environment-specific settings
Development & Deployment - Config Imports - Configuration organization and modularity - Environment Variables - Environment-specific configs - Multi-Source Analytics - Production deployment patterns
Common Patterns Across Examples¶
All examples demonstrate these important Duckalog concepts:
- Configuration Structure - Consistent YAML patterns
- View Composition - Building complex analytics from simple views
- Performance Optimization - Memory limits, threading, pragmas
- Error Handling - Validation and troubleshooting
- Best Practices - Security, maintainability, scalability
Getting Started¶
For Complete Beginners¶
- Start with Simple Parquet to learn core concepts
- Add Config Imports to organize your configuration
- Practice with your own data files
For Experienced Users¶
- Assess your needs using the difficulty guide above
- Jump to appropriate examples based on your current skills
- Combine patterns from multiple examples as needed
Learning Tips¶
- Follow the sequence for progressive skill building
- Complete each example fully before moving to the next
- Experiment with variations to solidify understanding
- Apply to your data to make learning practical
Next Steps¶
After working through examples:
- Read the User Guide in
../guides/index.mdfor comprehensive documentation - Explore the API Reference in
../reference/index.mdfor detailed function documentation - Review the Architecture in
../explanation/architecture.mdfor high-level design details - Join the community for questions and discussions
Contributing Examples¶
Have a great Duckalog pattern to share? Consider contributing:
- Create a new example following the patterns shown here
- Include clear explanations and real-world scenarios
- Add troubleshooting sections for common issues
- Ensure examples work with minimal setup
- Link from this index page
Need Help?¶
- Configuration Issues: Check troubleshooting sections in examples
- API Questions: See API Reference
- General Usage: Review User Guide
- Technical Details: Read Architecture
Choose an example above to get started, or explore them in order to build your Duckalog expertise progressively!