How to Manage Different Environments¶

Set up and manage different environments (development, staging, production) with proper separation of concerns and secure credential management.

Problem¶

You need to deploy Duckalog configurations across multiple environments while maintaining: - Separate database connections and credentials - Environment-specific settings and paths - Consistent configuration structure - Secure deployment practices

Prerequisites¶

Basic Duckalog configuration knowledge
Understanding of environment variables
Familiarity with deployment concepts
Access to different environment credentials

Solution¶

1. Base Configuration Structure¶

Create a base configuration with common settings:

# config/base.yaml
version: 1

duckdb:
  # Common pragmas for all environments
  pragmas:
    - "SET memory_limit='2GB'"
    - "SET threads=4"
    - "SET enable_progress_bar=false"

# Common views that exist in all environments
views: []

2. Environment-Specific Configurations¶

Create separate configurations for each environment:

Development Environment¶

# config/dev.yaml
version: 1
imports:
  - ./base.yaml

duckdb:
  database: dev_analytics.duckdb
  # Development-specific settings
  pragmas:
    - "SET memory_limit='1GB'"     # Lower memory for dev
    - "SET threads=2"              # Fewer threads for dev

views:
  - name: users_dev
    source: parquet
    uri: "./data/dev/users.parquet"
    sql: |
      SELECT * FROM users 
      WHERE created_at >= '2023-01-01'  # Recent data only

Staging Environment¶

# config/staging.yaml
version: 1
imports:
  - ./base.yaml

duckdb:
  database: staging_analytics.duckdb
  pragmas:
    - "SET memory_limit='4GB'"     # More memory for staging
    - "SET threads=6"              # More threads for staging

views:
  - name: users_staging
    source: parquet
    uri: "./data/staging/users.parquet"
    sql: |
      SELECT * FROM users 
      WHERE created_at >= '2022-01-01'  # Full historical data

Production Environment¶

# config/prod.yaml
version: 1
imports:
  - ./base.yaml

duckdb:
  database: prod_analytics.duckdb
  pragmas:
    - "SET memory_limit='8GB'"     # Maximum memory for production
    - "SET threads=8"              # Maximum threads for production
    - "SET enable_progress_bar=false"  # No progress bars in production

views:
  - name: users_prod
    source: parquet
    uri: "s3://prod-data-bucket/analytics/users.parquet"
    sql: |
      SELECT * FROM users 
      -- No WHERE clause - full production data

3. Environment Variables for Credentials¶

Use environment variables for sensitive information:

# config/prod.yaml
version: 1
imports:
  - ./base.yaml

duckdb:
  database: prod_analytics.duckdb

attachments:
  postgres:
    - alias: warehouse
      host: "${env:PG_HOST}"           # Production database host
      port: 5432
      database: "${env:PG_DATABASE}"     # Production database name
      user: "${env:PG_USER}"           # Production database user
      password: "${env:PG_PASSWORD}"     # Production database password

views:
  - name: users_prod
    source: parquet
    uri: "s3://prod-data-bucket/analytics/"
    # S3 credentials from environment
    sql: |
      SELECT * FROM users 
      WHERE created_at >= '${env:DATA_START_DATE}'

4. Environment Selection Script¶

Create a script to select the right configuration:

#!/bin/bash
# scripts/deploy.sh

set -e

# Environment selection
ENVIRONMENT=${1:-dev}
echo "Deploying to environment: $ENVIRONMENT"

# Validate environment
case $ENVIRONMENT in
  dev|staging|prod)
    ;;
  *)
    echo "Error: Environment must be dev, staging, or prod"
    exit 1
    ;;
esac

# Set environment-specific variables
export DUCKALOG_ENV=$ENVIRONMENT

# Load environment variables
case $ENVIRONMENT in
  dev)
    export PG_HOST=localhost
    export PG_DATABASE=analytics_dev
    export PG_USER=dev_user
    export PG_PASSWORD=dev_password
    export DATA_START_DATE="2023-01-01"
    ;;
  staging)
    export PG_HOST=staging-db.company.com
    export PG_DATABASE=analytics_staging
    export PG_USER=staging_user
    export PG_PASSWORD=${STAGING_PG_PASSWORD}
    export DATA_START_DATE="2022-01-01"
    ;;
  prod)
    export PG_HOST=prod-db.company.com
    export PG_DATABASE=analytics_prod
    export PG_USER=prod_user
    export PG_PASSWORD=${PROD_PG_PASSWORD}
    export DATA_START_DATE="2020-01-01"
    ;;
esac

# Build catalog
CONFIG_FILE="config/${ENVIRONMENT}.yaml"
echo "Building catalog from: $CONFIG_FILE"

duckalog run "$CONFIG_FILE"

echo "Deployment to $ENVIRONMENT completed successfully"

5. Docker Environment Configuration¶

For containerized deployments:

# Dockerfile
FROM python:3.12-slim

WORKDIR /app

# Install Duckalog
RUN pip install duckalog[remote]

# Copy configuration files
COPY config/ ./config/
COPY scripts/ ./scripts/
COPY data/ ./data/

# Set default environment
ENV DUCKALOG_ENV=dev

# Run deployment script
CMD ["./scripts/deploy.sh"]

# docker-compose.yml
version: '3.8'

services:
  duckalog-dev:
    build: .
    environment:
      - DUCKALOG_ENV=dev
      - PG_HOST=postgres-dev
      - PG_DATABASE=analytics_dev
      - PG_USER=dev_user
      - PG_PASSWORD=dev_password
    volumes:
      - ./data:/app/data
      - ./logs:/app/logs

  duckalog-staging:
    build: .
    environment:
      - DUCKALOG_ENV=staging
      - PG_HOST=postgres-staging
      - PG_DATABASE=analytics_staging
      - PG_USER=staging_user
      - PG_PASSWORD=${STAGING_PG_PASSWORD}
    volumes:
      - ./data:/app/data
      - ./logs:/app/logs

  duckalog-prod:
    build: .
    environment:
      - DUCKALOG_ENV=prod
      - PG_HOST=postgres-prod
      - PG_DATABASE=analytics_prod
      - PG_USER=prod_user
      - PG_PASSWORD=${PROD_PG_PASSWORD}
    volumes:
      - ./data:/app/data
      - ./logs:/app/logs

6. Kubernetes Environment Configuration¶

For Kubernetes deployments:

# k8s/duckalog-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: duckalog
spec:
  replicas: 1
  selector:
    matchLabels:
      app: duckalog
  template:
    metadata:
      labels:
        app: duckalog
    spec:
      containers:
      - name: duckalog
        image: duckalog:latest
        env:
        - name: DUCKALOG_ENV
          value: "prod"
        - name: PG_HOST
          valueFrom:
            secretKeyRef:
              name: database-credentials
              key: host
        - name: PG_DATABASE
          valueFrom:
            secretKeyRef:
              name: database-credentials
              key: database
        - name: PG_USER
          valueFrom:
            secretKeyRef:
              name: database-credentials
              key: username
        - name: PG_PASSWORD
          valueFrom:
            secretKeyRef:
              name: database-credentials
              key: password
        volumeMounts:
        - name: config-volume
          mountPath: /app/config
        - name: data-volume
          mountPath: /app/data
      volumes:
      - name: config-volume
        configMap:
          name: duckalog-config
      - name: data-volume
        persistentVolumeClaim:
          claimName: duckalog-data
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: duckalog-config
data:
  prod.yaml: |
    version: 1
    imports:
      - ./base.yaml
    duckdb:
      database: /app/data/prod_analytics.duckdb
      pragmas:
        - "SET memory_limit='8GB'"
        - "SET threads=8"
    views:
      - name: users_prod
        source: parquet
        uri: "/app/data/production/users.parquet"

Verification¶

1. Test Environment Loading¶

# Test each environment
for env in dev staging prod; do
  echo "Testing $env environment..."
  duckalog validate "config/${env}.yaml"
  if [ $? -eq 0 ]; then
    echo "✅ $env configuration is valid"
  else
    echo "❌ $env configuration has errors"
  fi
done

2. Verify Environment Isolation¶

# Ensure environments don't interfere
echo "Checking environment isolation..."

# Check that dev doesn't access prod data
if duckalog run config/dev.yaml --dry-run | grep -q "prod-data-bucket"; then
  echo "❌ Dev configuration accessing production data"
  exit 1
fi

# Check that prod doesn't use dev settings
if duckalog run config/prod.yaml --dry-run | grep -q "memory_limit='1GB'"; then
  echo "❌ Production configuration using development memory"
  exit 1
fi

echo "✅ Environment isolation verified"

3. Test Deployment Script¶

# Test deployment script
./scripts/deploy.sh dev
echo "Dev deployment exit code: $?"

./scripts/deploy.sh staging
echo "Staging deployment exit code: $?"

./scripts/deploy.sh prod
echo "Production deployment exit code: $?"

Common Variations¶

1. Configuration Imports for Environments¶

Use config imports to share common settings:

# config/prod.yaml
version: 1
imports:
  - ./base.yaml          # Common settings
  - ./shared-views.yaml   # Shared view definitions
  - ./prod-secrets.yaml   # Production-only secrets

duckdb:
  database: prod_analytics.duckdb
  # Production-specific overrides
  pragmas:
    - "SET memory_limit='16GB'"  # Override base.yaml

2. Environment Detection¶

Automatically detect environment:

# config/auto.yaml
version: 1
imports:
  - ./base.yaml
  - "./${env:ENVIRONMENT}.yaml"  # Dynamic import based on environment

duckdb:
  database: "${env:ENVIRONMENT}_analytics.duckdb"

# Use environment detection
export ENVIRONMENT=prod
duckalog run config/auto.yaml

3. Configuration Templates¶

Use template files with environment substitution:

# config/template.yaml
version: 1
imports:
  - ./base.yaml

duckdb:
  database: "{{ENVIRONMENT}}_analytics.duckdb"
  threads: "{{THREADS}}"
  memory_limit: "{{MEMORY_LIMIT}}"

views:
  - name: "{{ENVIRONMENT}}_users"
    uri: "{{DATA_PATH}}/users.parquet"

# Generate configuration from template
envsubst < config/template.yaml > config/generated.yaml

Troubleshooting¶

Environment Variable Not Found¶

Error: ConfigError: Environment variable not found: PG_HOST

Solution:

# Check if variable is set
echo $PG_HOST

# Set variable
export PG_HOST=your_host

# Use .env file
echo "PG_HOST=your_host" >> .env
source .env

Configuration File Not Found¶

Error: ConfigError: Config file not found: config/prod.yaml

Solution:

# Check file exists
ls -la config/prod.yaml

# Check working directory
pwd

# Use absolute path
duckalog run /full/path/to/config/prod.yaml

Permission Issues¶

Error: PermissionError: Cannot read configuration file

Solution:

# Check file permissions
ls -la config/prod.yaml

# Fix permissions
chmod 600 config/prod.yaml  # Read/write for owner only
chmod 644 config/prod.yaml  # Read for everyone, write for owner

Database Connection Failures¶

Error: EngineError: Failed to connect to database

Solution:

# Test database connection manually
duckdb config/prod_analytics.duckdb
# Then test: .tables

# Check database file permissions
ls -la prod_analytics.duckdb

# Verify database path in config
grep "database:" config/prod.yaml

Best Practices¶

1. Security¶

Never commit credentials to version control
Use environment variables for all sensitive data
Rotate credentials regularly
Use read-only database connections where possible

2. Consistency¶

Maintain consistent structure across environments
Use config imports to share common settings
Document environment-specific differences
Use naming conventions for environments

3. Deployment¶

Test configurations in staging before production
Use deployment scripts for consistency
Implement rollback procedures
Monitor deployments for issues

4. Maintenance¶

Regularly update base configurations
Review environment-specific settings
Document configuration changes
Use version control for configuration files

Next Steps¶

After setting up environment management:

Automate deployments with CI/CD pipelines
Implement monitoring for different environments
Create backup procedures for configuration and data
Document team workflows for environment changes
Explore advanced patterns in Performance Tuning

You now have a robust environment management system that supports secure, scalable deployments across development, staging, and production environments.