Skip to content

Configuration Schema Reference

Complete reference for Duckalog configuration schema, including all options, types, defaults, and examples for both basic and advanced usage patterns.

Overview

Duckalog configurations use YAML or JSON format with a hierarchical structure. The new modular architecture supports enhanced patterns including configuration imports, dependency injection hooks, and performance optimizations while maintaining full backward compatibility.

Architecture Features

The configuration schema now supports:

  • Configuration Imports: Modular configuration structure with file imports
  • Enhanced Environment Processing: Advanced variable resolution and validation
  • Import Resolution Performance: Caching and optimization for complex configurations
  • Dependency Injection: Customizable loading and resolution components
  • Backward Compatibility: All existing configurations continue to work

Version Requirements

  • Version 1: Basic configuration structure (legacy compatible)
  • Version 1+: Enhanced features with imports and modular patterns
  • Future versions: Will maintain backward compatibility where possible

Root Configuration

Config

The top-level configuration object.

Fields

Field Type Required Default Description
version integer - Configuration schema version (must be positive integer)
duckdb DuckDBConfig - DuckDB database configuration
views array[ViewConfig] - List of view definitions
attachments AttachmentsConfig {} Database attachment configurations
iceberg_catalogs array[IcebergCatalogConfig] [] Iceberg catalog configurations
semantic_models array[SemanticModelConfig] [] Semantic model definitions
imports ImportConfig [] Configuration imports and modular structure
load_sql_files boolean true Enable/disable SQL file loading for performance
resolve_paths boolean true Enable/disable path resolution (for testing)
load_dotenv boolean true Enable/disable .env file loading

DuckDB Configuration

DuckDBConfig

Configuration for DuckDB database connection and behavior.

Fields

Field Type Required Default Description
database string :memory: Path to DuckDB database file
install_extensions array[string] [] Extensions to install before creating views
load_extensions array[string] [] Extensions to load after connecting
pragmas array[string] [] DuckDB pragmas to set
settings string or array[string] - DuckDB SET statements executed after pragmas

Examples

# Basic configuration
duckdb:
  database: catalog.duckdb

# With extensions
duckdb:
  database: catalog.duckdb
  install_extensions:
    - httpfs
    - json
    - parquet

# With pragmas
duckdb:
  database: catalog.duckdb
  pragmas:
    - "SET memory_limit='4GB'"
    - "SET threads=4"
    - "SET enable_progress_bar=false"

# With settings (single SET statement)
duckdb:
  database: catalog.duckdb
  settings: "SET memory_limit='4GB'"

# With settings (multiple SET statements)
duckdb:
  database: catalog.duckdb
  settings:
    - "SET memory_limit='4GB'"
    - "SET threads=4"
    - "SET enable_progress_bar=false"

# With secrets
duckdb:
  database: catalog.duckdb
  secrets:
    - type: s3
      name: production_s3
      key_id: "${env:AWS_ACCESS_KEY_ID}"
      secret: "${env:AWS_SECRET_ACCESS_KEY}"
      region: us-west-2

Secrets Configuration

SecretConfig

Configuration for DuckDB secrets used to authenticate with external services and databases. Secrets are defined within the duckdb configuration section.

Fields

Field Type Required Default Description
type string - Secret type (s3, azure, gcs, http, postgres, mysql)
name string type value Optional name for the secret (defaults to type if not provided)
provider string config Secret provider (config or credential_chain)
persistent boolean false Whether to create a persistent secret (⚠️ Security implications apply)
scope string - Optional scope prefix for the secret
key_id string - Access key ID or username for authentication
secret string - Secret key or password for authentication
region string - Geographic region for cloud services
endpoint string - Custom endpoint URL for cloud services
connection_string string - Full connection string for databases
tenant_id string - Azure tenant ID for authentication
account_name string - Azure storage account name
client_id string - Azure client ID for authentication
client_secret string - Azure client secret for authentication
service_account_key string - GCS service account key
json_key string - GCS JSON key
bearer_token string - HTTP bearer token for authentication
header string - HTTP header for authentication
database string - Database name for database secrets
host string - Database host for database secrets
port integer - Database port for database secrets
user string - Database username (alternative to key_id for database types)
password string - Database password (alternative to secret for database types)
options object {} Additional key-value options for the secret

Secret Types

Type Description Common Fields
s3 Amazon S3 or S3-compatible storage key_id, secret, region, endpoint
azure Azure Blob Storage connection_string, tenant_id, account_name
gcs Google Cloud Storage service_account_key, json_key, key_id, secret
http HTTP basic authentication key_id, secret, bearer_token, header
postgres PostgreSQL database connections connection_string, host, port, database, user, password
mysql MySQL database connections (uses postgres type) connection_string, host, port, database, user, password

Examples

# S3 secret with static credentials
duckdb:
  database: catalog.duckdb
  secrets:
    - type: s3
      name: production_s3
      key_id: "${env:AWS_ACCESS_KEY_ID}"
      secret: "${env:AWS_SECRET_ACCESS_KEY}"
      region: us-west-2

# Azure storage secret
duckdb:
  database: catalog.duckdb
  secrets:
    - type: azure
      name: azure_prod
      connection_string: "${env:AZURE_STORAGE_CONNECTION_STRING}"
      account_name: "${env:AZURE_STORAGE_ACCOUNT}"

# GCS secret with service account
duckdb:
  database: catalog.duckdb
  secrets:
    - type: gcs
      name: gcs_service_account
      service_account_key: "${env:GCS_SERVICE_ACCOUNT_JSON}"

# PostgreSQL database secret
duckdb:
  database: catalog.duckdb
  secrets:
    - type: postgres
      name: analytics_db
      connection_string: "${env:DATABASE_URL}"

# HTTP secret for API access
duckdb:
  database: catalog.duckdb
  secrets:
    - type: http
      name: api_auth
      key_id: "${env:API_USERNAME}"
      secret: "${env:API_PASSWORD}"

# Secret with additional options
duckdb:
  database: catalog.duckdb
  secrets:
    - type: s3
      name: minio_storage
      key_id: "${env:MINIO_ACCESS_KEY}"
      secret: "${env:MINIO_SECRET_KEY}"
      endpoint: http://minio-server:9000
      persistent: false  # Temporary secret (default)
      options:
        use_ssl: false
        url_style: path

# Persistent secret (use with caution)
duckdb:
  database: catalog.duckdb
  secrets:
    - type: s3
      name: production_s3
      key_id: "${env:AWS_ACCESS_KEY_ID}"
      secret: "${env:AWS_SECRET_ACCESS_KEY}"
      persistent: true  # Persists to database file
      region: us-west-2

Security Note: For detailed guidance on secrets persistence and security implications, see Secrets Persistence Guide.

Views Configuration

ViewConfig

Definition of a database view in Duckalog.

Fields

Field Type Required Default Description
name string - View name (must be unique within schema)
db_schema string - Database schema for view
description string - Human-readable description of view
source string - Data source type (see Source Types)
uri string - Source URI or file path
sql string - SQL statement for view
sql_file SQLFileReference - External SQL file configuration
sql_template SQLFileReference - External SQL template with variable substitution
database string - Attachment alias for attached-database sources
table string - Table name (optionally schema-qualified) for attached sources
catalog string - Iceberg catalog name for catalog-based Iceberg views
options object {} Source-specific options passed to scan functions
tags array[string] [] Tags for categorization

Source Types

Type Description URI Examples
parquet Apache Parquet files data/file.parquet, s3://bucket/data/*.parquet
delta Delta Lake tables s3://bucket/delta-table/, data/delta-table/
iceberg Apache Iceberg tables s3://bucket/iceberg-table/, gs://bucket/table/
duckdb DuckDB database files ./reference.duckdb, s3://bucket/catalog.duckdb
sqlite SQLite database files ./legacy.db, /absolute/path/data.db
postgres PostgreSQL databases See PostgreSQL configuration below

Examples

# Parquet source
views:
  - name: users
    source: parquet
    uri: "data/users.parquet"
    description: "User data from Parquet files"

# Delta Lake source
views:
  - name: orders
    source: delta
    uri: "s3://analytics-bucket/orders/"
    description: "Order data from Delta Lake"

# SQL view
views:
  - name: active_users
    sql: |
      SELECT * FROM users 
      WHERE status = 'active'
    description: "Active users only"

# External SQL file
views:
  - name: complex_analytics
    sql_file:
      path: "./analytics/complex_query.sql"
      variables:
        start_date: "2023-01-01"
        min_amount: 100
    description: "Complex analytics from external SQL"

# View using attached database
views:
  - name: user_metrics
    source: duckdb
    database: analytics
    table: user_metrics
    sql: |
      SELECT * FROM user_metrics
    description: "User metrics from attached database"

SQL File Configuration

SQLFileConfig

Configuration for loading SQL from external files with optional template processing.

Fields

Field Type Required Default Description
path string - Path to SQL file (relative or absolute)
variables object {} Template variables for substitution
as_template boolean false Whether to process file as template

Template Processing

When as_template: true, the SQL file can contain template variables using {variable} syntax:

-- analytics_query.sql
SELECT 
    event_type,
    COUNT(*) as event_count,
    AVG(event_value) as avg_value
FROM events 
WHERE event_date >= '{start_date}'
  AND event_type = '{event_type}'
GROUP BY event_type
# With template variables
views:
  - name: event_analytics
    sql_file:
      path: "./analytics/analytics_query.sql"
      variables:
        start_date: "2023-01-01"
        event_type: "purchase"
      as_template: true

Attachments Configuration

AttachmentsConfig

Container for different types of database attachments.

Fields

Field Type Required Default Description
duckdb array[DuckDBAttachment] [] DuckDB database attachments
sqlite array[SQLiteAttachment] [] SQLite database attachments
postgres array[PostgresAttachment] [] PostgreSQL database attachments

DuckDBAttachment

Attach another DuckDB database file.

Fields

Field Type Required Default Description
alias string - Alias for attached database
path string - Path to DuckDB database file
read_only boolean true Whether attachment is read-only

Example

attachments:
  duckdb:
    - alias: reference_data
      path: "./reference.duckdb"
      read_only: true
    - alias: analytics_archive
      path: "/archive/analytics_2023.duckdb"
      read_only: true

SQLiteAttachment

Attach a SQLite database file.

Fields

Field Type Required Default Description
alias string - Alias for attached database
path string - Path to SQLite database file
read_only boolean true Whether attachment is read-only

Example

attachments:
  sqlite:
    - alias: legacy_system
      path: "./legacy_data.db"
      read_only: true

PostgresAttachment

Attach a PostgreSQL database.

Fields

Field Type Required Default Description
alias string - Alias for attached database
host string - PostgreSQL server hostname
port integer 5432 PostgreSQL server port
database string - PostgreSQL database name
user string - PostgreSQL username
password string - PostgreSQL password
sslmode string prefer SSL connection mode
schema string public Database schema

Example

attachments:
  postgres:
    - alias: data_warehouse
      host: "warehouse.company.com"
      port: 5432
      database: "analytics"
      user: "readonly_user"
      password: "${env:PG_PASSWORD}"
      sslmode: "require"
      schema: "public"

Iceberg Catalogs Configuration

IcebergCatalogConfig

Configuration for Apache Iceberg catalog integration.

Fields

Field Type Required Default Description
name string - Catalog name (must be unique)
catalog_type string - Catalog type (rest, jdbc, glue)
uri string - Catalog URI
warehouse string - Warehouse location
options object {} Additional catalog options

Catalog Types

Type Description Example URI
rest REST catalog https://iceberg-catalog.company.com/
jdbc JDBC catalog jdbc:postgresql://host:5432/db
glue AWS Glue catalog glue://account-id:region

Example

iceberg_catalogs:
  - name: production_iceberg
    catalog_type: rest
    uri: "https://iceberg-catalog.company.com/"
    warehouse: "s3://production-data-lake/"
    options:
      token: "${env:ICEBERG_TOKEN}"

Semantic Models Configuration

SemanticModelConfig

Business-friendly metadata layer for analytics and BI tools.

Fields

Field Type Required Default Description
name string - Model name (must be unique)
base_view string - Base view name for model
label string - Display label for model
description string - Model description
tags array[string] [] Categorization tags
dimensions array[SemanticDimensionConfig] [] Dimension definitions
measures array[SemanticMeasureConfig] [] Measure definitions
joins array[SemanticJoinConfig] [] Join configurations (v2+)
defaults SemanticDefaultsConfig - Default configuration (v2+)

SemanticDimensionConfig

Dimension definition for semantic models.

Fields

Field Type Required Default Description
name string - Dimension name
expression string - SQL expression for dimension
label string - Display label
description string - Dimension description
type string - Data type (string, number, date, etc.)
time_grains array[string] [] Time grains for date dimensions

Example

semantic_models:
  - name: sales_analytics
    base_view: sales_data
    dimensions:
      - name: order_date
        expression: "created_at::date"
        label: "Order Date"
        type: "date"
        time_grains: ["year", "quarter", "month", "day"]
      - name: customer_region
        expression: "UPPER(region)"
        label: "Customer Region"
        type: "string"

SemanticMeasureConfig

Measure definition for semantic models.

Fields

Field Type Required Default Description
name string - Measure name
expression string - SQL expression for measure
label string - Display label
description string - Measure description
type string - Data type (number, currency, etc.)

Example

semantic_models:
  - name: sales_analytics
    base_view: sales_data
    measures:
      - name: total_revenue
        expression: "SUM(amount)"
        label: "Total Revenue"
        type: "currency"
        aggregation: "sum"
      - name: average_order_value
        expression: "AVG(amount)"
        label: "Average Order Value"
        type: "number"
        aggregation: "avg"

SemanticJoinConfig

Join configuration for semantic models (v2+).

Fields

Field Type Required Default Description
to_view string - Target view to join
type string left Join type (left, right, inner, full)
on_condition string - Join condition SQL expression

Example

semantic_models:
  - name: customer_analytics
    base_view: customers
    joins:
      - to_view: orders
        type: left
        on_condition: "customers.id = orders.customer_id"
      - to_view: products
        type: left
        on_condition: "orders.product_id = products.id"

SemanticDefaultsConfig

Default configuration for semantic models (v2+).

Fields

Field Type Required Default Description
time_dimension string - Default time dimension name
primary_measure string - Default primary measure name
default_filters array[SemanticFilterConfig] [] Default filters

Example

semantic_models:
  - name: sales_analytics
    base_view: sales_data
    defaults:
      time_dimension: order_date
      primary_measure: total_revenue
      default_filters:
        - dimension: order_date
          operator: ">="
          value: "2023-01-01"

Import Configuration

ImportConfig

Configuration for importing other configuration files with enhanced performance and caching from the new modular architecture.

Types

# Simple list of paths (basic pattern)
imports:
  - ./base.yaml
  - ./views.yaml
  - ./analytics.yaml

# Selective imports (advanced pattern)
imports:
  duckdb:
    - path: ./database.yaml
      override: true
  views:
    - path: ./core_views.yaml
      override: true
    - path: ./analytics_views.yaml
      override: false

# Remote imports (new capability)
imports:
  - s3://company-configs/base.yaml
  - ./local-overrides.yaml
  - https://config.company.com/production.yaml

Enhanced Features

Request-Scoped Caching: - Import resolution is cached within request scope - Repeated imports across multiple configurations are optimized - Import chain analysis for performance monitoring

Remote Import Support: - Import from remote sources (S3, GCS, HTTPS, etc.) - Mixed local and remote import combinations - Authentication through environment variables

Import Resolution Diagnostics: - Track import chain depth and complexity - Performance metrics for import resolution - Cache hit/miss statistics

ImportEntry

Individual import entry with optional override control and enhanced resolution features.

Fields

Field Type Required Default Description
path string - Path to configuration file (local or remote URI)
override boolean true Whether this import can override existing values
optional boolean false Whether import failure should be tolerated
namespace string - Optional namespace for import isolation

Remote Import Examples

# Remote import with authentication
imports:
  - path: s3://company-configs/base.yaml
    override: true
    optional: false
  - path: ./local-overrides.yaml
    override: true
  - path: https://config.company.com/production.yaml
    namespace: company
    optional: true

SelectiveImports

Section-specific imports for targeted configuration merging with enhanced conflict resolution.

Fields

Field Type Required Default Description
duckdb array[ImportEntry] [] DuckDB configuration imports
views array[ImportEntry] [] View definition imports
attachments array[ImportEntry] [] Attachment configuration imports
secrets array[ImportEntry] [] Secret configuration imports

Advanced Import Patterns

# Namespace isolation
imports:
  secrets:
    - path: ./dev-secrets.yaml
      namespace: dev
      override: false
    - path: ./prod-secrets.yaml
      namespace: prod
      override: false

# Optional imports for different environments
imports:
  views:
    - path: ./core-views.yaml
      override: true
    - path: ./dev-views.yaml
      optional: true
    - path: ./analytics-views.yaml
      optional: true
      namespace: analytics

# Remote fallback pattern
imports:
  - path: s3://company-configs/production-base.yaml
    override: true
    optional: true  # Fallback to local if remote fails
  - path: ./local-base.yaml
    override: true

Import Resolution Algorithm

The new architecture enhances import resolution with:

  1. Dependency Analysis: Analyze import graph for circular dependencies
  2. Performance Optimization: Cache resolution results across multiple loads
  3. Error Context: Provide detailed error information with import chain
  4. Remote Fallback: Graceful handling of remote import failures

Import Chain Example

# main.yaml
version: 1
imports:
  - ./base.yaml
  - ./views/users.yaml
  - ./analytics.yaml

# base.yaml
version: 1
duckdb:
  database: analytics.duckdb
imports:
  - ./shared/secrets.yaml

# analytics.yaml
version: 1
imports:
  - ./base.yaml  # Already loaded - cached result
  - ./views/reports.yaml

Resolution with Caching: 1. Load main.yaml → start request cache 2. Load base.yaml → load and cache shared/secrets.yaml 3. Load views/users.yaml → use cached base 4. Load analytics.yaml → reuse cached base.yaml 5. Load views/reports.yaml → complete resolution

Performance Benefits: - base.yaml loaded once, reused twice - shared/secrets.yaml resolved once - Import chain depth: 3 levels - Cache hit ratio: 40% for repeated imports | attachments | array[ImportEntry] | ❌ | [] | Attachment configuration imports | | iceberg_catalogs | array[ImportEntry] | ❌ | [] | Iceberg catalog imports | | semantic_models | array[ImportEntry] | ❌ | [] | Semantic model imports |

Environment Variable Interpolation

Syntax

Use ${env:VARIABLE_NAME} syntax for environment variable substitution.

Examples

# Basic substitution
duckdb:
  database: "${env:DB_PATH}"

# With default values
views:
  - name: users
    uri: "${env:DATA_PATH}/users.parquet"
    sql: |
      SELECT * FROM users 
      WHERE created_at >= '${env:START_DATE}'

Supported Variables

Any environment variable can be used. Common patterns:

# Database credentials
export DB_HOST=localhost
export DB_USER=analytics_user
export DB_PASSWORD=${env:DB_PASSWORD}

# Cloud storage
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_DEFAULT_REGION=us-west-2

# File paths
export DATA_DIR=/data/analytics
export CONFIG_DIR=/etc/duckalog

Validation Rules

Required Fields

All configurations must include: - ✅ version: Positive integer - ✅ duckdb: Database configuration - ✅ views: Array of views (can be empty)

Uniqueness Constraints

  • View names must be unique within each schema
  • Attachment aliases must be unique
  • Iceberg catalog names must be unique
  • Semantic model names must be unique

Type Validation

  • All string fields support environment variable interpolation
  • File paths are resolved relative to configuration file
  • Remote URIs are validated for supported schemes
  • SQL expressions are validated for basic syntax

Common Patterns

Multi-Environment Configuration

version: 1

duckdb:
  database: "${env:DB_PATH}"

# Environment-specific imports
imports:
  - "./base-${env:ENVIRONMENT}.yaml"

Team Collaboration Structure

# Shared base configuration
version: 1
imports:
  - ./infrastructure/database.yaml    # Team: Infrastructure
  - ./data/views/users.yaml           # Team: Data
  - ./analytics/models.yaml           # Team: Analytics
  - ./business/reports.yaml           # Team: Business

Production Deployment

version: 1
duckdb:
  database: /data/prod/analytics.duckdb
  pragmas:
    - "SET memory_limit='8GB'"
    - "SET threads=8"
    - "SET enable_progress_bar=false"

# Secure credential management
attachments:
  postgres:
    - alias: warehouse
      host: "${env:PG_HOST}"
      user: "${env:PG_USER}"
      password: "${env:PG_PASSWORD}"

This comprehensive schema reference covers all Duckalog configuration options for building sophisticated data catalogs.