Skip to content

API Reference

This reference provides comprehensive documentation for the Duckalog Python API, including the new modular architecture patterns, dependency injection interfaces, and advanced usage patterns.

Core API Overview

The main API provides both simple convenience functions and advanced patterns for custom implementations. With the new config-driven connection management system, the primary API now returns a CatalogConnection object for intelligent connection management.

Core Functions

New: connect_to_catalog()

The primary function for connecting to catalogs with intelligent connection management:

from duckalog import connect_to_catalog

# Basic usage - returns CatalogConnection object
conn = connect_to_catalog("catalog.yaml")

# Use as context manager for automatic cleanup
with connect_to_catalog("catalog.yaml") as conn:
    # conn is a CatalogConnection object
    result = conn.get_connection().execute("SELECT COUNT(*) FROM users").fetchall()

# Force rebuild of all views
conn = connect_to_catalog("catalog.yaml", force_rebuild=True)

# Custom database path
conn = connect_to_catalog("catalog.yaml", database_path="custom.duckdb")

Key Features of CatalogConnection

  • Automatic Connection Management: Handles connection pooling and reuse
  • Session State Restoration: Automatically restores pragmas, settings, and attachments
  • Incremental Updates: Only creates missing views for faster builds
  • Lazy Connection: Database connection established only when needed
  • Context Manager Support: Automatic cleanup when used with with statement

Legacy: connect_to_catalog_cm()

For backward compatibility, the context manager function still works:

from duckalog import connect_to_catalog_cm

# Old style (still supported for compatibility)
with connect_to_catalog_cm("catalog.yaml") as conn:
    # conn is raw DuckDB connection (legacy behavior)
    result = conn.execute("SELECT * FROM users").fetchall()

Legacy Functions

All existing functions continue to work but are deprecated in favor of the new connection management approach:

from duckalog import (
    generate_sql,           # Generate SQL from config
    validate_config,        # Validate configuration
    connect_to_catalog,     # Build and connect in one operation
)

duckalog

Duckalog public API.

AttachmentsConfig

Bases: BaseModel

Collection of attachment configurations.

Attributes:

Name Type Description
duckdb list[DuckDBAttachment]

DuckDB attachment entries.

sqlite list[SQLiteAttachment]

SQLite attachment entries.

postgres list[PostgresAttachment]

Postgres attachment entries.

duckalog list[DuckalogAttachment]

Duckalog config attachment entries.

CatalogConnection

Manages DuckDB connections with session state restoration and lazy initialization.

This class ensures that every connection obtained through it has the correct catalog state (pragmas, settings, extensions, attachments, and secrets) applied, regardless of whether it's a new or existing connection.

Attributes:

Name Type Description
config_path

Path to the Duckalog configuration file.

database_path

Optional override for the DuckDB database path.

read_only

Whether to open the connection in read-only mode.

force_rebuild

If True, all views will be recreated even if they exist.

config Optional[Config]

The loaded and validated configuration.

conn Optional[DuckDBPyConnection]

The active DuckDB connection, or None if not initialized.

__enter__()

Context manager support: returns the CatalogConnection instance.

__exit__(exc_type, exc_val, exc_tb)

Context manager support: ensures connection cleanup.

__init__(config_path, database_path=None, read_only=False, force_rebuild=False, filesystem=None, load_dotenv=True)

Initialize the catalog connection manager.

Parameters:

Name Type Description Default
config_path str

Path to the Duckalog configuration file.

required
database_path Optional[str]

Optional override for the DuckDB database path.

None
read_only bool

Whether to open the connection in read-only mode.

False
force_rebuild bool

If True, all views will be recreated even if they exist.

False
filesystem Optional[Any]

Optional fsspec filesystem object for remote file access.

None
load_dotenv bool

If True, automatically load and process .env files.

True

close()

Clean up the DuckDB connection and resources.

get_connection()

Get the DuckDB connection, initializing it if necessary.

This method establishes the connection lazily on the first call, restores the session state, and performs incremental updates. Subsequent calls return the same connection instance.

Returns:

Type Description
DuckDBPyConnection

An active DuckDB connection with catalog state restored.

Raises:

Type Description
ConfigError

If loading the configuration fails.

FileNotFoundError

If the database path is invalid or missing (for existing catalogs).

EngineError

If connecting to DuckDB or restoring state fails.

Config

Bases: BaseModel

Top-level Duckalog configuration.

Attributes:

Name Type Description
version int

Positive integer describing the config schema version.

duckdb DuckDBConfig

DuckDB session and connection settings.

views list[ViewConfig]

List of view definitions to create in the catalog.

attachments AttachmentsConfig

Optional attachments to external databases.

iceberg_catalogs list[IcebergCatalogConfig]

Optional Iceberg catalog definitions.

semantic_models list[SemanticModelConfig]

Optional semantic model definitions for business metadata.

imports Union[list[Union[str, ImportEntry]], SelectiveImports]

Optional list of additional config files to import and merge. Can be a simple list of paths (backward compatible) or a SelectiveImports object for advanced options like section-specific imports, override behavior, and glob patterns.

env_files list[str]

Optional list of custom .env file patterns to load. Supports patterns like ['.env', '.env.local', '.env.production']. Files are loaded in order with later files overriding earlier ones. Defaults to ['.env'] for backward compatibility.

loader_settings LoaderSettings

Optional settings for the configuration loader.

ConfigError

Bases: DuckalogError

Configuration-related errors.

This exception is raised when a catalog configuration cannot be read, parsed, interpolated, or validated according to the Duckalog schema.

Typical error conditions include:

  • The config file does not exist or cannot be read.
  • The file is not valid YAML/JSON.
  • Required fields are missing or invalid.
  • An environment variable placeholder cannot be resolved.

DuckDBAttachment

Bases: BaseModel

Configuration for attaching another DuckDB database.

Attributes:

Name Type Description
alias str

Alias under which the database will be attached.

path str

Filesystem path to the DuckDB database file.

read_only bool

Whether the attachment should be opened in read-only mode. Defaults to True for safety.

DuckDBConfig

Bases: BaseModel

DuckDB connection and session settings.

Attributes:

Name Type Description
database str

Path to the DuckDB database file. Defaults to ":memory:".

install_extensions list[str]

Names of extensions to install before use.

load_extensions list[str]

Names of extensions to load in the session.

pragmas list[str]

SQL statements (typically SET pragmas) executed after connecting and loading extensions.

settings Optional[Union[str, list[str]]]

DuckDB SET statements executed after pragmas. Can be a single string or list of strings.

secrets list[SecretConfig]

List of secret definitions for external services and databases.

EngineError

Bases: DuckalogError

Engine-level error raised during catalog builds.

This exception wraps lower-level DuckDB errors, such as failures to connect to the database, attach external systems, or execute generated SQL statements.

IcebergCatalogConfig

Bases: BaseModel

Configuration for an Iceberg catalog.

Attributes:

Name Type Description
name str

Catalog name referenced by Iceberg views.

catalog_type str

Backend type (for example, rest, hive, glue).

uri Optional[str]

Optional URI used by certain catalog types.

warehouse Optional[str]

Optional warehouse location for catalog data.

options dict[str, Any]

Additional catalog-specific options.

PostgresAttachment

Bases: BaseModel

Configuration for attaching a Postgres database.

Attributes:

Name Type Description
alias str

Alias used inside DuckDB to reference the Postgres database.

host str

Hostname or IP address of the Postgres server.

port int

TCP port of the Postgres server.

database str

Database name to connect to.

user str

Username for authentication.

password str

Password for authentication.

sslmode Optional[str]

Optional SSL mode (for example, require).

options dict[str, Any]

Extra key/value options passed to the attachment clause.

SQLFileEncodingError

Bases: SQLFileError

Raised when a SQL file has invalid encoding.

SQLFileError

Bases: ConfigError

Base exception for SQL file-related errors.

This exception is raised when SQL file operations fail, such as when a referenced SQL file cannot be found, read, or processed.

SQLFileLoader

Loads SQL content from external files and processes templates.

__init__()

Initialize the SQL file loader.

load_sql_file(file_path, config_file_path, variables=None, as_template=False, filesystem=None)

Load SQL content from a file and optionally process as a template.

Parameters:

Name Type Description Default
file_path str

Path to the SQL file (can be relative or absolute)

required
config_file_path str

Path to the config file for resolving relative paths

required
variables Optional[dict[str, Any]]

Dictionary of variables for template substitution

None
as_template bool

Whether to process the file content as a template

False
filesystem Optional[Any]

Optional filesystem object for file I/O operations

None

Returns:

Type Description
str

The loaded SQL content (processed if template, raw otherwise)

Raises:

Type Description
SQLFileError

If the file cannot be loaded or processed

SQLFileNotFoundError

Bases: SQLFileError

Raised when a referenced SQL file does not exist.

SQLFilePermissionError

Bases: SQLFileError

Raised when a SQL file cannot be read due to permissions.

SQLFileReference

Bases: BaseModel

Reference to SQL content in an external file.

Attributes:

Name Type Description
path str

Path to the SQL file (relative or absolute).

variables Optional[dict[str, Any]]

Dictionary of variables for template substitution.

as_template bool

Whether to process the file content as a template.

SQLFileSizeError

Bases: SQLFileError

Raised when a SQL file exceeds size limits.

SQLGroup

Unified access to all SQL-related functionality.

SQLTemplateError

Bases: SQLFileError

Raised when template processing fails.

SQLiteAttachment

Bases: BaseModel

Configuration for attaching a SQLite database.

Attributes:

Name Type Description
alias str

Alias under which the SQLite database will be attached.

path str

Filesystem path to the SQLite .db file.

SecretConfig

Bases: BaseModel

Configuration for a DuckDB secret.

Attributes:

Name Type Description
type SecretType

Secret type (s3, azure, gcs, http, postgres, mysql).

name Optional[str]

Optional name for the secret (defaults to type if not provided).

provider SecretProvider

Secret provider (config or credential_chain).

persistent bool

Whether to create a persistent secret. Defaults to False.

scope Optional[str]

Optional scope prefix for the secret.

key_id Optional[str]

Access key ID or username for authentication.

secret Optional[str]

Secret key or password for authentication.

region Optional[str]

Geographic region for cloud services.

endpoint Optional[str]

Custom endpoint URL for cloud services.

connection_string Optional[str]

Full connection string for databases.

tenant_id Optional[str]

Azure tenant ID for authentication.

account_name Optional[str]

Azure storage account name.

client_id Optional[str]

Azure client ID for authentication.

client_secret Optional[str]

Azure client secret for authentication.

service_account_key Optional[str]

GCS service account key.

json_key Optional[str]

GCS JSON key.

bearer_token Optional[str]

HTTP bearer token for authentication.

header Optional[str]

HTTP header for authentication.

database Optional[str]

Database name for database secrets.

host Optional[str]

Database host for database secrets.

port Optional[int]

Database port for database secrets.

user Optional[str]

Database username (alternative to key_id for database types).

password Optional[str]

Database password (alternative to secret for database types).

options dict[str, Any]

Additional key-value options for the secret.

SemanticDefaultsConfig

Bases: BaseModel

Default configuration for a semantic model.

Provides default settings for query builders and dashboards, such as the primary time dimension and default measures.

Attributes:

Name Type Description
time_dimension Optional[str]

Default time dimension name.

primary_measure Optional[str]

Default primary measure name.

default_filters list[dict[str, Any]]

Optional list of default filters.

SemanticDimensionConfig

Bases: BaseModel

Definition of a semantic dimension.

A dimension represents a business attribute that maps to an expression over the base view of a semantic model.

Attributes:

Name Type Description
name str

Unique dimension name within the semantic model.

expression str

SQL expression referencing columns from the base view.

label Optional[str]

Human-readable display name.

description Optional[str]

Optional detailed description.

type Optional[str]

Optional data type hint (time, number, string, boolean, date).

time_grains list[str]

Optional list of time grains for time dimensions.

SemanticJoinConfig

Bases: BaseModel

Definition of a semantic join.

A join defines a relationship to another view for enriching the semantic model with additional data, typically dimension tables.

Attributes:

Name Type Description
to_view str

Name of an existing view in the views section to join to.

type str

Join type (inner, left, right, full).

on_condition str

SQL join condition expression.

SemanticMeasureConfig

Bases: BaseModel

Definition of a semantic measure.

A measure represents a business metric that typically involves aggregation or calculation over the base view of a semantic model.

Attributes:

Name Type Description
name str

Unique measure name within the semantic model.

expression str

SQL expression (often aggregated) over the base view.

label Optional[str]

Human-readable display name.

description Optional[str]

Optional detailed description.

type Optional[str]

Optional data type hint.

SemanticModelConfig

Bases: BaseModel

Definition of a semantic model.

A semantic model provides business-friendly metadata on top of an existing Duckalog view, defining dimensions and measures for analytics and BI use cases.

Attributes:

Name Type Description
name str

Unique semantic model name within the config.

base_view str

Name of an existing view in the views section.

dimensions list[SemanticDimensionConfig]

Optional list of dimension definitions.

measures list[SemanticMeasureConfig]

Optional list of measure definitions.

joins list[SemanticJoinConfig]

Optional list of join definitions to other views.

defaults Optional[SemanticDefaultsConfig]

Optional default configuration for query builders.

label Optional[str]

Human-readable display name.

description Optional[str]

Optional detailed description.

tags list[str]

Optional list of classification tags.

ViewConfig

Bases: BaseModel

Definition of a single catalog view.

A view can be defined in several ways: 1. Inline SQL: Using the sql field with raw SQL text 2. SQL File: Using sql_file to reference external SQL files 3. SQL Template: Using sql_template for parameterized SQL files 4. Data Source: Using source + required fields for direct data access 5. Source + SQL: Using source for data access plus sql for transformations

For data sources, the required fields depend on the source type: - Parquet/Delta: uri field is required - Iceberg: Either uri OR both catalog and table - DuckDB/SQLite/Postgres: Both database and table are required

When using SQL with a data source, the SQL will be applied as a transformation over the data from the specified source.

Additional metadata fields such as description and tags do not affect SQL generation but are preserved for documentation and tooling.

Attributes:

Name Type Description
name str

Unique view name within the config.

schema str

Optional schema name for organizing views in DuckDB schemas.

sql Optional[str]

Raw SQL text defining the view body.

sql_file Optional[SQLFileReference]

Direct reference to a SQL file.

sql_template Optional[SQLFileReference]

Reference to a SQL template file with variable substitution.

source Optional[EnvSource]

Source type (e.g. "parquet", "iceberg", "duckdb").

uri Optional[str]

URI for file- or table-based sources (Parquet/Delta/Iceberg).

database Optional[str]

Attachment alias for attached-database sources.

table Optional[str]

Table name (optionally schema-qualified) for attached sources.

catalog Optional[str]

Iceberg catalog name for catalog-based Iceberg views.

options dict[str, Any]

Source-specific options passed to scan functions.

description Optional[str]

Optional human-readable description of the view.

tags list[str]

Optional list of tags for classification.

connect_to_catalog(config_path, database_path=None, read_only=False, force_rebuild=False)

Create a CatalogConnection instance that manages DuckDB connections with state restoration.

This is the primary entry point for working with Duckalog catalogs in Python. It returns a :class:CatalogConnection instance which lazily establishes a DuckDB connection and automatically restores session state (pragmas, attachments, etc.) and performs incremental view updates.

Parameters:

Name Type Description Default
config_path str

Path to the YAML/JSON configuration file.

required
database_path str | None

Optional database path override.

None
read_only bool

Open the connection in read-only mode for safety.

False
force_rebuild bool

If True, all views will be recreated even if they exist.

False

Returns:

Name Type Description
A CatalogConnection

class:CatalogConnection instance.

Example

Using as a context manager::

from duckalog import connect_to_catalog
with connect_to_catalog("catalog.yaml") as catalog:
    conn = catalog.get_connection()
    result = conn.execute("SELECT * FROM my_view").fetchall()

Using for persistent state management::

catalog = connect_to_catalog("catalog.yaml")
conn1 = catalog.get_connection()
# ... later ...
conn2 = catalog.get_connection()  # Returns the same connection
catalog.close()

connect_to_catalog_cm(config_path, database_path=None, read_only=False, force_rebuild=False)

Context manager that yields an active, state-restored DuckDB connection.

This provides the same state restoration and incremental update benefits as :class:CatalogConnection, but yields the raw DuckDB connection object for convenience in simple scripts.

Usage::

from duckalog import connect_to_catalog_cm
with connect_to_catalog_cm("catalog.yaml") as conn:
    data = conn.execute("SELECT * FROM users").fetchall()
    print(f"Found {len(data)} records")
# Connection automatically closed here

Parameters:

Name Type Description Default
config_path str

Path to the YAML/JSON configuration file.

required
database_path str | None

Optional database path override.

None
read_only bool

Open the connection in read-only mode for safety.

False
force_rebuild bool

If True, all views will be recreated.

False

Yields:

Type Description
Generator[DuckDBPyConnection]

An active DuckDB connection with catalog state restored.

create_config_template(format='yaml', output_path=None, database_name='analytics_catalog.duckdb', project_name='my_analytics_project')

Generate a basic, valid Duckalog configuration template.

This function creates a configuration template with sensible defaults and educational example content that demonstrates key Duckalog features.

Parameters:

Name Type Description Default
format ConfigFormat

Output format for the configuration ('yaml' or 'json'). Defaults to 'yaml'.

'yaml'
output_path str | None

Optional path to write the configuration file. If provided, the template is written to this path and the content is also returned as a string.

None
database_name str

Name for the DuckDB database file.

'analytics_catalog.duckdb'
project_name str

Name used in comments to personalize the template.

'my_analytics_project'

Returns:

Type Description
str

The generated configuration as a string.

Raises:

Type Description
ValueError

If format is not 'yaml' or 'json'.

ConfigError

If the generated template fails validation.

OSError

If writing to output_path fails.

Example

Generate a YAML template::

template = create_config_template(format='yaml')
print(template)

Generate and save a JSON template::

template = create_config_template(
    format='json',
    output_path='my_config.json'
)

generate_all_views_sql(config, include_secrets=False)

Generate SQL for all views in a configuration.

The output includes a descriptive header with the config version followed by a CREATE OR REPLACE VIEW statement for each view in the order they appear in the configuration.

Parameters:

Name Type Description Default
config Config

The validated :class:Config instance to render.

required
include_secrets bool

Whether to include CREATE SECRET statements for secrets.

False

Returns:

Type Description
str

A multi-statement SQL script suitable for use as a catalog definition.

generate_secret_sql(secret)

Generate CREATE SECRET statement for a DuckDB secret.

Parameters:

Name Type Description Default
secret SecretConfig

Secret configuration object.

required

Returns:

Type Description
str

SQL CREATE SECRET statement.

generate_sql(config_path)

Generate a full SQL script from a config file.

This is a convenience wrapper around :func:load_config and :func:generate_all_views_sql that does not connect to DuckDB.

Parameters:

Name Type Description Default
config_path str

Path to the YAML/JSON configuration file.

required

Returns:

Type Description
str

A multi-statement SQL script containing CREATE OR REPLACE VIEW

str

statements for all configured views.

Raises:

Type Description
ConfigError

If the configuration file is invalid.

Example

from duckalog import generate_sql sql = generate_sql("catalog.yaml") print("CREATE VIEW" in sql) True

generate_view_sql(view)

Generate a CREATE OR REPLACE VIEW statement for a single view.

Parameters:

Name Type Description Default
view ViewConfig

The :class:ViewConfig to generate SQL for.

required

Returns:

Type Description
str

A single SQL statement that creates or replaces the view.

load_config(path, load_sql_files=True, sql_file_loader=None, resolve_paths=True, filesystem=None, load_dotenv=True)

Load, interpolate, and validate a Duckalog configuration file.

quote_ident(value)

Quote a SQL identifier using double quotes.

This helper wraps a string in double quotes and escapes any embedded double quotes according to SQL rules.

Parameters:

Name Type Description Default
value str

Identifier to quote (for example, a view or column name).

required

Returns:

Type Description
str

The identifier wrapped in double quotes.

Example

quote_ident("events") '"events"'

quote_literal(value)

Quote a SQL string literal using single quotes.

This helper wraps a string in single quotes and escapes any embedded single quotes according to SQL rules.

Parameters:

Name Type Description Default
value str

String literal to quote (for example, a file path, secret, or connection string).

required

Returns:

Type Description
str

The string wrapped in single quotes with proper escaping.

Example

quote_literal("path/to/file.parquet") "'path/to/file.parquet'" quote_literal("user's data") "'user''s data'"

render_options(options)

Render a mapping of options into scan-function arguments.

The resulting string is suitable for appending to a *_scan function call. Keys are sorted alphabetically to keep output deterministic.

Parameters:

Name Type Description Default
options dict[str, Any]

Mapping of option name to value (str, bool, int, or float).

required

Returns:

Type Description
str

A string that starts with , when options are present (for example,

str

", hive_partitioning=TRUE") or an empty string when no options

str

are provided.

Raises:

Type Description
TypeError

If a value has a type that cannot be rendered safely.

validate_config(config_path)

Validate a configuration file without touching DuckDB.

Parameters:

Name Type Description Default
config_path str

Path to the YAML/JSON configuration file.

required

Raises:

Type Description
ConfigError

If the configuration file is missing, malformed, or does not satisfy the schema and interpolation rules.

Example

from duckalog import validate_config validate_config("catalog.yaml") # raises on invalid config

validate_generated_config(content, format='yaml')

Validate that generated configuration content can be loaded successfully.

Parameters:

Name Type Description Default
content str

Configuration content as string.

required
format ConfigFormat

Format of the content ('yaml' or 'json').

'yaml'

Raises:

Type Description
ConfigError

If the configuration cannot be loaded or is invalid.

New Connection Management Architecture

CatalogConnection Class

The new CatalogConnection class is the primary interface for working with Duckalog catalogs. It provides intelligent connection management and session state restoration.

from duckalog import connect_to_catalog

# Create connection
conn = connect_to_catalog("catalog.yaml")

# The connection manages:
# - Automatic catalog building
# - Connection pooling and reuse
# - Session state restoration
# - Incremental view updates

Key Methods

# Get raw DuckDB connection when needed
duckdb_conn = conn.get_connection()

# Clean up the connection and resources
conn.close()

Context Manager Usage

# Recommended usage pattern
with connect_to_catalog("catalog.yaml") as conn:
    # Connection automatically managed
    # Pragmas, settings, and attachments restored
    # Cleanup performed on exit

    result = conn.get_connection().execute("SELECT COUNT(*) FROM users").fetchone()
    print(f"Total users: {result[0]}")

# Connection automatically cleaned up

Session State Features

with connect_to_catalog("catalog.yaml") as conn:
    # Session state is automatically restored:
    # - DuckDB pragmas
    # - Session settings  
    # - Database attachments
    # - Persistent secrets
    # - Custom functions

    # All views are available
    for table in conn.get_connection().execute("SHOW TABLES").fetchall():
        print(f"Available view: {table[0]}")

New Architecture Patterns

Configuration Loading with Dependency Injection

The new load_config() function from duckalog.config.api provides enhanced configuration loading with dependency injection support:

from duckalog.config.api import load_config

# Basic usage (same as before)
config = load_config("catalog.yaml")

# Advanced usage with custom dependencies
config = load_config(
    "catalog.yaml",
    sql_file_loader=custom_loader,
    filesystem=custom_fs,
    load_dotenv=False
)

Request-Scoped Caching Context

The new architecture provides request-scoped caching for performance optimization:

from duckalog.config.resolution.imports import request_cache_scope

# Use request context for caching
with request_cache_scope() as context:
    # All config loading within this scope shares caches
    config1 = load_config("config1.yaml")
    config2 = load_config("config2.yaml")  # Reuses resolved imports

Import Patterns

# New modular imports (recommended)
from duckalog.config.api import load_config
from duckalog.config.resolution.imports import request_cache_scope
from duckalog.config.resolution.base import ImportResolver, ImportContext
from duckalog.config.resolution.env import EnvProcessor, DefaultEnvProcessor

# High-level convenience functions (unchanged)
from duckalog import (
    generate_sql,
    validate_config,
    connect_to_catalog,
    connect_to_catalog_cm,
    connect_to_catalog,
)

# Configuration models
from duckalog.config import (
    Config,
    DuckDBConfig,
    AttachmentsConfig,
    ViewConfig,
    SecretConfig,
)

Backward Compatibility Imports

All existing import patterns continue to work:

# Legacy imports (still supported)
from duckalog import load_config  # Re-exports from config.api
from duckalog.config import load_config  # Re-exports from config.api

# Direct module imports
from duckalog.config.api import load_config  # New location

Advanced Usage Examples

Custom Dependency Injection

from duckalog.config.resolution.base import ImportResolver, ImportContext
from duckalog.config.resolution.imports import DefaultImportResolver

# Create custom resolver
class CustomImportResolver(ImportResolver):
    def __init__(self, custom_config):
        self.custom_config = custom_config

    def resolve(self, config_data: dict, context: ImportContext) -> dict:
        # Custom import resolution logic
        return resolved_config

# Use with load_config
with request_cache_scope() as request_context:
    resolver = CustomImportResolver(custom_config)
    # Integrate with the loading process

Connection Management Patterns

from duckalog import connect_to_catalog

# Pattern 1: Single connection with multiple queries
conn = connect_to_catalog("catalog.yaml")
users = conn.get_connection().execute("SELECT * FROM users").fetchall()
orders = conn.get_connection().execute("SELECT * FROM orders").fetchall()

# Pattern 2: Context manager for automatic cleanup
with connect_to_catalog("catalog.yaml") as conn:
    # Process data with automatic connection management
    results = conn.get_connection().execute("""
        SELECT u.username, COUNT(o.order_id) as order_count
        FROM users u
        LEFT JOIN orders o ON u.user_id = o.user_id
        GROUP BY u.username
    """).fetchall()

# Pattern 3: Multiple connections with connection pooling
with connect_to_catalog("analytics.yaml") as analytics_conn:
    with connect_to_catalog("warehouse.yaml") as warehouse_conn:
        # Both connections managed independently
        analytics_data = analytics_conn.get_connection().execute("SELECT * FROM metrics").fetchall()
        warehouse_data = warehouse_conn.get_connection().execute("SELECT * FROM inventory").fetchall()

Performance Optimization

# Incremental builds - only missing views created
conn = connect_to_catalog("catalog.yaml")  # Fast for subsequent runs

# Force rebuild when needed
conn = connect_to_catalog("catalog.yaml", force_rebuild=True)

# Connection reuse across operations
with connect_to_catalog("catalog.yaml") as conn:
    # Reuse same connection for multiple queries
    results = conn.get_connection().execute("COMPLEX_QUERY").fetchall()
    # Connection reused, no reconnection overhead

Custom Filesystem Implementations

import fsspec
from duckalog import load_config

# Create custom filesystem
custom_fs = fsspec.filesystem(
    "myprotocol",
    key="access-key",
    secret="secret-key"
)

# Load config with custom filesystem
config = load_config(
    "myprotocol://bucket/config.yaml",
    filesystem=custom_fs
)

# Use with new connection management
from duckalog import connect_to_catalog
conn = connect_to_catalog(
    "myprotocol://bucket/config.yaml",
    filesystem=custom_fs
)

Performance Optimization with Caching

from duckalog.config.resolution.imports import request_cache_scope
from duckalog.config.api import load_config

# Batch load configurations efficiently
configs = []
with request_cache_scope() as context:
    for config_file in config_files:
        config = load_config(config_file)
        configs.append(config)
    # All configs share resolved imports and environment variables

Custom Environment Processing

from duckalog.config.resolution.env import EnvProcessor, DefaultEnvProcessor

class CustomEnvProcessor(EnvProcessor):
    def process(self, config_data: dict, load_dotenv: bool = True) -> dict:
        # Custom environment variable processing
        # e.g., add custom prefixes, special handling
        return processed_config

# Use with custom environment processor
env_processor = CustomEnvProcessor()

Migration from Old Patterns

From Legacy load_config

# Old pattern (still works)
from duckalog import load_config
config = load_config("config.yaml")

# New pattern with same interface
from duckalog.config.api import load_config
config = load_config("config.yaml")

# New pattern with additional options
config = load_config(
    "config.yaml",
    resolve_paths=True,
    filesystem=custom_fs,
    load_dotenv=False
)

Custom Implementation Patterns

# For extending the configuration system
from duckalog.config.resolution.base import ImportResolver, ImportContext
from duckalog.config.api import _load_config_from_local_file

class MyCustomResolver(ImportResolver):
    def resolve(self, config_data: dict, context: ImportContext) -> dict:
        # Your custom logic here
        return config_data

# Use in custom loading scenarios
def my_custom_loader(path: str, **kwargs):
    return _load_config_from_local_file(
        path=path,
        sql_file_loader=kwargs.get("sql_file_loader"),
        resolve_paths=kwargs.get("resolve_paths", True),
        filesystem=kwargs.get("filesystem"),
        load_dotenv=kwargs.get("load_dotenv", True),
    )

Configuration Models

Core Models

The configuration models are now organized in the duckalog.config package:

duckalog.config.models

Configuration models and schema definitions for Duckalog catalogs.

This module contains all Pydantic models used for configuration validation and typing. These models form the foundation of the configuration system and must not import from other config modules to avoid circular dependencies.

AttachmentsConfig

Bases: BaseModel

Collection of attachment configurations.

Attributes:

Name Type Description
duckdb list[DuckDBAttachment]

DuckDB attachment entries.

sqlite list[SQLiteAttachment]

SQLite attachment entries.

postgres list[PostgresAttachment]

Postgres attachment entries.

duckalog list[DuckalogAttachment]

Duckalog config attachment entries.

Config

Bases: BaseModel

Top-level Duckalog configuration.

Attributes:

Name Type Description
version int

Positive integer describing the config schema version.

duckdb DuckDBConfig

DuckDB session and connection settings.

views list[ViewConfig]

List of view definitions to create in the catalog.

attachments AttachmentsConfig

Optional attachments to external databases.

iceberg_catalogs list[IcebergCatalogConfig]

Optional Iceberg catalog definitions.

semantic_models list[SemanticModelConfig]

Optional semantic model definitions for business metadata.

imports Union[list[Union[str, ImportEntry]], SelectiveImports]

Optional list of additional config files to import and merge. Can be a simple list of paths (backward compatible) or a SelectiveImports object for advanced options like section-specific imports, override behavior, and glob patterns.

env_files list[str]

Optional list of custom .env file patterns to load. Supports patterns like ['.env', '.env.local', '.env.production']. Files are loaded in order with later files overriding earlier ones. Defaults to ['.env'] for backward compatibility.

loader_settings LoaderSettings

Optional settings for the configuration loader.

DuckDBAttachment

Bases: BaseModel

Configuration for attaching another DuckDB database.

Attributes:

Name Type Description
alias str

Alias under which the database will be attached.

path str

Filesystem path to the DuckDB database file.

read_only bool

Whether the attachment should be opened in read-only mode. Defaults to True for safety.

DuckDBConfig

Bases: BaseModel

DuckDB connection and session settings.

Attributes:

Name Type Description
database str

Path to the DuckDB database file. Defaults to ":memory:".

install_extensions list[str]

Names of extensions to install before use.

load_extensions list[str]

Names of extensions to load in the session.

pragmas list[str]

SQL statements (typically SET pragmas) executed after connecting and loading extensions.

settings Optional[Union[str, list[str]]]

DuckDB SET statements executed after pragmas. Can be a single string or list of strings.

secrets list[SecretConfig]

List of secret definitions for external services and databases.

DuckalogAttachment

Bases: BaseModel

Configuration for attaching another Duckalog catalog config.

Attributes:

Name Type Description
alias str

Alias under which the child catalog will be attached.

config_path str

Path to the child Duckalog config file.

database Optional[str]

Optional override for the child's database file path.

read_only bool

Whether the attachment should be opened in read-only mode. Defaults to True for safety.

IcebergCatalogConfig

Bases: BaseModel

Configuration for an Iceberg catalog.

Attributes:

Name Type Description
name str

Catalog name referenced by Iceberg views.

catalog_type str

Backend type (for example, rest, hive, glue).

uri Optional[str]

Optional URI used by certain catalog types.

warehouse Optional[str]

Optional warehouse location for catalog data.

options dict[str, Any]

Additional catalog-specific options.

ImportEntry

Bases: BaseModel

A single import entry with optional override behavior.

Attributes:

Name Type Description
path str

Path to the configuration file to import.

override bool

Whether this import can override values from earlier imports or the main config. If False, only fills in missing fields without overwriting existing values. Defaults to True.

LoaderSettings

Bases: BaseModel

Settings for the configuration loader.

Attributes:

Name Type Description
concurrency_enabled bool

Whether to enable parallel I/O for imports and SQL files.

max_threads Optional[int]

Maximum number of threads for parallel I/O. Defaults to None (uses default ThreadPoolExecutor behavior).

PostgresAttachment

Bases: BaseModel

Configuration for attaching a Postgres database.

Attributes:

Name Type Description
alias str

Alias used inside DuckDB to reference the Postgres database.

host str

Hostname or IP address of the Postgres server.

port int

TCP port of the Postgres server.

database str

Database name to connect to.

user str

Username for authentication.

password str

Password for authentication.

sslmode Optional[str]

Optional SSL mode (for example, require).

options dict[str, Any]

Extra key/value options passed to the attachment clause.

SQLFileReference

Bases: BaseModel

Reference to SQL content in an external file.

Attributes:

Name Type Description
path str

Path to the SQL file (relative or absolute).

variables Optional[dict[str, Any]]

Dictionary of variables for template substitution.

as_template bool

Whether to process the file content as a template.

SQLiteAttachment

Bases: BaseModel

Configuration for attaching a SQLite database.

Attributes:

Name Type Description
alias str

Alias under which the SQLite database will be attached.

path str

Filesystem path to the SQLite .db file.

SecretConfig

Bases: BaseModel

Configuration for a DuckDB secret.

Attributes:

Name Type Description
type SecretType

Secret type (s3, azure, gcs, http, postgres, mysql).

name Optional[str]

Optional name for the secret (defaults to type if not provided).

provider SecretProvider

Secret provider (config or credential_chain).

persistent bool

Whether to create a persistent secret. Defaults to False.

scope Optional[str]

Optional scope prefix for the secret.

key_id Optional[str]

Access key ID or username for authentication.

secret Optional[str]

Secret key or password for authentication.

region Optional[str]

Geographic region for cloud services.

endpoint Optional[str]

Custom endpoint URL for cloud services.

connection_string Optional[str]

Full connection string for databases.

tenant_id Optional[str]

Azure tenant ID for authentication.

account_name Optional[str]

Azure storage account name.

client_id Optional[str]

Azure client ID for authentication.

client_secret Optional[str]

Azure client secret for authentication.

service_account_key Optional[str]

GCS service account key.

json_key Optional[str]

GCS JSON key.

bearer_token Optional[str]

HTTP bearer token for authentication.

header Optional[str]

HTTP header for authentication.

database Optional[str]

Database name for database secrets.

host Optional[str]

Database host for database secrets.

port Optional[int]

Database port for database secrets.

user Optional[str]

Database username (alternative to key_id for database types).

password Optional[str]

Database password (alternative to secret for database types).

options dict[str, Any]

Additional key-value options for the secret.

SelectiveImports

Bases: BaseModel

Section-specific imports for targeted configuration merging.

Each field represents a section of the Config that can have targeted imports. Values can be either simple paths (strings) or ImportEntry objects with options.

SemanticDefaultsConfig

Bases: BaseModel

Default configuration for a semantic model.

Provides default settings for query builders and dashboards, such as the primary time dimension and default measures.

Attributes:

Name Type Description
time_dimension Optional[str]

Default time dimension name.

primary_measure Optional[str]

Default primary measure name.

default_filters list[dict[str, Any]]

Optional list of default filters.

SemanticDimensionConfig

Bases: BaseModel

Definition of a semantic dimension.

A dimension represents a business attribute that maps to an expression over the base view of a semantic model.

Attributes:

Name Type Description
name str

Unique dimension name within the semantic model.

expression str

SQL expression referencing columns from the base view.

label Optional[str]

Human-readable display name.

description Optional[str]

Optional detailed description.

type Optional[str]

Optional data type hint (time, number, string, boolean, date).

time_grains list[str]

Optional list of time grains for time dimensions.

SemanticJoinConfig

Bases: BaseModel

Definition of a semantic join.

A join defines a relationship to another view for enriching the semantic model with additional data, typically dimension tables.

Attributes:

Name Type Description
to_view str

Name of an existing view in the views section to join to.

type str

Join type (inner, left, right, full).

on_condition str

SQL join condition expression.

SemanticMeasureConfig

Bases: BaseModel

Definition of a semantic measure.

A measure represents a business metric that typically involves aggregation or calculation over the base view of a semantic model.

Attributes:

Name Type Description
name str

Unique measure name within the semantic model.

expression str

SQL expression (often aggregated) over the base view.

label Optional[str]

Human-readable display name.

description Optional[str]

Optional detailed description.

type Optional[str]

Optional data type hint.

SemanticModelConfig

Bases: BaseModel

Definition of a semantic model.

A semantic model provides business-friendly metadata on top of an existing Duckalog view, defining dimensions and measures for analytics and BI use cases.

Attributes:

Name Type Description
name str

Unique semantic model name within the config.

base_view str

Name of an existing view in the views section.

dimensions list[SemanticDimensionConfig]

Optional list of dimension definitions.

measures list[SemanticMeasureConfig]

Optional list of measure definitions.

joins list[SemanticJoinConfig]

Optional list of join definitions to other views.

defaults Optional[SemanticDefaultsConfig]

Optional default configuration for query builders.

label Optional[str]

Human-readable display name.

description Optional[str]

Optional detailed description.

tags list[str]

Optional list of classification tags.

ViewConfig

Bases: BaseModel

Definition of a single catalog view.

A view can be defined in several ways: 1. Inline SQL: Using the sql field with raw SQL text 2. SQL File: Using sql_file to reference external SQL files 3. SQL Template: Using sql_template for parameterized SQL files 4. Data Source: Using source + required fields for direct data access 5. Source + SQL: Using source for data access plus sql for transformations

For data sources, the required fields depend on the source type: - Parquet/Delta: uri field is required - Iceberg: Either uri OR both catalog and table - DuckDB/SQLite/Postgres: Both database and table are required

When using SQL with a data source, the SQL will be applied as a transformation over the data from the specified source.

Additional metadata fields such as description and tags do not affect SQL generation but are preserved for documentation and tooling.

Attributes:

Name Type Description
name str

Unique view name within the config.

schema str

Optional schema name for organizing views in DuckDB schemas.

sql Optional[str]

Raw SQL text defining the view body.

sql_file Optional[SQLFileReference]

Direct reference to a SQL file.

sql_template Optional[SQLFileReference]

Reference to a SQL template file with variable substitution.

source Optional[EnvSource]

Source type (e.g. "parquet", "iceberg", "duckdb").

uri Optional[str]

URI for file- or table-based sources (Parquet/Delta/Iceberg).

database Optional[str]

Attachment alias for attached-database sources.

table Optional[str]

Table name (optionally schema-qualified) for attached sources.

catalog Optional[str]

Iceberg catalog name for catalog-based Iceberg views.

options dict[str, Any]

Source-specific options passed to scan functions.

description Optional[str]

Optional human-readable description of the view.

tags list[str]

Optional list of tags for classification.

Import Resolution Models

duckalog.config.resolution.base

EnvProcessor

Bases: Protocol

Interface for environment variable processors.

process(config_data, load_dotenv=True)

Process environment variables and .env files.

ImportContext dataclass

Tracks import state during loading.

ImportResolver

Bases: Protocol

Interface for configuration import resolvers.

resolve(config_data, context)

Resolve imports within a configuration dictionary.

Environment Processing Models

duckalog.config.resolution.env

Environment processing helpers with request-scoped caching.

This module extracts .env discovery/merging from the legacy loader and exposes it behind the EnvProcessor protocol for dependency injection.

DefaultEnvProcessor

Bases: EnvProcessor

EnvProcessor that mirrors legacy loader behavior.

EnvCache dataclass

Cache for loaded .env files to avoid duplicate filesystem access.

env_cache_scope(cache=None)

Context manager to provide request-scoped env cache.

Error Handling

The new architecture provides enhanced error handling with specific exception types:

Configuration Errors

duckalog.errors

Unified exception hierarchy for Duckalog.

This module provides a consistent exception hierarchy for all Duckalog errors, ensuring that all library exceptions derive from a common base class and can be caught uniformly by users.

CircularImportError

Bases: ImportError

Circular import detected.

ConfigError

Bases: DuckalogError

Configuration-related errors.

This exception is raised when a catalog configuration cannot be read, parsed, interpolated, or validated according to the Duckalog schema.

Typical error conditions include:

  • The config file does not exist or cannot be read.
  • The file is not valid YAML/JSON.
  • Required fields are missing or invalid.
  • An environment variable placeholder cannot be resolved.

DuckalogError

Bases: Exception

Base exception for all Duckalog errors.

This is the root of the exception hierarchy. All Duckalog-specific exceptions should inherit from this class to provide a unified error handling interface for users of the library.

Users can catch this class to handle any Duckalog-specific error, or catch more specific subclasses for targeted error handling.

__init__(message, cause=None)

Initialize DuckalogError with optional cause chaining.

Parameters:

Name Type Description Default
message str

Human-readable error message.

required
cause Exception | None

Optional underlying exception that caused this error.

None

__str__()

Return the error message.

DuplicateNameError

Bases: ImportError

Duplicate name across imported configs.

EngineError

Bases: DuckalogError

Engine-level error raised during catalog builds.

This exception wraps lower-level DuckDB errors, such as failures to connect to the database, attach external systems, or execute generated SQL statements.

ImportError

Bases: ConfigError

Base class for import-related errors.

ImportFileNotFoundError

Bases: ImportError

Imported file not found.

ImportValidationError

Bases: ImportError

Imported file validation failed.

PathResolutionError

Bases: ConfigError

Raised when path resolution fails due to security or access issues.

This exception is a subclass of ConfigError since path resolution issues are typically configuration-related problems.

RemoteConfigError

Bases: ConfigError

Error raised when remote configuration loading fails.

This exception wraps lower-level errors from remote storage systems when attempting to load configuration files from locations like S3, GCS, Azure Blob Storage, or HTTP endpoints.

SQLFileEncodingError

Bases: SQLFileError

Raised when a SQL file has invalid encoding.

SQLFileError

Bases: ConfigError

Base exception for SQL file-related errors.

This exception is raised when SQL file operations fail, such as when a referenced SQL file cannot be found, read, or processed.

SQLFileNotFoundError

Bases: SQLFileError

Raised when a referenced SQL file does not exist.

SQLFilePermissionError

Bases: SQLFileError

Raised when a SQL file cannot be read due to permissions.

SQLFileSizeError

Bases: SQLFileError

Raised when a SQL file exceeds size limits.

SQLTemplateError

Bases: SQLFileError

Raised when template processing fails.

Import Resolution Errors

duckalog.config.resolution.imports

Import resolution extracted from the legacy loader with DI hooks.

RequestContext dataclass

Context for request-scoped caching and state management.

Performance Considerations

Caching Benefits

  • Request-Scoped Caching: Shared resolution across multiple config loads
  • Import Resolution Caching: Avoid re-processing the same imports
  • Environment Variable Caching: Reuse resolved environment values
  • Path Resolution Caching: Cache normalized paths for reuse

Memory Management

  • Automatic Cache Cleanup: Request scopes automatically clear caches
  • Import Chain Tracking: Prevent circular dependencies with minimal overhead
  • Efficient Context Management: Lightweight context objects for tracking

Security Features

Enhanced Path Security

The new architecture maintains and enhances the existing security features:

  • Relative Path Resolution: All paths resolved relative to config location
  • Security Boundary Validation: Prevent directory traversal attacks
  • Cross-Platform Support: Consistent behavior across operating systems
  • Remote URI Support: Secure handling of remote configuration sources

Dependency Injection Security

  • Controlled Interface: Safe injection points for custom implementations
  • Validation Integration: Custom resolvers participate in validation
  • Context Isolation: Each request maintains isolated state
  • Audit Trail: Track configuration resolution through contexts

Testing Support

Mockable Interfaces

All new interfaces are designed for easy testing:

from unittest.mock import Mock
from duckalog.config.resolution.base import ImportResolver

# Create mock resolver for testing
mock_resolver = Mock(spec=ImportResolver)
mock_resolver.resolve.return_value = test_config

# Test with mocked dependencies
with request_cache_scope() as context:
    # Use mock resolver in tests
    pass

Test Utilities

# Test with temporary configurations
from tempfile import NamedTemporaryFile
from duckalog.config.api import load_config

with NamedTemporaryFile(mode='w', suffix='.yaml') as f:
    f.write(test_config_content)
    f.flush()
    config = load_config(f.name)

Extensibility Points

Custom Import Resolvers

Extend the import resolution system:

class DatabaseImportResolver(ImportResolver):
    """Load imports from database instead of files."""

    def resolve(self, config_data: dict, context: ImportContext) -> dict:
        # Implement database-based import loading
        pass

Custom Environment Processors

Add custom environment handling:

class VaultEnvProcessor(EnvProcessor):
    """Load secrets from vault systems."""

    def process(self, config_data: dict, load_dotenv: bool = True) -> dict:
        # Implement vault integration
        pass

Custom Filesystem Integration

Add support for new protocols:

class CustomFilesystem:
    """Custom filesystem implementation."""

    def open(self, path, mode='r'):
        # Implement custom file access
        pass