API Reference¶

This reference provides comprehensive documentation for the Duckalog Python API, including the new modular architecture patterns, dependency injection interfaces, and advanced usage patterns.

Core API Overview¶

The main API provides both simple convenience functions and advanced patterns for custom implementations. With the new config-driven connection management system, the primary API now returns a CatalogConnection object for intelligent connection management.

Core Functions¶

New: connect_to_catalog()¶

The primary function for connecting to catalogs with intelligent connection management:

from duckalog import connect_to_catalog

# Basic usage - returns CatalogConnection object
conn = connect_to_catalog("catalog.yaml")

# Use as context manager for automatic cleanup
with connect_to_catalog("catalog.yaml") as conn:
    # conn is a CatalogConnection object
    result = conn.get_connection().execute("SELECT COUNT(*) FROM users").fetchall()

# Force rebuild of all views
conn = connect_to_catalog("catalog.yaml", force_rebuild=True)

# Custom database path
conn = connect_to_catalog("catalog.yaml", database_path="custom.duckdb")

Key Features of CatalogConnection¶

Automatic Connection Management: Handles connection pooling and reuse
Session State Restoration: Automatically restores pragmas, settings, and attachments
Incremental Updates: Only creates missing views for faster builds
Lazy Connection: Database connection established only when needed
Context Manager Support: Automatic cleanup when used with with statement

Legacy: connect_to_catalog_cm()¶

For backward compatibility, the context manager function still works:

from duckalog import connect_to_catalog_cm

# Old style (still supported for compatibility)
with connect_to_catalog_cm("catalog.yaml") as conn:
    # conn is raw DuckDB connection (legacy behavior)
    result = conn.execute("SELECT * FROM users").fetchall()

Legacy Functions¶

All existing functions continue to work but are deprecated in favor of the new connection management approach:

from duckalog import (
    generate_sql,           # Generate SQL from config
    validate_config,        # Validate configuration
    connect_to_catalog,     # Build and connect in one operation
)

`duckalog` ¶

Duckalog public API.

`AttachmentsConfig` ¶

Bases: BaseModel

Collection of attachment configurations.

Attributes:

Name	Type	Description
`duckdb`	`list[DuckDBAttachment]`	DuckDB attachment entries.
`sqlite`	`list[SQLiteAttachment]`	SQLite attachment entries.
`postgres`	`list[PostgresAttachment]`	Postgres attachment entries.
`duckalog`	`list[DuckalogAttachment]`	Duckalog config attachment entries.

`CatalogConnection` ¶

Manages DuckDB connections with session state restoration and lazy initialization.

This class ensures that every connection obtained through it has the correct catalog state (pragmas, settings, extensions, attachments, and secrets) applied, regardless of whether it's a new or existing connection.

Attributes:

Name	Type	Description
`config_path`		Path to the Duckalog configuration file.
`database_path`		Optional override for the DuckDB database path.
`read_only`		Whether to open the connection in read-only mode.
`force_rebuild`		If True, all views will be recreated even if they exist.
`config`	`Optional[Config]`	The loaded and validated configuration.
`conn`	`Optional[DuckDBPyConnection]`	The active DuckDB connection, or None if not initialized.

`enter()` ¶

Context manager support: returns the CatalogConnection instance.

`exit(exc_type, exc_val, exc_tb)` ¶

Context manager support: ensures connection cleanup.

`init(config_path, database_path=None, read_only=False, force_rebuild=False, filesystem=None, load_dotenv=True)` ¶

Initialize the catalog connection manager.

Parameters:

Name	Type	Description	Default
`config_path`	`str`	Path to the Duckalog configuration file.	required
`database_path`	`Optional[str]`	Optional override for the DuckDB database path.	`None`
`read_only`	`bool`	Whether to open the connection in read-only mode.	`False`
`force_rebuild`	`bool`	If True, all views will be recreated even if they exist.	`False`
`filesystem`	`Optional[Any]`	Optional fsspec filesystem object for remote file access.	`None`
`load_dotenv`	`bool`	If True, automatically load and process .env files.	`True`

`close()` ¶

Clean up the DuckDB connection and resources.

`get_connection()` ¶

Get the DuckDB connection, initializing it if necessary.

This method establishes the connection lazily on the first call, restores the session state, and performs incremental updates. Subsequent calls return the same connection instance.

Returns:

Type	Description
`DuckDBPyConnection`	An active DuckDB connection with catalog state restored.

Raises:

Type	Description
`ConfigError`	If loading the configuration fails.
`FileNotFoundError`	If the database path is invalid or missing (for existing catalogs).
`EngineError`	If connecting to DuckDB or restoring state fails.

`Config` ¶

Bases: BaseModel

Top-level Duckalog configuration.

Attributes:

Name	Type	Description
`version`	`int`	Positive integer describing the config schema version.
`duckdb`	`DuckDBConfig`	DuckDB session and connection settings.
`views`	`list[ViewConfig]`	List of view definitions to create in the catalog.
`attachments`	`AttachmentsConfig`	Optional attachments to external databases.
`iceberg_catalogs`	`list[IcebergCatalogConfig]`	Optional Iceberg catalog definitions.
`semantic_models`	`list[SemanticModelConfig]`	Optional semantic model definitions for business metadata.
`imports`	`Union[list[Union[str, ImportEntry]], SelectiveImports]`	Optional list of additional config files to import and merge. Can be a simple list of paths (backward compatible) or a SelectiveImports object for advanced options like section-specific imports, override behavior, and glob patterns.
`env_files`	`list[str]`	Optional list of custom .env file patterns to load. Supports patterns like ['.env', '.env.local', '.env.production']. Files are loaded in order with later files overriding earlier ones. Defaults to ['.env'] for backward compatibility.
`loader_settings`	`LoaderSettings`	Optional settings for the configuration loader.

`ConfigError` ¶

Bases: DuckalogError

Configuration-related errors.

This exception is raised when a catalog configuration cannot be read, parsed, interpolated, or validated according to the Duckalog schema.

Typical error conditions include:

The config file does not exist or cannot be read.
The file is not valid YAML/JSON.
Required fields are missing or invalid.
An environment variable placeholder cannot be resolved.

`DuckDBAttachment` ¶

Bases: BaseModel

Configuration for attaching another DuckDB database.

Attributes:

Name	Type	Description
`alias`	`str`	Alias under which the database will be attached.
`path`	`str`	Filesystem path to the DuckDB database file.
`read_only`	`bool`	Whether the attachment should be opened in read-only mode. Defaults to `True` for safety.

`DuckDBConfig` ¶

Bases: BaseModel

DuckDB connection and session settings.

Attributes:

Name	Type	Description
`database`	`str`	Path to the DuckDB database file. Defaults to `":memory:"`.
`install_extensions`	`list[str]`	Names of extensions to install before use.
`load_extensions`	`list[str]`	Names of extensions to load in the session.
`pragmas`	`list[str]`	SQL statements (typically `SET` pragmas) executed after connecting and loading extensions.
`settings`	`Optional[Union[str, list[str]]]`	DuckDB SET statements executed after pragmas. Can be a single string or list of strings.
`secrets`	`list[SecretConfig]`	List of secret definitions for external services and databases.

`EngineError` ¶

Bases: DuckalogError

Engine-level error raised during catalog builds.

This exception wraps lower-level DuckDB errors, such as failures to connect to the database, attach external systems, or execute generated SQL statements.

`IcebergCatalogConfig` ¶

Bases: BaseModel

Configuration for an Iceberg catalog.

Attributes:

Name	Type	Description
`name`	`str`	Catalog name referenced by Iceberg views.
`catalog_type`	`str`	Backend type (for example, `rest`, `hive`, `glue`).
`uri`	`Optional[str]`	Optional URI used by certain catalog types.
`warehouse`	`Optional[str]`	Optional warehouse location for catalog data.
`options`	`dict[str, Any]`	Additional catalog-specific options.

`PostgresAttachment` ¶

Bases: BaseModel

Configuration for attaching a Postgres database.

Attributes:

Name	Type	Description
`alias`	`str`	Alias used inside DuckDB to reference the Postgres database.
`host`	`str`	Hostname or IP address of the Postgres server.
`port`	`int`	TCP port of the Postgres server.
`database`	`str`	Database name to connect to.
`user`	`str`	Username for authentication.
`password`	`str`	Password for authentication.
`sslmode`	`Optional[str]`	Optional SSL mode (for example, `require`).
`options`	`dict[str, Any]`	Extra key/value options passed to the attachment clause.

`SQLFileEncodingError` ¶

Bases: SQLFileError

Raised when a SQL file has invalid encoding.

`SQLFileError` ¶

Bases: ConfigError

Base exception for SQL file-related errors.

This exception is raised when SQL file operations fail, such as when a referenced SQL file cannot be found, read, or processed.

`SQLFileLoader` ¶

Loads SQL content from external files and processes templates.

`init()` ¶

Initialize the SQL file loader.

`load_sql_file(file_path, config_file_path, variables=None, as_template=False, filesystem=None)` ¶

Load SQL content from a file and optionally process as a template.

Parameters:

Name	Type	Description	Default
`file_path`	`str`	Path to the SQL file (can be relative or absolute)	required
`config_file_path`	`str`	Path to the config file for resolving relative paths	required
`variables`	`Optional[dict[str, Any]]`	Dictionary of variables for template substitution	`None`
`as_template`	`bool`	Whether to process the file content as a template	`False`
`filesystem`	`Optional[Any]`	Optional filesystem object for file I/O operations	`None`

Returns:

Type	Description
`str`	The loaded SQL content (processed if template, raw otherwise)

Raises:

Type	Description
`SQLFileError`	If the file cannot be loaded or processed

`SQLFileNotFoundError` ¶

Bases: SQLFileError

Raised when a referenced SQL file does not exist.

`SQLFilePermissionError` ¶

Bases: SQLFileError

Raised when a SQL file cannot be read due to permissions.

`SQLFileReference` ¶

Bases: BaseModel

Reference to SQL content in an external file.

Attributes:

Name	Type	Description
`path`	`str`	Path to the SQL file (relative or absolute).
`variables`	`Optional[dict[str, Any]]`	Dictionary of variables for template substitution.
`as_template`	`bool`	Whether to process the file content as a template.

`SQLFileSizeError` ¶

Bases: SQLFileError

Raised when a SQL file exceeds size limits.

`SQLGroup` ¶

Unified access to all SQL-related functionality.

`SQLTemplateError` ¶

Bases: SQLFileError

Raised when template processing fails.

`SQLiteAttachment` ¶

Bases: BaseModel

Configuration for attaching a SQLite database.

Attributes:

Name	Type	Description
`alias`	`str`	Alias under which the SQLite database will be attached.
`path`	`str`	Filesystem path to the SQLite `.db` file.

`SecretConfig` ¶

Bases: BaseModel

Configuration for a DuckDB secret.

Attributes:

Name	Type	Description
`type`	`SecretType`	Secret type (s3, azure, gcs, http, postgres, mysql).
`name`	`Optional[str]`	Optional name for the secret (defaults to type if not provided).
`provider`	`SecretProvider`	Secret provider (config or credential_chain).
`persistent`	`bool`	Whether to create a persistent secret. Defaults to False.
`scope`	`Optional[str]`	Optional scope prefix for the secret.
`key_id`	`Optional[str]`	Access key ID or username for authentication.
`secret`	`Optional[str]`	Secret key or password for authentication.
`region`	`Optional[str]`	Geographic region for cloud services.
`endpoint`	`Optional[str]`	Custom endpoint URL for cloud services.
`connection_string`	`Optional[str]`	Full connection string for databases.
`tenant_id`	`Optional[str]`	Azure tenant ID for authentication.
`account_name`	`Optional[str]`	Azure storage account name.
`client_id`	`Optional[str]`	Azure client ID for authentication.
`client_secret`	`Optional[str]`	Azure client secret for authentication.
`service_account_key`	`Optional[str]`	GCS service account key.
`json_key`	`Optional[str]`	GCS JSON key.
`bearer_token`	`Optional[str]`	HTTP bearer token for authentication.
`header`	`Optional[str]`	HTTP header for authentication.
`database`	`Optional[str]`	Database name for database secrets.
`host`	`Optional[str]`	Database host for database secrets.
`port`	`Optional[int]`	Database port for database secrets.
`user`	`Optional[str]`	Database username (alternative to key_id for database types).
`password`	`Optional[str]`	Database password (alternative to secret for database types).
`options`	`dict[str, Any]`	Additional key-value options for the secret.

`SemanticDefaultsConfig` ¶

Bases: BaseModel

Default configuration for a semantic model.

Provides default settings for query builders and dashboards, such as the primary time dimension and default measures.

Attributes:

Name	Type	Description
`time_dimension`	`Optional[str]`	Default time dimension name.
`primary_measure`	`Optional[str]`	Default primary measure name.
`default_filters`	`list[dict[str, Any]]`	Optional list of default filters.

`SemanticDimensionConfig` ¶

Bases: BaseModel

Definition of a semantic dimension.

A dimension represents a business attribute that maps to an expression over the base view of a semantic model.

Attributes:

Name	Type	Description
`name`	`str`	Unique dimension name within the semantic model.
`expression`	`str`	SQL expression referencing columns from the base view.
`label`	`Optional[str]`	Human-readable display name.
`description`	`Optional[str]`	Optional detailed description.
`type`	`Optional[str]`	Optional data type hint (time, number, string, boolean, date).
`time_grains`	`list[str]`	Optional list of time grains for time dimensions.

`SemanticJoinConfig` ¶

Bases: BaseModel

Definition of a semantic join.

A join defines a relationship to another view for enriching the semantic model with additional data, typically dimension tables.

Attributes:

Name	Type	Description
`to_view`	`str`	Name of an existing view in the views section to join to.
`type`	`str`	Join type (inner, left, right, full).
`on_condition`	`str`	SQL join condition expression.

`SemanticMeasureConfig` ¶

Bases: BaseModel

Definition of a semantic measure.

A measure represents a business metric that typically involves aggregation or calculation over the base view of a semantic model.

Attributes:

Name	Type	Description
`name`	`str`	Unique measure name within the semantic model.
`expression`	`str`	SQL expression (often aggregated) over the base view.
`label`	`Optional[str]`	Human-readable display name.
`description`	`Optional[str]`	Optional detailed description.
`type`	`Optional[str]`	Optional data type hint.

`SemanticModelConfig` ¶

Bases: BaseModel

Definition of a semantic model.

A semantic model provides business-friendly metadata on top of an existing Duckalog view, defining dimensions and measures for analytics and BI use cases.

Attributes:

Name	Type	Description
`name`	`str`	Unique semantic model name within the config.
`base_view`	`str`	Name of an existing view in the views section.
`dimensions`	`list[SemanticDimensionConfig]`	Optional list of dimension definitions.
`measures`	`list[SemanticMeasureConfig]`	Optional list of measure definitions.
`joins`	`list[SemanticJoinConfig]`	Optional list of join definitions to other views.
`defaults`	`Optional[SemanticDefaultsConfig]`	Optional default configuration for query builders.
`label`	`Optional[str]`	Human-readable display name.
`description`	`Optional[str]`	Optional detailed description.
`tags`	`list[str]`	Optional list of classification tags.

`ViewConfig` ¶

Bases: BaseModel

Definition of a single catalog view.

A view can be defined in several ways: 1. Inline SQL: Using the sql field with raw SQL text 2. SQL File: Using sql_file to reference external SQL files 3. SQL Template: Using sql_template for parameterized SQL files 4. Data Source: Using source + required fields for direct data access 5. Source + SQL: Using source for data access plus sql for transformations

For data sources, the required fields depend on the source type: - Parquet/Delta: uri field is required - Iceberg: Either uri OR both catalog and table - DuckDB/SQLite/Postgres: Both database and table are required

When using SQL with a data source, the SQL will be applied as a transformation over the data from the specified source.

Additional metadata fields such as description and tags do not affect SQL generation but are preserved for documentation and tooling.

Attributes:

Name	Type	Description
`name`	`str`	Unique view name within the config.
`schema`	`str`	Optional schema name for organizing views in DuckDB schemas.
`sql`	`Optional[str]`	Raw SQL text defining the view body.
`sql_file`	`Optional[SQLFileReference]`	Direct reference to a SQL file.
`sql_template`	`Optional[SQLFileReference]`	Reference to a SQL template file with variable substitution.
`source`	`Optional[EnvSource]`	Source type (e.g. `"parquet"`, `"iceberg"`, `"duckdb"`).
`uri`	`Optional[str]`	URI for file- or table-based sources (Parquet/Delta/Iceberg).
`database`	`Optional[str]`	Attachment alias for attached-database sources.
`table`	`Optional[str]`	Table name (optionally schema-qualified) for attached sources.
`catalog`	`Optional[str]`	Iceberg catalog name for catalog-based Iceberg views.
`options`	`dict[str, Any]`	Source-specific options passed to scan functions.
`description`	`Optional[str]`	Optional human-readable description of the view.
`tags`	`list[str]`	Optional list of tags for classification.

`connect_to_catalog(config_path, database_path=None, read_only=False, force_rebuild=False)` ¶

Create a CatalogConnection instance that manages DuckDB connections with state restoration.

This is the primary entry point for working with Duckalog catalogs in Python. It returns a :class:CatalogConnection instance which lazily establishes a DuckDB connection and automatically restores session state (pragmas, attachments, etc.) and performs incremental view updates.

Parameters:

Name	Type	Description	Default
`config_path`	`str`	Path to the YAML/JSON configuration file.	required
`database_path`	`str \| None`	Optional database path override.	`None`
`read_only`	`bool`	Open the connection in read-only mode for safety.	`False`
`force_rebuild`	`bool`	If True, all views will be recreated even if they exist.	`False`

Returns:

Name	Type	Description
`A`	`CatalogConnection`	class:`CatalogConnection` instance.

Example

Using as a context manager::

from duckalog import connect_to_catalog
with connect_to_catalog("catalog.yaml") as catalog:
    conn = catalog.get_connection()
    result = conn.execute("SELECT * FROM my_view").fetchall()

Using for persistent state management::

catalog = connect_to_catalog("catalog.yaml")
conn1 = catalog.get_connection()
# ... later ...
conn2 = catalog.get_connection()  # Returns the same connection
catalog.close()

`connect_to_catalog_cm(config_path, database_path=None, read_only=False, force_rebuild=False)` ¶

Context manager that yields an active, state-restored DuckDB connection.

This provides the same state restoration and incremental update benefits as :class:CatalogConnection, but yields the raw DuckDB connection object for convenience in simple scripts.

Usage::

from duckalog import connect_to_catalog_cm
with connect_to_catalog_cm("catalog.yaml") as conn:
    data = conn.execute("SELECT * FROM users").fetchall()
    print(f"Found {len(data)} records")
# Connection automatically closed here

Parameters:

Name	Type	Description	Default
`config_path`	`str`	Path to the YAML/JSON configuration file.	required
`database_path`	`str \| None`	Optional database path override.	`None`
`read_only`	`bool`	Open the connection in read-only mode for safety.	`False`
`force_rebuild`	`bool`	If True, all views will be recreated.	`False`

Yields:

Type	Description
`Generator[DuckDBPyConnection]`	An active DuckDB connection with catalog state restored.

`create_config_template(format='yaml', output_path=None, database_name='analytics_catalog.duckdb', project_name='my_analytics_project')` ¶

Generate a basic, valid Duckalog configuration template.

This function creates a configuration template with sensible defaults and educational example content that demonstrates key Duckalog features.

Parameters:

Name	Type	Description	Default
`format`	`ConfigFormat`	Output format for the configuration ('yaml' or 'json'). Defaults to 'yaml'.	`'yaml'`
`output_path`	`str \| None`	Optional path to write the configuration file. If provided, the template is written to this path and the content is also returned as a string.	`None`
`database_name`	`str`	Name for the DuckDB database file.	`'analytics_catalog.duckdb'`
`project_name`	`str`	Name used in comments to personalize the template.	`'my_analytics_project'`

Returns:

Type	Description
`str`	The generated configuration as a string.

Raises:

Type	Description
`ValueError`	If format is not 'yaml' or 'json'.
`ConfigError`	If the generated template fails validation.
`OSError`	If writing to output_path fails.

Example

Generate a YAML template::

template = create_config_template(format='yaml')
print(template)

Generate and save a JSON template::

template = create_config_template(
    format='json',
    output_path='my_config.json'
)

`generate_all_views_sql(config, include_secrets=False)` ¶

Generate SQL for all views in a configuration.

The output includes a descriptive header with the config version followed by a CREATE OR REPLACE VIEW statement for each view in the order they appear in the configuration.

Parameters:

Name	Type	Description	Default
`config`	`Config`	The validated :class:`Config` instance to render.	required
`include_secrets`	`bool`	Whether to include CREATE SECRET statements for secrets.	`False`

Returns:

Type	Description
`str`	A multi-statement SQL script suitable for use as a catalog definition.

`generate_secret_sql(secret)` ¶

Generate CREATE SECRET statement for a DuckDB secret.

Parameters:

Name	Type	Description	Default
`secret`	`SecretConfig`	Secret configuration object.	required

Returns:

Type	Description
`str`	SQL CREATE SECRET statement.

`generate_sql(config_path)` ¶

Generate a full SQL script from a config file.

This is a convenience wrapper around :func:load_config and :func:generate_all_views_sql that does not connect to DuckDB.

Parameters:

Name	Type	Description	Default
`config_path`	`str`	Path to the YAML/JSON configuration file.	required

Returns:

Type	Description
`str`	A multi-statement SQL script containing `CREATE OR REPLACE VIEW`
`str`	statements for all configured views.

Raises:

Type	Description
`ConfigError`	If the configuration file is invalid.

Example

from duckalog import generate_sql sql = generate_sql("catalog.yaml") print("CREATE VIEW" in sql) True

`generate_view_sql(view)` ¶

Generate a CREATE OR REPLACE VIEW statement for a single view.

Parameters:

Name	Type	Description	Default
`view`	`ViewConfig`	The :class:`ViewConfig` to generate SQL for.	required

Returns:

Type	Description
`str`	A single SQL statement that creates or replaces the view.

`load_config(path, load_sql_files=True, sql_file_loader=None, resolve_paths=True, filesystem=None, load_dotenv=True)` ¶

Load, interpolate, and validate a Duckalog configuration file.

`quote_ident(value)` ¶

Quote a SQL identifier using double quotes.

This helper wraps a string in double quotes and escapes any embedded double quotes according to SQL rules.

Parameters:

Name	Type	Description	Default
`value`	`str`	Identifier to quote (for example, a view or column name).	required

Returns:

Type	Description
`str`	The identifier wrapped in double quotes.

Example

quote_ident("events") '"events"'

`quote_literal(value)` ¶

Quote a SQL string literal using single quotes.

This helper wraps a string in single quotes and escapes any embedded single quotes according to SQL rules.

Parameters:

Name	Type	Description	Default
`value`	`str`	String literal to quote (for example, a file path, secret, or connection string).	required

Returns:

Type	Description
`str`	The string wrapped in single quotes with proper escaping.

Example

quote_literal("path/to/file.parquet") "'path/to/file.parquet'" quote_literal("user's data") "'user''s data'"

`render_options(options)` ¶

Render a mapping of options into scan-function arguments.

The resulting string is suitable for appending to a *_scan function call. Keys are sorted alphabetically to keep output deterministic.

Parameters:

Name	Type	Description	Default
`options`	`dict[str, Any]`	Mapping of option name to value (str, bool, int, or float).	required

Returns:

Type	Description
`str`	A string that starts with `,` when options are present (for example,
`str`	`", hive_partitioning=TRUE"`) or an empty string when no options
`str`	are provided.

Raises:

Type	Description
`TypeError`	If a value has a type that cannot be rendered safely.

`validate_config(config_path)` ¶

Validate a configuration file without touching DuckDB.

Parameters:

Name	Type	Description	Default
`config_path`	`str`	Path to the YAML/JSON configuration file.	required

Raises:

Type	Description
`ConfigError`	If the configuration file is missing, malformed, or does not satisfy the schema and interpolation rules.

Example

from duckalog import validate_config validate_config("catalog.yaml") # raises on invalid config

`validate_generated_config(content, format='yaml')` ¶

Validate that generated configuration content can be loaded successfully.

Parameters:

Name	Type	Description	Default
`content`	`str`	Configuration content as string.	required
`format`	`ConfigFormat`	Format of the content ('yaml' or 'json').	`'yaml'`

Raises:

Type	Description
`ConfigError`	If the configuration cannot be loaded or is invalid.

New Connection Management Architecture¶

CatalogConnection Class¶

The new CatalogConnection class is the primary interface for working with Duckalog catalogs. It provides intelligent connection management and session state restoration.

from duckalog import connect_to_catalog

# Create connection
conn = connect_to_catalog("catalog.yaml")

# The connection manages:
# - Automatic catalog building
# - Connection pooling and reuse
# - Session state restoration
# - Incremental view updates

Key Methods¶

# Get raw DuckDB connection when needed
duckdb_conn = conn.get_connection()

# Clean up the connection and resources
conn.close()

Context Manager Usage¶

# Recommended usage pattern
with connect_to_catalog("catalog.yaml") as conn:
    # Connection automatically managed
    # Pragmas, settings, and attachments restored
    # Cleanup performed on exit

    result = conn.get_connection().execute("SELECT COUNT(*) FROM users").fetchone()
    print(f"Total users: {result[0]}")

# Connection automatically cleaned up

Session State Features¶

with connect_to_catalog("catalog.yaml") as conn:
    # Session state is automatically restored:
    # - DuckDB pragmas
    # - Session settings  
    # - Database attachments
    # - Persistent secrets
    # - Custom functions

    # All views are available
    for table in conn.get_connection().execute("SHOW TABLES").fetchall():
        print(f"Available view: {table[0]}")

New Architecture Patterns¶

Configuration Loading with Dependency Injection¶

The new load_config() function from duckalog.config.api provides enhanced configuration loading with dependency injection support:

from duckalog.config.api import load_config

# Basic usage (same as before)
config = load_config("catalog.yaml")

# Advanced usage with custom dependencies
config = load_config(
    "catalog.yaml",
    sql_file_loader=custom_loader,
    filesystem=custom_fs,
    load_dotenv=False
)

Request-Scoped Caching Context¶

The new architecture provides request-scoped caching for performance optimization:

from duckalog.config.resolution.imports import request_cache_scope

# Use request context for caching
with request_cache_scope() as context:
    # All config loading within this scope shares caches
    config1 = load_config("config1.yaml")
    config2 = load_config("config2.yaml")  # Reuses resolved imports

Import Patterns¶

Recommended Import Patterns¶

# New modular imports (recommended)
from duckalog.config.api import load_config
from duckalog.config.resolution.imports import request_cache_scope
from duckalog.config.resolution.base import ImportResolver, ImportContext
from duckalog.config.resolution.env import EnvProcessor, DefaultEnvProcessor

# High-level convenience functions (unchanged)
from duckalog import (
    generate_sql,
    validate_config,
    connect_to_catalog,
    connect_to_catalog_cm,
    connect_to_catalog,
)

# Configuration models
from duckalog.config import (
    Config,
    DuckDBConfig,
    AttachmentsConfig,
    ViewConfig,
    SecretConfig,
)

Backward Compatibility Imports¶

All existing import patterns continue to work:

# Legacy imports (still supported)
from duckalog import load_config  # Re-exports from config.api
from duckalog.config import load_config  # Re-exports from config.api

# Direct module imports
from duckalog.config.api import load_config  # New location

Advanced Usage Examples¶

Custom Dependency Injection¶

from duckalog.config.resolution.base import ImportResolver, ImportContext
from duckalog.config.resolution.imports import DefaultImportResolver

# Create custom resolver
class CustomImportResolver(ImportResolver):
    def __init__(self, custom_config):
        self.custom_config = custom_config

    def resolve(self, config_data: dict, context: ImportContext) -> dict:
        # Custom import resolution logic
        return resolved_config

# Use with load_config
with request_cache_scope() as request_context:
    resolver = CustomImportResolver(custom_config)
    # Integrate with the loading process

Connection Management Patterns¶

from duckalog import connect_to_catalog

# Pattern 1: Single connection with multiple queries
conn = connect_to_catalog("catalog.yaml")
users = conn.get_connection().execute("SELECT * FROM users").fetchall()
orders = conn.get_connection().execute("SELECT * FROM orders").fetchall()

# Pattern 2: Context manager for automatic cleanup
with connect_to_catalog("catalog.yaml") as conn:
    # Process data with automatic connection management
    results = conn.get_connection().execute("""
        SELECT u.username, COUNT(o.order_id) as order_count
        FROM users u
        LEFT JOIN orders o ON u.user_id = o.user_id
        GROUP BY u.username
    """).fetchall()

# Pattern 3: Multiple connections with connection pooling
with connect_to_catalog("analytics.yaml") as analytics_conn:
    with connect_to_catalog("warehouse.yaml") as warehouse_conn:
        # Both connections managed independently
        analytics_data = analytics_conn.get_connection().execute("SELECT * FROM metrics").fetchall()
        warehouse_data = warehouse_conn.get_connection().execute("SELECT * FROM inventory").fetchall()

Performance Optimization¶

# Incremental builds - only missing views created
conn = connect_to_catalog("catalog.yaml")  # Fast for subsequent runs

# Force rebuild when needed
conn = connect_to_catalog("catalog.yaml", force_rebuild=True)

# Connection reuse across operations
with connect_to_catalog("catalog.yaml") as conn:
    # Reuse same connection for multiple queries
    results = conn.get_connection().execute("COMPLEX_QUERY").fetchall()
    # Connection reused, no reconnection overhead

Custom Filesystem Implementations¶

import fsspec
from duckalog import load_config

# Create custom filesystem
custom_fs = fsspec.filesystem(
    "myprotocol",
    key="access-key",
    secret="secret-key"
)

# Load config with custom filesystem
config = load_config(
    "myprotocol://bucket/config.yaml",
    filesystem=custom_fs
)

# Use with new connection management
from duckalog import connect_to_catalog
conn = connect_to_catalog(
    "myprotocol://bucket/config.yaml",
    filesystem=custom_fs
)

Performance Optimization with Caching¶

from duckalog.config.resolution.imports import request_cache_scope
from duckalog.config.api import load_config

# Batch load configurations efficiently
configs = []
with request_cache_scope() as context:
    for config_file in config_files:
        config = load_config(config_file)
        configs.append(config)
    # All configs share resolved imports and environment variables

Custom Environment Processing¶

from duckalog.config.resolution.env import EnvProcessor, DefaultEnvProcessor

class CustomEnvProcessor(EnvProcessor):
    def process(self, config_data: dict, load_dotenv: bool = True) -> dict:
        # Custom environment variable processing
        # e.g., add custom prefixes, special handling
        return processed_config

# Use with custom environment processor
env_processor = CustomEnvProcessor()

Migration from Old Patterns¶

From Legacy load_config¶

# Old pattern (still works)
from duckalog import load_config
config = load_config("config.yaml")

# New pattern with same interface
from duckalog.config.api import load_config
config = load_config("config.yaml")

# New pattern with additional options
config = load_config(
    "config.yaml",
    resolve_paths=True,
    filesystem=custom_fs,
    load_dotenv=False
)

Custom Implementation Patterns¶

# For extending the configuration system
from duckalog.config.resolution.base import ImportResolver, ImportContext
from duckalog.config.api import _load_config_from_local_file

class MyCustomResolver(ImportResolver):
    def resolve(self, config_data: dict, context: ImportContext) -> dict:
        # Your custom logic here
        return config_data

# Use in custom loading scenarios
def my_custom_loader(path: str, **kwargs):
    return _load_config_from_local_file(
        path=path,
        sql_file_loader=kwargs.get("sql_file_loader"),
        resolve_paths=kwargs.get("resolve_paths", True),
        filesystem=kwargs.get("filesystem"),
        load_dotenv=kwargs.get("load_dotenv", True),
    )

Configuration Models¶

Core Models¶

The configuration models are now organized in the duckalog.config package:

`duckalog.config.models` ¶

Configuration models and schema definitions for Duckalog catalogs.

This module contains all Pydantic models used for configuration validation and typing. These models form the foundation of the configuration system and must not import from other config modules to avoid circular dependencies.

`AttachmentsConfig` ¶

Bases: BaseModel

Collection of attachment configurations.

Attributes:

Name	Type	Description
`duckdb`	`list[DuckDBAttachment]`	DuckDB attachment entries.
`sqlite`	`list[SQLiteAttachment]`	SQLite attachment entries.
`postgres`	`list[PostgresAttachment]`	Postgres attachment entries.
`duckalog`	`list[DuckalogAttachment]`	Duckalog config attachment entries.

`Config` ¶

Bases: BaseModel

Top-level Duckalog configuration.

Attributes:

Name	Type	Description
`version`	`int`	Positive integer describing the config schema version.
`duckdb`	`DuckDBConfig`	DuckDB session and connection settings.
`views`	`list[ViewConfig]`	List of view definitions to create in the catalog.
`attachments`	`AttachmentsConfig`	Optional attachments to external databases.
`iceberg_catalogs`	`list[IcebergCatalogConfig]`	Optional Iceberg catalog definitions.
`semantic_models`	`list[SemanticModelConfig]`	Optional semantic model definitions for business metadata.
`imports`	`Union[list[Union[str, ImportEntry]], SelectiveImports]`	Optional list of additional config files to import and merge. Can be a simple list of paths (backward compatible) or a SelectiveImports object for advanced options like section-specific imports, override behavior, and glob patterns.
`env_files`	`list[str]`	Optional list of custom .env file patterns to load. Supports patterns like ['.env', '.env.local', '.env.production']. Files are loaded in order with later files overriding earlier ones. Defaults to ['.env'] for backward compatibility.
`loader_settings`	`LoaderSettings`	Optional settings for the configuration loader.

`DuckDBAttachment` ¶

Bases: BaseModel

Configuration for attaching another DuckDB database.

Attributes:

Name	Type	Description
`alias`	`str`	Alias under which the database will be attached.
`path`	`str`	Filesystem path to the DuckDB database file.
`read_only`	`bool`	Whether the attachment should be opened in read-only mode. Defaults to `True` for safety.

`DuckDBConfig` ¶

Bases: BaseModel

DuckDB connection and session settings.

Attributes:

Name	Type	Description
`database`	`str`	Path to the DuckDB database file. Defaults to `":memory:"`.
`install_extensions`	`list[str]`	Names of extensions to install before use.
`load_extensions`	`list[str]`	Names of extensions to load in the session.
`pragmas`	`list[str]`	SQL statements (typically `SET` pragmas) executed after connecting and loading extensions.
`settings`	`Optional[Union[str, list[str]]]`	DuckDB SET statements executed after pragmas. Can be a single string or list of strings.
`secrets`	`list[SecretConfig]`	List of secret definitions for external services and databases.

`DuckalogAttachment` ¶

Bases: BaseModel

Configuration for attaching another Duckalog catalog config.

Attributes:

Name	Type	Description
`alias`	`str`	Alias under which the child catalog will be attached.
`config_path`	`str`	Path to the child Duckalog config file.
`database`	`Optional[str]`	Optional override for the child's database file path.
`read_only`	`bool`	Whether the attachment should be opened in read-only mode. Defaults to `True` for safety.

`IcebergCatalogConfig` ¶

Bases: BaseModel

Configuration for an Iceberg catalog.

Attributes:

Name	Type	Description
`name`	`str`	Catalog name referenced by Iceberg views.
`catalog_type`	`str`	Backend type (for example, `rest`, `hive`, `glue`).
`uri`	`Optional[str]`	Optional URI used by certain catalog types.
`warehouse`	`Optional[str]`	Optional warehouse location for catalog data.
`options`	`dict[str, Any]`	Additional catalog-specific options.

`ImportEntry` ¶

Bases: BaseModel

A single import entry with optional override behavior.

Attributes:

Name	Type	Description
`path`	`str`	Path to the configuration file to import.
`override`	`bool`	Whether this import can override values from earlier imports or the main config. If False, only fills in missing fields without overwriting existing values. Defaults to True.

`LoaderSettings` ¶

Bases: BaseModel

Settings for the configuration loader.

Attributes:

Name	Type	Description
`concurrency_enabled`	`bool`	Whether to enable parallel I/O for imports and SQL files.
`max_threads`	`Optional[int]`	Maximum number of threads for parallel I/O. Defaults to None (uses default ThreadPoolExecutor behavior).

`PostgresAttachment` ¶

Bases: BaseModel

Configuration for attaching a Postgres database.

Attributes:

Name	Type	Description
`alias`	`str`	Alias used inside DuckDB to reference the Postgres database.
`host`	`str`	Hostname or IP address of the Postgres server.
`port`	`int`	TCP port of the Postgres server.
`database`	`str`	Database name to connect to.
`user`	`str`	Username for authentication.
`password`	`str`	Password for authentication.
`sslmode`	`Optional[str]`	Optional SSL mode (for example, `require`).
`options`	`dict[str, Any]`	Extra key/value options passed to the attachment clause.

`SQLFileReference` ¶

Bases: BaseModel

Reference to SQL content in an external file.

Attributes:

Name	Type	Description
`path`	`str`	Path to the SQL file (relative or absolute).
`variables`	`Optional[dict[str, Any]]`	Dictionary of variables for template substitution.
`as_template`	`bool`	Whether to process the file content as a template.

`SQLiteAttachment` ¶

Bases: BaseModel

Configuration for attaching a SQLite database.

Attributes:

Name	Type	Description
`alias`	`str`	Alias under which the SQLite database will be attached.
`path`	`str`	Filesystem path to the SQLite `.db` file.

`SecretConfig` ¶

Bases: BaseModel

Configuration for a DuckDB secret.

Attributes:

Name	Type	Description
`type`	`SecretType`	Secret type (s3, azure, gcs, http, postgres, mysql).
`name`	`Optional[str]`	Optional name for the secret (defaults to type if not provided).
`provider`	`SecretProvider`	Secret provider (config or credential_chain).
`persistent`	`bool`	Whether to create a persistent secret. Defaults to False.
`scope`	`Optional[str]`	Optional scope prefix for the secret.
`key_id`	`Optional[str]`	Access key ID or username for authentication.
`secret`	`Optional[str]`	Secret key or password for authentication.
`region`	`Optional[str]`	Geographic region for cloud services.
`endpoint`	`Optional[str]`	Custom endpoint URL for cloud services.
`connection_string`	`Optional[str]`	Full connection string for databases.
`tenant_id`	`Optional[str]`	Azure tenant ID for authentication.
`account_name`	`Optional[str]`	Azure storage account name.
`client_id`	`Optional[str]`	Azure client ID for authentication.
`client_secret`	`Optional[str]`	Azure client secret for authentication.
`service_account_key`	`Optional[str]`	GCS service account key.
`json_key`	`Optional[str]`	GCS JSON key.
`bearer_token`	`Optional[str]`	HTTP bearer token for authentication.
`header`	`Optional[str]`	HTTP header for authentication.
`database`	`Optional[str]`	Database name for database secrets.
`host`	`Optional[str]`	Database host for database secrets.
`port`	`Optional[int]`	Database port for database secrets.
`user`	`Optional[str]`	Database username (alternative to key_id for database types).
`password`	`Optional[str]`	Database password (alternative to secret for database types).
`options`	`dict[str, Any]`	Additional key-value options for the secret.

`SelectiveImports` ¶

Bases: BaseModel

Section-specific imports for targeted configuration merging.

Each field represents a section of the Config that can have targeted imports. Values can be either simple paths (strings) or ImportEntry objects with options.

`SemanticDefaultsConfig` ¶

Bases: BaseModel

Default configuration for a semantic model.

Provides default settings for query builders and dashboards, such as the primary time dimension and default measures.

Attributes:

Name	Type	Description
`time_dimension`	`Optional[str]`	Default time dimension name.
`primary_measure`	`Optional[str]`	Default primary measure name.
`default_filters`	`list[dict[str, Any]]`	Optional list of default filters.

`SemanticDimensionConfig` ¶

Bases: BaseModel

Definition of a semantic dimension.

A dimension represents a business attribute that maps to an expression over the base view of a semantic model.

Attributes:

Name	Type	Description
`name`	`str`	Unique dimension name within the semantic model.
`expression`	`str`	SQL expression referencing columns from the base view.
`label`	`Optional[str]`	Human-readable display name.
`description`	`Optional[str]`	Optional detailed description.
`type`	`Optional[str]`	Optional data type hint (time, number, string, boolean, date).
`time_grains`	`list[str]`	Optional list of time grains for time dimensions.

`SemanticJoinConfig` ¶

Bases: BaseModel

Definition of a semantic join.

A join defines a relationship to another view for enriching the semantic model with additional data, typically dimension tables.

Attributes:

Name	Type	Description
`to_view`	`str`	Name of an existing view in the views section to join to.
`type`	`str`	Join type (inner, left, right, full).
`on_condition`	`str`	SQL join condition expression.

`SemanticMeasureConfig` ¶

Bases: BaseModel

Definition of a semantic measure.

A measure represents a business metric that typically involves aggregation or calculation over the base view of a semantic model.

Attributes:

Name	Type	Description
`name`	`str`	Unique measure name within the semantic model.
`expression`	`str`	SQL expression (often aggregated) over the base view.
`label`	`Optional[str]`	Human-readable display name.
`description`	`Optional[str]`	Optional detailed description.
`type`	`Optional[str]`	Optional data type hint.

`SemanticModelConfig` ¶

Bases: BaseModel

Definition of a semantic model.

A semantic model provides business-friendly metadata on top of an existing Duckalog view, defining dimensions and measures for analytics and BI use cases.

Attributes:

Name	Type	Description
`name`	`str`	Unique semantic model name within the config.
`base_view`	`str`	Name of an existing view in the views section.
`dimensions`	`list[SemanticDimensionConfig]`	Optional list of dimension definitions.
`measures`	`list[SemanticMeasureConfig]`	Optional list of measure definitions.
`joins`	`list[SemanticJoinConfig]`	Optional list of join definitions to other views.
`defaults`	`Optional[SemanticDefaultsConfig]`	Optional default configuration for query builders.
`label`	`Optional[str]`	Human-readable display name.
`description`	`Optional[str]`	Optional detailed description.
`tags`	`list[str]`	Optional list of classification tags.

`ViewConfig` ¶

Bases: BaseModel

Definition of a single catalog view.

A view can be defined in several ways: 1. Inline SQL: Using the sql field with raw SQL text 2. SQL File: Using sql_file to reference external SQL files 3. SQL Template: Using sql_template for parameterized SQL files 4. Data Source: Using source + required fields for direct data access 5. Source + SQL: Using source for data access plus sql for transformations

For data sources, the required fields depend on the source type: - Parquet/Delta: uri field is required - Iceberg: Either uri OR both catalog and table - DuckDB/SQLite/Postgres: Both database and table are required

When using SQL with a data source, the SQL will be applied as a transformation over the data from the specified source.

Additional metadata fields such as description and tags do not affect SQL generation but are preserved for documentation and tooling.

Attributes:

Name	Type	Description
`name`	`str`	Unique view name within the config.
`schema`	`str`	Optional schema name for organizing views in DuckDB schemas.
`sql`	`Optional[str]`	Raw SQL text defining the view body.
`sql_file`	`Optional[SQLFileReference]`	Direct reference to a SQL file.
`sql_template`	`Optional[SQLFileReference]`	Reference to a SQL template file with variable substitution.
`source`	`Optional[EnvSource]`	Source type (e.g. `"parquet"`, `"iceberg"`, `"duckdb"`).
`uri`	`Optional[str]`	URI for file- or table-based sources (Parquet/Delta/Iceberg).
`database`	`Optional[str]`	Attachment alias for attached-database sources.
`table`	`Optional[str]`	Table name (optionally schema-qualified) for attached sources.
`catalog`	`Optional[str]`	Iceberg catalog name for catalog-based Iceberg views.
`options`	`dict[str, Any]`	Source-specific options passed to scan functions.
`description`	`Optional[str]`	Optional human-readable description of the view.
`tags`	`list[str]`	Optional list of tags for classification.

Import Resolution Models¶

`duckalog.config.resolution.base` ¶

`EnvProcessor` ¶

Bases: Protocol

Interface for environment variable processors.

`process(config_data, load_dotenv=True)` ¶

Process environment variables and .env files.

`ImportContext` `dataclass` ¶

Tracks import state during loading.

`ImportResolver` ¶

Bases: Protocol

Interface for configuration import resolvers.

`resolve(config_data, context)` ¶

Resolve imports within a configuration dictionary.

Environment Processing Models¶

`duckalog.config.resolution.env` ¶

Environment processing helpers with request-scoped caching.

This module extracts .env discovery/merging from the legacy loader and exposes it behind the EnvProcessor protocol for dependency injection.

`DefaultEnvProcessor` ¶

Bases: EnvProcessor

EnvProcessor that mirrors legacy loader behavior.

`EnvCache` `dataclass` ¶

Cache for loaded .env files to avoid duplicate filesystem access.

`env_cache_scope(cache=None)` ¶

Context manager to provide request-scoped env cache.

Error Handling¶

The new architecture provides enhanced error handling with specific exception types:

Configuration Errors¶

`duckalog.errors` ¶

Unified exception hierarchy for Duckalog.

This module provides a consistent exception hierarchy for all Duckalog errors, ensuring that all library exceptions derive from a common base class and can be caught uniformly by users.

`CircularImportError` ¶

Bases: ImportError

Circular import detected.

`ConfigError` ¶

Bases: DuckalogError

Configuration-related errors.

This exception is raised when a catalog configuration cannot be read, parsed, interpolated, or validated according to the Duckalog schema.

Typical error conditions include:

The config file does not exist or cannot be read.
The file is not valid YAML/JSON.
Required fields are missing or invalid.
An environment variable placeholder cannot be resolved.

`DuckalogError` ¶

Bases: Exception

Base exception for all Duckalog errors.

This is the root of the exception hierarchy. All Duckalog-specific exceptions should inherit from this class to provide a unified error handling interface for users of the library.

Users can catch this class to handle any Duckalog-specific error, or catch more specific subclasses for targeted error handling.

`init(message, cause=None)` ¶

Initialize DuckalogError with optional cause chaining.

Parameters:

Name	Type	Description	Default
`message`	`str`	Human-readable error message.	required
`cause`	`Exception \| None`	Optional underlying exception that caused this error.	`None`

`str()` ¶

Return the error message.

`DuplicateNameError` ¶

Bases: ImportError

Duplicate name across imported configs.

`EngineError` ¶

Bases: DuckalogError

Engine-level error raised during catalog builds.

This exception wraps lower-level DuckDB errors, such as failures to connect to the database, attach external systems, or execute generated SQL statements.

`ImportError` ¶

Bases: ConfigError

Base class for import-related errors.

`ImportFileNotFoundError` ¶

Bases: ImportError

Imported file not found.

`ImportValidationError` ¶

Bases: ImportError

Imported file validation failed.

`PathResolutionError` ¶

Bases: ConfigError

Raised when path resolution fails due to security or access issues.

This exception is a subclass of ConfigError since path resolution issues are typically configuration-related problems.

`RemoteConfigError` ¶

Bases: ConfigError

Error raised when remote configuration loading fails.

This exception wraps lower-level errors from remote storage systems when attempting to load configuration files from locations like S3, GCS, Azure Blob Storage, or HTTP endpoints.

`SQLFileEncodingError` ¶

Bases: SQLFileError

Raised when a SQL file has invalid encoding.

`SQLFileError` ¶

Bases: ConfigError

Base exception for SQL file-related errors.

This exception is raised when SQL file operations fail, such as when a referenced SQL file cannot be found, read, or processed.

`SQLFileNotFoundError` ¶

Bases: SQLFileError

Raised when a referenced SQL file does not exist.

`SQLFilePermissionError` ¶

Bases: SQLFileError

Raised when a SQL file cannot be read due to permissions.

`SQLFileSizeError` ¶

Bases: SQLFileError

Raised when a SQL file exceeds size limits.

`SQLTemplateError` ¶

Bases: SQLFileError

Raised when template processing fails.

Import Resolution Errors¶

`duckalog.config.resolution.imports` ¶

Import resolution extracted from the legacy loader with DI hooks.

`RequestContext` `dataclass` ¶

Context for request-scoped caching and state management.

Performance Considerations¶

Caching Benefits¶

Request-Scoped Caching: Shared resolution across multiple config loads
Import Resolution Caching: Avoid re-processing the same imports
Environment Variable Caching: Reuse resolved environment values
Path Resolution Caching: Cache normalized paths for reuse

Memory Management¶

Automatic Cache Cleanup: Request scopes automatically clear caches
Import Chain Tracking: Prevent circular dependencies with minimal overhead
Efficient Context Management: Lightweight context objects for tracking

Security Features¶

Enhanced Path Security¶

The new architecture maintains and enhances the existing security features:

Relative Path Resolution: All paths resolved relative to config location
Security Boundary Validation: Prevent directory traversal attacks
Cross-Platform Support: Consistent behavior across operating systems
Remote URI Support: Secure handling of remote configuration sources

Dependency Injection Security¶

Controlled Interface: Safe injection points for custom implementations
Validation Integration: Custom resolvers participate in validation
Context Isolation: Each request maintains isolated state
Audit Trail: Track configuration resolution through contexts

Testing Support¶

Mockable Interfaces¶

All new interfaces are designed for easy testing:

from unittest.mock import Mock
from duckalog.config.resolution.base import ImportResolver

# Create mock resolver for testing
mock_resolver = Mock(spec=ImportResolver)
mock_resolver.resolve.return_value = test_config

# Test with mocked dependencies
with request_cache_scope() as context:
    # Use mock resolver in tests
    pass

Test Utilities¶

# Test with temporary configurations
from tempfile import NamedTemporaryFile
from duckalog.config.api import load_config

with NamedTemporaryFile(mode='w', suffix='.yaml') as f:
    f.write(test_config_content)
    f.flush()
    config = load_config(f.name)

Extensibility Points¶

Custom Import Resolvers¶

Extend the import resolution system:

class DatabaseImportResolver(ImportResolver):
    """Load imports from database instead of files."""

    def resolve(self, config_data: dict, context: ImportContext) -> dict:
        # Implement database-based import loading
        pass

Custom Environment Processors¶

Add custom environment handling:

class VaultEnvProcessor(EnvProcessor):
    """Load secrets from vault systems."""

    def process(self, config_data: dict, load_dotenv: bool = True) -> dict:
        # Implement vault integration
        pass

Custom Filesystem Integration¶

Add support for new protocols:

class CustomFilesystem:
    """Custom filesystem implementation."""

    def open(self, path, mode='r'):
        # Implement custom file access
        pass

API Reference¶

Core API Overview¶

Core Functions¶

New: connect_to_catalog()¶

Key Features of CatalogConnection¶

Legacy: connect_to_catalog_cm()¶

Legacy Functions¶

duckalog ¶

AttachmentsConfig ¶

CatalogConnection ¶

__enter__() ¶

__exit__(exc_type, exc_val, exc_tb) ¶

__init__(config_path, database_path=None, read_only=False, force_rebuild=False, filesystem=None, load_dotenv=True) ¶

close() ¶

get_connection() ¶

Config ¶

ConfigError ¶

DuckDBAttachment ¶

DuckDBConfig ¶

EngineError ¶

IcebergCatalogConfig ¶

PostgresAttachment ¶

SQLFileEncodingError ¶

SQLFileError ¶

SQLFileLoader ¶

__init__() ¶

load_sql_file(file_path, config_file_path, variables=None, as_template=False, filesystem=None) ¶

SQLFileNotFoundError ¶

SQLFilePermissionError ¶

SQLFileReference ¶

SQLFileSizeError ¶

SQLGroup ¶

SQLTemplateError ¶

SQLiteAttachment ¶

SecretConfig ¶

SemanticDefaultsConfig ¶

SemanticDimensionConfig ¶

SemanticJoinConfig ¶

SemanticMeasureConfig ¶

SemanticModelConfig ¶

ViewConfig ¶

connect_to_catalog(config_path, database_path=None, read_only=False, force_rebuild=False) ¶

connect_to_catalog_cm(config_path, database_path=None, read_only=False, force_rebuild=False) ¶

create_config_template(format='yaml', output_path=None, database_name='analytics_catalog.duckdb', project_name='my_analytics_project') ¶

generate_all_views_sql(config, include_secrets=False) ¶

generate_secret_sql(secret) ¶

generate_sql(config_path) ¶

generate_view_sql(view) ¶

load_config(path, load_sql_files=True, sql_file_loader=None, resolve_paths=True, filesystem=None, load_dotenv=True) ¶

quote_ident(value) ¶

quote_literal(value) ¶

render_options(options) ¶

validate_config(config_path) ¶

validate_generated_config(content, format='yaml') ¶

New Connection Management Architecture¶

CatalogConnection Class¶

Key Methods¶

Context Manager Usage¶

Session State Features¶

New Architecture Patterns¶

Configuration Loading with Dependency Injection¶

Request-Scoped Caching Context¶

Import Patterns¶

Recommended Import Patterns¶

Backward Compatibility Imports¶

Advanced Usage Examples¶

Custom Dependency Injection¶

Connection Management Patterns¶

Performance Optimization¶

Custom Filesystem Implementations¶

Performance Optimization with Caching¶

Custom Environment Processing¶

Migration from Old Patterns¶

From Legacy load_config¶

Custom Implementation Patterns¶

Configuration Models¶

Core Models¶

duckalog.config.models ¶

AttachmentsConfig ¶

Config ¶

`duckalog` ¶

`AttachmentsConfig` ¶

`CatalogConnection` ¶

`enter()` ¶

`exit(exc_type, exc_val, exc_tb)` ¶

`init(config_path, database_path=None, read_only=False, force_rebuild=False, filesystem=None, load_dotenv=True)` ¶

`close()` ¶

`get_connection()` ¶

`Config` ¶

`ConfigError` ¶

`DuckDBAttachment` ¶

`DuckDBConfig` ¶

`EngineError` ¶

`IcebergCatalogConfig` ¶

`PostgresAttachment` ¶

`SQLFileEncodingError` ¶

`SQLFileError` ¶

`SQLFileLoader` ¶

`init()` ¶

`load_sql_file(file_path, config_file_path, variables=None, as_template=False, filesystem=None)` ¶

`SQLFileNotFoundError` ¶

`SQLFilePermissionError` ¶

`SQLFileReference` ¶

`SQLFileSizeError` ¶

`SQLGroup` ¶

`SQLTemplateError` ¶

`SQLiteAttachment` ¶

`SecretConfig` ¶

`SemanticDefaultsConfig` ¶

`SemanticDimensionConfig` ¶

`SemanticJoinConfig` ¶

`SemanticMeasureConfig` ¶

`SemanticModelConfig` ¶

`ViewConfig` ¶

`connect_to_catalog(config_path, database_path=None, read_only=False, force_rebuild=False)` ¶

`connect_to_catalog_cm(config_path, database_path=None, read_only=False, force_rebuild=False)` ¶

`create_config_template(format='yaml', output_path=None, database_name='analytics_catalog.duckdb', project_name='my_analytics_project')` ¶

`generate_all_views_sql(config, include_secrets=False)` ¶

`generate_secret_sql(secret)` ¶

`generate_sql(config_path)` ¶

`generate_view_sql(view)` ¶

`load_config(path, load_sql_files=True, sql_file_loader=None, resolve_paths=True, filesystem=None, load_dotenv=True)` ¶

`quote_ident(value)` ¶

`quote_literal(value)` ¶

`render_options(options)` ¶

`validate_config(config_path)` ¶

`validate_generated_config(content, format='yaml')` ¶

`duckalog.config.models` ¶

`AttachmentsConfig` ¶

`Config` ¶

`DuckDBAttachment` ¶

`DuckDBConfig` ¶

`DuckalogAttachment` ¶

`IcebergCatalogConfig` ¶

`ImportEntry` ¶

`LoaderSettings` ¶

`PostgresAttachment` ¶

`SQLFileReference` ¶

`SQLiteAttachment` ¶

`SecretConfig` ¶

`SelectiveImports` ¶

`SemanticDefaultsConfig` ¶

`SemanticDimensionConfig` ¶

`SemanticJoinConfig` ¶

`SemanticMeasureConfig` ¶

`SemanticModelConfig` ¶

`ViewConfig` ¶

`duckalog.config.resolution.base` ¶

`EnvProcessor` ¶

`process(config_data, load_dotenv=True)` ¶

`ImportContext` `dataclass` ¶

`ImportResolver` ¶

`resolve(config_data, context)` ¶

`duckalog.config.resolution.env` ¶

`DefaultEnvProcessor` ¶

`EnvCache` `dataclass` ¶

`env_cache_scope(cache=None)` ¶

`duckalog.errors` ¶

`CircularImportError` ¶

`ConfigError` ¶