API Reference¶

This section provides comprehensive API documentation for the Duckalog library, generated from the source code using mkdocstrings.

Core API¶

`duckalog` ¶

Duckalog public API.

`AttachmentsConfig` ¶

Bases: BaseModel

Collection of attachment configurations.

Attributes:

Name	Type	Description
`duckdb`	`list[DuckDBAttachment]`	DuckDB attachment entries.
`sqlite`	`list[SQLiteAttachment]`	SQLite attachment entries.
`postgres`	`list[PostgresAttachment]`	Postgres attachment entries.
`duckalog`	`list[DuckalogAttachment]`	Duckalog config attachment entries.

`Config` ¶

Bases: BaseModel

Top-level Duckalog configuration.

Attributes:

Name	Type	Description
`version`	`int`	Positive integer describing the config schema version.
`duckdb`	`DuckDBConfig`	DuckDB session and connection settings.
`views`	`list[ViewConfig]`	List of view definitions to create in the catalog.
`attachments`	`AttachmentsConfig`	Optional attachments to external databases.
`iceberg_catalogs`	`list[IcebergCatalogConfig]`	Optional Iceberg catalog definitions.
`semantic_models`	`list[SemanticModelConfig]`	Optional semantic model definitions for business metadata.

`ConfigError` ¶

Bases: DuckalogError

Configuration-related errors.

This exception is raised when a catalog configuration cannot be read, parsed, interpolated, or validated according to the Duckalog schema.

Typical error conditions include:

The config file does not exist or cannot be read.
The file is not valid YAML/JSON.
Required fields are missing or invalid.
An environment variable placeholder cannot be resolved.

`DuckDBAttachment` ¶

Bases: BaseModel

Configuration for attaching another DuckDB database.

Attributes:

Name	Type	Description
`alias`	`str`	Alias under which the database will be attached.
`path`	`str`	Filesystem path to the DuckDB database file.
`read_only`	`bool`	Whether the attachment should be opened in read-only mode. Defaults to `True` for safety.

`DuckDBConfig` ¶

Bases: BaseModel

DuckDB connection and session settings.

Attributes:

Name	Type	Description
`database`	`str`	Path to the DuckDB database file. Defaults to `":memory:"`.
`install_extensions`	`list[str]`	Names of extensions to install before use.
`load_extensions`	`list[str]`	Names of extensions to load in the session.
`pragmas`	`list[str]`	SQL statements (typically `SET` pragmas) executed after connecting and loading extensions.
`settings`	`Optional[Literal[str, list[str]]]`	DuckDB SET statements executed after pragmas. Can be a single string or list of strings.
`secrets`	`list[SecretConfig]`	List of secret definitions for external services and databases.

`EngineError` ¶

Bases: DuckalogError

Engine-level error raised during catalog builds.

This exception wraps lower-level DuckDB errors, such as failures to connect to the database, attach external systems, or execute generated SQL statements.

`IcebergCatalogConfig` ¶

Bases: BaseModel

Configuration for an Iceberg catalog.

Attributes:

Name	Type	Description
`name`	`str`	Catalog name referenced by Iceberg views.
`catalog_type`	`str`	Backend type (for example, `rest`, `hive`, `glue`).
`uri`	`Optional[str]`	Optional URI used by certain catalog types.
`warehouse`	`Optional[str]`	Optional warehouse location for catalog data.
`options`	`dict[str, Any]`	Additional catalog-specific options.

`PostgresAttachment` ¶

Bases: BaseModel

Configuration for attaching a Postgres database.

Attributes:

Name	Type	Description
`alias`	`str`	Alias used inside DuckDB to reference the Postgres database.
`host`	`str`	Hostname or IP address of the Postgres server.
`port`	`int`	TCP port of the Postgres server.
`database`	`str`	Database name to connect to.
`user`	`str`	Username for authentication.
`password`	`str`	Password for authentication.
`sslmode`	`Optional[str]`	Optional SSL mode (for example, `require`).
`options`	`dict[str, Any]`	Extra key/value options passed to the attachment clause.

`SQLiteAttachment` ¶

Bases: BaseModel

Configuration for attaching a SQLite database.

Attributes:

Name	Type	Description
`alias`	`str`	Alias under which the SQLite database will be attached.
`path`	`str`	Filesystem path to the SQLite `.db` file.

`SecretConfig` ¶

Bases: BaseModel

Configuration for a DuckDB secret.

Attributes:

Name	Type	Description
`type`	`SecretType`	Secret type (s3, azure, gcs, http, postgres, mysql).
`name`	`Optional[str]`	Optional name for the secret (defaults to type if not provided).
`provider`	`SecretProvider`	Secret provider (config or credential_chain).
`persistent`	`bool`	Whether to create a persistent secret. Defaults to False.
`scope`	`Optional[str]`	Optional scope prefix for the secret.
`key_id`	`Optional[str]`	Access key ID or username for authentication.
`secret`	`Optional[str]`	Secret key or password for authentication.
`region`	`Optional[str]`	Geographic region for cloud services.
`endpoint`	`Optional[str]`	Custom endpoint URL for cloud services.
`connection_string`	`Optional[str]`	Full connection string for databases.
`tenant_id`	`Optional[str]`	Azure tenant ID for authentication.
`account_name`	`Optional[str]`	Azure storage account name.
`client_id`	`Optional[str]`	Azure client ID for authentication.
`client_secret`	`Optional[str]`	Azure client secret for authentication.
`service_account_key`	`Optional[str]`	GCS service account key.
`json_key`	`Optional[str]`	GCS JSON key.
`bearer_token`	`Optional[str]`	HTTP bearer token for authentication.
`header`	`Optional[str]`	HTTP header for authentication.
`database`	`Optional[str]`	Database name for database secrets.
`host`	`Optional[str]`	Database host for database secrets.
`port`	`Optional[int]`	Database port for database secrets.
`user`	`Optional[str]`	Database username (alternative to key_id for database types).
`password`	`Optional[str]`	Database password (alternative to secret for database types).
`options`	`dict[str, Any]`	Additional key-value options for the secret.

`ViewConfig` ¶

Bases: BaseModel

Definition of a single catalog view.

A view can be defined in several ways: 1. Inline SQL: Using the sql field with raw SQL text 2. SQL File: Using sql_file to reference external SQL files 3. SQL Template: Using sql_template for parameterized SQL files 4. Data Source: Using source + required fields for direct data access 5. Source + SQL: Using source for data access plus sql for transformations

For data sources, the required fields depend on the source type: - Parquet/Delta: uri field is required - Iceberg: Either uri OR both catalog and table - DuckDB/SQLite/Postgres: Both database and table are required

When using SQL with a data source, the SQL will be applied as a transformation over the data from the specified source.

Additional metadata fields such as description and tags do not affect SQL generation but are preserved for documentation and tooling.

Attributes:

Name	Type	Description
`name`	`str`	Unique view name within the config.
`sql`	`Optional[str]`	Raw SQL text defining the view body.
`sql_file`	`Optional[SQLFileReference]`	Direct reference to a SQL file.
`sql_template`	`Optional[SQLFileReference]`	Reference to a SQL template file with variable substitution.
`source`	`Optional[EnvSource]`	Source type (e.g. `"parquet"`, `"iceberg"`, `"duckdb"`).
`uri`	`Optional[str]`	URI for file- or table-based sources (Parquet/Delta/Iceberg).
`database`	`Optional[str]`	Attachment alias for attached-database sources.
`table`	`Optional[str]`	Table name (optionally schema-qualified) for attached sources.
`catalog`	`Optional[str]`	Iceberg catalog name for catalog-based Iceberg views.
`options`	`dict[str, Any]`	Source-specific options passed to scan functions.
`description`	`Optional[str]`	Optional human-readable description of the view.
`tags`	`list[str]`	Optional list of tags for classification.

`build_catalog(config_path, db_path=None, dry_run=False, verbose=False, filesystem=None, include_secrets=True)` ¶

Build or update a DuckDB catalog from a configuration file.

This function is the high-level entry point used by both the CLI and Python API. It loads the config, optionally performs a dry-run SQL generation, or otherwise connects to DuckDB, sets up attachments and Iceberg catalogs, and creates or replaces configured views.

Parameters:

Name	Type	Description	Default
`config_path`	`str`	Path to the YAML/JSON configuration file.	required
`db_path`	`str \| None`	Optional override for `duckdb.database` in the config. Can be a local path or remote URI (s3://, gs://, gcs://, abfs://, adl://, sftp://).	`None`
`dry_run`	`bool`	If `True`, do not connect to DuckDB; instead generate and return the full SQL script for all views.	`False`
`verbose`	`bool`	If `True`, enable more verbose logging via the standard logging module.	`False`
`filesystem`	`Any \| None`	Optional pre-configured fsspec filesystem object for remote export authentication. If not provided, default authentication will be used.	`None`

Returns:

Type	Description
`str \| None`	The generated SQL script as a string when `dry_run` is `True`,
`str \| None`	otherwise `None` when the catalog is applied to DuckDB.

Raises:

Type	Description
`ConfigError`	If the configuration file is invalid.
`EngineError`	If connecting to DuckDB or executing SQL fails, or if remote export fails.

Example

Build a catalog in-place::

from duckalog import build_catalog

build_catalog("catalog.yaml")

Build and export to remote storage::

build_catalog("catalog.yaml", db_path="s3://my-bucket/catalog.duckdb")

Generate SQL without modifying the database::

sql = build_catalog("catalog.yaml", dry_run=True)
print(sql)

`connect_and_build_catalog(config_path, database_path=None, dry_run=False, verbose=False, read_only=False, **kwargs)` ¶

Build a catalog and create a DuckDB connection in one operation.

This function combines catalog building with connection creation, providing a streamlined workflow for users who want to start working with their catalog immediately after creating it.

Parameters:

Name	Type	Description	Default
`config_path`	`str`	Path to the YAML/JSON configuration file.	required
`database_path`	`str \| None`	Optional database path override.	`None`
`dry_run`	`bool`	If True, only validates configuration and returns SQL. If False, builds the catalog and creates a connection.	`False`
`verbose`	`bool`	Enable verbose logging during build process.	`False`
`read_only`	`bool`	Open the resulting connection in read-only mode.	`False`
`**kwargs`	`Any`	Additional keyword arguments.	`{}`

Returns:

Type	Description
`DuckDBPyConnection \| str \| None`	A DuckDB connection object for immediate use, or SQL string when dry_run=True.

Raises:

Type	Description
`ConfigError`	If the configuration file is invalid.
`EngineError`	If catalog building or connection fails.
`FileNotFoundError`	If the resulting database file doesn't exist (after build).

Example

Build catalog and start querying immediately::

conn = connect_and_build_catalog("catalog.yaml")
data = conn.execute("SELECT * FROM important_table").fetchall()
print(f"Found {len(data)} records")
conn.close()

Validate only (dry run)::

sql = connect_and_build_catalog("catalog.yaml", dry_run=True) print("SQL generation completed, no database created")

Custom database path::

conn = connect_and_build_catalog("catalog.yaml", database_path="analytics.db")
print("Connected to custom database: analytics.db")

If dry_run=True, the function only validates the configuration and returns the SQL script without creating any database files or connections.

`connect_to_catalog(config_path, database_path=None, read_only=False)` ¶

Connect to an existing DuckDB database created by Duckalog.

This function provides a direct connection to a DuckDB database that was previously created using Duckalog. It simplifies the workflow for users who just want to start querying their catalog database.

Parameters:

Name	Type	Description	Default
`config_path`	`str`	Path to the YAML/JSON configuration file for an existing catalog.	required
`database_path`	`str \| None`	Optional database path override. If not provided, uses the path from the configuration.	`None`
`read_only`	`bool`	Open the connection in read-only mode for safety.	`False`

Returns:

Type	Description
`DuckDBPyConnection`	An active DuckDB connection object ready for query execution.

Raises:

Type	Description
`ConfigError`	If the configuration file is invalid.
`FileNotFoundError`	If the specified database file doesn't exist.
`Error`	If connection or queries fail.

Example

Connect to an existing catalog::

from duckalog import connect_to_catalog
conn = connect_to_catalog("catalog.yaml")

# Use the connection for queries
result = conn.execute("SELECT * FROM some_table").fetchall()
conn.close()

With path override::

conn = connect_to_catalog("catalog.yaml", database_path="analytics.db")

With read-only mode::

conn = connect_to_catalog("catalog.yaml", read_only=True)

With context manager::

from duckalog import connect_to_catalog_cm
with connect_to_catalog_cm("catalog.yaml") as conn:
    data = conn.execute("SELECT * FROM users").fetchall()
    print(f"Found {len(data)} records")
# Connection automatically closed here

`connect_to_catalog_cm(config_path, database_path=None, read_only=False)` ¶

Context manager version of connect_to_catalog for automatic connection cleanup.

Usage::

from duckalog import connect_to_catalog_cm
with connect_to_catalog_cm("catalog.yaml") as conn:
    data = conn.execute("SELECT * FROM users").fetchall()
    print(f"Found {len(data)} records")
# Connection is automatically closed here

Parameters:

Name	Type	Description	Default
`config_path`	`str`	Path to the YAML/JSON configuration file for an existing catalog.	required
`database_path`	`str \| None`	Optional database path override.	`None`
`read_only`	`bool`	Open the connection in read-only mode for safety.	`False`

Yields:

Type	Description
`Generator[DuckDBPyConnection]`	An active DuckDB connection that will be closed automatically.

Raises:

Type	Description
`ConfigError`	If the configuration file is invalid.
`FileNotFoundError`	If the specified database file doesn't exist.
`Error`	If connection or queries fail.

`create_config_template(format='yaml', output_path=None, database_name='analytics_catalog.duckdb', project_name='my_analytics_project')` ¶

Generate a basic, valid Duckalog configuration template.

This function creates a configuration template with sensible defaults and educational example content that demonstrates key Duckalog features.

Parameters:

Name	Type	Description	Default
`format`	`ConfigFormat`	Output format for the configuration ('yaml' or 'json'). Defaults to 'yaml'.	`'yaml'`
`output_path`	`str \| None`	Optional path to write the configuration file. If provided, the template is written to this path and the content is also returned as a string.	`None`
`database_name`	`str`	Name for the DuckDB database file.	`'analytics_catalog.duckdb'`
`project_name`	`str`	Name used in comments to personalize the template.	`'my_analytics_project'`

Returns:

Type	Description
`str`	The generated configuration as a string.

Raises:

Type	Description
`ValueError`	If format is not 'yaml' or 'json'.
`ConfigError`	If the generated template fails validation.
`OSError`	If writing to output_path fails.

Example

Generate a YAML template::

template = create_config_template(format='yaml')
print(template)

Generate and save a JSON template::

template = create_config_template(
    format='json',
    output_path='my_config.json'
)

`generate_all_views_sql(config, include_secrets=False)` ¶

Generate SQL for all views in a configuration.

The output includes a descriptive header with the config version followed by a CREATE OR REPLACE VIEW statement for each view in the order they appear in the configuration.

Parameters:

Name	Type	Description	Default
`config`	`Config`	The validated :class:`Config` instance to render.	required
`include_secrets`	`bool`	Whether to include CREATE SECRET statements for secrets.	`False`

Returns:

Type	Description
`str`	A multi-statement SQL script suitable for use as a catalog definition.

`generate_secret_sql(secret)` ¶

Generate CREATE SECRET statement for a DuckDB secret.

Parameters:

Name	Type	Description	Default
`secret`	`SecretConfig`	Secret configuration object.	required

Returns:

Type	Description
`str`	SQL CREATE SECRET statement.

`generate_sql(config_path)` ¶

Generate a full SQL script from a config file.

This is a convenience wrapper around :func:load_config and :func:generate_all_views_sql that does not connect to DuckDB.

Parameters:

Name	Type	Description	Default
`config_path`	`str`	Path to the YAML/JSON configuration file.	required

Returns:

Type	Description
`str`	A multi-statement SQL script containing `CREATE OR REPLACE VIEW`
`str`	statements for all configured views.

Raises:

Type	Description
`ConfigError`	If the configuration file is invalid.

Example

from duckalog import generate_sql sql = generate_sql("catalog.yaml") print("CREATE VIEW" in sql) True

`generate_view_sql(view)` ¶

Generate a CREATE OR REPLACE VIEW statement for a single view.

Parameters:

Name	Type	Description	Default
`view`	`ViewConfig`	The :class:`ViewConfig` to generate SQL for.	required

Returns:

Type	Description
`str`	A single SQL statement that creates or replaces the view.

`load_config(path, load_sql_files=True, sql_file_loader=None, resolve_paths=True, filesystem=None)` ¶

Load, interpolate, and validate a Duckalog configuration file.

This helper is the main entry point for turning a YAML or JSON file into a validated :class:Config instance. It applies environment-variable interpolation and enforces the configuration schema.

Parameters:

Name	Type	Description	Default
`path`	`str`	Path to a YAML or JSON config file, or a remote URI.	required
`load_sql_files`	`bool`	Whether to load and process SQL from external files. If False, SQL file references are left as-is for later processing.	`True`
`sql_file_loader`	`Optional[Any]`	Optional SQLFileLoader instance for loading SQL files. If None, a default loader will be created.	`None`
`resolve_paths`	`bool`	Whether to resolve relative paths to absolute paths. If True, relative paths in view URIs and attachment paths will be resolved relative to the config file's directory. For remote configs, this defaults to False.	`True`
`filesystem`	`Optional[Any]`	Optional fsspec filesystem object to use for remote operations. If provided, this filesystem will be used instead of creating a new one based on URI scheme. Useful for custom authentication or advanced use cases.	`None`

Returns:

Type	Description
`Any`	A validated :class:`Config` object.

Raises:

Type	Description
`ConfigError`	If the file cannot be read, is not valid YAML/JSON, fails schema validation, contains unresolved `${env:VAR_NAME}` placeholders, or if SQL file loading fails.

Example

Load a catalog from catalog.yaml::

from duckalog import load_config

config = load_config("catalog.yaml")
print(len(config.views))

Load a catalog from S3::

config = load_config("s3://my-bucket/configs/catalog.yaml")
print(len(config.views))

Load a catalog with custom filesystem::

import fsspec
fs = fsspec.filesystem("s3", key="key", secret="secret", anon=False)
config = load_config("s3://my-bucket/configs/catalog.yaml", filesystem=fs)
print(len(config.views))

`quote_ident(value)` ¶

Quote a SQL identifier using double quotes.

This helper wraps a string in double quotes and escapes any embedded double quotes according to SQL rules.

Parameters:

Name	Type	Description	Default
`value`	`str`	Identifier to quote (for example, a view or column name).	required

Returns:

Type	Description
`str`	The identifier wrapped in double quotes.

Example

quote_ident("events") '"events"'

`quote_literal(value)` ¶

Quote a SQL string literal using single quotes.

This helper wraps a string in single quotes and escapes any embedded single quotes according to SQL rules.

Parameters:

Name	Type	Description	Default
`value`	`str`	String literal to quote (for example, a file path, secret, or connection string).	required

Returns:

Type	Description
`str`	The string wrapped in single quotes with proper escaping.

Example

quote_literal("path/to/file.parquet") "'path/to/file.parquet'" quote_literal("user's data") "'user''s data'"

`render_options(options)` ¶

Render a mapping of options into scan-function arguments.

The resulting string is suitable for appending to a *_scan function call. Keys are sorted alphabetically to keep output deterministic.

Parameters:

Name	Type	Description	Default
`options`	`dict[str, Any]`	Mapping of option name to value (str, bool, int, or float).	required

Returns:

Type	Description
`str`	A string that starts with `,` when options are present (for example,
`str`	`", hive_partitioning=TRUE"`) or an empty string when no options
`str`	are provided.

Raises:

Type	Description
`TypeError`	If a value has a type that cannot be rendered safely.

`validate_config(config_path)` ¶

Validate a configuration file without touching DuckDB.

Parameters:

Name	Type	Description	Default
`config_path`	`str`	Path to the YAML/JSON configuration file.	required

Raises:

Type	Description
`ConfigError`	If the configuration file is missing, malformed, or does not satisfy the schema and interpolation rules.

Example

from duckalog import validate_config validate_config("catalog.yaml") # raises on invalid config

`validate_generated_config(content, format='yaml')` ¶

Validate that generated configuration content can be loaded successfully.

Parameters:

Name	Type	Description	Default
`content`	`str`	Configuration content as string.	required
`format`	`ConfigFormat`	Format of the content ('yaml' or 'json').	`'yaml'`

Raises:

Type	Description
`ConfigError`	If the configuration cannot be loaded or is invalid.

Configuration Models¶

Config¶

The main configuration class that represents a Duckalog configuration file.

View¶

Represents a view definition in the catalog.

DuckDBConfig¶

Configuration for the DuckDB database instance.

Attachment¶

Configuration for database attachments.

IcebergCatalog¶

Configuration for Iceberg catalog connections.

Utility Functions¶

load_config¶

Load a Duckalog configuration from a file path.

build_catalog¶

Build a DuckDB catalog from a configuration file.

validate_config¶

Validate a Duckalog configuration without building.

generate_sql¶

Generate SQL statements from a configuration file.

Command Line Interface¶

duckalog build¶

Build a DuckDB catalog from a configuration file.

duckalog validate¶

Validate a configuration file.

duckalog generate-sql¶

Generate SQL statements from a configuration file.

Error Handling¶

ConfigError¶

Raised when there's an error in the configuration file.

CatalogError¶

Raised when there's an error building the catalog.

Advanced Usage¶

For detailed examples and advanced usage patterns, see the User Guide and Examples.

API Reference¶

Core API¶

duckalog ¶

AttachmentsConfig ¶

Config ¶

ConfigError ¶

DuckDBAttachment ¶

DuckDBConfig ¶

EngineError ¶

IcebergCatalogConfig ¶

PostgresAttachment ¶

SQLiteAttachment ¶

SecretConfig ¶

ViewConfig ¶

build_catalog(config_path, db_path=None, dry_run=False, verbose=False, filesystem=None, include_secrets=True) ¶

connect_and_build_catalog(config_path, database_path=None, dry_run=False, verbose=False, read_only=False, **kwargs) ¶

connect_to_catalog(config_path, database_path=None, read_only=False) ¶

connect_to_catalog_cm(config_path, database_path=None, read_only=False) ¶

create_config_template(format='yaml', output_path=None, database_name='analytics_catalog.duckdb', project_name='my_analytics_project') ¶

generate_all_views_sql(config, include_secrets=False) ¶

generate_secret_sql(secret) ¶

generate_sql(config_path) ¶

generate_view_sql(view) ¶

load_config(path, load_sql_files=True, sql_file_loader=None, resolve_paths=True, filesystem=None) ¶

quote_ident(value) ¶

quote_literal(value) ¶

render_options(options) ¶

validate_config(config_path) ¶

validate_generated_config(content, format='yaml') ¶

Configuration Models¶

Config¶

View¶

DuckDBConfig¶

Attachment¶

IcebergCatalog¶

Utility Functions¶

load_config¶

build_catalog¶

validate_config¶

generate_sql¶

Command Line Interface¶

duckalog build¶

duckalog validate¶

duckalog generate-sql¶

Error Handling¶

ConfigError¶

CatalogError¶

Advanced Usage¶

`duckalog` ¶

`AttachmentsConfig` ¶

`Config` ¶

`ConfigError` ¶

`DuckDBAttachment` ¶

`DuckDBConfig` ¶

`EngineError` ¶

`IcebergCatalogConfig` ¶

`PostgresAttachment` ¶

`SQLiteAttachment` ¶

`SecretConfig` ¶

`ViewConfig` ¶

`build_catalog(config_path, db_path=None, dry_run=False, verbose=False, filesystem=None, include_secrets=True)` ¶

`connect_and_build_catalog(config_path, database_path=None, dry_run=False, verbose=False, read_only=False, **kwargs)` ¶

`connect_to_catalog(config_path, database_path=None, read_only=False)` ¶

`connect_to_catalog_cm(config_path, database_path=None, read_only=False)` ¶

`create_config_template(format='yaml', output_path=None, database_name='analytics_catalog.duckdb', project_name='my_analytics_project')` ¶

`generate_all_views_sql(config, include_secrets=False)` ¶

`generate_secret_sql(secret)` ¶

`generate_sql(config_path)` ¶

`generate_view_sql(view)` ¶

`load_config(path, load_sql_files=True, sql_file_loader=None, resolve_paths=True, filesystem=None)` ¶

`quote_ident(value)` ¶

`quote_literal(value)` ¶

`render_options(options)` ¶

`validate_config(config_path)` ¶

`validate_generated_config(content, format='yaml')` ¶