Skip to content

API Reference

This section provides comprehensive API documentation for the Duckalog library, generated from the source code using mkdocstrings.

Core API

duckalog

Duckalog public API.

AttachmentsConfig

Bases: BaseModel

Collection of attachment configurations.

Attributes:

Name Type Description
duckdb list[DuckDBAttachment]

DuckDB attachment entries.

sqlite list[SQLiteAttachment]

SQLite attachment entries.

postgres list[PostgresAttachment]

Postgres attachment entries.

duckalog list[DuckalogAttachment]

Duckalog config attachment entries.

Config

Bases: BaseModel

Top-level Duckalog configuration.

Attributes:

Name Type Description
version int

Positive integer describing the config schema version.

duckdb DuckDBConfig

DuckDB session and connection settings.

views list[ViewConfig]

List of view definitions to create in the catalog.

attachments AttachmentsConfig

Optional attachments to external databases.

iceberg_catalogs list[IcebergCatalogConfig]

Optional Iceberg catalog definitions.

semantic_models list[SemanticModelConfig]

Optional semantic model definitions for business metadata.

ConfigError

Bases: DuckalogError

Configuration-related errors.

This exception is raised when a catalog configuration cannot be read, parsed, interpolated, or validated according to the Duckalog schema.

Typical error conditions include:

  • The config file does not exist or cannot be read.
  • The file is not valid YAML/JSON.
  • Required fields are missing or invalid.
  • An environment variable placeholder cannot be resolved.

DuckDBAttachment

Bases: BaseModel

Configuration for attaching another DuckDB database.

Attributes:

Name Type Description
alias str

Alias under which the database will be attached.

path str

Filesystem path to the DuckDB database file.

read_only bool

Whether the attachment should be opened in read-only mode. Defaults to True for safety.

DuckDBConfig

Bases: BaseModel

DuckDB connection and session settings.

Attributes:

Name Type Description
database str

Path to the DuckDB database file. Defaults to ":memory:".

install_extensions list[str]

Names of extensions to install before use.

load_extensions list[str]

Names of extensions to load in the session.

pragmas list[str]

SQL statements (typically SET pragmas) executed after connecting and loading extensions.

settings Optional[Literal[str, list[str]]]

DuckDB SET statements executed after pragmas. Can be a single string or list of strings.

secrets list[SecretConfig]

List of secret definitions for external services and databases.

EngineError

Bases: DuckalogError

Engine-level error raised during catalog builds.

This exception wraps lower-level DuckDB errors, such as failures to connect to the database, attach external systems, or execute generated SQL statements.

IcebergCatalogConfig

Bases: BaseModel

Configuration for an Iceberg catalog.

Attributes:

Name Type Description
name str

Catalog name referenced by Iceberg views.

catalog_type str

Backend type (for example, rest, hive, glue).

uri Optional[str]

Optional URI used by certain catalog types.

warehouse Optional[str]

Optional warehouse location for catalog data.

options dict[str, Any]

Additional catalog-specific options.

PostgresAttachment

Bases: BaseModel

Configuration for attaching a Postgres database.

Attributes:

Name Type Description
alias str

Alias used inside DuckDB to reference the Postgres database.

host str

Hostname or IP address of the Postgres server.

port int

TCP port of the Postgres server.

database str

Database name to connect to.

user str

Username for authentication.

password str

Password for authentication.

sslmode Optional[str]

Optional SSL mode (for example, require).

options dict[str, Any]

Extra key/value options passed to the attachment clause.

SQLiteAttachment

Bases: BaseModel

Configuration for attaching a SQLite database.

Attributes:

Name Type Description
alias str

Alias under which the SQLite database will be attached.

path str

Filesystem path to the SQLite .db file.

SecretConfig

Bases: BaseModel

Configuration for a DuckDB secret.

Attributes:

Name Type Description
type SecretType

Secret type (s3, azure, gcs, http, postgres, mysql).

name Optional[str]

Optional name for the secret (defaults to type if not provided).

provider SecretProvider

Secret provider (config or credential_chain).

persistent bool

Whether to create a persistent secret. Defaults to False.

scope Optional[str]

Optional scope prefix for the secret.

key_id Optional[str]

Access key ID or username for authentication.

secret Optional[str]

Secret key or password for authentication.

region Optional[str]

Geographic region for cloud services.

endpoint Optional[str]

Custom endpoint URL for cloud services.

connection_string Optional[str]

Full connection string for databases.

tenant_id Optional[str]

Azure tenant ID for authentication.

account_name Optional[str]

Azure storage account name.

client_id Optional[str]

Azure client ID for authentication.

client_secret Optional[str]

Azure client secret for authentication.

service_account_key Optional[str]

GCS service account key.

json_key Optional[str]

GCS JSON key.

bearer_token Optional[str]

HTTP bearer token for authentication.

header Optional[str]

HTTP header for authentication.

database Optional[str]

Database name for database secrets.

host Optional[str]

Database host for database secrets.

port Optional[int]

Database port for database secrets.

user Optional[str]

Database username (alternative to key_id for database types).

password Optional[str]

Database password (alternative to secret for database types).

options dict[str, Any]

Additional key-value options for the secret.

ViewConfig

Bases: BaseModel

Definition of a single catalog view.

A view can be defined in several ways: 1. Inline SQL: Using the sql field with raw SQL text 2. SQL File: Using sql_file to reference external SQL files 3. SQL Template: Using sql_template for parameterized SQL files 4. Data Source: Using source + required fields for direct data access 5. Source + SQL: Using source for data access plus sql for transformations

For data sources, the required fields depend on the source type: - Parquet/Delta: uri field is required - Iceberg: Either uri OR both catalog and table - DuckDB/SQLite/Postgres: Both database and table are required

When using SQL with a data source, the SQL will be applied as a transformation over the data from the specified source.

Additional metadata fields such as description and tags do not affect SQL generation but are preserved for documentation and tooling.

Attributes:

Name Type Description
name str

Unique view name within the config.

sql Optional[str]

Raw SQL text defining the view body.

sql_file Optional[SQLFileReference]

Direct reference to a SQL file.

sql_template Optional[SQLFileReference]

Reference to a SQL template file with variable substitution.

source Optional[EnvSource]

Source type (e.g. "parquet", "iceberg", "duckdb").

uri Optional[str]

URI for file- or table-based sources (Parquet/Delta/Iceberg).

database Optional[str]

Attachment alias for attached-database sources.

table Optional[str]

Table name (optionally schema-qualified) for attached sources.

catalog Optional[str]

Iceberg catalog name for catalog-based Iceberg views.

options dict[str, Any]

Source-specific options passed to scan functions.

description Optional[str]

Optional human-readable description of the view.

tags list[str]

Optional list of tags for classification.

build_catalog(config_path, db_path=None, dry_run=False, verbose=False, filesystem=None, include_secrets=True)

Build or update a DuckDB catalog from a configuration file.

This function is the high-level entry point used by both the CLI and Python API. It loads the config, optionally performs a dry-run SQL generation, or otherwise connects to DuckDB, sets up attachments and Iceberg catalogs, and creates or replaces configured views.

Parameters:

Name Type Description Default
config_path str

Path to the YAML/JSON configuration file.

required
db_path str | None

Optional override for duckdb.database in the config. Can be a local path or remote URI (s3://, gs://, gcs://, abfs://, adl://, sftp://).

None
dry_run bool

If True, do not connect to DuckDB; instead generate and return the full SQL script for all views.

False
verbose bool

If True, enable more verbose logging via the standard logging module.

False
filesystem Any | None

Optional pre-configured fsspec filesystem object for remote export authentication. If not provided, default authentication will be used.

None

Returns:

Type Description
str | None

The generated SQL script as a string when dry_run is True,

str | None

otherwise None when the catalog is applied to DuckDB.

Raises:

Type Description
ConfigError

If the configuration file is invalid.

EngineError

If connecting to DuckDB or executing SQL fails, or if remote export fails.

Example

Build a catalog in-place::

from duckalog import build_catalog

build_catalog("catalog.yaml")

Build and export to remote storage::

build_catalog("catalog.yaml", db_path="s3://my-bucket/catalog.duckdb")

Generate SQL without modifying the database::

sql = build_catalog("catalog.yaml", dry_run=True)
print(sql)

connect_and_build_catalog(config_path, database_path=None, dry_run=False, verbose=False, read_only=False, **kwargs)

Build a catalog and create a DuckDB connection in one operation.

This function combines catalog building with connection creation, providing a streamlined workflow for users who want to start working with their catalog immediately after creating it.

Parameters:

Name Type Description Default
config_path str

Path to the YAML/JSON configuration file.

required
database_path str | None

Optional database path override.

None
dry_run bool

If True, only validates configuration and returns SQL. If False, builds the catalog and creates a connection.

False
verbose bool

Enable verbose logging during build process.

False
read_only bool

Open the resulting connection in read-only mode.

False
**kwargs Any

Additional keyword arguments.

{}

Returns:

Type Description
DuckDBPyConnection | str | None

A DuckDB connection object for immediate use, or SQL string when dry_run=True.

Raises:

Type Description
ConfigError

If the configuration file is invalid.

EngineError

If catalog building or connection fails.

FileNotFoundError

If the resulting database file doesn't exist (after build).

Example

Build catalog and start querying immediately::

conn = connect_and_build_catalog("catalog.yaml")
data = conn.execute("SELECT * FROM important_table").fetchall()
print(f"Found {len(data)} records")
conn.close()

Validate only (dry run)::

sql = connect_and_build_catalog("catalog.yaml", dry_run=True) print("SQL generation completed, no database created")

Custom database path::

conn = connect_and_build_catalog("catalog.yaml", database_path="analytics.db")
print("Connected to custom database: analytics.db")

If dry_run=True, the function only validates the configuration and returns the SQL script without creating any database files or connections.

connect_to_catalog(config_path, database_path=None, read_only=False)

Connect to an existing DuckDB database created by Duckalog.

This function provides a direct connection to a DuckDB database that was previously created using Duckalog. It simplifies the workflow for users who just want to start querying their catalog database.

Parameters:

Name Type Description Default
config_path str

Path to the YAML/JSON configuration file for an existing catalog.

required
database_path str | None

Optional database path override. If not provided, uses the path from the configuration.

None
read_only bool

Open the connection in read-only mode for safety.

False

Returns:

Type Description
DuckDBPyConnection

An active DuckDB connection object ready for query execution.

Raises:

Type Description
ConfigError

If the configuration file is invalid.

FileNotFoundError

If the specified database file doesn't exist.

Error

If connection or queries fail.

Example

Connect to an existing catalog::

from duckalog import connect_to_catalog
conn = connect_to_catalog("catalog.yaml")

# Use the connection for queries
result = conn.execute("SELECT * FROM some_table").fetchall()
conn.close()

With path override::

conn = connect_to_catalog("catalog.yaml", database_path="analytics.db")

With read-only mode::

conn = connect_to_catalog("catalog.yaml", read_only=True)

With context manager::

from duckalog import connect_to_catalog_cm
with connect_to_catalog_cm("catalog.yaml") as conn:
    data = conn.execute("SELECT * FROM users").fetchall()
    print(f"Found {len(data)} records")
# Connection automatically closed here

connect_to_catalog_cm(config_path, database_path=None, read_only=False)

Context manager version of connect_to_catalog for automatic connection cleanup.

Usage::

from duckalog import connect_to_catalog_cm
with connect_to_catalog_cm("catalog.yaml") as conn:
    data = conn.execute("SELECT * FROM users").fetchall()
    print(f"Found {len(data)} records")
# Connection is automatically closed here

Parameters:

Name Type Description Default
config_path str

Path to the YAML/JSON configuration file for an existing catalog.

required
database_path str | None

Optional database path override.

None
read_only bool

Open the connection in read-only mode for safety.

False

Yields:

Type Description
Generator[DuckDBPyConnection]

An active DuckDB connection that will be closed automatically.

Raises:

Type Description
ConfigError

If the configuration file is invalid.

FileNotFoundError

If the specified database file doesn't exist.

Error

If connection or queries fail.

create_config_template(format='yaml', output_path=None, database_name='analytics_catalog.duckdb', project_name='my_analytics_project')

Generate a basic, valid Duckalog configuration template.

This function creates a configuration template with sensible defaults and educational example content that demonstrates key Duckalog features.

Parameters:

Name Type Description Default
format ConfigFormat

Output format for the configuration ('yaml' or 'json'). Defaults to 'yaml'.

'yaml'
output_path str | None

Optional path to write the configuration file. If provided, the template is written to this path and the content is also returned as a string.

None
database_name str

Name for the DuckDB database file.

'analytics_catalog.duckdb'
project_name str

Name used in comments to personalize the template.

'my_analytics_project'

Returns:

Type Description
str

The generated configuration as a string.

Raises:

Type Description
ValueError

If format is not 'yaml' or 'json'.

ConfigError

If the generated template fails validation.

OSError

If writing to output_path fails.

Example

Generate a YAML template::

template = create_config_template(format='yaml')
print(template)

Generate and save a JSON template::

template = create_config_template(
    format='json',
    output_path='my_config.json'
)

generate_all_views_sql(config, include_secrets=False)

Generate SQL for all views in a configuration.

The output includes a descriptive header with the config version followed by a CREATE OR REPLACE VIEW statement for each view in the order they appear in the configuration.

Parameters:

Name Type Description Default
config Config

The validated :class:Config instance to render.

required
include_secrets bool

Whether to include CREATE SECRET statements for secrets.

False

Returns:

Type Description
str

A multi-statement SQL script suitable for use as a catalog definition.

generate_secret_sql(secret)

Generate CREATE SECRET statement for a DuckDB secret.

Parameters:

Name Type Description Default
secret SecretConfig

Secret configuration object.

required

Returns:

Type Description
str

SQL CREATE SECRET statement.

generate_sql(config_path)

Generate a full SQL script from a config file.

This is a convenience wrapper around :func:load_config and :func:generate_all_views_sql that does not connect to DuckDB.

Parameters:

Name Type Description Default
config_path str

Path to the YAML/JSON configuration file.

required

Returns:

Type Description
str

A multi-statement SQL script containing CREATE OR REPLACE VIEW

str

statements for all configured views.

Raises:

Type Description
ConfigError

If the configuration file is invalid.

Example

from duckalog import generate_sql sql = generate_sql("catalog.yaml") print("CREATE VIEW" in sql) True

generate_view_sql(view)

Generate a CREATE OR REPLACE VIEW statement for a single view.

Parameters:

Name Type Description Default
view ViewConfig

The :class:ViewConfig to generate SQL for.

required

Returns:

Type Description
str

A single SQL statement that creates or replaces the view.

load_config(path, load_sql_files=True, sql_file_loader=None, resolve_paths=True, filesystem=None)

Load, interpolate, and validate a Duckalog configuration file.

This helper is the main entry point for turning a YAML or JSON file into a validated :class:Config instance. It applies environment-variable interpolation and enforces the configuration schema.

Parameters:

Name Type Description Default
path str

Path to a YAML or JSON config file, or a remote URI.

required
load_sql_files bool

Whether to load and process SQL from external files. If False, SQL file references are left as-is for later processing.

True
sql_file_loader Optional[Any]

Optional SQLFileLoader instance for loading SQL files. If None, a default loader will be created.

None
resolve_paths bool

Whether to resolve relative paths to absolute paths. If True, relative paths in view URIs and attachment paths will be resolved relative to the config file's directory. For remote configs, this defaults to False.

True
filesystem Optional[Any]

Optional fsspec filesystem object to use for remote operations. If provided, this filesystem will be used instead of creating a new one based on URI scheme. Useful for custom authentication or advanced use cases.

None

Returns:

Type Description
Any

A validated :class:Config object.

Raises:

Type Description
ConfigError

If the file cannot be read, is not valid YAML/JSON, fails schema validation, contains unresolved ${env:VAR_NAME} placeholders, or if SQL file loading fails.

Example

Load a catalog from catalog.yaml::

from duckalog import load_config

config = load_config("catalog.yaml")
print(len(config.views))

Load a catalog from S3::

config = load_config("s3://my-bucket/configs/catalog.yaml")
print(len(config.views))

Load a catalog with custom filesystem::

import fsspec
fs = fsspec.filesystem("s3", key="key", secret="secret", anon=False)
config = load_config("s3://my-bucket/configs/catalog.yaml", filesystem=fs)
print(len(config.views))

quote_ident(value)

Quote a SQL identifier using double quotes.

This helper wraps a string in double quotes and escapes any embedded double quotes according to SQL rules.

Parameters:

Name Type Description Default
value str

Identifier to quote (for example, a view or column name).

required

Returns:

Type Description
str

The identifier wrapped in double quotes.

Example

quote_ident("events") '"events"'

quote_literal(value)

Quote a SQL string literal using single quotes.

This helper wraps a string in single quotes and escapes any embedded single quotes according to SQL rules.

Parameters:

Name Type Description Default
value str

String literal to quote (for example, a file path, secret, or connection string).

required

Returns:

Type Description
str

The string wrapped in single quotes with proper escaping.

Example

quote_literal("path/to/file.parquet") "'path/to/file.parquet'" quote_literal("user's data") "'user''s data'"

render_options(options)

Render a mapping of options into scan-function arguments.

The resulting string is suitable for appending to a *_scan function call. Keys are sorted alphabetically to keep output deterministic.

Parameters:

Name Type Description Default
options dict[str, Any]

Mapping of option name to value (str, bool, int, or float).

required

Returns:

Type Description
str

A string that starts with , when options are present (for example,

str

", hive_partitioning=TRUE") or an empty string when no options

str

are provided.

Raises:

Type Description
TypeError

If a value has a type that cannot be rendered safely.

validate_config(config_path)

Validate a configuration file without touching DuckDB.

Parameters:

Name Type Description Default
config_path str

Path to the YAML/JSON configuration file.

required

Raises:

Type Description
ConfigError

If the configuration file is missing, malformed, or does not satisfy the schema and interpolation rules.

Example

from duckalog import validate_config validate_config("catalog.yaml") # raises on invalid config

validate_generated_config(content, format='yaml')

Validate that generated configuration content can be loaded successfully.

Parameters:

Name Type Description Default
content str

Configuration content as string.

required
format ConfigFormat

Format of the content ('yaml' or 'json').

'yaml'

Raises:

Type Description
ConfigError

If the configuration cannot be loaded or is invalid.

Configuration Models

Config

The main configuration class that represents a Duckalog configuration file.

View

Represents a view definition in the catalog.

DuckDBConfig

Configuration for the DuckDB database instance.

Attachment

Configuration for database attachments.

IcebergCatalog

Configuration for Iceberg catalog connections.

Utility Functions

load_config

Load a Duckalog configuration from a file path.

build_catalog

Build a DuckDB catalog from a configuration file.

validate_config

Validate a Duckalog configuration without building.

generate_sql

Generate SQL statements from a configuration file.

Command Line Interface

duckalog build

Build a DuckDB catalog from a configuration file.

duckalog validate

Validate a configuration file.

duckalog generate-sql

Generate SQL statements from a configuration file.

Error Handling

ConfigError

Raised when there's an error in the configuration file.

CatalogError

Raised when there's an error building the catalog.

Advanced Usage

For detailed examples and advanced usage patterns, see the User Guide and Examples.