API Reference¶
This reference provides comprehensive documentation for the Duckalog Python API, including the new modular architecture patterns, dependency injection interfaces, and advanced usage patterns.
Core API Overview¶
The main API provides both simple convenience functions and advanced patterns for custom implementations. With the new config-driven connection management system, the primary API now returns a CatalogConnection object for intelligent connection management.
Core Functions¶
New: connect_to_catalog()¶
The primary function for connecting to catalogs with intelligent connection management:
from duckalog import connect_to_catalog
# Basic usage - returns CatalogConnection object
conn = connect_to_catalog("catalog.yaml")
# Use as context manager for automatic cleanup
with connect_to_catalog("catalog.yaml") as conn:
# conn is a CatalogConnection object
result = conn.get_connection().execute("SELECT COUNT(*) FROM users").fetchall()
# Force rebuild of all views
conn = connect_to_catalog("catalog.yaml", force_rebuild=True)
# Custom database path
conn = connect_to_catalog("catalog.yaml", database_path="custom.duckdb")
Key Features of CatalogConnection¶
- Automatic Connection Management: Handles connection pooling and reuse
- Session State Restoration: Automatically restores pragmas, settings, and attachments
- Incremental Updates: Only creates missing views for faster builds
- Lazy Connection: Database connection established only when needed
- Context Manager Support: Automatic cleanup when used with
withstatement
Legacy: connect_to_catalog_cm()¶
For backward compatibility, the context manager function still works:
from duckalog import connect_to_catalog_cm
# Old style (still supported for compatibility)
with connect_to_catalog_cm("catalog.yaml") as conn:
# conn is raw DuckDB connection (legacy behavior)
result = conn.execute("SELECT * FROM users").fetchall()
Legacy Functions¶
All existing functions continue to work but are deprecated in favor of the new connection management approach:
from duckalog import (
generate_sql, # Generate SQL from config
validate_config, # Validate configuration
connect_to_catalog, # Build and connect in one operation
)
duckalog
¶
Duckalog public API.
AttachmentsConfig
¶
Bases: BaseModel
Collection of attachment configurations.
Attributes:
| Name | Type | Description |
|---|---|---|
duckdb |
list[DuckDBAttachment]
|
DuckDB attachment entries. |
sqlite |
list[SQLiteAttachment]
|
SQLite attachment entries. |
postgres |
list[PostgresAttachment]
|
Postgres attachment entries. |
duckalog |
list[DuckalogAttachment]
|
Duckalog config attachment entries. |
CatalogConnection
¶
Manages DuckDB connections with session state restoration and lazy initialization.
This class ensures that every connection obtained through it has the correct catalog state (pragmas, settings, extensions, attachments, and secrets) applied, regardless of whether it's a new or existing connection.
Attributes:
| Name | Type | Description |
|---|---|---|
config_path |
Path to the Duckalog configuration file. |
|
database_path |
Optional override for the DuckDB database path. |
|
read_only |
Whether to open the connection in read-only mode. |
|
force_rebuild |
If True, all views will be recreated even if they exist. |
|
config |
Optional[Config]
|
The loaded and validated configuration. |
conn |
Optional[DuckDBPyConnection]
|
The active DuckDB connection, or None if not initialized. |
__enter__()
¶
Context manager support: returns the CatalogConnection instance.
__exit__(exc_type, exc_val, exc_tb)
¶
Context manager support: ensures connection cleanup.
__init__(config_path, database_path=None, read_only=False, force_rebuild=False, filesystem=None, load_dotenv=True)
¶
Initialize the catalog connection manager.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
str
|
Path to the Duckalog configuration file. |
required |
database_path
|
Optional[str]
|
Optional override for the DuckDB database path. |
None
|
read_only
|
bool
|
Whether to open the connection in read-only mode. |
False
|
force_rebuild
|
bool
|
If True, all views will be recreated even if they exist. |
False
|
filesystem
|
Optional[Any]
|
Optional fsspec filesystem object for remote file access. |
None
|
load_dotenv
|
bool
|
If True, automatically load and process .env files. |
True
|
close()
¶
Clean up the DuckDB connection and resources.
get_connection()
¶
Get the DuckDB connection, initializing it if necessary.
This method establishes the connection lazily on the first call, restores the session state, and performs incremental updates. Subsequent calls return the same connection instance.
Returns:
| Type | Description |
|---|---|
DuckDBPyConnection
|
An active DuckDB connection with catalog state restored. |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If loading the configuration fails. |
FileNotFoundError
|
If the database path is invalid or missing (for existing catalogs). |
EngineError
|
If connecting to DuckDB or restoring state fails. |
Config
¶
Bases: BaseModel
Top-level Duckalog configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
version |
int
|
Positive integer describing the config schema version. |
duckdb |
DuckDBConfig
|
DuckDB session and connection settings. |
views |
list[ViewConfig]
|
List of view definitions to create in the catalog. |
attachments |
AttachmentsConfig
|
Optional attachments to external databases. |
iceberg_catalogs |
list[IcebergCatalogConfig]
|
Optional Iceberg catalog definitions. |
semantic_models |
list[SemanticModelConfig]
|
Optional semantic model definitions for business metadata. |
imports |
Union[list[Union[str, ImportEntry]], SelectiveImports]
|
Optional list of additional config files to import and merge. Can be a simple list of paths (backward compatible) or a SelectiveImports object for advanced options like section-specific imports, override behavior, and glob patterns. |
env_files |
list[str]
|
Optional list of custom .env file patterns to load. Supports patterns like ['.env', '.env.local', '.env.production']. Files are loaded in order with later files overriding earlier ones. Defaults to ['.env'] for backward compatibility. |
loader_settings |
LoaderSettings
|
Optional settings for the configuration loader. |
ConfigError
¶
Bases: DuckalogError
Configuration-related errors.
This exception is raised when a catalog configuration cannot be read, parsed, interpolated, or validated according to the Duckalog schema.
Typical error conditions include:
- The config file does not exist or cannot be read.
- The file is not valid YAML/JSON.
- Required fields are missing or invalid.
- An environment variable placeholder cannot be resolved.
DuckDBAttachment
¶
Bases: BaseModel
Configuration for attaching another DuckDB database.
Attributes:
| Name | Type | Description |
|---|---|---|
alias |
str
|
Alias under which the database will be attached. |
path |
str
|
Filesystem path to the DuckDB database file. |
read_only |
bool
|
Whether the attachment should be opened in read-only mode.
Defaults to |
DuckDBConfig
¶
Bases: BaseModel
DuckDB connection and session settings.
Attributes:
| Name | Type | Description |
|---|---|---|
database |
str
|
Path to the DuckDB database file. Defaults to |
install_extensions |
list[str]
|
Names of extensions to install before use. |
load_extensions |
list[str]
|
Names of extensions to load in the session. |
pragmas |
list[str]
|
SQL statements (typically |
settings |
Optional[Union[str, list[str]]]
|
DuckDB SET statements executed after pragmas. Can be a single string or list of strings. |
secrets |
list[SecretConfig]
|
List of secret definitions for external services and databases. |
EngineError
¶
Bases: DuckalogError
Engine-level error raised during catalog builds.
This exception wraps lower-level DuckDB errors, such as failures to connect to the database, attach external systems, or execute generated SQL statements.
IcebergCatalogConfig
¶
Bases: BaseModel
Configuration for an Iceberg catalog.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Catalog name referenced by Iceberg views. |
catalog_type |
str
|
Backend type (for example, |
uri |
Optional[str]
|
Optional URI used by certain catalog types. |
warehouse |
Optional[str]
|
Optional warehouse location for catalog data. |
options |
dict[str, Any]
|
Additional catalog-specific options. |
PostgresAttachment
¶
Bases: BaseModel
Configuration for attaching a Postgres database.
Attributes:
| Name | Type | Description |
|---|---|---|
alias |
str
|
Alias used inside DuckDB to reference the Postgres database. |
host |
str
|
Hostname or IP address of the Postgres server. |
port |
int
|
TCP port of the Postgres server. |
database |
str
|
Database name to connect to. |
user |
str
|
Username for authentication. |
password |
str
|
Password for authentication. |
sslmode |
Optional[str]
|
Optional SSL mode (for example, |
options |
dict[str, Any]
|
Extra key/value options passed to the attachment clause. |
SQLFileEncodingError
¶
SQLFileError
¶
Bases: ConfigError
Base exception for SQL file-related errors.
This exception is raised when SQL file operations fail, such as when a referenced SQL file cannot be found, read, or processed.
SQLFileLoader
¶
Loads SQL content from external files and processes templates.
__init__()
¶
Initialize the SQL file loader.
load_sql_file(file_path, config_file_path, variables=None, as_template=False, filesystem=None)
¶
Load SQL content from a file and optionally process as a template.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str
|
Path to the SQL file (can be relative or absolute) |
required |
config_file_path
|
str
|
Path to the config file for resolving relative paths |
required |
variables
|
Optional[dict[str, Any]]
|
Dictionary of variables for template substitution |
None
|
as_template
|
bool
|
Whether to process the file content as a template |
False
|
filesystem
|
Optional[Any]
|
Optional filesystem object for file I/O operations |
None
|
Returns:
| Type | Description |
|---|---|
str
|
The loaded SQL content (processed if template, raw otherwise) |
Raises:
| Type | Description |
|---|---|
SQLFileError
|
If the file cannot be loaded or processed |
SQLFileNotFoundError
¶
SQLFilePermissionError
¶
SQLFileReference
¶
Bases: BaseModel
Reference to SQL content in an external file.
Attributes:
| Name | Type | Description |
|---|---|---|
path |
str
|
Path to the SQL file (relative or absolute). |
variables |
Optional[dict[str, Any]]
|
Dictionary of variables for template substitution. |
as_template |
bool
|
Whether to process the file content as a template. |
SQLFileSizeError
¶
SQLGroup
¶
Unified access to all SQL-related functionality.
SQLTemplateError
¶
SQLiteAttachment
¶
Bases: BaseModel
Configuration for attaching a SQLite database.
Attributes:
| Name | Type | Description |
|---|---|---|
alias |
str
|
Alias under which the SQLite database will be attached. |
path |
str
|
Filesystem path to the SQLite |
SecretConfig
¶
Bases: BaseModel
Configuration for a DuckDB secret.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
SecretType
|
Secret type (s3, azure, gcs, http, postgres, mysql). |
name |
Optional[str]
|
Optional name for the secret (defaults to type if not provided). |
provider |
SecretProvider
|
Secret provider (config or credential_chain). |
persistent |
bool
|
Whether to create a persistent secret. Defaults to False. |
scope |
Optional[str]
|
Optional scope prefix for the secret. |
key_id |
Optional[str]
|
Access key ID or username for authentication. |
secret |
Optional[str]
|
Secret key or password for authentication. |
region |
Optional[str]
|
Geographic region for cloud services. |
endpoint |
Optional[str]
|
Custom endpoint URL for cloud services. |
connection_string |
Optional[str]
|
Full connection string for databases. |
tenant_id |
Optional[str]
|
Azure tenant ID for authentication. |
account_name |
Optional[str]
|
Azure storage account name. |
client_id |
Optional[str]
|
Azure client ID for authentication. |
client_secret |
Optional[str]
|
Azure client secret for authentication. |
service_account_key |
Optional[str]
|
GCS service account key. |
json_key |
Optional[str]
|
GCS JSON key. |
bearer_token |
Optional[str]
|
HTTP bearer token for authentication. |
header |
Optional[str]
|
HTTP header for authentication. |
database |
Optional[str]
|
Database name for database secrets. |
host |
Optional[str]
|
Database host for database secrets. |
port |
Optional[int]
|
Database port for database secrets. |
user |
Optional[str]
|
Database username (alternative to key_id for database types). |
password |
Optional[str]
|
Database password (alternative to secret for database types). |
options |
dict[str, Any]
|
Additional key-value options for the secret. |
SemanticDefaultsConfig
¶
Bases: BaseModel
Default configuration for a semantic model.
Provides default settings for query builders and dashboards, such as the primary time dimension and default measures.
Attributes:
| Name | Type | Description |
|---|---|---|
time_dimension |
Optional[str]
|
Default time dimension name. |
primary_measure |
Optional[str]
|
Default primary measure name. |
default_filters |
list[dict[str, Any]]
|
Optional list of default filters. |
SemanticDimensionConfig
¶
Bases: BaseModel
Definition of a semantic dimension.
A dimension represents a business attribute that maps to an expression over the base view of a semantic model.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique dimension name within the semantic model. |
expression |
str
|
SQL expression referencing columns from the base view. |
label |
Optional[str]
|
Human-readable display name. |
description |
Optional[str]
|
Optional detailed description. |
type |
Optional[str]
|
Optional data type hint (time, number, string, boolean, date). |
time_grains |
list[str]
|
Optional list of time grains for time dimensions. |
SemanticJoinConfig
¶
Bases: BaseModel
Definition of a semantic join.
A join defines a relationship to another view for enriching the semantic model with additional data, typically dimension tables.
Attributes:
| Name | Type | Description |
|---|---|---|
to_view |
str
|
Name of an existing view in the views section to join to. |
type |
str
|
Join type (inner, left, right, full). |
on_condition |
str
|
SQL join condition expression. |
SemanticMeasureConfig
¶
Bases: BaseModel
Definition of a semantic measure.
A measure represents a business metric that typically involves aggregation or calculation over the base view of a semantic model.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique measure name within the semantic model. |
expression |
str
|
SQL expression (often aggregated) over the base view. |
label |
Optional[str]
|
Human-readable display name. |
description |
Optional[str]
|
Optional detailed description. |
type |
Optional[str]
|
Optional data type hint. |
SemanticModelConfig
¶
Bases: BaseModel
Definition of a semantic model.
A semantic model provides business-friendly metadata on top of an existing Duckalog view, defining dimensions and measures for analytics and BI use cases.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique semantic model name within the config. |
base_view |
str
|
Name of an existing view in the views section. |
dimensions |
list[SemanticDimensionConfig]
|
Optional list of dimension definitions. |
measures |
list[SemanticMeasureConfig]
|
Optional list of measure definitions. |
joins |
list[SemanticJoinConfig]
|
Optional list of join definitions to other views. |
defaults |
Optional[SemanticDefaultsConfig]
|
Optional default configuration for query builders. |
label |
Optional[str]
|
Human-readable display name. |
description |
Optional[str]
|
Optional detailed description. |
tags |
list[str]
|
Optional list of classification tags. |
ViewConfig
¶
Bases: BaseModel
Definition of a single catalog view.
A view can be defined in several ways:
1. Inline SQL: Using the sql field with raw SQL text
2. SQL File: Using sql_file to reference external SQL files
3. SQL Template: Using sql_template for parameterized SQL files
4. Data Source: Using source + required fields for direct data access
5. Source + SQL: Using source for data access plus sql for transformations
For data sources, the required fields depend on the source type:
- Parquet/Delta: uri field is required
- Iceberg: Either uri OR both catalog and table
- DuckDB/SQLite/Postgres: Both database and table are required
When using SQL with a data source, the SQL will be applied as a transformation over the data from the specified source.
Additional metadata fields such as description and tags do not affect
SQL generation but are preserved for documentation and tooling.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique view name within the config. |
schema |
str
|
Optional schema name for organizing views in DuckDB schemas. |
sql |
Optional[str]
|
Raw SQL text defining the view body. |
sql_file |
Optional[SQLFileReference]
|
Direct reference to a SQL file. |
sql_template |
Optional[SQLFileReference]
|
Reference to a SQL template file with variable substitution. |
source |
Optional[EnvSource]
|
Source type (e.g. |
uri |
Optional[str]
|
URI for file- or table-based sources (Parquet/Delta/Iceberg). |
database |
Optional[str]
|
Attachment alias for attached-database sources. |
table |
Optional[str]
|
Table name (optionally schema-qualified) for attached sources. |
catalog |
Optional[str]
|
Iceberg catalog name for catalog-based Iceberg views. |
options |
dict[str, Any]
|
Source-specific options passed to scan functions. |
description |
Optional[str]
|
Optional human-readable description of the view. |
tags |
list[str]
|
Optional list of tags for classification. |
connect_to_catalog(config_path, database_path=None, read_only=False, force_rebuild=False)
¶
Create a CatalogConnection instance that manages DuckDB connections with state restoration.
This is the primary entry point for working with Duckalog catalogs in Python.
It returns a :class:CatalogConnection instance which lazily establishes
a DuckDB connection and automatically restores session state (pragmas,
attachments, etc.) and performs incremental view updates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
str
|
Path to the YAML/JSON configuration file. |
required |
database_path
|
str | None
|
Optional database path override. |
None
|
read_only
|
bool
|
Open the connection in read-only mode for safety. |
False
|
force_rebuild
|
bool
|
If True, all views will be recreated even if they exist. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
CatalogConnection
|
class: |
Example
Using as a context manager::
from duckalog import connect_to_catalog
with connect_to_catalog("catalog.yaml") as catalog:
conn = catalog.get_connection()
result = conn.execute("SELECT * FROM my_view").fetchall()
Using for persistent state management::
catalog = connect_to_catalog("catalog.yaml")
conn1 = catalog.get_connection()
# ... later ...
conn2 = catalog.get_connection() # Returns the same connection
catalog.close()
connect_to_catalog_cm(config_path, database_path=None, read_only=False, force_rebuild=False)
¶
Context manager that yields an active, state-restored DuckDB connection.
This provides the same state restoration and incremental update benefits
as :class:CatalogConnection, but yields the raw DuckDB connection object
for convenience in simple scripts.
Usage::
from duckalog import connect_to_catalog_cm
with connect_to_catalog_cm("catalog.yaml") as conn:
data = conn.execute("SELECT * FROM users").fetchall()
print(f"Found {len(data)} records")
# Connection automatically closed here
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
str
|
Path to the YAML/JSON configuration file. |
required |
database_path
|
str | None
|
Optional database path override. |
None
|
read_only
|
bool
|
Open the connection in read-only mode for safety. |
False
|
force_rebuild
|
bool
|
If True, all views will be recreated. |
False
|
Yields:
| Type | Description |
|---|---|
Generator[DuckDBPyConnection]
|
An active DuckDB connection with catalog state restored. |
create_config_template(format='yaml', output_path=None, database_name='analytics_catalog.duckdb', project_name='my_analytics_project')
¶
Generate a basic, valid Duckalog configuration template.
This function creates a configuration template with sensible defaults and educational example content that demonstrates key Duckalog features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
format
|
ConfigFormat
|
Output format for the configuration ('yaml' or 'json'). Defaults to 'yaml'. |
'yaml'
|
output_path
|
str | None
|
Optional path to write the configuration file. If provided, the template is written to this path and the content is also returned as a string. |
None
|
database_name
|
str
|
Name for the DuckDB database file. |
'analytics_catalog.duckdb'
|
project_name
|
str
|
Name used in comments to personalize the template. |
'my_analytics_project'
|
Returns:
| Type | Description |
|---|---|
str
|
The generated configuration as a string. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If format is not 'yaml' or 'json'. |
ConfigError
|
If the generated template fails validation. |
OSError
|
If writing to output_path fails. |
Example
Generate a YAML template::
template = create_config_template(format='yaml')
print(template)
Generate and save a JSON template::
template = create_config_template(
format='json',
output_path='my_config.json'
)
generate_all_views_sql(config, include_secrets=False)
¶
Generate SQL for all views in a configuration.
The output includes a descriptive header with the config version followed
by a CREATE OR REPLACE VIEW statement for each view in the order they
appear in the configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
Config
|
The validated :class: |
required |
include_secrets
|
bool
|
Whether to include CREATE SECRET statements for secrets. |
False
|
Returns:
| Type | Description |
|---|---|
str
|
A multi-statement SQL script suitable for use as a catalog definition. |
generate_secret_sql(secret)
¶
Generate CREATE SECRET statement for a DuckDB secret.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
secret
|
SecretConfig
|
Secret configuration object. |
required |
Returns:
| Type | Description |
|---|---|
str
|
SQL CREATE SECRET statement. |
generate_sql(config_path)
¶
Generate a full SQL script from a config file.
This is a convenience wrapper around :func:load_config and
:func:generate_all_views_sql that does not connect to DuckDB.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
str
|
Path to the YAML/JSON configuration file. |
required |
Returns:
| Type | Description |
|---|---|
str
|
A multi-statement SQL script containing |
str
|
statements for all configured views. |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the configuration file is invalid. |
Example
from duckalog import generate_sql sql = generate_sql("catalog.yaml") print("CREATE VIEW" in sql) True
generate_view_sql(view)
¶
Generate a CREATE OR REPLACE VIEW statement for a single view.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
view
|
ViewConfig
|
The :class: |
required |
Returns:
| Type | Description |
|---|---|
str
|
A single SQL statement that creates or replaces the view. |
load_config(path, load_sql_files=True, sql_file_loader=None, resolve_paths=True, filesystem=None, load_dotenv=True)
¶
Load, interpolate, and validate a Duckalog configuration file.
quote_ident(value)
¶
Quote a SQL identifier using double quotes.
This helper wraps a string in double quotes and escapes any embedded double quotes according to SQL rules.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str
|
Identifier to quote (for example, a view or column name). |
required |
Returns:
| Type | Description |
|---|---|
str
|
The identifier wrapped in double quotes. |
Example
quote_ident("events") '"events"'
quote_literal(value)
¶
Quote a SQL string literal using single quotes.
This helper wraps a string in single quotes and escapes any embedded single quotes according to SQL rules.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str
|
String literal to quote (for example, a file path, secret, or connection string). |
required |
Returns:
| Type | Description |
|---|---|
str
|
The string wrapped in single quotes with proper escaping. |
Example
quote_literal("path/to/file.parquet") "'path/to/file.parquet'" quote_literal("user's data") "'user''s data'"
render_options(options)
¶
Render a mapping of options into scan-function arguments.
The resulting string is suitable for appending to a *_scan function
call. Keys are sorted alphabetically to keep output deterministic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
options
|
dict[str, Any]
|
Mapping of option name to value (str, bool, int, or float). |
required |
Returns:
| Type | Description |
|---|---|
str
|
A string that starts with |
str
|
|
str
|
are provided. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If a value has a type that cannot be rendered safely. |
validate_config(config_path)
¶
Validate a configuration file without touching DuckDB.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
str
|
Path to the YAML/JSON configuration file. |
required |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the configuration file is missing, malformed, or does not satisfy the schema and interpolation rules. |
Example
from duckalog import validate_config validate_config("catalog.yaml") # raises on invalid config
validate_generated_config(content, format='yaml')
¶
Validate that generated configuration content can be loaded successfully.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
content
|
str
|
Configuration content as string. |
required |
format
|
ConfigFormat
|
Format of the content ('yaml' or 'json'). |
'yaml'
|
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the configuration cannot be loaded or is invalid. |
New Connection Management Architecture¶
CatalogConnection Class¶
The new CatalogConnection class is the primary interface for working with Duckalog catalogs. It provides intelligent connection management and session state restoration.
from duckalog import connect_to_catalog
# Create connection
conn = connect_to_catalog("catalog.yaml")
# The connection manages:
# - Automatic catalog building
# - Connection pooling and reuse
# - Session state restoration
# - Incremental view updates
Key Methods¶
# Get raw DuckDB connection when needed
duckdb_conn = conn.get_connection()
# Clean up the connection and resources
conn.close()
Context Manager Usage¶
# Recommended usage pattern
with connect_to_catalog("catalog.yaml") as conn:
# Connection automatically managed
# Pragmas, settings, and attachments restored
# Cleanup performed on exit
result = conn.get_connection().execute("SELECT COUNT(*) FROM users").fetchone()
print(f"Total users: {result[0]}")
# Connection automatically cleaned up
Session State Features¶
with connect_to_catalog("catalog.yaml") as conn:
# Session state is automatically restored:
# - DuckDB pragmas
# - Session settings
# - Database attachments
# - Persistent secrets
# - Custom functions
# All views are available
for table in conn.get_connection().execute("SHOW TABLES").fetchall():
print(f"Available view: {table[0]}")
New Architecture Patterns¶
Configuration Loading with Dependency Injection¶
The new load_config() function from duckalog.config.api provides enhanced configuration loading with dependency injection support:
from duckalog.config.api import load_config
# Basic usage (same as before)
config = load_config("catalog.yaml")
# Advanced usage with custom dependencies
config = load_config(
"catalog.yaml",
sql_file_loader=custom_loader,
filesystem=custom_fs,
load_dotenv=False
)
Request-Scoped Caching Context¶
The new architecture provides request-scoped caching for performance optimization:
from duckalog.config.resolution.imports import request_cache_scope
# Use request context for caching
with request_cache_scope() as context:
# All config loading within this scope shares caches
config1 = load_config("config1.yaml")
config2 = load_config("config2.yaml") # Reuses resolved imports
Import Patterns¶
Recommended Import Patterns¶
# New modular imports (recommended)
from duckalog.config.api import load_config
from duckalog.config.resolution.imports import request_cache_scope
from duckalog.config.resolution.base import ImportResolver, ImportContext
from duckalog.config.resolution.env import EnvProcessor, DefaultEnvProcessor
# High-level convenience functions (unchanged)
from duckalog import (
generate_sql,
validate_config,
connect_to_catalog,
connect_to_catalog_cm,
connect_to_catalog,
)
# Configuration models
from duckalog.config import (
Config,
DuckDBConfig,
AttachmentsConfig,
ViewConfig,
SecretConfig,
)
Backward Compatibility Imports¶
All existing import patterns continue to work:
# Legacy imports (still supported)
from duckalog import load_config # Re-exports from config.api
from duckalog.config import load_config # Re-exports from config.api
# Direct module imports
from duckalog.config.api import load_config # New location
Advanced Usage Examples¶
Custom Dependency Injection¶
from duckalog.config.resolution.base import ImportResolver, ImportContext
from duckalog.config.resolution.imports import DefaultImportResolver
# Create custom resolver
class CustomImportResolver(ImportResolver):
def __init__(self, custom_config):
self.custom_config = custom_config
def resolve(self, config_data: dict, context: ImportContext) -> dict:
# Custom import resolution logic
return resolved_config
# Use with load_config
with request_cache_scope() as request_context:
resolver = CustomImportResolver(custom_config)
# Integrate with the loading process
Connection Management Patterns¶
from duckalog import connect_to_catalog
# Pattern 1: Single connection with multiple queries
conn = connect_to_catalog("catalog.yaml")
users = conn.get_connection().execute("SELECT * FROM users").fetchall()
orders = conn.get_connection().execute("SELECT * FROM orders").fetchall()
# Pattern 2: Context manager for automatic cleanup
with connect_to_catalog("catalog.yaml") as conn:
# Process data with automatic connection management
results = conn.get_connection().execute("""
SELECT u.username, COUNT(o.order_id) as order_count
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.username
""").fetchall()
# Pattern 3: Multiple connections with connection pooling
with connect_to_catalog("analytics.yaml") as analytics_conn:
with connect_to_catalog("warehouse.yaml") as warehouse_conn:
# Both connections managed independently
analytics_data = analytics_conn.get_connection().execute("SELECT * FROM metrics").fetchall()
warehouse_data = warehouse_conn.get_connection().execute("SELECT * FROM inventory").fetchall()
Performance Optimization¶
# Incremental builds - only missing views created
conn = connect_to_catalog("catalog.yaml") # Fast for subsequent runs
# Force rebuild when needed
conn = connect_to_catalog("catalog.yaml", force_rebuild=True)
# Connection reuse across operations
with connect_to_catalog("catalog.yaml") as conn:
# Reuse same connection for multiple queries
results = conn.get_connection().execute("COMPLEX_QUERY").fetchall()
# Connection reused, no reconnection overhead
Custom Filesystem Implementations¶
import fsspec
from duckalog import load_config
# Create custom filesystem
custom_fs = fsspec.filesystem(
"myprotocol",
key="access-key",
secret="secret-key"
)
# Load config with custom filesystem
config = load_config(
"myprotocol://bucket/config.yaml",
filesystem=custom_fs
)
# Use with new connection management
from duckalog import connect_to_catalog
conn = connect_to_catalog(
"myprotocol://bucket/config.yaml",
filesystem=custom_fs
)
Performance Optimization with Caching¶
from duckalog.config.resolution.imports import request_cache_scope
from duckalog.config.api import load_config
# Batch load configurations efficiently
configs = []
with request_cache_scope() as context:
for config_file in config_files:
config = load_config(config_file)
configs.append(config)
# All configs share resolved imports and environment variables
Custom Environment Processing¶
from duckalog.config.resolution.env import EnvProcessor, DefaultEnvProcessor
class CustomEnvProcessor(EnvProcessor):
def process(self, config_data: dict, load_dotenv: bool = True) -> dict:
# Custom environment variable processing
# e.g., add custom prefixes, special handling
return processed_config
# Use with custom environment processor
env_processor = CustomEnvProcessor()
Migration from Old Patterns¶
From Legacy load_config¶
# Old pattern (still works)
from duckalog import load_config
config = load_config("config.yaml")
# New pattern with same interface
from duckalog.config.api import load_config
config = load_config("config.yaml")
# New pattern with additional options
config = load_config(
"config.yaml",
resolve_paths=True,
filesystem=custom_fs,
load_dotenv=False
)
Custom Implementation Patterns¶
# For extending the configuration system
from duckalog.config.resolution.base import ImportResolver, ImportContext
from duckalog.config.api import _load_config_from_local_file
class MyCustomResolver(ImportResolver):
def resolve(self, config_data: dict, context: ImportContext) -> dict:
# Your custom logic here
return config_data
# Use in custom loading scenarios
def my_custom_loader(path: str, **kwargs):
return _load_config_from_local_file(
path=path,
sql_file_loader=kwargs.get("sql_file_loader"),
resolve_paths=kwargs.get("resolve_paths", True),
filesystem=kwargs.get("filesystem"),
load_dotenv=kwargs.get("load_dotenv", True),
)
Configuration Models¶
Core Models¶
The configuration models are now organized in the duckalog.config package:
duckalog.config.models
¶
Configuration models and schema definitions for Duckalog catalogs.
This module contains all Pydantic models used for configuration validation and typing. These models form the foundation of the configuration system and must not import from other config modules to avoid circular dependencies.
AttachmentsConfig
¶
Bases: BaseModel
Collection of attachment configurations.
Attributes:
| Name | Type | Description |
|---|---|---|
duckdb |
list[DuckDBAttachment]
|
DuckDB attachment entries. |
sqlite |
list[SQLiteAttachment]
|
SQLite attachment entries. |
postgres |
list[PostgresAttachment]
|
Postgres attachment entries. |
duckalog |
list[DuckalogAttachment]
|
Duckalog config attachment entries. |
Config
¶
Bases: BaseModel
Top-level Duckalog configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
version |
int
|
Positive integer describing the config schema version. |
duckdb |
DuckDBConfig
|
DuckDB session and connection settings. |
views |
list[ViewConfig]
|
List of view definitions to create in the catalog. |
attachments |
AttachmentsConfig
|
Optional attachments to external databases. |
iceberg_catalogs |
list[IcebergCatalogConfig]
|
Optional Iceberg catalog definitions. |
semantic_models |
list[SemanticModelConfig]
|
Optional semantic model definitions for business metadata. |
imports |
Union[list[Union[str, ImportEntry]], SelectiveImports]
|
Optional list of additional config files to import and merge. Can be a simple list of paths (backward compatible) or a SelectiveImports object for advanced options like section-specific imports, override behavior, and glob patterns. |
env_files |
list[str]
|
Optional list of custom .env file patterns to load. Supports patterns like ['.env', '.env.local', '.env.production']. Files are loaded in order with later files overriding earlier ones. Defaults to ['.env'] for backward compatibility. |
loader_settings |
LoaderSettings
|
Optional settings for the configuration loader. |
DuckDBAttachment
¶
Bases: BaseModel
Configuration for attaching another DuckDB database.
Attributes:
| Name | Type | Description |
|---|---|---|
alias |
str
|
Alias under which the database will be attached. |
path |
str
|
Filesystem path to the DuckDB database file. |
read_only |
bool
|
Whether the attachment should be opened in read-only mode.
Defaults to |
DuckDBConfig
¶
Bases: BaseModel
DuckDB connection and session settings.
Attributes:
| Name | Type | Description |
|---|---|---|
database |
str
|
Path to the DuckDB database file. Defaults to |
install_extensions |
list[str]
|
Names of extensions to install before use. |
load_extensions |
list[str]
|
Names of extensions to load in the session. |
pragmas |
list[str]
|
SQL statements (typically |
settings |
Optional[Union[str, list[str]]]
|
DuckDB SET statements executed after pragmas. Can be a single string or list of strings. |
secrets |
list[SecretConfig]
|
List of secret definitions for external services and databases. |
DuckalogAttachment
¶
Bases: BaseModel
Configuration for attaching another Duckalog catalog config.
Attributes:
| Name | Type | Description |
|---|---|---|
alias |
str
|
Alias under which the child catalog will be attached. |
config_path |
str
|
Path to the child Duckalog config file. |
database |
Optional[str]
|
Optional override for the child's database file path. |
read_only |
bool
|
Whether the attachment should be opened in read-only mode.
Defaults to |
IcebergCatalogConfig
¶
Bases: BaseModel
Configuration for an Iceberg catalog.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Catalog name referenced by Iceberg views. |
catalog_type |
str
|
Backend type (for example, |
uri |
Optional[str]
|
Optional URI used by certain catalog types. |
warehouse |
Optional[str]
|
Optional warehouse location for catalog data. |
options |
dict[str, Any]
|
Additional catalog-specific options. |
ImportEntry
¶
Bases: BaseModel
A single import entry with optional override behavior.
Attributes:
| Name | Type | Description |
|---|---|---|
path |
str
|
Path to the configuration file to import. |
override |
bool
|
Whether this import can override values from earlier imports or the main config. If False, only fills in missing fields without overwriting existing values. Defaults to True. |
LoaderSettings
¶
Bases: BaseModel
Settings for the configuration loader.
Attributes:
| Name | Type | Description |
|---|---|---|
concurrency_enabled |
bool
|
Whether to enable parallel I/O for imports and SQL files. |
max_threads |
Optional[int]
|
Maximum number of threads for parallel I/O. Defaults to None (uses default ThreadPoolExecutor behavior). |
PostgresAttachment
¶
Bases: BaseModel
Configuration for attaching a Postgres database.
Attributes:
| Name | Type | Description |
|---|---|---|
alias |
str
|
Alias used inside DuckDB to reference the Postgres database. |
host |
str
|
Hostname or IP address of the Postgres server. |
port |
int
|
TCP port of the Postgres server. |
database |
str
|
Database name to connect to. |
user |
str
|
Username for authentication. |
password |
str
|
Password for authentication. |
sslmode |
Optional[str]
|
Optional SSL mode (for example, |
options |
dict[str, Any]
|
Extra key/value options passed to the attachment clause. |
SQLFileReference
¶
Bases: BaseModel
Reference to SQL content in an external file.
Attributes:
| Name | Type | Description |
|---|---|---|
path |
str
|
Path to the SQL file (relative or absolute). |
variables |
Optional[dict[str, Any]]
|
Dictionary of variables for template substitution. |
as_template |
bool
|
Whether to process the file content as a template. |
SQLiteAttachment
¶
Bases: BaseModel
Configuration for attaching a SQLite database.
Attributes:
| Name | Type | Description |
|---|---|---|
alias |
str
|
Alias under which the SQLite database will be attached. |
path |
str
|
Filesystem path to the SQLite |
SecretConfig
¶
Bases: BaseModel
Configuration for a DuckDB secret.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
SecretType
|
Secret type (s3, azure, gcs, http, postgres, mysql). |
name |
Optional[str]
|
Optional name for the secret (defaults to type if not provided). |
provider |
SecretProvider
|
Secret provider (config or credential_chain). |
persistent |
bool
|
Whether to create a persistent secret. Defaults to False. |
scope |
Optional[str]
|
Optional scope prefix for the secret. |
key_id |
Optional[str]
|
Access key ID or username for authentication. |
secret |
Optional[str]
|
Secret key or password for authentication. |
region |
Optional[str]
|
Geographic region for cloud services. |
endpoint |
Optional[str]
|
Custom endpoint URL for cloud services. |
connection_string |
Optional[str]
|
Full connection string for databases. |
tenant_id |
Optional[str]
|
Azure tenant ID for authentication. |
account_name |
Optional[str]
|
Azure storage account name. |
client_id |
Optional[str]
|
Azure client ID for authentication. |
client_secret |
Optional[str]
|
Azure client secret for authentication. |
service_account_key |
Optional[str]
|
GCS service account key. |
json_key |
Optional[str]
|
GCS JSON key. |
bearer_token |
Optional[str]
|
HTTP bearer token for authentication. |
header |
Optional[str]
|
HTTP header for authentication. |
database |
Optional[str]
|
Database name for database secrets. |
host |
Optional[str]
|
Database host for database secrets. |
port |
Optional[int]
|
Database port for database secrets. |
user |
Optional[str]
|
Database username (alternative to key_id for database types). |
password |
Optional[str]
|
Database password (alternative to secret for database types). |
options |
dict[str, Any]
|
Additional key-value options for the secret. |
SelectiveImports
¶
Bases: BaseModel
Section-specific imports for targeted configuration merging.
Each field represents a section of the Config that can have targeted imports. Values can be either simple paths (strings) or ImportEntry objects with options.
SemanticDefaultsConfig
¶
Bases: BaseModel
Default configuration for a semantic model.
Provides default settings for query builders and dashboards, such as the primary time dimension and default measures.
Attributes:
| Name | Type | Description |
|---|---|---|
time_dimension |
Optional[str]
|
Default time dimension name. |
primary_measure |
Optional[str]
|
Default primary measure name. |
default_filters |
list[dict[str, Any]]
|
Optional list of default filters. |
SemanticDimensionConfig
¶
Bases: BaseModel
Definition of a semantic dimension.
A dimension represents a business attribute that maps to an expression over the base view of a semantic model.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique dimension name within the semantic model. |
expression |
str
|
SQL expression referencing columns from the base view. |
label |
Optional[str]
|
Human-readable display name. |
description |
Optional[str]
|
Optional detailed description. |
type |
Optional[str]
|
Optional data type hint (time, number, string, boolean, date). |
time_grains |
list[str]
|
Optional list of time grains for time dimensions. |
SemanticJoinConfig
¶
Bases: BaseModel
Definition of a semantic join.
A join defines a relationship to another view for enriching the semantic model with additional data, typically dimension tables.
Attributes:
| Name | Type | Description |
|---|---|---|
to_view |
str
|
Name of an existing view in the views section to join to. |
type |
str
|
Join type (inner, left, right, full). |
on_condition |
str
|
SQL join condition expression. |
SemanticMeasureConfig
¶
Bases: BaseModel
Definition of a semantic measure.
A measure represents a business metric that typically involves aggregation or calculation over the base view of a semantic model.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique measure name within the semantic model. |
expression |
str
|
SQL expression (often aggregated) over the base view. |
label |
Optional[str]
|
Human-readable display name. |
description |
Optional[str]
|
Optional detailed description. |
type |
Optional[str]
|
Optional data type hint. |
SemanticModelConfig
¶
Bases: BaseModel
Definition of a semantic model.
A semantic model provides business-friendly metadata on top of an existing Duckalog view, defining dimensions and measures for analytics and BI use cases.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique semantic model name within the config. |
base_view |
str
|
Name of an existing view in the views section. |
dimensions |
list[SemanticDimensionConfig]
|
Optional list of dimension definitions. |
measures |
list[SemanticMeasureConfig]
|
Optional list of measure definitions. |
joins |
list[SemanticJoinConfig]
|
Optional list of join definitions to other views. |
defaults |
Optional[SemanticDefaultsConfig]
|
Optional default configuration for query builders. |
label |
Optional[str]
|
Human-readable display name. |
description |
Optional[str]
|
Optional detailed description. |
tags |
list[str]
|
Optional list of classification tags. |
ViewConfig
¶
Bases: BaseModel
Definition of a single catalog view.
A view can be defined in several ways:
1. Inline SQL: Using the sql field with raw SQL text
2. SQL File: Using sql_file to reference external SQL files
3. SQL Template: Using sql_template for parameterized SQL files
4. Data Source: Using source + required fields for direct data access
5. Source + SQL: Using source for data access plus sql for transformations
For data sources, the required fields depend on the source type:
- Parquet/Delta: uri field is required
- Iceberg: Either uri OR both catalog and table
- DuckDB/SQLite/Postgres: Both database and table are required
When using SQL with a data source, the SQL will be applied as a transformation over the data from the specified source.
Additional metadata fields such as description and tags do not affect
SQL generation but are preserved for documentation and tooling.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique view name within the config. |
schema |
str
|
Optional schema name for organizing views in DuckDB schemas. |
sql |
Optional[str]
|
Raw SQL text defining the view body. |
sql_file |
Optional[SQLFileReference]
|
Direct reference to a SQL file. |
sql_template |
Optional[SQLFileReference]
|
Reference to a SQL template file with variable substitution. |
source |
Optional[EnvSource]
|
Source type (e.g. |
uri |
Optional[str]
|
URI for file- or table-based sources (Parquet/Delta/Iceberg). |
database |
Optional[str]
|
Attachment alias for attached-database sources. |
table |
Optional[str]
|
Table name (optionally schema-qualified) for attached sources. |
catalog |
Optional[str]
|
Iceberg catalog name for catalog-based Iceberg views. |
options |
dict[str, Any]
|
Source-specific options passed to scan functions. |
description |
Optional[str]
|
Optional human-readable description of the view. |
tags |
list[str]
|
Optional list of tags for classification. |
Import Resolution Models¶
duckalog.config.resolution.base
¶
Environment Processing Models¶
duckalog.config.resolution.env
¶
Environment processing helpers with request-scoped caching.
This module extracts .env discovery/merging from the legacy loader and exposes it behind the EnvProcessor protocol for dependency injection.
Error Handling¶
The new architecture provides enhanced error handling with specific exception types:
Configuration Errors¶
duckalog.errors
¶
Unified exception hierarchy for Duckalog.
This module provides a consistent exception hierarchy for all Duckalog errors, ensuring that all library exceptions derive from a common base class and can be caught uniformly by users.
CircularImportError
¶
ConfigError
¶
Bases: DuckalogError
Configuration-related errors.
This exception is raised when a catalog configuration cannot be read, parsed, interpolated, or validated according to the Duckalog schema.
Typical error conditions include:
- The config file does not exist or cannot be read.
- The file is not valid YAML/JSON.
- Required fields are missing or invalid.
- An environment variable placeholder cannot be resolved.
DuckalogError
¶
Bases: Exception
Base exception for all Duckalog errors.
This is the root of the exception hierarchy. All Duckalog-specific exceptions should inherit from this class to provide a unified error handling interface for users of the library.
Users can catch this class to handle any Duckalog-specific error, or catch more specific subclasses for targeted error handling.
DuplicateNameError
¶
EngineError
¶
Bases: DuckalogError
Engine-level error raised during catalog builds.
This exception wraps lower-level DuckDB errors, such as failures to connect to the database, attach external systems, or execute generated SQL statements.
ImportError
¶
ImportFileNotFoundError
¶
ImportValidationError
¶
PathResolutionError
¶
Bases: ConfigError
Raised when path resolution fails due to security or access issues.
This exception is a subclass of ConfigError since path resolution issues are typically configuration-related problems.
RemoteConfigError
¶
Bases: ConfigError
Error raised when remote configuration loading fails.
This exception wraps lower-level errors from remote storage systems when attempting to load configuration files from locations like S3, GCS, Azure Blob Storage, or HTTP endpoints.
SQLFileEncodingError
¶
SQLFileError
¶
Bases: ConfigError
Base exception for SQL file-related errors.
This exception is raised when SQL file operations fail, such as when a referenced SQL file cannot be found, read, or processed.
SQLFileNotFoundError
¶
SQLFilePermissionError
¶
SQLFileSizeError
¶
SQLTemplateError
¶
Import Resolution Errors¶
duckalog.config.resolution.imports
¶
Import resolution extracted from the legacy loader with DI hooks.
RequestContext
dataclass
¶
Context for request-scoped caching and state management.
Performance Considerations¶
Caching Benefits¶
- Request-Scoped Caching: Shared resolution across multiple config loads
- Import Resolution Caching: Avoid re-processing the same imports
- Environment Variable Caching: Reuse resolved environment values
- Path Resolution Caching: Cache normalized paths for reuse
Memory Management¶
- Automatic Cache Cleanup: Request scopes automatically clear caches
- Import Chain Tracking: Prevent circular dependencies with minimal overhead
- Efficient Context Management: Lightweight context objects for tracking
Security Features¶
Enhanced Path Security¶
The new architecture maintains and enhances the existing security features:
- Relative Path Resolution: All paths resolved relative to config location
- Security Boundary Validation: Prevent directory traversal attacks
- Cross-Platform Support: Consistent behavior across operating systems
- Remote URI Support: Secure handling of remote configuration sources
Dependency Injection Security¶
- Controlled Interface: Safe injection points for custom implementations
- Validation Integration: Custom resolvers participate in validation
- Context Isolation: Each request maintains isolated state
- Audit Trail: Track configuration resolution through contexts
Testing Support¶
Mockable Interfaces¶
All new interfaces are designed for easy testing:
from unittest.mock import Mock
from duckalog.config.resolution.base import ImportResolver
# Create mock resolver for testing
mock_resolver = Mock(spec=ImportResolver)
mock_resolver.resolve.return_value = test_config
# Test with mocked dependencies
with request_cache_scope() as context:
# Use mock resolver in tests
pass
Test Utilities¶
# Test with temporary configurations
from tempfile import NamedTemporaryFile
from duckalog.config.api import load_config
with NamedTemporaryFile(mode='w', suffix='.yaml') as f:
f.write(test_config_content)
f.flush()
config = load_config(f.name)
Extensibility Points¶
Custom Import Resolvers¶
Extend the import resolution system:
class DatabaseImportResolver(ImportResolver):
"""Load imports from database instead of files."""
def resolve(self, config_data: dict, context: ImportContext) -> dict:
# Implement database-based import loading
pass
Custom Environment Processors¶
Add custom environment handling:
class VaultEnvProcessor(EnvProcessor):
"""Load secrets from vault systems."""
def process(self, config_data: dict, load_dotenv: bool = True) -> dict:
# Implement vault integration
pass
Custom Filesystem Integration¶
Add support for new protocols: