API Reference¶
This section documents the core API functions and classes in Duckalog, with a focus on path resolution features.
Core Configuration API¶
load_config()¶
Load, interpolate, and validate a Duckalog configuration file with optional path resolution.
from duckalog import load_config
# Load with path resolution enabled (default)
config = load_config("catalog.yaml")
# Load with explicit path resolution control
config = load_config("catalog.yaml", resolve_paths=True)
config = load_config("catalog.yaml", resolve_paths=False)
# Load without processing SQL files (for validation)
config = load_config("catalog.yaml", load_sql_files=False)
Parameters:
- path (str): Path to YAML or JSON configuration file
- load_sql_files (bool, optional): Whether to load SQL from external files. Default: True
- sql_file_loader (SQLFileLoader, optional): Custom SQL file loader instance
- resolve_paths (bool, optional): Whether to resolve relative paths. Default: True
Returns:
- Config: Validated configuration object
Raises:
- ConfigError: Configuration parsing or validation errors (inherits from DuckalogError)
Path Resolution API¶
Path Detection Functions¶
is_relative_path()¶
Detect if a path is relative based on platform-specific rules.
from duckalog.path_resolution import is_relative_path
# Returns True for relative paths
assert is_relative_path("data/file.parquet")
assert is_relative_path("../shared/data.parquet")
# Returns False for absolute paths and URIs
assert not is_relative_path("/absolute/path/file.parquet")
assert not is_relative_path("s3://bucket/file.parquet")
Parameters:
- path (str): Path string to check
Returns:
- bool: True if path is relative, False otherwise
detect_path_type()¶
Categorize paths by type for processing.
from duckalog.path_resolution import detect_path_type
path_type = detect_path_type("../data/file.parquet") # Returns: "relative"
path_type = detect_path_type("/absolute/file.parquet") # Returns: "absolute"
path_type = detect_path_type("s3://bucket/file.parquet") # Returns: "remote"
Parameters:
- path (str): Path string to analyze
Returns:
- str: One of "relative", "absolute", "remote", "invalid"
Path Resolution Functions¶
resolve_relative_path()¶
Resolve a relative path to an absolute path relative to configuration directory.
from duckalog.path_resolution import resolve_relative_path
from pathlib import Path
config_dir = Path("/project/config")
resolved = resolve_relative_path("data/file.parquet", config_dir)
# Returns: "/project/config/data/file.parquet"
Parameters:
- path (str): Path to resolve (relative or absolute)
- config_dir (Path): Directory containing the configuration file
Returns:
- str: Resolved absolute path
Raises:
- ValueError: If path resolution fails or violates security rules
Security Validation Functions¶
validate_path_security()¶
Validate that resolved paths don't violate security boundaries.
from duckalog.path_resolution import validate_path_security
from pathlib import Path
config_dir = Path("/project/config")
# Safe paths return True
assert validate_path_security("data/file.parquet", config_dir)
assert validate_path_security("../shared/data.parquet", config_dir)
# Dangerous paths return False
assert not validate_path_security("../../../etc/passwd", config_dir)
Parameters:
- path (str): Path to validate (will be resolved if relative)
- config_dir (Path): Directory containing the configuration file
Returns:
- bool: True if path is safe, False otherwise
validate_file_accessibility()¶
Validate that a file path is accessible and readable.
from duckalog.path_resolution import validate_file_accessibility
accessible, error = validate_file_accessibility("/path/to/file.parquet")
if accessible:
print("File is accessible")
else:
print(f"File access error: {error}")
Parameters:
- path (str): File path to validate
Returns:
- tuple[bool, str | None]: (is_accessible, error_message)
Path Normalization Functions¶
normalize_path_for_sql()¶
Normalize a path for use in SQL statements.
from duckalog.path_resolution import normalize_path_for_sql
sql_path = normalize_path_for_sql("/path/to/file.parquet")
# Returns: "'/path/to/file.parquet'"
sql_path = normalize_path_for_sql("/path/file's_data.parquet")
# Returns: "'/path/file''s_data.parquet'" (properly escaped)
Parameters:
- path (str): Absolute path to normalize
Returns:
- str: SQL-safe quoted path
Root-Based Path Security Functions¶
is_within_allowed_roots()¶
Check if a resolved path is within any of the allowed root directories using robust cross-platform validation.
from duckalog.path_resolution import is_within_allowed_roots
from pathlib import Path
config_dir = Path("/project/config")
allowed_roots = [config_dir]
# Check if a path is within allowed roots
is_safe = is_within_allowed_roots("/project/config/data/file.parquet", allowed_roots)
# Returns: True
is_unsafe = is_within_allowed_roots("/etc/passwd", allowed_roots)
# Returns: False
is_traversal_blocked = is_within_allowed_roots("../../../etc/passwd", allowed_roots)
# Returns: False (path traversal blocked)
Parameters:
- candidate_path (str): The path to check (will be resolved to absolute)
- allowed_roots (list[Path]): List of Path objects representing allowed root directories
Returns:
- bool: True if the candidate path is within at least one allowed root, False otherwise
Raises:
- ValueError: If the candidate path cannot be resolved (invalid path)
Security Features:
- Uses Path.resolve() to follow symlinks and get canonical paths
- Uses os.path.commonpath() for robust cross-platform path comparison
- Prevents all forms of path traversal attacks regardless of encoding or separators
- Handles Windows drive letters and UNC paths correctly
SQL Generation API¶
SQL Quoting Functions¶
quote_ident()¶
Quote a SQL identifier using double quotes with proper escaping.
from duckalog import quote_ident
# Quote simple identifiers
assert quote_ident("events") == '"events"'
# Quote identifiers with spaces
assert quote_ident("daily users") == '"daily users"'
# Quote identifiers with quotes (escapes embedded quotes)
assert quote_ident('user "events"') == '"user ""events"""'
Parameters:
- identifier (str): Identifier to quote
Returns:
- str: Identifier wrapped in double quotes with proper escaping
quote_literal()¶
Quote a SQL string literal using single quotes with proper escaping.
from duckalog import quote_literal
# Quote simple strings
assert quote_literal("user's data") == "'user''s data'"
# Quote file paths
assert quote_literal("path/to/file.parquet") == "'path/to/file.parquet'"
# Quote SQL statements
assert quote_literal("SELECT * FROM table") == "'SELECT * FROM table'"
# Quote empty strings
assert quote_literal("") == "''"
Parameters:
- value (str): String literal to quote
Returns:
- str: String wrapped in single quotes with proper escaping
generate_view_sql()¶
Generate a CREATE OR REPLACE VIEW statement for a view configuration.
from duckalog import ViewConfig, generate_view_sql
view = ViewConfig(
name="test_view",
source="duckdb",
database="my_db",
table="users"
)
sql = generate_view_sql(view)
# Returns: CREATE OR REPLACE VIEW "test_view" AS SELECT * FROM "my_db"."users";
Parameters:
- view (ViewConfig): View configuration to generate SQL for
Returns:
- str: SQL statement creating the view with proper quoting
generate_secret_sql()¶
Generate CREATE SECRET statement for a secret configuration with proper escaping.
from duckalog import SecretConfig, generate_secret_sql
secret = SecretConfig(
type="s3",
name="prod_s3",
provider="config",
key_id="user's key", # Contains single quote
secret="secret'with'quotes"
)
sql = generate_secret_sql(secret)
# Returns: CREATE SECRET prod_s3 (TYPE S3, KEY_ID 'user''s key', SECRET 'secret''with''quotes')
Parameters:
- secret (SecretConfig): Secret configuration to generate SQL for
Returns:
- str: SQL statement creating the secret with proper escaping
Note: This function enforces strict type checking for secret options and will raise TypeError for unsupported types (non-bool, int, float, or str values).
Configuration Models¶
Config¶
Top-level configuration model with path resolution integration.
from duckalog import load_config
config = load_config("catalog.yaml")
# Access configuration properties
print(config.version)
print(len(config.views))
# Views have resolved paths when resolution is enabled
for view in config.views:
print(f"View: {view.name}, URI: {view.uri}")
ViewConfig¶
Model for individual view definitions that may contain resolved paths.
from duckalog.config import ViewConfig
# View with relative path (will be resolved during config loading)
view = ViewConfig(
name="events",
source="parquet",
uri="data/events.parquet" # Relative path
)
# View with absolute path (unchanged during resolution)
view = ViewConfig(
name="remote_data",
source="parquet",
uri="s3://bucket/events.parquet" # Remote URI - unchanged
)
AttachmentConfig¶
Models for database attachments with path resolution support.
from duckalog.config import DuckDBAttachment, SQLiteAttachment
# DuckDB attachment with relative path
duckdb_attach = DuckDBAttachment(
alias="reference_db",
path="./databases/reference.duckdb", # Will be resolved
read_only=True
)
# SQLite attachment with relative path
sqlite_attach = SQLiteAttachment(
alias="legacy_db",
path="../legacy/system.db" # Will be resolved
)
Exception Classes¶
Duckalog uses a consistent exception hierarchy based on DuckalogError as the base class for all library exceptions. This provides a unified error handling interface and makes it easy to catch all Duckalog-specific errors.
Base Exception¶
DuckalogError¶
Base exception for all Duckalog-specific errors. You can catch this class to handle any Duckalog-related error, or catch more specific subclasses for targeted error handling.
from duckalog import DuckalogError
try:
config = load_config("catalog.yaml")
build_catalog(config)
except DuckalogError as e:
# Handle any Duckalog-specific error
print(f"Duckalog error: {e}")
Configuration Errors¶
ConfigError¶
Raised for configuration-related errors including parsing, validation, and path resolution failures. Inherits from DuckalogError.
from duckalog import load_config, ConfigError
try:
config = load_config("catalog.yaml")
except ConfigError as e:
print(f"Configuration error: {e}")
Common scenarios:
- Missing required configuration fields
- Invalid YAML/JSON syntax
- Unresolved environment variable placeholders (${env:VAR})
- Invalid view definitions
PathResolutionError¶
Raised specifically when path resolution fails due to security or access issues. Inherits from ConfigError.
from duckalog import PathResolutionError, resolve_relative_path
from pathlib import Path
try:
resolved = resolve_relative_path("../../../etc/passwd", Path("/project/config"))
except PathResolutionError as e:
print(f"Path resolution failed: {e}")
print(f"Original path: {e.original_path}")
print(f"Resolved path: {e.resolved_path}")
RemoteConfigError¶
Raised when remote configuration loading fails. Inherits from ConfigError.
from duckalog import RemoteConfigError, load_config_from_uri
try:
config = load_config_from_uri("s3://bucket/config.yaml")
except RemoteConfigError as e:
print(f"Remote config error: {e}")
SQL File Errors¶
SQLFileError¶
Base exception for SQL file-related errors. Inherits from DuckalogError.
from duckalog import SQLFileError
try:
# Operations that load SQL from files
pass
except SQLFileError as e:
print(f"SQL file error: {e}")
Subclasses for specific SQL file errors:
- SQLFileNotFoundError: Referenced SQL file does not exist
- SQLFilePermissionError: SQL file cannot be read due to permissions
- SQLFileEncodingError: SQL file has invalid encoding
- SQLFileSizeError: SQL file exceeds size limits
- SQLTemplateError: Template processing fails
Engine Errors¶
EngineError¶
Raised during catalog builds when DuckDB operations fail. Inherits from DuckalogError.
from duckalog import EngineError, build_catalog
try:
build_catalog("catalog.yaml")
except EngineError as e:
print(f"Engine error: {e}")
Common scenarios: - DuckDB connection failures - SQL execution errors - Attachment setup failures - Extension installation/loading errors - Secret creation failures
Exception Chaining¶
All Duckalog exceptions support proper exception chaining to preserve the original error context:
from duckalog import EngineError
try:
# Some operation that fails
raise ValueError("Original database connection failed")
except ValueError as exc:
raise EngineError("Failed to connect to DuckDB") from exc
This preserves the full traceback and makes debugging much easier.
Generated API Documentation¶
This section is generated from the Duckalog source code using mkdocstrings.
duckalog
¶
Duckalog public API.
AttachmentsConfig
¶
Bases: BaseModel
Collection of attachment configurations.
Attributes:
| Name | Type | Description |
|---|---|---|
duckdb |
list[DuckDBAttachment]
|
DuckDB attachment entries. |
sqlite |
list[SQLiteAttachment]
|
SQLite attachment entries. |
postgres |
list[PostgresAttachment]
|
Postgres attachment entries. |
duckalog |
list[DuckalogAttachment]
|
Duckalog config attachment entries. |
Config
¶
Bases: BaseModel
Top-level Duckalog configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
version |
int
|
Positive integer describing the config schema version. |
duckdb |
DuckDBConfig
|
DuckDB session and connection settings. |
views |
list[ViewConfig]
|
List of view definitions to create in the catalog. |
attachments |
AttachmentsConfig
|
Optional attachments to external databases. |
iceberg_catalogs |
list[IcebergCatalogConfig]
|
Optional Iceberg catalog definitions. |
semantic_models |
list[SemanticModelConfig]
|
Optional semantic model definitions for business metadata. |
ConfigError
¶
Bases: DuckalogError
Configuration-related errors.
This exception is raised when a catalog configuration cannot be read, parsed, interpolated, or validated according to the Duckalog schema.
Typical error conditions include:
- The config file does not exist or cannot be read.
- The file is not valid YAML/JSON.
- Required fields are missing or invalid.
- An environment variable placeholder cannot be resolved.
DuckDBAttachment
¶
Bases: BaseModel
Configuration for attaching another DuckDB database.
Attributes:
| Name | Type | Description |
|---|---|---|
alias |
str
|
Alias under which the database will be attached. |
path |
str
|
Filesystem path to the DuckDB database file. |
read_only |
bool
|
Whether the attachment should be opened in read-only mode.
Defaults to |
DuckDBConfig
¶
Bases: BaseModel
DuckDB connection and session settings.
Attributes:
| Name | Type | Description |
|---|---|---|
database |
str
|
Path to the DuckDB database file. Defaults to |
install_extensions |
list[str]
|
Names of extensions to install before use. |
load_extensions |
list[str]
|
Names of extensions to load in the session. |
pragmas |
list[str]
|
SQL statements (typically |
settings |
Optional[Literal[str, list[str]]]
|
DuckDB SET statements executed after pragmas. Can be a single string or list of strings. |
secrets |
list[SecretConfig]
|
List of secret definitions for external services and databases. |
EngineError
¶
Bases: DuckalogError
Engine-level error raised during catalog builds.
This exception wraps lower-level DuckDB errors, such as failures to connect to the database, attach external systems, or execute generated SQL statements.
IcebergCatalogConfig
¶
Bases: BaseModel
Configuration for an Iceberg catalog.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Catalog name referenced by Iceberg views. |
catalog_type |
str
|
Backend type (for example, |
uri |
Optional[str]
|
Optional URI used by certain catalog types. |
warehouse |
Optional[str]
|
Optional warehouse location for catalog data. |
options |
dict[str, Any]
|
Additional catalog-specific options. |
PostgresAttachment
¶
Bases: BaseModel
Configuration for attaching a Postgres database.
Attributes:
| Name | Type | Description |
|---|---|---|
alias |
str
|
Alias used inside DuckDB to reference the Postgres database. |
host |
str
|
Hostname or IP address of the Postgres server. |
port |
int
|
TCP port of the Postgres server. |
database |
str
|
Database name to connect to. |
user |
str
|
Username for authentication. |
password |
str
|
Password for authentication. |
sslmode |
Optional[str]
|
Optional SSL mode (for example, |
options |
dict[str, Any]
|
Extra key/value options passed to the attachment clause. |
SQLiteAttachment
¶
Bases: BaseModel
Configuration for attaching a SQLite database.
Attributes:
| Name | Type | Description |
|---|---|---|
alias |
str
|
Alias under which the SQLite database will be attached. |
path |
str
|
Filesystem path to the SQLite |
SecretConfig
¶
Bases: BaseModel
Configuration for a DuckDB secret.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
SecretType
|
Secret type (s3, azure, gcs, http, postgres, mysql). |
name |
Optional[str]
|
Optional name for the secret (defaults to type if not provided). |
provider |
SecretProvider
|
Secret provider (config or credential_chain). |
persistent |
bool
|
Whether to create a persistent secret. Defaults to False. |
scope |
Optional[str]
|
Optional scope prefix for the secret. |
key_id |
Optional[str]
|
Access key ID or username for authentication. |
secret |
Optional[str]
|
Secret key or password for authentication. |
region |
Optional[str]
|
Geographic region for cloud services. |
endpoint |
Optional[str]
|
Custom endpoint URL for cloud services. |
connection_string |
Optional[str]
|
Full connection string for databases. |
tenant_id |
Optional[str]
|
Azure tenant ID for authentication. |
account_name |
Optional[str]
|
Azure storage account name. |
client_id |
Optional[str]
|
Azure client ID for authentication. |
client_secret |
Optional[str]
|
Azure client secret for authentication. |
service_account_key |
Optional[str]
|
GCS service account key. |
json_key |
Optional[str]
|
GCS JSON key. |
bearer_token |
Optional[str]
|
HTTP bearer token for authentication. |
header |
Optional[str]
|
HTTP header for authentication. |
database |
Optional[str]
|
Database name for database secrets. |
host |
Optional[str]
|
Database host for database secrets. |
port |
Optional[int]
|
Database port for database secrets. |
user |
Optional[str]
|
Database username (alternative to key_id for database types). |
password |
Optional[str]
|
Database password (alternative to secret for database types). |
options |
dict[str, Any]
|
Additional key-value options for the secret. |
ViewConfig
¶
Bases: BaseModel
Definition of a single catalog view.
A view can be defined in several ways:
1. Inline SQL: Using the sql field with raw SQL text
2. SQL File: Using sql_file to reference external SQL files
3. SQL Template: Using sql_template for parameterized SQL files
4. Data Source: Using source + required fields for direct data access
5. Source + SQL: Using source for data access plus sql for transformations
For data sources, the required fields depend on the source type:
- Parquet/Delta: uri field is required
- Iceberg: Either uri OR both catalog and table
- DuckDB/SQLite/Postgres: Both database and table are required
When using SQL with a data source, the SQL will be applied as a transformation over the data from the specified source.
Additional metadata fields such as description and tags do not affect
SQL generation but are preserved for documentation and tooling.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Unique view name within the config. |
sql |
Optional[str]
|
Raw SQL text defining the view body. |
sql_file |
Optional[SQLFileReference]
|
Direct reference to a SQL file. |
sql_template |
Optional[SQLFileReference]
|
Reference to a SQL template file with variable substitution. |
source |
Optional[EnvSource]
|
Source type (e.g. |
uri |
Optional[str]
|
URI for file- or table-based sources (Parquet/Delta/Iceberg). |
database |
Optional[str]
|
Attachment alias for attached-database sources. |
table |
Optional[str]
|
Table name (optionally schema-qualified) for attached sources. |
catalog |
Optional[str]
|
Iceberg catalog name for catalog-based Iceberg views. |
options |
dict[str, Any]
|
Source-specific options passed to scan functions. |
description |
Optional[str]
|
Optional human-readable description of the view. |
tags |
list[str]
|
Optional list of tags for classification. |
build_catalog(config_path, db_path=None, dry_run=False, verbose=False, filesystem=None, include_secrets=True)
¶
Build or update a DuckDB catalog from a configuration file.
This function is the high-level entry point used by both the CLI and Python API. It loads the config, optionally performs a dry-run SQL generation, or otherwise connects to DuckDB, sets up attachments and Iceberg catalogs, and creates or replaces configured views.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
str
|
Path to the YAML/JSON configuration file. |
required |
db_path
|
str | None
|
Optional override for |
None
|
dry_run
|
bool
|
If |
False
|
verbose
|
bool
|
If |
False
|
filesystem
|
Any | None
|
Optional pre-configured fsspec filesystem object for remote export authentication. If not provided, default authentication will be used. |
None
|
Returns:
| Type | Description |
|---|---|
str | None
|
The generated SQL script as a string when |
str | None
|
otherwise |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the configuration file is invalid. |
EngineError
|
If connecting to DuckDB or executing SQL fails, or if remote export fails. |
Example
Build a catalog in-place::
from duckalog import build_catalog
build_catalog("catalog.yaml")
Build and export to remote storage::
build_catalog("catalog.yaml", db_path="s3://my-bucket/catalog.duckdb")
Generate SQL without modifying the database::
sql = build_catalog("catalog.yaml", dry_run=True)
print(sql)
connect_and_build_catalog(config_path, database_path=None, dry_run=False, verbose=False, read_only=False, **kwargs)
¶
Build a catalog and create a DuckDB connection in one operation.
This function combines catalog building with connection creation, providing a streamlined workflow for users who want to start working with their catalog immediately after creating it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
str
|
Path to the YAML/JSON configuration file. |
required |
database_path
|
str | None
|
Optional database path override. |
None
|
dry_run
|
bool
|
If True, only validates configuration and returns SQL. If False, builds the catalog and creates a connection. |
False
|
verbose
|
bool
|
Enable verbose logging during build process. |
False
|
read_only
|
bool
|
Open the resulting connection in read-only mode. |
False
|
**kwargs
|
Any
|
Additional keyword arguments. |
{}
|
Returns:
| Type | Description |
|---|---|
DuckDBPyConnection | str | None
|
A DuckDB connection object for immediate use, or SQL string when dry_run=True. |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the configuration file is invalid. |
EngineError
|
If catalog building or connection fails. |
FileNotFoundError
|
If the resulting database file doesn't exist (after build). |
Example
Build catalog and start querying immediately::
conn = connect_and_build_catalog("catalog.yaml")
data = conn.execute("SELECT * FROM important_table").fetchall()
print(f"Found {len(data)} records")
conn.close()
Validate only (dry run)::
sql = connect_and_build_catalog("catalog.yaml", dry_run=True) print("SQL generation completed, no database created")
Custom database path::
conn = connect_and_build_catalog("catalog.yaml", database_path="analytics.db")
print("Connected to custom database: analytics.db")
If dry_run=True, the function only validates the configuration and returns the SQL script
without creating any database files or connections.
connect_to_catalog(config_path, database_path=None, read_only=False)
¶
Connect to an existing DuckDB database created by Duckalog.
This function provides a direct connection to a DuckDB database that was previously created using Duckalog. It simplifies the workflow for users who just want to start querying their catalog database.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
str
|
Path to the YAML/JSON configuration file for an existing catalog. |
required |
database_path
|
str | None
|
Optional database path override. If not provided, uses the path from the configuration. |
None
|
read_only
|
bool
|
Open the connection in read-only mode for safety. |
False
|
Returns:
| Type | Description |
|---|---|
DuckDBPyConnection
|
An active DuckDB connection object ready for query execution. |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the configuration file is invalid. |
FileNotFoundError
|
If the specified database file doesn't exist. |
Error
|
If connection or queries fail. |
Example
Connect to an existing catalog::
from duckalog import connect_to_catalog
conn = connect_to_catalog("catalog.yaml")
# Use the connection for queries
result = conn.execute("SELECT * FROM some_table").fetchall()
conn.close()
With path override::
conn = connect_to_catalog("catalog.yaml", database_path="analytics.db")
With read-only mode::
conn = connect_to_catalog("catalog.yaml", read_only=True)
With context manager::
from duckalog import connect_to_catalog_cm
with connect_to_catalog_cm("catalog.yaml") as conn:
data = conn.execute("SELECT * FROM users").fetchall()
print(f"Found {len(data)} records")
# Connection automatically closed here
connect_to_catalog_cm(config_path, database_path=None, read_only=False)
¶
Context manager version of connect_to_catalog for automatic connection cleanup.
Usage::
from duckalog import connect_to_catalog_cm
with connect_to_catalog_cm("catalog.yaml") as conn:
data = conn.execute("SELECT * FROM users").fetchall()
print(f"Found {len(data)} records")
# Connection is automatically closed here
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
str
|
Path to the YAML/JSON configuration file for an existing catalog. |
required |
database_path
|
str | None
|
Optional database path override. |
None
|
read_only
|
bool
|
Open the connection in read-only mode for safety. |
False
|
Yields:
| Type | Description |
|---|---|
Generator[DuckDBPyConnection]
|
An active DuckDB connection that will be closed automatically. |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the configuration file is invalid. |
FileNotFoundError
|
If the specified database file doesn't exist. |
Error
|
If connection or queries fail. |
create_config_template(format='yaml', output_path=None, database_name='analytics_catalog.duckdb', project_name='my_analytics_project')
¶
Generate a basic, valid Duckalog configuration template.
This function creates a configuration template with sensible defaults and educational example content that demonstrates key Duckalog features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
format
|
ConfigFormat
|
Output format for the configuration ('yaml' or 'json'). Defaults to 'yaml'. |
'yaml'
|
output_path
|
str | None
|
Optional path to write the configuration file. If provided, the template is written to this path and the content is also returned as a string. |
None
|
database_name
|
str
|
Name for the DuckDB database file. |
'analytics_catalog.duckdb'
|
project_name
|
str
|
Name used in comments to personalize the template. |
'my_analytics_project'
|
Returns:
| Type | Description |
|---|---|
str
|
The generated configuration as a string. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If format is not 'yaml' or 'json'. |
ConfigError
|
If the generated template fails validation. |
OSError
|
If writing to output_path fails. |
Example
Generate a YAML template::
template = create_config_template(format='yaml')
print(template)
Generate and save a JSON template::
template = create_config_template(
format='json',
output_path='my_config.json'
)
generate_all_views_sql(config, include_secrets=False)
¶
Generate SQL for all views in a configuration.
The output includes a descriptive header with the config version followed
by a CREATE OR REPLACE VIEW statement for each view in the order they
appear in the configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
Config
|
The validated :class: |
required |
include_secrets
|
bool
|
Whether to include CREATE SECRET statements for secrets. |
False
|
Returns:
| Type | Description |
|---|---|
str
|
A multi-statement SQL script suitable for use as a catalog definition. |
generate_secret_sql(secret)
¶
Generate CREATE SECRET statement for a DuckDB secret.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
secret
|
SecretConfig
|
Secret configuration object. |
required |
Returns:
| Type | Description |
|---|---|
str
|
SQL CREATE SECRET statement. |
generate_sql(config_path)
¶
Generate a full SQL script from a config file.
This is a convenience wrapper around :func:load_config and
:func:generate_all_views_sql that does not connect to DuckDB.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
str
|
Path to the YAML/JSON configuration file. |
required |
Returns:
| Type | Description |
|---|---|
str
|
A multi-statement SQL script containing |
str
|
statements for all configured views. |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the configuration file is invalid. |
Example
from duckalog import generate_sql sql = generate_sql("catalog.yaml") print("CREATE VIEW" in sql) True
generate_view_sql(view)
¶
Generate a CREATE OR REPLACE VIEW statement for a single view.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
view
|
ViewConfig
|
The :class: |
required |
Returns:
| Type | Description |
|---|---|
str
|
A single SQL statement that creates or replaces the view. |
load_config(path, load_sql_files=True, sql_file_loader=None, resolve_paths=True, filesystem=None)
¶
Load, interpolate, and validate a Duckalog configuration file.
This helper is the main entry point for turning a YAML or JSON file into a
validated :class:Config instance. It applies environment-variable
interpolation and enforces the configuration schema.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to a YAML or JSON config file, or a remote URI. |
required |
load_sql_files
|
bool
|
Whether to load and process SQL from external files. If False, SQL file references are left as-is for later processing. |
True
|
sql_file_loader
|
Optional[Any]
|
Optional SQLFileLoader instance for loading SQL files. If None, a default loader will be created. |
None
|
resolve_paths
|
bool
|
Whether to resolve relative paths to absolute paths. If True, relative paths in view URIs and attachment paths will be resolved relative to the config file's directory. For remote configs, this defaults to False. |
True
|
filesystem
|
Optional[Any]
|
Optional fsspec filesystem object to use for remote operations. If provided, this filesystem will be used instead of creating a new one based on URI scheme. Useful for custom authentication or advanced use cases. |
None
|
Returns:
| Type | Description |
|---|---|
Any
|
A validated :class: |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the file cannot be read, is not valid YAML/JSON,
fails schema validation, contains unresolved
|
Example
Load a catalog from catalog.yaml::
from duckalog import load_config
config = load_config("catalog.yaml")
print(len(config.views))
Load a catalog from S3::
config = load_config("s3://my-bucket/configs/catalog.yaml")
print(len(config.views))
Load a catalog with custom filesystem::
import fsspec
fs = fsspec.filesystem("s3", key="key", secret="secret", anon=False)
config = load_config("s3://my-bucket/configs/catalog.yaml", filesystem=fs)
print(len(config.views))
quote_ident(value)
¶
Quote a SQL identifier using double quotes.
This helper wraps a string in double quotes and escapes any embedded double quotes according to SQL rules.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str
|
Identifier to quote (for example, a view or column name). |
required |
Returns:
| Type | Description |
|---|---|
str
|
The identifier wrapped in double quotes. |
Example
quote_ident("events") '"events"'
quote_literal(value)
¶
Quote a SQL string literal using single quotes.
This helper wraps a string in single quotes and escapes any embedded single quotes according to SQL rules.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
str
|
String literal to quote (for example, a file path, secret, or connection string). |
required |
Returns:
| Type | Description |
|---|---|
str
|
The string wrapped in single quotes with proper escaping. |
Example
quote_literal("path/to/file.parquet") "'path/to/file.parquet'" quote_literal("user's data") "'user''s data'"
render_options(options)
¶
Render a mapping of options into scan-function arguments.
The resulting string is suitable for appending to a *_scan function
call. Keys are sorted alphabetically to keep output deterministic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
options
|
dict[str, Any]
|
Mapping of option name to value (str, bool, int, or float). |
required |
Returns:
| Type | Description |
|---|---|
str
|
A string that starts with |
str
|
|
str
|
are provided. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If a value has a type that cannot be rendered safely. |
validate_config(config_path)
¶
Validate a configuration file without touching DuckDB.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config_path
|
str
|
Path to the YAML/JSON configuration file. |
required |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the configuration file is missing, malformed, or does not satisfy the schema and interpolation rules. |
Example
from duckalog import validate_config validate_config("catalog.yaml") # raises on invalid config
validate_generated_config(content, format='yaml')
¶
Validate that generated configuration content can be loaded successfully.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
content
|
str
|
Configuration content as string. |
required |
format
|
ConfigFormat
|
Format of the content ('yaml' or 'json'). |
'yaml'
|
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the configuration cannot be loaded or is invalid. |