Skip to content

How to Migrate from Manual SQL

Migrate existing manual SQL workflows to Duckalog configuration-driven approach while preserving business logic and ensuring consistency.

Problem

You have existing manual SQL workflows (CREATE VIEW statements, SQL files, database scripts) and want to migrate to Duckalog's declarative configuration approach.

Prerequisites

  • Existing SQL knowledge and workflows
  • Understanding of your current data architecture
  • Access to current SQL files and database schemas
  • Basic Duckalog configuration knowledge

Solution

1. Inventory Existing SQL Assets

Catalog your current SQL resources:

# Find all SQL files
find . -name "*.sql" -type f > sql_inventory.txt

# Analyze existing database
duckdb current_database.duckdb -c "
SELECT 
    table_name,
    sql as definition
FROM information_schema.views 
WHERE table_schema = 'main'
ORDER BY table_name;
" > existing_views.sql

# Check for stored procedures
duckdb current_database.duckdb -c "
SELECT 
    routine_name,
    routine_definition
FROM information_schema.routines 
WHERE routine_schema = 'main';
" > existing_routines.sql

2. Create Base Configuration

Start with a minimal Duckalog configuration:

# migration/base.yaml
version: 1

duckdb:
  database: migrated_analytics.duckdb
  # Preserve existing database settings
  pragmas:
    - "SET memory_limit='4GB'"
    - "SET threads=4"

# Start with empty views - will add gradually
views: []

3. Migrate Simple Views First

Convert straightforward views to Duckalog format:

Original SQL

-- existing_views.sql
CREATE VIEW active_users AS
SELECT 
    user_id,
    username,
    email,
    created_at
FROM users 
WHERE status = 'active';

Duckalog Configuration

# migration/views/users.yaml
version: 1
imports:
  - ../base.yaml

views:
  - name: active_users
    sql: |
      SELECT 
          user_id,
          username,
          email,
          created_at
      FROM users 
      WHERE status = 'active'

4. Migrate Complex Views

Handle views with joins, subqueries, and complex logic:

Original Complex SQL

-- existing_analytics.sql
CREATE VIEW user_analytics AS
SELECT 
    u.user_id,
    u.username,
    COUNT(o.order_id) as order_count,
    SUM(o.amount) as total_spent,
    u.created_at
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE u.created_at >= '2023-01-01'
GROUP BY u.user_id, u.username, u.created_at;

Duckalog Configuration

# migration/views/analytics.yaml
version: 1
imports:
  - ../base.yaml

views:
  - name: user_analytics
    sql: |
      SELECT 
          u.user_id,
          u.username,
          COUNT(o.order_id) as order_count,
          SUM(o.amount) as total_spent,
          u.created_at
      FROM users u
      LEFT JOIN orders o ON u.user_id = o.user_id
      WHERE u.created_at >= '2023-01-01'
      GROUP BY u.user_id, u.username, u.created_at

5. Migrate Views with External Data Sources

Convert views that reference external files and databases:

Original SQL with External Files

-- existing_external.sql
CREATE VIEW daily_events AS
SELECT * FROM read_parquet('/data/events/' || DATE_FORMAT(CURRENT_DATE, '%Y-%m-%d') || '.parquet');

Duckalog Configuration

# migration/views/external.yaml
version: 1
imports:
  - ../base.yaml

views:
  - name: daily_events
    source: parquet
    uri: "data/events/daily-*.parquet"
    sql: |
      SELECT * FROM '{{daily_events}}' 
      WHERE event_date = CURRENT_DATE

6. Migrate Database Attachments

Convert multi-database queries to Duckalog attachments:

Original Multi-Database SQL

-- existing_multi_db.sql
CREATE VIEW user_orders AS
SELECT 
    u.user_id,
    u.username,
    o.order_id,
    o.amount,
    o.order_date
FROM main.users u
JOIN legacy.orders o ON u.user_id = o.user_id;

Duckalog Configuration

# migration/attachments.yaml
version: 1
imports:
  - ../base.yaml

duckdb:
  database: migrated_analytics.duckdb

attachments:
  duckdb:
    - alias: legacy
      path: "./databases/legacy.duckdb"
      read_only: true

views:
  - name: user_orders
    sql: |
      SELECT 
          u.user_id,
          u.username,
          o.order_id,
          o.amount,
          o.order_date
      FROM main.users u
      JOIN legacy.orders o ON u.user_id = o.user_id

7. Organize Migrated Configuration

Structure your migrated configuration for maintainability:

migration/
├── base.yaml              # Common settings
├── attachments.yaml         # Database attachments
├── views/
│   ├── users.yaml          # User-related views
│   ├── analytics.yaml      # Analytics views
│   ├── external.yaml       # External data views
│   └── reports.yaml        # Reporting views
└── main.yaml              # Main configuration
# migration/main.yaml
version: 1
imports:
  - ./base.yaml
  - ./attachments.yaml
  - ./views/users.yaml
  - ./views/analytics.yaml
  - ./views/external.yaml
  - ./views/reports.yaml

duckdb:
  database: migrated_analytics.duckdb

8. Incremental Migration Strategy

Migrate gradually to reduce risk:

Phase 1: Foundation (Week 1)

# Phase 1: Basic structure
- Set up base configuration
- Migrate 2-3 simple views
- Test basic functionality

Phase 2: Core Views (Week 2-3)

# Phase 2: Essential views
- Migrate user and order views
- Set up database attachments
- Test data consistency

Phase 3: Advanced Analytics (Week 4-5)

# Phase 3: Complex analytics
- Migrate analytics and reporting views
- Optimize performance
- Full integration testing

Phase 4: External Data (Week 6)

# Phase 4: External integration
- Migrate external data source views
- Set up automated data refresh
- Production deployment preparation

Verification

1. Compare Results

Validate that migrated views produce same results:

# Build original database
duckdb original.duckdb < original_setup.sql

# Build migrated database
duckalog run migration/main.yaml

# Compare view definitions
duckdb original.duckdb -c "SHOW ALL views;" > original_views.txt
duckdb migrated_analytics.duckdb -c "SHOW ALL views;" > migrated_views.txt

# Compare row counts
diff original_views.txt migrated_views.txt

# Test sample queries
duckdb original.duckdb -c "SELECT COUNT(*) FROM user_analytics LIMIT 5;"
duckdb migrated_analytics.duckdb -c "SELECT COUNT(*) FROM user_analytics LIMIT 5;"

2. Data Consistency Checks

Ensure data integrity after migration:

-- Check row counts match
SELECT 
    'user_analytics' as view_name,
    COUNT(*) as row_count
FROM user_analytics
UNION ALL
SELECT 
    'legacy_user_analytics' as view_name,
    COUNT(*) as row_count
FROM legacy_user_analytics;

3. Performance Validation

Compare query performance:

# Time query execution
time duckdb original.duckdb -c "SELECT * FROM user_analytics WHERE user_id = 12345;"
time duckdb migrated_analytics.duckdb -c "SELECT * FROM user_analytics WHERE user_id = 12345;"

# Check query plans
EXPLAIN SELECT * FROM user_analytics WHERE user_id = 12345;

Common Variations

1. Migrate from Stored Procedures

Convert stored procedures to views:

-- Original stored procedure
CREATE PROCEDURE get_user_orders(user_id_param INTEGER)
BEGIN
    SELECT * FROM orders WHERE user_id = user_id_param;
END;
# Migrated as parameterized view
views:
  - name: user_orders
    sql: |
      SELECT * FROM orders 
      WHERE user_id = '{{user_id_param}}'

2. Migrate Dynamic SQL

Handle dynamic SQL generation:

-- Original dynamic SQL
DO $$
BEGIN
    EXECUTE 'CREATE VIEW temp_' || $1 || ' AS SELECT * FROM ' || $1;
END $$;
# Migrated as multiple views
views:
  - name: temp_users
    sql: "SELECT * FROM users"
  - name: temp_orders
    sql: "SELECT * FROM orders"
  - name: temp_products
    sql: "SELECT * FROM products"

3. Migrate ETL Workflows

Convert ETL processes to Duckalog views:

# Original ETL: Extract-Transform-Load
# Migrated: Direct views over transformed data
views:
  - name: cleaned_users
    source: parquet
    uri: "data/processed/cleaned_users.parquet"
  - name: user_metrics
    sql: |
      SELECT 
          user_segment,
          COUNT(*) as user_count,
          AVG(order_value) as avg_order_value
      FROM cleaned_users
      GROUP BY user_segment

Troubleshooting

SQL Syntax Differences

Issue: DuckDB SQL syntax differs from your database

Solution:

-- Check DuckDB syntax
duckdb -c "YOUR_SQL_HERE"

-- Common differences:
-- PostgreSQL: || → DuckDB: ||
-- MySQL: IFNULL → DuckDB: COALESCE
-- SQL Server: GETDATE → DuckDB: CURRENT_DATE

Data Type Mismatches

Issue: Data types not compatible with DuckDB

Solution:

-- Cast to compatible types
SELECT 
    CAST(text_column AS INTEGER) as numeric_value,
    COALESCE(nullable_column, 'default') as safe_column
FROM source_table;

Performance Issues

Issue: Migrated views are slower than original

Solution:

# Optimize DuckDB settings
duckdb:
  database: migrated_analytics.duckdb
  pragmas:
    - "SET memory_limit='8GB'"
    - "SET threads=8"
    - "SET enable_optimizer=true"

Missing Dependencies

Issue: Views depend on objects that don't exist

Solution:

# Ensure proper order with imports
# main.yaml
imports:
  - ./base.yaml
  - ./attachments.yaml    # Attachments first
  - ./views/tables.yaml   # Table views second
  - ./views/analytics.yaml # Analytics views last (depend on tables)

Best Practices

1. Migration Strategy

  • Start small with simple, non-critical views
  • Test thoroughly at each migration phase
  • Maintain parallel systems during transition
  • Document differences between old and new systems

2. Configuration Organization

  • Use imports to modularize configuration
  • Group related views in separate files
  • Maintain base configuration with common settings
  • Follow naming conventions for consistency

3. Validation Approach

  • Automate comparisons between old and new systems
  • Use same test data for both systems
  • Monitor performance throughout migration
  • Get stakeholder sign-off at each phase

4. Risk Mitigation

  • Backup original systems before migration
  • Create rollback procedures for each phase
  • Test with production data in staging environment
  • Plan for cutback if issues arise

Next Steps

After successful migration:

  • Decommission old SQL workflows
  • Train team on Duckalog configuration
  • Establish new deployment procedures
  • Monitor performance and optimize further
  • Document lessons learned for future migrations

You now have a systematic approach to migrate from manual SQL workflows to Duckalog's declarative configuration while preserving business logic and improving maintainability.