Concepts
Understand the core concepts and architecture of DuckDB WASM Adapter.
Overview
DuckDB WASM Adapter bridges the gap between DuckDB's powerful analytical capabilities and modern frontend frameworks. This guide explains the fundamental concepts that make this possible.
Architecture
High-Level Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Framework │ │ Adapter │ │ DuckDB WASM │
│ Components │◄──►│ Layer │◄──►│ Engine │
│ │ │ │ │ │
│ React/Vue/ │ │ • Hooks/ │ │ • SQL Engine │
│ Svelte Apps │ │ Composables │ │ • WASM Runtime │
│ │ │ • State Mgmt │ │ • Web Workers │
│ │ │ • Caching │ │ • Memory Mgmt │
└─────────────────┘ └──────────────────┘ └─────────────────┘Component Layers
- Framework Layer: React hooks, Vue composables, Svelte stores
- Adapter Layer: Shared logic, caching, state management
- Core Layer: Connection management, query execution
- DuckDB Layer: SQL engine, WASM runtime, data processing
Core Concepts
1. In-Browser Database
Unlike traditional web applications that query remote databases, DuckDB WASM Adapter runs a complete analytical database in your browser:
Traditional Architecture:
Browser ──HTTP──► Server ──SQL──► DatabaseDuckDB WASM Architecture:
Browser Components ──SQL──► DuckDB WASM EngineBenefits:
- No network latency
- Works offline
- Privacy (data never leaves browser)
- Scales with client resources
- No server infrastructure needed
2. Reactive Data Flow
The adapter provides reactive interfaces that automatically update when data changes:
// Data flows reactively from database to UI
const users = useQuery('SELECT * FROM users');
// users.data updates automatically when the query result changesReact Flow:
State Change → Hook Re-render → Component UpdateVue Flow:
Ref Change → Computed Re-evaluation → Template UpdateSvelte Flow:
Store Update → Reactive Statement → DOM Update3. Connection Management
Connections are managed automatically with sensible defaults:
// Automatic connection management
<DuckDBProvider autoConnect>
<App />
</DuckDBProvider>Connection Lifecycle:
- Initialization: WASM module loading
- Connection: Database instance creation
- Ready State: Available for queries
- Cleanup: Automatic resource management
4. Query Execution
Queries are executed asynchronously with automatic state management:
const { data, loading, error } = useQuery('SELECT * FROM users');Execution Flow:
- Query validation
- Cache check (if enabled)
- DuckDB execution
- Result processing
- State update
- Cache storage (if applicable)
5. Data Import/Export
Seamless data exchange with various formats:
// Import from files
await importCSV(file, 'table_name');
await importJSON(data, 'table_name');
// Export to formats
const csv = await exportCSV('SELECT * FROM table_name');
const json = await exportJSON('SELECT * FROM table_name');Supported Formats:
- CSV (Comma-separated values)
- JSON (JavaScript Object Notation)
- Parquet (Columnar storage)
- Arrow (In-memory columnar)
Advanced Concepts
1. Web Workers
DuckDB WASM runs in a Web Worker to avoid blocking the main thread:
// Worker usage (enabled by default)
const connection = await createConnection({
worker: true // Recommended for production
});Benefits:
- Non-blocking UI
- Better performance
- Parallel processing
- Memory isolation
Considerations:
- Data serialization overhead
- Debugging complexity
- Browser compatibility
2. Memory Management
DuckDB WASM manages memory efficiently within browser constraints:
// Memory is managed automatically
// But you can influence it through configuration
const connection = await createConnection({
cache: {
maxSize: 50 * 1024 * 1024 // 50MB cache limit
}
});Memory Usage:
- WASM heap for DuckDB
- JavaScript objects for results
- Cache storage
- Temporary data during operations
3. Caching Strategy
Intelligent caching improves performance for repeated queries:
const config = {
cache: {
enabled: true,
evictionStrategy: 'lru', // Least Recently Used
ttl: 300000, // 5 minutes
maxEntries: 100
}
};Cache Types:
- LRU: Least Recently Used
- LFU: Least Frequently Used
- FIFO: First In, First Out
- TTL: Time To Live
4. Query Builder
Type-safe query construction with fluent API:
import { select } from '@northprint/duckdb-wasm-adapter-core';
const query = select('name', 'email')
.from('users')
.where('active', '=', true)
.orderBy('name')
.limit(10)
.build();Benefits:
- SQL injection prevention
- Type safety
- IDE autocomplete
- Reusable patterns
5. Error Handling
Comprehensive error management with specific error types:
try {
await connection.execute('SELECT * FROM non_existent_table');
} catch (error) {
if (error instanceof DuckDBError) {
switch (error.code) {
case 'QUERY_FAILED':
// Handle query errors
break;
case 'CONNECTION_FAILED':
// Handle connection errors
break;
}
}
}Data Models
1. Tables and Schemas
DuckDB supports standard SQL table structures:
-- Create schema
CREATE SCHEMA analytics;
-- Create table with types
CREATE TABLE analytics.users (
id INTEGER PRIMARY KEY,
name VARCHAR NOT NULL,
email VARCHAR UNIQUE,
metadata JSON,
created_at TIMESTAMP DEFAULT NOW()
);2. Data Types
DuckDB supports rich data types:
- Numeric: INTEGER, DECIMAL, DOUBLE
- Text: VARCHAR, TEXT
- Temporal: DATE, TIME, TIMESTAMP
- Structured: JSON, ARRAY, MAP, STRUCT
- Binary: BLOB
- Boolean: BOOLEAN
3. Temporary Data
Perfect for analytics workflows:
-- Temporary tables for processing
CREATE TEMP TABLE processing_data AS
SELECT * FROM raw_data WHERE condition = true;
-- Common table expressions
WITH filtered_users AS (
SELECT * FROM users WHERE active = true
)
SELECT department, COUNT(*)
FROM filtered_users
GROUP BY department;Performance Concepts
1. Query Optimization
DuckDB includes a sophisticated query optimizer:
-- Automatically optimized
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name
HAVING COUNT(o.id) > 5;Optimizations:
- Predicate pushdown
- Join reordering
- Index usage
- Parallel execution
2. Columnar Storage
Efficient for analytical queries:
// Columnar format is optimal for aggregations
const stats = await connection.execute(`
SELECT
department,
AVG(salary) as avg_salary,
COUNT(*) as employee_count
FROM employees
GROUP BY department
`);3. Vectorized Execution
SIMD operations for better performance:
// Vectorized operations on large datasets
const analysis = await connection.execute(`
SELECT
EXTRACT(month FROM date) as month,
SUM(revenue) as total_revenue
FROM sales
WHERE date >= '2024-01-01'
GROUP BY month
ORDER BY month
`);Framework Integration Patterns
React Patterns
// Context for global state
const DuckDBContext = React.createContext();
// Custom hooks for reusable logic
function useUsers() {
return useQuery('SELECT * FROM users');
}
// Higher-order components for data fetching
function withData(Component, query) {
return function WrappedComponent(props) {
const result = useQuery(query);
return <Component {...props} {...result} />;
};
}Vue Patterns
// Plugin for global registration
app.use(DuckDBPlugin);
// Composables for reusable logic
function useUsers() {
return useQuery('SELECT * FROM users');
}
// Provide/inject for dependency injection
provide('duckdb', connection);
const connection = inject('duckdb');Svelte Patterns
// Stores for global state
export const users = writable([]);
// Actions for reusable functionality
export function duckdbAction(node, options) {
// Setup DuckDB functionality
return {
update(newOptions) {},
destroy() {}
};
}
// Context for component communication
setContext('duckdb', connection);
const connection = getContext('duckdb');Security Considerations
1. Client-Side Only
Data processing happens entirely in the browser:
// Data never leaves the browser
const sensitiveData = [
{ ssn: '123-45-6789', name: 'Alice' }
];
await importJSON(sensitiveData, 'private_data');2. SQL Injection Prevention
Always use parameterized queries:
// ✅ Safe: Parameterized query
const users = await connection.execute(
'SELECT * FROM users WHERE id = ?',
[userId]
);
// ❌ Dangerous: String concatenation
const users = await connection.execute(
`SELECT * FROM users WHERE id = ${userId}`
);3. Memory Limits
Browser memory constraints provide natural security boundaries:
// Automatic memory management within browser limits
const config = {
cache: {
maxSize: 100 * 1024 * 1024 // 100MB limit
}
};Best Practices
1. Design for Analytics
DuckDB excels at analytical workloads:
-- ✅ Good: Analytical queries
SELECT
DATE_TRUNC('month', created_at) as month,
department,
COUNT(*) as hires,
AVG(salary) as avg_salary
FROM employees
GROUP BY month, department
ORDER BY month, department;
-- ❌ Less optimal: OLTP-style queries
SELECT * FROM employees WHERE id = 123;2. Batch Operations
Process data in batches for better performance:
// ✅ Good: Batch insert
const batchSize = 1000;
for (let i = 0; i < largeDataset.length; i += batchSize) {
const batch = largeDataset.slice(i, i + batchSize);
await importJSON(batch, 'large_table');
}
// ❌ Inefficient: Row-by-row
for (const row of largeDataset) {
await connection.execute('INSERT INTO table VALUES (?)', [row]);
}3. Use Appropriate Data Types
Choose efficient data types:
-- ✅ Good: Appropriate types
CREATE TABLE events (
id INTEGER,
timestamp TIMESTAMP,
user_id INTEGER,
event_type VARCHAR(50),
properties JSON
);
-- ❌ Inefficient: Everything as TEXT
CREATE TABLE events (
id TEXT,
timestamp TEXT,
user_id TEXT,
event_type TEXT,
properties TEXT
);Common Patterns
1. ETL Workflows
Extract, Transform, Load in the browser:
// Extract
const data = await fetch('/api/raw-data').then(r => r.json());
// Transform
await importJSON(data, 'raw_data');
await connection.execute(`
CREATE TABLE clean_data AS
SELECT
id,
UPPER(name) as name,
CAST(score AS DECIMAL) as score
FROM raw_data
WHERE score IS NOT NULL
`);
// Load (export or use)
const cleanData = await exportJSON('SELECT * FROM clean_data');2. Real-time Dashboards
Combine reactive queries with real-time data:
// Real-time metrics
const metrics = useQuery(
'SELECT COUNT(*) as total, AVG(value) as average FROM events',
undefined,
{ refetchInterval: 5000 } // Update every 5 seconds
);
// Historical trends
const trends = useQuery(`
SELECT
DATE_TRUNC('hour', timestamp) as hour,
COUNT(*) as event_count
FROM events
WHERE timestamp >= NOW() - INTERVAL 24 HOURS
GROUP BY hour
ORDER BY hour
`);3. Interactive Analytics
Build interactive data exploration tools:
function DataExplorer() {
const [filters, setFilters] = useState({});
const [groupBy, setGroupBy] = useState('department');
const query = useMemo(() => {
return buildDynamicQuery(filters, groupBy);
}, [filters, groupBy]);
const { data } = useQuery(query);
return (
<div>
<FilterPanel filters={filters} onChange={setFilters} />
<GroupBySelector value={groupBy} onChange={setGroupBy} />
<ResultsChart data={data} />
</div>
);
}Debugging and Development
1. Debug Mode
Enable detailed logging during development:
const connection = await createConnection({
debug: {
enabled: true,
logQueries: true,
logTiming: true,
slowQueryThreshold: 100
}
});2. Query Profiling
Analyze query performance:
// Profile a complex query
const result = await connection.execute(`
EXPLAIN ANALYZE
SELECT department, COUNT(*), AVG(salary)
FROM employees
GROUP BY department
`);3. Error Handling
Implement comprehensive error handling:
function QueryComponent() {
const { data, loading, error, refetch } = useQuery(complexQuery);
if (error) {
return (
<div>
<h3>Query Failed</h3>
<pre>{error.message}</pre>
<button onClick={refetch}>Retry</button>
</div>
);
}
return <div>{/* Render data */}</div>;
}Understanding these concepts will help you build efficient, maintainable applications with DuckDB WASM Adapter. The reactive patterns, combined with DuckDB's analytical power, enable sophisticated data applications entirely in the browser.