Skip to main content

Command Palette

Search for a command to run...

Comparing Databases: ClickHouse vs DuckDB

Updated
•8 min read
Comparing Databases: ClickHouse vs DuckDB

By creating a simple SQL query Go application.

Introduction

ClickHouse is a beast 🔥; companies like OpenAI, Netflix, and Cloudflare rely on it, but when does it actually make sense to run ClickHouse?

When people talk about analytics databases, the discussion often jumps straight to scale: billions of rows, clusters, replicas, and complex operations. But many real-world systems start much simpler. A service receiving a query and returning results computed over a small to medium dataset is often good enough—at least in the beginning.

In this blog, we’ll take a pragmatic look at ClickHouse and DuckDB through that lens. Why? Because while building my latest application, I used both, and quickly realized that choosing between them requires a clear understanding of where each one truly shines. We’ll build the mental model around a simple Go application that executes SQL queries over a small to medium analytical dataset, and then reason our way up to larger-scale systems. Along the way, we’ll compare ClickHouse Cloud, self-hosted ClickHouse, and DuckDB, using benchmarks, napkin math, and—most importantly—operational reality.

The goal is not to crown a single “winner,” but to understand when each database makes sense and how to evolve your architecture without getting stuck in a corner.

The Baseline: A Simple Go SQL Service

At the core, our application GitHub repo here is intentionally boring:

  • A Go service that exposes a gRPC API

  • Accepts a SQL-like query

  • Reads from a small startup funding dataset.

  • Returns an aggregated result.

    • Example - Top Startups by City and Industry

Before comparing them, here’s what ClickHouse and DuckDB have in common that both have:

  • Columnar storage for efficient I/O and high compression.

  • A preference for bulk writes over many small inserts.

  • Vectorized execution to maximize CPU efficiency.

  • Strong analytical performance for large scans, aggregations, and complex filters like below.

SELECT country, count(*) 
FROM events 
WHERE ts >= now() - INTERVAL 7 DAY 
GROUP BY country;

Let’s Start with ClickHouse

ClickHouse is a leading analytical database powered by its specialized MergeTree Engine. This core technology is specifically optimized to deliver high-speed performance for OLAP workloads.

ClickHouse Offerings

Let’s break down the main ways people actually run ClickHouse.

1. ClickHouse Cloud

ClickHouse Cloud offers a managed experience. This is by far the fastest way to get started with ClickHouse without thinking about operations.

Let’s do some rough math to make the trade-offs concrete.

Napkin Math

  • Smallest production tier

  • Always-on cluster

  • Managed storage and compute

Cost: $655/month (High-Availability (HA) cluster with 3 replicas. Includes daily backups (24-hour cycle) and dedicated resources of 2 vCPUs and 8GiB RAM per instance)

Pros:

  • Minimal setup

  • No cluster management

  • Production-grade from day one

Cons:

  • Less control over internals

  • Overkill for tiny datasets


2. ClickHouse Self-Hosted (Docker / Kubernetes)

Even a single-node setup carries significant operational weight. On Kubernetes, management is best handled via the Altinity ClickHouse Operator, which automates deployment and configuration.

Napkin Math

Assume:

  • 3-node cluster

  • Each node: 2 vCPU, 8 GB RAM

  • Persistent volumes

  • Monitoring + backups

Monthly cost: $500-600/month + Hidden Ops cost (https://calculator.aws/#/estimate).

Note: These are rough estimates; actual costs may vary.

  • Compute + storage

  • Engineering time to operate

Hidden cost:

  • On-call

  • Upgrades

  • Performance tuning

This only makes sense when:

  • Data size is large

  • Query concurrency is high

  • Cost efficiency at scale matters

Pros:

  • Lower cost at scale

Cons:

  • Operational overhead - Scaling requires manual shard rebalancing.

  • Requires a deep understanding of ClickHouse internals

  • Non-trivial upgrades and tuning.

At this point—especially for a small dataset—the ops cost can outweigh the performance benefit.


Benchmark Context: ClickBench

ClickBench is a well-known benchmark for analytical databases. Across multiple runs and environments, ClickHouse consistently ranks at or near the top in:

  • Query latency

  • Throughput

  • Aggregation-heavy workloads

DuckDB also performs surprisingly well, often:

  • Matching or exceeding ClickHouse on single-node workloads

  • Falling behind primarily when concurrency and scale increase

View the complete live benchmark results here.

While chDB (ClickHouse’s in-process engine) offers impressive analytical speed, DuckDB currently remains the more mature and seamless choice due to its highly competitive performance.


Option 3:

Welcome, DuckDB 🦆

DuckDB is an embedded analytical database. Instead of running as a separate service, it runs inside your application.

Key properties:

  • No server to manage.

  • Executes SQL directly on popular data formats like CSV and Parquet

  • Allocates memory dynamically via buffer manager

  • Runs directly in your browser via shell.duckdb.org.

In our Go app, this means:

db, _ := sql.Open("duckdb", "")

rows, _ := db.Query("SELECT ... FROM 'startup_funding.csv'")

That’s it.

No tables. No ingestion. No storage planning.


Why DuckDB Feels Operationally Magical:

Compared to ClickHouse, DuckDB offers:

  • Minimal deployment surface
    There is no separate database service to run — DuckDB lives inside the application process.
    That said, this shifts responsibility to the app layer: memory limits, CPU contention, and resource sizing become application concerns, which can surface as challenges at scale.

  • No explicit schema migration phase
    When querying columnar files (e.g., Parquet on S3), DuckDB can infer schemas at read time.
    This simplicity comes from treating the files as the source of truth, rather than maintaining a managed storage layer.ClickHouse also offers schema inference for external files; it is primarily designed around its managed MergeTree storage. In ClickHouse, schema inference is typically a "first step" before data is committed to a structured internal table for performance. DuckDB, by contrast, maintains a "zero-copy" philosophy where the schema lives and evolves within the files themselves, removing the need for a separate migration phase entirely.

  • No background maintenance processes
    There are no merges, compactions, or cleanup jobs running behind the scenes, largely because DuckDB does not manage a continuously mutating dataset.

  • No cluster coordination
    DuckDB avoids coordination by remaining single-node and embedded.
    Similarly, ClickHouse can avoid coordination when used purely as a query engine over external columnar files — but coordination naturally appears once it is used as a distributed, stateful store.

DuckDB in ECS / a Single Container

DuckDB:

  • Runs in-process

  • No server

  • No background services

  • Reads CSV / Parquet directly

You can deploy:

  • A single ECS task

  • With a few GBs of RAM

  • Vertical Scaling as the dataset grows, plus scale horizontally by spinning up identical stateless container replicas.

Cost: $100-200/m

Note: It’s much cheaper in Lambda because you only pay for the seconds the query is actually running.

This is extremely attractive for early-stage systems.

Works best especially for local development teams with moderate-sized analytical workloads,this operational simplicity translates directly into faster iterations, fewer moving parts, and dramatically less cognitive burden


DuckDB is not a silver bullet. ClickHouse becomes necessary when:

1. Scaling & Architecture

  • ClickHouse: Scales horizontally. Shards data across nodes, separating data capacity from app replicas for predictable costs.

  • DuckDB: Single-node only. Capacity is vertically capped by the machine it runs on.

2. High Concurrency

  • DuckDB Breakdown: Optimized for single-user exploration. High concurrency leads to CPU/memory contention, "noisy neighbor" interference, and exploding latency.

  • ClickHouse Advantage: Designed for high QPS and dashboard traffic. Uses query quotas and distributed execution to handle 100s of concurrent queries smoothly.

3. Strict Latency SLAs

  • DuckDB: Excellent raw speed but poor tail latency (P99). Without admission control or resource isolation, one heavy query can break SLAs for everyone.

  • ClickHouse: Built for predictability. Load shedding, replica distribution, and workload prioritization ensure bounded tail latency for customer-facing products.

4. Multi-Tenant Analytics

  • DuckDB: No native concept of tenants or resource capping. One user's workload can easily monopolize the entire system.

  • ClickHouse: Production-grade isolation. Supports logical isolation and per-tenant quotas to ensure fair performance in SaaS environments.

5. Continuous Ingestion

  • DuckDB: A batch-processing engine, not a streaming one. It struggles with concurrent writers and is generally not part of a real-time ingestion path.

  • ClickHouse: Built for real-time appends (millions of rows/sec). It can ingest from Kafka or CDC pipelines and serve queries on fresh data simultaneously.

6. ClickHouse is like a whole car:

  • CH is a complete data warehouse that provides not just the engine, but the integrated "safety features" like fine-grained access control (RBAC), multi-tenant resource isolation, and centralized governance.

  • While DuckDB is perfect for a driver who needs a fast, portable motor to plug into their own custom rig, ClickHouse is the choice for an organization that needs a secure, multi-passenger vehicle ready to hit the highway.


The Flexible Middle Ground: Columnar Files on Object Storage

One of the most effective architectural decisions you can make is decoupling storage from the query engine. By storing your analytical data as columnar files on object storage (like S3), you create a flexible "Data Lake" that can be accessed by multiple tools simultaneously.

Columnar files, such as Parquet, are a widely used example of this approach today, which also unlocks two key performance optimizations that modern query engines love: predicate pushdown and column pruning. Other Columar files like Vortex, etc, are also emerging.

How DuckDB and ClickHouse fit into this model

Both DuckDB and ClickHouse can operate directly on columnar files stored in S3:

  • DuckDB

    • Reads columnar files (such as Parquet) directly from S3

    • Ideal for ad-hoc and application-embedded analytics

  • ClickHouse

    • Can query columnar files in S3 without loading them into native tables

    • Useful for exploration, backfills, and hybrid architectures


Final Thoughts

ClickHouse and DuckDB are not traditional competitors—they are complementary tools. Starting with DuckDB for its simplistic implementation, speed of development, and connectors to so many different file formats, and graduating to ClickHouse when scale and concurrency demand it.

Personally, running DuckDB for our funding analysis use case locally made development much easier. But in conclusion, when the need evolves from an analytical database to a full-scale data warehousing problem with multi-tenancy, the architecture must shift from "the engine" to "the car."(Ref: here) While DuckDB allows you to query files instantly with zero overhead, ClickHouse provides the essential warehouse features, i.e, built-in security, role-based access control, and robust resource management.

DuckDB is the perfect starting point for rapid, local development, but ClickHouse is the destination when you need a governed, high-concurrency platform to serve many users under one roof.

References:

P

Great read!