The storage engine is the most consequential decision in an observability platform. It determines what queries are fast, what retention costs, and what scale breaks you. We evaluated the usual candidates before committing to ClickHouse as the analytical store for logs, metrics, and traces. Here's the reasoning — including the trade-offs we knowingly accepted.

What we evaluated

Elasticsearch is superb at what it was built for: full-text search with relevance scoring and highlighting. But telemetry workloads are dominated by aggregations over time ranges, and inverted indexes pay a heavy price there — index-time overhead on every field, and significant storage amplification compared to columnar layouts.

Loki takes the opposite bet: index only labels, keep log bodies cheap. Storage costs are excellent, but the moment you need to aggregate over high-cardinality fields that aren't labels — "group error counts by customer ID" — you're brute-force scanning, and the label-cardinality ceiling shapes your whole schema.

Tempo is a focused trace store: fetch by trace ID, very cheap object storage. But search and analytics over span attributes were not its strength, and we wanted to run aggregations across traces, not just retrieve them.

Each of these is a good tool. None of them is one engine that handles logs, metrics, *and* traces with heavy analytical queries — and running three separate storage systems is its own tax.

Why columnar storage fits telemetry

Telemetry is wide events: a log line or span carries dozens of attributes, but any given query touches three or four columns. A columnar engine reads only those columns, which is why aggregation queries skip the vast majority of disk I/O that a row store or document store would pay.

Two more properties compound the win. Telemetry columns are extremely repetitive (service names, status codes, level strings), and columnar compression with codecs like ZSTD and delta encoding exploits that — order-of-magnitude compression is typical for this kind of data. And ClickHouse's materialized views let us maintain pre-aggregated 1-minute and 1-hour rollups at insert time, so dashboard queries hit small rollup tables instead of scanning raw events.

Our pipeline

Applications send OTLP to the ingest gateway, which validates, stamps the tenant, and publishes to Kafka. Kafka decouples ingest latency from storage: the gateway answers in milliseconds, consumers batch tens of thousands of rows per insert (ClickHouse strongly prefers large batches), and a backend hiccup means consumer lag instead of dropped data.

A simplified version of the logs table:

CREATE TABLE logs (
  timestamp   DateTime64(3) CODEC(Delta, ZSTD),
  tenant_id   LowCardinality(String),
  service     LowCardinality(String),
  level       LowCardinality(String),
  message     String CODEC(ZSTD),
  attributes  Map(String, String)
)
ENGINE = MergeTree
PARTITION BY toDate(timestamp)
ORDER BY (tenant_id, service, timestamp)
TTL toDateTime(timestamp) + INTERVAL 30 DAY;

The ORDER BY key means a tenant's queries touch only their slice of each part, and the TTL handles retention without batch delete jobs.

One honest caveat: ClickHouse is not a full-text engine, and we didn't pretend otherwise. For BM25-ranked log search with highlighting, we dual-write logs to OpenSearch alongside ClickHouse. Columnar for analytics, inverted index for search — complementary engines, each doing the job it's built for.

The trade-offs we accepted

Mutations are expensive. ClickHouse is built for append-heavy workloads; updates and deletes are background rewrites. Fine for telemetry (immutable by nature), wrong for anything transactional — our control plane stays in PostgreSQL.
Joins need discipline. It's an OLAP engine, not a relational workhorse; we denormalize at write time instead of joining at read time.
Operational learning curve. Parts, merges, and insert batching behave differently from anything in the Postgres/Elastic world; the consumer layer had to be designed around large batches from day one.

Takeaways

Telemetry queries are aggregation-shaped, and columnar storage is built for exactly that shape
Compression on repetitive telemetry columns is where the retention budget is won
Put a queue in front of your storage engine — batching and backpressure solve themselves
Use the right engine per job: ClickHouse for analytics, OpenSearch for full-text, Postgres for the control plane

Why We Chose ClickHouse for Observability Data

What we evaluated

Why columnar storage fits telemetry

Our pipeline

The trade-offs we accepted

Takeaways

OpenTelemetry Collector vs Direct SDK Export: Which Should You Use?

Why OpenTelemetry is the Future of Observability

What is aiAxonIQ? A Complete Guide to the Observability Platform