PromQL looks intimidating, but real-world usage collapses into a small set of patterns. Learn these ten and you can read almost any dashboard and write almost any alert. Each one below is copy-paste-adaptable — swap in your own metric names.

1. Request rate — `rate()` on a counter

Counters only go up, so their raw value is useless. rate() turns them into per-second throughput:

rate(http_requests_total[5m])

The [5m] window smooths the result; shorter windows react faster but are noisier. Rule of thumb: at least 4× your scrape interval.

2. Error ratio — dividing two rates

The single most useful alerting expression in PromQL:

sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))

A ratio is robust to traffic changes — 50 errors/sec means nothing without knowing whether that's out of 100 or 100,000.

3. Latency percentiles — `histogram_quantile`

histogram_quantile(0.95,
  sum by (le) (rate(http_request_duration_seconds_bucket[5m])))

The le label is the histogram bucket boundary and must survive the sum by — forget it and you get nothing. Averages hide tail pain; percentiles are what users feel.

4. Group by label — `sum by`

sum by (service) (rate(http_requests_total[5m]))

The aggregation workhorse. sum by (service, status) for a two-dimensional breakdown; without (instance) to keep everything except one label.

5. Top offenders — `topk`

topk(5, sum by (service) (rate(http_requests_total{status=~"5.."}[5m])))

Perfect for "which five services are throwing the most errors right now" panels — the first thing to glance at during an incident.

6. Growth over a day — `increase`

increase(payment_failures_total[24h])

increase is rate × window: total events over the period instead of per-second. Right tool for "how many failures today" questions.

7. Is it even there? — `absent()`

absent(up{job="payment-service"})

Fires when the series doesn't exist at all. Without it, a dead service produces no data, no data matches no threshold, and your alerts stay green while production burns. Pair with up == 0 for scrape failures.

8. Saturation — how close to the ceiling

1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)

Utilization tells you what's used; saturation tells you what's left before things break. The same shape works for disk, file descriptors, and connection pools.

9. Smoothing spikes — `avg_over_time`

avg_over_time(queue_depth[1h])

For gauges that bounce around, the time-smoothed value separates "momentary blip" from "sustained problem" — often the difference between a page and a ticket.

10. Will we run out? — `predict_linear`

predict_linear(node_filesystem_avail_bytes[6h], 24 * 3600) < 0

Linear extrapolation: "at the current trend, will the disk be full in 24 hours?" Alerting on predicted exhaustion beats alerting on 90%-full — you get hours of runway instead of minutes.

Where to go next

These ten cover the bulk of daily PromQL. Combine them — topk over an error ratio, avg_over_time over saturation — and you're writing expert-level queries from a beginner-sized toolbox. And when you'd rather not hand-write them: aiAxonIQ's metrics platform speaks Prometheus remote-write natively, its natural-language query feature generates expressions like these from plain English, and Prophet-based forecasting handles the trend-prediction use case with seasonality that predict_linear's straight line can't capture.

PromQL for Beginners: 10 Queries You'll Actually Use

1. Request rate — `rate()` on a counter

2. Error ratio — dividing two rates

3. Latency percentiles — `histogram_quantile`

4. Group by label — `sum by`

5. Top offenders — `topk`

6. Growth over a day — `increase`

7. Is it even there? — `absent()`

8. Saturation — how close to the ceiling

9. Smoothing spikes — `avg_over_time`

10. Will we run out? — `predict_linear`

Where to go next

What is aiAxonIQ? A Complete Guide to the Observability Platform

LLM Observability: What to Trace and Why

OpenTelemetry Collector vs Direct SDK Export: Which Should You Use?

PromQL for Beginners: 10 Queries You'll Actually Use

1. Request rate — rate() on a counter

2. Error ratio — dividing two rates

3. Latency percentiles — histogram_quantile

4. Group by label — sum by

5. Top offenders — topk

6. Growth over a day — increase

7. Is it even there? — absent()

8. Saturation — how close to the ceiling

9. Smoothing spikes — avg_over_time

10. Will we run out? — predict_linear

Where to go next

What is aiAxonIQ? A Complete Guide to the Observability Platform

LLM Observability: What to Trace and Why

OpenTelemetry Collector vs Direct SDK Export: Which Should You Use?

1. Request rate — `rate()` on a counter

3. Latency percentiles — `histogram_quantile`

4. Group by label — `sum by`

5. Top offenders — `topk`

6. Growth over a day — `increase`

7. Is it even there? — `absent()`

9. Smoothing spikes — `avg_over_time`

10. Will we run out? — `predict_linear`