Back to Blog
PromQL Metrics

PromQL for Beginners: 10 Queries You'll Actually Use

PromQL's power comes from a handful of patterns. Master these 10 and you'll handle 90% of real-world monitoring use cases.

AX

aiAxonIQ Team

Engineering at aiAxonIQ

Feb 3, 20268 min read

PromQL looks intimidating, but real-world usage collapses into a small set of patterns. Learn these ten and you can read almost any dashboard and write almost any alert. Each one below is copy-paste-adaptable โ€” swap in your own metric names.

1. Request rate โ€” rate() on a counter

Counters only go up, so their raw value is useless. rate() turns them into per-second throughput:

rate(http_requests_total[5m])

The [5m] window smooths the result; shorter windows react faster but are noisier. Rule of thumb: at least 4ร— your scrape interval.

2. Error ratio โ€” dividing two rates

The single most useful alerting expression in PromQL:

sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))

A ratio is robust to traffic changes โ€” 50 errors/sec means nothing without knowing whether that's out of 100 or 100,000.

3. Latency percentiles โ€” histogram_quantile

histogram_quantile(0.95,
  sum by (le) (rate(http_request_duration_seconds_bucket[5m])))

The le label is the histogram bucket boundary and must survive the sum by โ€” forget it and you get nothing. Averages hide tail pain; percentiles are what users feel.

4. Group by label โ€” sum by

sum by (service) (rate(http_requests_total[5m]))

The aggregation workhorse. sum by (service, status) for a two-dimensional breakdown; without (instance) to keep everything except one label.

5. Top offenders โ€” topk

topk(5, sum by (service) (rate(http_requests_total{status=~"5.."}[5m])))

Perfect for "which five services are throwing the most errors right now" panels โ€” the first thing to glance at during an incident.

6. Growth over a day โ€” increase

increase(payment_failures_total[24h])

increase is rate ร— window: total events over the period instead of per-second. Right tool for "how many failures today" questions.

7. Is it even there? โ€” absent()

absent(up{job="payment-service"})

Fires when the series doesn't exist at all. Without it, a dead service produces no data, no data matches no threshold, and your alerts stay green while production burns. Pair with up == 0 for scrape failures.

8. Saturation โ€” how close to the ceiling

1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)

Utilization tells you what's used; saturation tells you what's left before things break. The same shape works for disk, file descriptors, and connection pools.

9. Smoothing spikes โ€” avg_over_time

avg_over_time(queue_depth[1h])

For gauges that bounce around, the time-smoothed value separates "momentary blip" from "sustained problem" โ€” often the difference between a page and a ticket.

10. Will we run out? โ€” predict_linear

predict_linear(node_filesystem_avail_bytes[6h], 24 * 3600) < 0

Linear extrapolation: "at the current trend, will the disk be full in 24 hours?" Alerting on predicted exhaustion beats alerting on 90%-full โ€” you get hours of runway instead of minutes.

Where to go next

These ten cover the bulk of daily PromQL. Combine them โ€” topk over an error ratio, avg_over_time over saturation โ€” and you're writing expert-level queries from a beginner-sized toolbox. And when you'd rather not hand-write them: aiAxonIQ speaks Prometheus remote-write natively, its natural-language query feature generates expressions like these from plain English, and Prophet-based forecasting handles the trend-prediction use case with seasonality that predict_linear's straight line can't capture.

Thanks for reading!

More articles