PROMETHEUS40 cmd

Prometheus / PromQL / Alertmanager チート集

prometheus.yml、service discovery、relabel、PromQL の rate・集計・関数、recording/alert rule、Alertmanager ルーティングを整理したチート集。

更新日: 2026-05-29
参照範囲: 公式ドキュメント / man page / 主要ベンダーCLI
対象実装: 主要 Linux / BSD / ネットワーク機器 CLI の一般的な実装
免責: OS とバージョン差分は実環境で確認してください。

このチートシートの使いどころ

Prometheus / PromQL / Alertmanager チート集は、prometheus.yml、service discovery、relabel、PromQL の rate・集計・関数、recording/alert rule、Alertmanager ルーティングを整理したチート集。対象は公式ドキュメント、man page、主要ベンダー CLI で確認できる現行の一般的な実装です。カテゴリはprometheus.yml、PromQL 基本、PromQL 集計・関数、recording-alert rule、Alertmanagerを中心に、40件のコマンドや値を用途別に引けます。障害調査、設定変更前の確認、作業メモ作成で、Prometheus、PromQL、Alertmanager、monitoring、監視に関連する操作を短時間で探すために使います。

40 / 40

scrape_configs static

静的 target を scrape する。

Example

scrape_configs:
- job_name: node
  static_configs:
  - targets: ["node1.example.com:9100", "node2.example.com:9100"]

relabel_configs keep

ラベル条件で target を残す。

Example

relabel_configs:
- source_labels: [__meta_kubernetes_namespace]
  regex: prod
  action: keep

relabel_configs replace

meta label を通常 label に移す。

Example

relabel_configs:
- source_labels: [__address__]
  target_label: instance
  regex: "(.+):9100"

metric_relabel_configs drop

不要メトリクスを取り込み前に捨てる。

Example

metric_relabel_configs:
- source_labels: [__name__]
  regex: go_memstats_.*
  action: drop

file_sd_configs

file service discovery を使う。

Example

file_sd_configs:
- files: ["/etc/prometheus/targets/*.json"]
  refresh_interval: 1m

kubernetes_sd_configs

Kubernetes Pod を discovery する。

Example

kubernetes_sd_configs:
- role: pod

ec2_sd_configs

EC2 instance を discovery する。

Example

ec2_sd_configs:
- region: ap-northeast-1
  port: 9100

honor_labels

exporter 側 label を優先する。

Example

honor_labels: true

up

target の scrape 成功可否を確認する。

Example

up{job="node"} == 0

instant vector

現在値ベクトルを選択する。

Example

node_memory_MemAvailable_bytes{instance="node1:9100"}

range vector

5 分範囲を関数へ渡す。

Example

http_requests_total{job="api"}[5m]

scalar

単一値へ変換する。

Example

scalar(count(up{job="node"}))

rate

カウンタの秒あたり増加率を出す。

Example

rate(http_requests_total{job="api"}[5m])

increase

期間内の増加量を出す。

Example

increase(http_requests_total{status=~"5.."}[1h])

irate

短期的な瞬間レートを出す。

Example

irate(node_network_receive_bytes_total{device="eth0"}[2m])

delta

ゲージの差分を出す。

Example

delta(node_filesystem_avail_bytes{mountpoint="/"}[30m])

predict_linear

将来値を線形予測する。

Example

predict_linear(node_filesystem_avail_bytes{mountpoint="/"}[6h], 24 * 3600) < 0

sum by

job 単位で集計する。

Example

sum by (job) (rate(http_requests_total[5m]))

avg without

instance を除いて平均する。

Example

avg without(instance) (node_load1)

histogram_quantile

95 パーセンタイルを計算する。

Example

histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))

topk

上位 N 件を抽出する。

Example

topk(10, sum by (pod) (rate(container_cpu_usage_seconds_total[5m])))

bottomk

下位 N 件を抽出する。

Example

bottomk(5, node_filesystem_avail_bytes{mountpoint="/"})

absent

系列が存在しないことを検知する。

Example

absent(up{job="blackbox", instance="https://example.com"})

clamp_min

下限で値を丸める。

Example

clamp_min(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes, 0)

label_replace

ラベルを正規表現で作る。

Example

label_replace(up, "host", "$1", "instance", "([^:]+):.*")

vector(0)

欠損時の 0 埋めに使う。

Example

sum(rate(optional_metric_total[5m])) OR vector(0)

recording rule

頻出クエリを事前計算する。

Example

groups:
- name: api-recording
  rules:
  - record: job:http_requests:rate5m
    expr: sum by (job) (rate(http_requests_total[5m]))

alerting rule

5xx 率でアラートを出す。

Example

groups:
- name: api-alerts
  rules:
  - alert: High5xxRate
    expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05
    for: 10m

for clause

条件継続時間を指定する。

Example

for: 15m

annotations

通知文に label/value を埋め込む。

Example

annotations:
  summary: "{{ $labels.job }} で 5xx が多い"
  description: "現在値 {{ $value }}"

promtool check rules

rule ファイルを検証する。

Example

promtool check rules /etc/prometheus/rules/*.yml

alertmanager route

通知経路を設定する。

Example

route:
  receiver: default
  group_by: [alertname, cluster]
  routes:
  - matchers: [severity="critical"]
    receiver: pager

alertmanager receiver

Slack receiver を定義する。

Example

receivers:
- name: default
  slack_configs:
  - channel: "#alerts"
    api_url: "https://hooks.slack.com/services/T000/B000/XXX"

inhibit_rules

重大アラート中に低優先アラートを抑止する。

Example

inhibit_rules:
- source_matchers: [severity="critical"]
  target_matchers: [severity="warning"]
  equal: [alertname, instance]

amtool silence add

メンテナンス silence を作成する。

Example

amtool silence add alertname=High5xxRate instance=api-1 --duration=2h --comment="deploy"

amtool alert query

発火中アラートを確認する。

Example

amtool alert query --alertmanager.url=http://alertmanager:9093

scrape_duration_seconds

scrape が遅い target を探す。

Example

topk(10, scrape_duration_seconds)

kube_pod_status_phase

Pod phase を見る代表メトリクス。

Example

sum by (namespace, phase) (kube_pod_status_phase{phase="Pending"})

reload config

設定を HTTP reload する。

Example

curl -X POST http://prometheus.example.com:9090/-/reload

targets API

target 状態を API で確認する。

Example

curl -s http://prometheus.example.com:9090/api/v1/targets | jq .data.activeTargets[0]

Command	Description	Example
`scrape_configs static`	静的 target を scrape する。	`scrape_configs: - job_name: node static_configs: - targets: ["node1.example.com:9100", "node2.example.com:9100"]`
`relabel_configs keep`	ラベル条件で target を残す。	`relabel_configs: - source_labels: [__meta_kubernetes_namespace] regex: prod action: keep`
`relabel_configs replace`	meta label を通常 label に移す。	`relabel_configs: - source_labels: [__address__] target_label: instance regex: "(.+):9100"`
`metric_relabel_configs drop`	不要メトリクスを取り込み前に捨てる。	`metric_relabel_configs: - source_labels: [__name__] regex: go_memstats_.* action: drop`
`file_sd_configs`	file service discovery を使う。	`file_sd_configs: - files: ["/etc/prometheus/targets/*.json"] refresh_interval: 1m`
`kubernetes_sd_configs`	Kubernetes Pod を discovery する。	`kubernetes_sd_configs: - role: pod`
`ec2_sd_configs`	EC2 instance を discovery する。	`ec2_sd_configs: - region: ap-northeast-1 port: 9100`
`honor_labels`	exporter 側 label を優先する。	`honor_labels: true`
`up`	target の scrape 成功可否を確認する。	`up{job="node"} == 0`
`instant vector`	現在値ベクトルを選択する。	`node_memory_MemAvailable_bytes{instance="node1:9100"}`
`range vector`	5 分範囲を関数へ渡す。	`http_requests_total{job="api"}[5m]`
`scalar`	単一値へ変換する。	`scalar(count(up{job="node"}))`
`rate`	カウンタの秒あたり増加率を出す。	`rate(http_requests_total{job="api"}[5m])`
`increase`	期間内の増加量を出す。	`increase(http_requests_total{status=~"5.."}[1h])`
`irate`	短期的な瞬間レートを出す。	`irate(node_network_receive_bytes_total{device="eth0"}[2m])`
`delta`	ゲージの差分を出す。	`delta(node_filesystem_avail_bytes{mountpoint="/"}[30m])`
`predict_linear`	将来値を線形予測する。	`predict_linear(node_filesystem_avail_bytes{mountpoint="/"}[6h], 24 * 3600) < 0`
`sum by`	job 単位で集計する。	`sum by (job) (rate(http_requests_total[5m]))`
`avg without`	instance を除いて平均する。	`avg without(instance) (node_load1)`
`histogram_quantile`	95 パーセンタイルを計算する。	`histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))`
`topk`	上位 N 件を抽出する。	`topk(10, sum by (pod) (rate(container_cpu_usage_seconds_total[5m])))`
`bottomk`	下位 N 件を抽出する。	`bottomk(5, node_filesystem_avail_bytes{mountpoint="/"})`
`absent`	系列が存在しないことを検知する。	`absent(up{job="blackbox", instance="https://example.com"})`
`clamp_min`	下限で値を丸める。	`clamp_min(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes, 0)`
`label_replace`	ラベルを正規表現で作る。	`label_replace(up, "host", "$1", "instance", "([^:]+):.*")`
`vector(0)`	欠損時の 0 埋めに使う。	`sum(rate(optional_metric_total[5m])) OR vector(0)`
`recording rule`	頻出クエリを事前計算する。	`groups: - name: api-recording rules: - record: job:http_requests:rate5m expr: sum by (job) (rate(http_requests_total[5m]))`
`alerting rule`	5xx 率でアラートを出す。	`groups: - name: api-alerts rules: - alert: High5xxRate expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05 for: 10m`
`for clause`	条件継続時間を指定する。	`for: 15m`
`annotations`	通知文に label/value を埋め込む。	`annotations: summary: "{{ $labels.job }} で 5xx が多い" description: "現在値 {{ $value }}"`
`promtool check rules`	rule ファイルを検証する。	`promtool check rules /etc/prometheus/rules/*.yml`
`alertmanager route`	通知経路を設定する。	`route: receiver: default group_by: [alertname, cluster] routes: - matchers: [severity="critical"] receiver: pager`
`alertmanager receiver`	Slack receiver を定義する。	`receivers: - name: default slack_configs: - channel: "#alerts" api_url: "https://hooks.slack.com/services/T000/B000/XXX"`
`inhibit_rules`	重大アラート中に低優先アラートを抑止する。	`inhibit_rules: - source_matchers: [severity="critical"] target_matchers: [severity="warning"] equal: [alertname, instance]`
`amtool silence add`	メンテナンス silence を作成する。	`amtool silence add alertname=High5xxRate instance=api-1 --duration=2h --comment="deploy"`
`amtool alert query`	発火中アラートを確認する。	`amtool alert query --alertmanager.url=http://alertmanager:9093`
`scrape_duration_seconds`	scrape が遅い target を探す。	`topk(10, scrape_duration_seconds)`
`kube_pod_status_phase`	Pod phase を見る代表メトリクス。	`sum by (namespace, phase) (kube_pod_status_phase{phase="Pending"})`
`reload config`	設定を HTTP reload する。	`curl -X POST http://prometheus.example.com:9090/-/reload`
`targets API`	target 状態を API で確認する。	`curl -s http://prometheus.example.com:9090/api/v1/targets \| jq .data.activeTargets[0]`