kubernetes-guide/content/monitoring/victoriametrics/install-with-operator.md

8.8 KiB
Raw Blame History

使用 operator 部署 VictoriaMetrics

VictoriaMetrics 架构概览

以下是 VictoriaMetrics 的核心组件架构图:

  • vmstorage 负责存储数据,是有状态组件。
  • vmselect 负责查询数据Grafana 添加 Prometheus 数据源时使用 vmselect 地址,查询数据时,vmselect 会调用各个 vmstorage 的接口完成数据的查询。
  • vminsert 负责写入数据,采集器将采集到的数据 "吐到" vminsert,然后 vminsert 会调用各个 vmstorage 的接口完成数据的写入。
  • 各个组件都可以水平伸缩,但不支持自动伸缩,因为伸缩需要修改启动参数。

安装 operator

使用 helm 安装:

helm repo add vm https://victoriametrics.github.io/helm-charts
helm repo update
helm install victoria-operator vm/victoria-metrics-operator

检查 operator 是否成功启动:

$ kubectl -n monitoring get pod
NAME                                                           READY   STATUS    RESTARTS   AGE
victoria-operator-victoria-metrics-operator-7b886f85bb-jf6ng   1/1     Running   0          20s

安装 VMSorage, VMSelect 与 VMInsert

准备 vmcluster.yaml:

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMCluster
metadata:
  name: vmcluster
  namespace: monitoring
spec:
  retentionPeriod: "1" # 默认单位是月,参考 https://docs.victoriametrics.com/Single-server-VictoriaMetrics.html#retention
  vmstorage:
    replicaCount: 2
    storage:
      volumeClaimTemplate:
        metadata:
          name: data
        spec:
          accessModes: [ "ReadWriteOnce" ]
          storageClassName: cbs
          resources:
            requests:
              storage: 100Gi
  vmselect:
    replicaCount: 2
  vminsert:
    replicaCount: 2

安装:

$ kubectl apply -f vmcluster.yaml
vmcluster.operator.victoriametrics.com/vmcluster created

检查组件是否启动成功:

$ kubectl -n monitoring get pod | grep vmcluster
vminsert-vmcluster-77886b8dcb-jqpfw                            1/1     Running   0          20s
vminsert-vmcluster-77886b8dcb-l5wrg                            1/1     Running   0          20s
vmselect-vmcluster-0                                           1/1     Running   0          20s
vmselect-vmcluster-1                                           1/1     Running   0          20s
vmstorage-vmcluster-0                                          1/1     Running   0          20s
vmstorage-vmcluster-1                                          1/1     Running   0          20s

安装 VMAlertmanager 与 VMAlert

准备 vmalertmanager.yaml:

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlertmanager
metadata:
  name: vmalertmanager
  namespace: monitoring
spec:
  replicaCount: 1
  selectAllByDefault: true

安装 VMAlertmanager:

$ kubectl apply -f vmalertmanager.yaml
vmalertmanager.operator.victoriametrics.com/vmalertmanager created

准备 vmalert.yaml:

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAlert
metadata:
  name: vmalert
  namespace: monitoring
spec:
  replicaCount: 1
  selectAllByDefault: true
  notifier:
    url: http://vmalertmanager-vmalertmanager:9093
  resources:
    requests:
      cpu: 10m
      memory: 10Mi
  remoteWrite:
    url: http://vminsert-vmcluster:8480/insert/0/prometheus/
  remoteRead:
    url: http://vmselect-vmcluster:8481/select/0/prometheus/
  datasource:
    url: http://vmselect-vmcluster:8481/select/0/prometheus/

安装 VMAlert:

$ kubectl apply -f vmalert.yaml
vmalert.operator.victoriametrics.com/vmalert created

检查组件是否启动成功:

$ kubectl -n monitoring get pod | grep vmalert
vmalert-vmalert-5987fb9d5f-9wt6l                               2/2     Running   0          20s
vmalertmanager-vmalertmanager-0                                2/2     Running   0          40s

安装 VMAgent

vmagent 用于采集监控数据并发送给 VictoriaMetrics 进行存储,对于腾讯云容器服务上的容器监控数据采集,需要用自定义的 additionalScrapeConfigs 配置,准备自定义采集规则配置文件 scrape-config.yaml:

apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: additional-scrape-configs
  namespace: monitoring
stringData:
  additional-scrape-configs.yaml: |-
    - job_name: "tke-cadvisor"
      scheme: https
      metrics_path: /metrics/cadvisor
      tls_config:
        insecure_skip_verify: true
      authorization:
        credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__meta_kubernetes_node_label_node_kubernetes_io_instance_type]
        regex: eklet
        action: drop
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    - job_name: "tke-kubelet"
      scheme: https
      metrics_path: /metrics
      tls_config:
        insecure_skip_verify: true
      authorization:
        credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__meta_kubernetes_node_label_node_kubernetes_io_instance_type]
        regex: eklet
        action: drop
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    - job_name: "tke-probes"
      scheme: https
      metrics_path: /metrics/probes
      tls_config:
        insecure_skip_verify: true
      authorization:
        credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__meta_kubernetes_node_label_node_kubernetes_io_instance_type]
        regex: eklet
        action: drop
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    - job_name: eks
      honor_timestamps: true
      metrics_path: '/metrics'
      params:
        collect[]: ['ipvs']
        # - 'cpu'
        # - 'meminfo'
        # - 'diskstats'
        # - 'filesystem'
        # - 'load0vg'
        # - 'netdev'
        # - 'filefd'
        # - 'pressure'
        # - 'vmstat'
      scheme: http
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_tke_cloud_tencent_com_pod_type]
        regex: eklet
        action: keep
      - source_labels: [__meta_kubernetes_pod_phase]
        regex: Running
        action: keep
      - source_labels: [__meta_kubernetes_pod_ip]
        separator: ;
        regex: (.*)
        target_label: __address__
        replacement: ${1}:9100
        action: replace
      - source_labels: [__meta_kubernetes_pod_name]
        separator: ;
        regex: (.*)
        target_label: pod
        replacement: ${1}
        action: replace
      - source_labels: [__meta_kubernetes_namespace]
        separator: ;
        regex: (.*)
        target_label: namespace
        replacement: ${1}
        action: replace
      metric_relabel_configs:
      - source_labels: [__name__]
        separator: ;
        regex: (container_.*|pod_.*|kubelet_.*)
        replacement: $1
        action: keep    

再准备 vmagent.yaml:

apiVersion: operator.victoriametrics.com/v1beta1
kind: VMAgent
metadata:
  name: vmagent
  namespace: monitoring
spec:
  selectAllByDefault: true
  additionalScrapeConfigs:
    key: additional-scrape-configs.yaml
    name: additional-scrape-configs
  resources:
    requests:
      cpu: 10m
      memory: 10Mi
  replicaCount: 1
  remoteWrite:
  - url: "http://vminsert-vmcluster:8480/insert/0/prometheus/api/v1/write"

安装:

$ kubectl apply -f scrape-config.yaml
secret/additional-scrape-configs created
$ kubectl apply -f vmagent.yaml
vmagent.operator.victoriametrics.com/vmagent created

检查组件是否启动成功:

$ kubectl -n monitoring get pod | grep vmagent
vmagent-vmagent-cf9bbdbb4-tm4w9                                2/2     Running   0          20s
vmagent-vmagent-cf9bbdbb4-ija8r                                2/2     Running   0          20s

配置 Grafana

添加数据源

VictoriaMetrics 兼容 Prometheus在 Grafana 添加数据源时,使用 Prometheus 类型,如果 Grafana 跟 VictoriaMetrics 安装在同一集群中,可以使用 service 地址,如:

http://vmselect-vmcluster:8481/select/0/prometheus/

添加 Dashboard

VictoriaMetrics 官方提供了几个 Grafana Dashboardid 分别是:

  1. 11176
  2. 12683
  3. 14205

可以将其导入 Grafana: