update heapster v1.5.1

pull/131/head
jmgao 2018-03-07 14:03:49 +08:00
parent b19600963d
commit 83285d0af6
5 changed files with 292 additions and 166 deletions

View File

@ -0,0 +1,92 @@
## heapster
`Heapster` 监控整个集群资源的过程首先kubelet内置的cAdvisor收集本node节点的容器资源占用情况然后heapster从kubelet提供的api采集节点和容器的资源占用最后heapster 持久化数据存储到`influxdb`中(也可以是其他的存储后端,Google Cloud Monitoring等
`Grafana` 则通过配置数据源指向上述 `influxdb`,从而界面化显示监控信息。
### 部署
访问 [heapster release](https://github.com/kubernetes/heapster)页面下载最新 release 1.4.3,参考目录`heapster-1.3.0/deploy/kube-config/influxdb`因为这个官方release 在k8s1.8.4使用还是有不少问题请在参考的基础上使用本项目提供的yaml文件
1. [grafana](../../manifests/heapster/grafana.yaml)
1. [heapster](../../manifests/heapster/heapster.yaml)
1. [influxdb](../../manifests/heapster/influxdb.yaml)
安装比较简单 `kubectl create -f /etc/ansible/manifests/heapster/`,主要讲一下注意事项
#### grafana.yaml配置
+ 修改`heapster-grafana-amd64`镜像v4.2.0版本修改成 v4.4.3版本,否则 grafana pod无法起来报`CrashLoopBackOff`错误,详见[ISSUE](https://github.com/kubernetes/heapster/issues/1806)
+ 参数`- name: GF_SERVER_ROOT_URL`的设置要根据后续访问grafana的方式确定如果使用 NodePort方式访问必须设置成:`value: /`如果使用apiserver proxy方式必须设置成`value: /api/v1/namespaces/kube-system/services/monitoring-grafana/proxy/`,注意官方文件中预设的`value: /api/v1/proxy/namespaces/kube-system/services/monitoring-grafana/`已经不适合k8s 1.8.0版本了,
+ `kubernetes.io/cluster-service: 'true'``type: NodePort` 根据上述的访问方式设置建议使用apiserver 方式,可以增加安全控制
#### heapster.yaml配置
+ 需要配置 RBAC 把 ServiceAccount `heapster` 与集群预定义的集群角色 `system:heapster` 绑定这样heapster pod才有相应权限去访问 apiserver
#### influxdb.yaml配置
+ influxdb 官方建议使用命令行或 HTTP API 接口来查询数据库,从 v1.1.0 版本开始默认关闭 admin UI这里参考[opsnull](https://github.com/opsnull/follow-me-install-kubernetes-cluster/blob/master/10-%E9%83%A8%E7%BD%B2Heapster%E6%8F%92%E4%BB%B6.md)给出的方法增加ConfigMap配置然后挂载到容器中覆盖默认配置
+ 注意influxdb 这个版本只能使用 NodePort方式访问它的admin UI才能正确连接数据库
### 验证
``` bash
$ kubectl get pods -n kube-system | grep -E 'heapster|monitoring'
heapster-3273315324-tmxbg 1/1 Running 0 11m
monitoring-grafana-2255110352-94lpn 1/1 Running 0 11m
monitoring-influxdb-884893134-3vb6n 1/1 Running 0 11m
```
扩展检查Pods日志
``` bash
$ kubectl logs heapster-3273315324-tmxbg -n kube-system
$ kubectl logs monitoring-grafana-2255110352-94lpn -n kube-system
$ kubectl logs monitoring-influxdb-884893134-3vb6n -n kube-system
```
部署完heapster使用上一步介绍方法查看kubernets dashboard 界面,就可以看到各 Nodes、Pods 的 CPU、内存、负载等利用率曲线图如果 dashboard上还无法看到利用率图使用以下命令重启 dashboard pod
+ 首先删除 `kubectl scale deploy kubernetes-dashboard --replicas=0 -n kube-system`
+ 然后新建 `kubectl scale deploy kubernetes-dashboard --replicas=1 -n kube-system`
### 访问 grafana
#### 1.通过apiserver 访问(建议的方式)
``` bash
kubectl cluster-info | grep grafana
monitoring-grafana is running at https://x.x.x.x:6443/api/v1/namespaces/kube-system/services/monitoring-grafana/proxy
```
请参考上一步 [访问dashboard](dashboard.md)同样的方式,使用证书或者密码认证,访问`https://x.x.x.x:6443/api/v1/namespaces/kube-system/services/monitoring-grafana/proxy`即可,如图可以点击[Home]选择查看 `Cluster` `Pods`的监控图形
![grafana](../../pics/grafana.png)
#### 2.通过NodePort 访问
+ 修改 `Service` 允许 type: NodePort
+ 修改 `Deployment`中参数`- name: GF_SERVER_ROOT_URL`为 `value: /`
+ 如果之前grafana已经运行使用 `kubectl replace --force -f /etc/ansible/manifests/heapster/grafana.yaml` 重启 grafana插件
``` bash
kubectl get svc -n kube-system|grep grafana
monitoring-grafana NodePort 10.68.135.50 <none> 80:5855/TCP 11m
```
然后用浏览器访问 http://NodeIP:5855
### 访问 influxdb
官方建议使用命令行或 HTTP API 接口来查询`influxdb`数据库,如非必要就跳过此步骤
目前根据测试 k8s v1.8.4 使用 NodePort 方式访问 admin 界面后才能正常连接数据库
``` bash
kubectl get svc -n kube-system|grep influxdb
monitoring-influxdb NodePort 10.68.195.193 <none> 8086:3382/TCP,8083:7651/TCP 12h
```
+ 如上例子8083是管理页面端口对外暴露的端口为7651
+ 8086 是数据连接端口对外暴露的端口为3382
使用浏览器访问 http://NodeIP:7651如图在页面的 “Connection Settings” 的 Host 中输入 node IP Port 中输入 3382(由8086对外暴露的端口),点击 “Save” 即可
![influxdb](../../pics/influxdb.png)
[前一篇](dashboard.md) -- [目录](index.md) -- [后一篇](ingress.md)

View File

@ -1,12 +1,14 @@
## heapster ## heapster
本文档基于heapster 1.5.1和k8s 1.9.x旧版文档请看[heapster 1.4.3](heapster.1.4.3.md)
`Heapster` 监控整个集群资源的过程首先kubelet内置的cAdvisor收集本node节点的容器资源占用情况然后heapster从kubelet提供的api采集节点和容器的资源占用最后heapster 持久化数据存储到`influxdb`中(也可以是其他的存储后端,Google Cloud Monitoring等 `Heapster` 监控整个集群资源的过程首先kubelet内置的cAdvisor收集本node节点的容器资源占用情况然后heapster从kubelet提供的api采集节点和容器的资源占用最后heapster 持久化数据存储到`influxdb`中(也可以是其他的存储后端,Google Cloud Monitoring等
`Grafana` 则通过配置数据源指向上述 `influxdb`,从而界面化显示监控信息。 `Grafana` 则通过配置数据源指向上述 `influxdb`,从而界面化显示监控信息。
### 部署 ### 部署
访问 [heapster release](https://github.com/kubernetes/heapster)页面下载最新 release 1.4.3,参考目录`heapster-1.3.0/deploy/kube-config/influxdb`因为这个官方release 在k8s1.8.4使用还是有不少问题,请在参考的基础上使用本项目提供的yaml文件 访问 [heapster release](https://github.com/kubernetes/heapster)页面下载最新 release 1.5.1,参考目录`heapster-1.5.1/deploy/kube-config/influxdb`请在参考官方yaml文件的基础上使用本项目提供的yaml文件
1. [grafana](../../manifests/heapster/grafana.yaml) 1. [grafana](../../manifests/heapster/grafana.yaml)
1. [heapster](../../manifests/heapster/heapster.yaml) 1. [heapster](../../manifests/heapster/heapster.yaml)
@ -16,8 +18,7 @@
#### grafana.yaml配置 #### grafana.yaml配置
+ 修改`heapster-grafana-amd64`镜像v4.2.0版本修改成 v4.4.3版本,否则 grafana pod无法起来报`CrashLoopBackOff`错误,详见[ISSUE](https://github.com/kubernetes/heapster/issues/1806) + 参数`- name: GF_SERVER_ROOT_URL`的设置要根据后续访问grafana的方式确定如果使用 NodePort方式访问必须设置成:`value: /`如果使用apiserver proxy方式必须设置成`value: /api/v1/namespaces/kube-system/services/monitoring-grafana/proxy/`
+ 参数`- name: GF_SERVER_ROOT_URL`的设置要根据后续访问grafana的方式确定如果使用 NodePort方式访问必须设置成:`value: /`如果使用apiserver proxy方式必须设置成`value: /api/v1/namespaces/kube-system/services/monitoring-grafana/proxy/`,注意官方文件中预设的`value: /api/v1/proxy/namespaces/kube-system/services/monitoring-grafana/`已经不适合k8s 1.8.0版本了,
+ `kubernetes.io/cluster-service: 'true'``type: NodePort` 根据上述的访问方式设置建议使用apiserver 方式,可以增加安全控制 + `kubernetes.io/cluster-service: 'true'``type: NodePort` 根据上述的访问方式设置建议使用apiserver 方式,可以增加安全控制
#### heapster.yaml配置 #### heapster.yaml配置
@ -26,8 +27,8 @@
#### influxdb.yaml配置 #### influxdb.yaml配置
+ influxdb 官方建议使用命令行或 HTTP API 接口来查询数据库,从 v1.1.0 版本开始默认关闭 admin UI,这里参考[opsnull](https://github.com/opsnull/follow-me-install-kubernetes-cluster/blob/master/10-%E9%83%A8%E7%BD%B2Heapster%E6%8F%92%E4%BB%B6.md)给出的方法增加ConfigMap配置然后挂载到容器中覆盖默认配置 + influxdb 官方建议使用命令行或 HTTP API 接口来查询数据库,从 v1.1.0 版本开始默认关闭 admin UI, 从 v1.3.3 版本开始已经移除 admin UI 插件如果你因特殊原因需要访问admin UI请使用 v1.1.1 版本并使用configMap 配置开启它。参考[heapster 1.4.3](heapster.1.4.3.md)具体配置yaml文件参考[influxdb v1.1.1](../../manifests/heapster/influxdb-v1.1.1/influxdb.yaml)
+ 注意influxdb 这个版本只能使用 NodePort方式访问它的admin UI才能正确连接数据库
### 验证 ### 验证
@ -71,22 +72,6 @@ monitoring-grafana NodePort 10.68.135.50 <none> 80:5855/TCP
``` ```
然后用浏览器访问 http://NodeIP:5855 然后用浏览器访问 http://NodeIP:5855
### 访问 influxdb
官方建议使用命令行或 HTTP API 接口来查询`influxdb`数据库,如非必要就跳过此步骤
目前根据测试 k8s v1.8.4 使用 NodePort 方式访问 admin 界面后才能正常连接数据库
``` bash
kubectl get svc -n kube-system|grep influxdb
monitoring-influxdb NodePort 10.68.195.193 <none> 8086:3382/TCP,8083:7651/TCP 12h
```
+ 如上例子8083是管理页面端口对外暴露的端口为7651
+ 8086 是数据连接端口对外暴露的端口为3382
使用浏览器访问 http://NodeIP:7651如图在页面的 “Connection Settings” 的 Host 中输入 node IP Port 中输入 3382(由8086对外暴露的端口),点击 “Save” 即可
![influxdb](../../pics/influxdb.png)
[前一篇](dashboard.md) -- [目录](index.md) -- [后一篇](ingress.md) [前一篇](dashboard.md) -- [目录](index.md) -- [后一篇](ingress.md)

View File

@ -39,8 +39,8 @@ spec:
serviceAccountName: heapster serviceAccountName: heapster
containers: containers:
- name: heapster - name: heapster
#image: gcr.io/google_containers/heapster-amd64:v1.3.0 #image: gcr.io/google_containers/heapster-amd64:v1.5.1
image: mirrorgooglecontainers/heapster-amd64:v1.3.0 image: mirrorgooglecontainers/heapster-amd64:v1.5.1
imagePullPolicy: IfNotPresent imagePullPolicy: IfNotPresent
command: command:
- /heapster - /heapster

View File

@ -0,0 +1,190 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: monitoring-influxdb
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
k8s-app: influxdb
template:
metadata:
labels:
task: monitoring
k8s-app: influxdb
spec:
containers:
- name: influxdb
#image: gcr.io/google_containers/heapster-influxdb-amd64:v1.1.1
image: mirrorgooglecontainers/heapster-influxdb-amd64:v1.1.1
volumeMounts:
- mountPath: /data
name: influxdb-storage
- mountPath: /etc/
name: influxdb-config
volumes:
- name: influxdb-storage
emptyDir: {}
- name: influxdb-config
configMap:
name: influxdb-config
---
apiVersion: v1
kind: Service
metadata:
labels:
task: monitoring
# For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
# If you are NOT using this as an addon, you should comment out this line.
# kubernetes.io/cluster-service: 'true'
kubernetes.io/name: monitoring-influxdb
name: monitoring-influxdb
namespace: kube-system
spec:
type: NodePort
ports:
- port: 8086
targetPort: 8086
name: http
- port: 8083
targetPort: 8083
name: admin
selector:
k8s-app: influxdb
---
apiVersion: v1
kind: ConfigMap
metadata:
name: influxdb-config
namespace: kube-system
data:
config.toml: |
reporting-disabled = true
bind-address = ":8088"
[meta]
dir = "/data/meta"
retention-autocreate = true
logging-enabled = true
[data]
dir = "/data/data"
wal-dir = "/data/wal"
query-log-enabled = true
cache-max-memory-size = 1073741824
cache-snapshot-memory-size = 26214400
cache-snapshot-write-cold-duration = "10m0s"
compact-full-write-cold-duration = "4h0m0s"
max-series-per-database = 1000000
max-values-per-tag = 100000
trace-logging-enabled = false
[coordinator]
write-timeout = "10s"
max-concurrent-queries = 0
query-timeout = "0s"
log-queries-after = "0s"
max-select-point = 0
max-select-series = 0
max-select-buckets = 0
[retention]
enabled = true
check-interval = "30m0s"
[admin]
enabled = true
bind-address = ":8083"
https-enabled = false
https-certificate = "/etc/ssl/influxdb.pem"
[shard-precreation]
enabled = true
check-interval = "10m0s"
advance-period = "30m0s"
[monitor]
store-enabled = true
store-database = "_internal"
store-interval = "10s"
[subscriber]
enabled = true
http-timeout = "30s"
insecure-skip-verify = false
ca-certs = ""
write-concurrency = 40
write-buffer-size = 1000
[http]
enabled = true
bind-address = ":8086"
auth-enabled = false
log-enabled = true
write-tracing = false
pprof-enabled = false
https-enabled = false
https-certificate = "/etc/ssl/influxdb.pem"
https-private-key = ""
max-row-limit = 10000
max-connection-limit = 0
shared-secret = ""
realm = "InfluxDB"
unix-socket-enabled = false
bind-socket = "/var/run/influxdb.sock"
[[graphite]]
enabled = false
bind-address = ":2003"
database = "graphite"
retention-policy = ""
protocol = "tcp"
batch-size = 5000
batch-pending = 10
batch-timeout = "1s"
consistency-level = "one"
separator = "."
udp-read-buffer = 0
[[collectd]]
enabled = false
bind-address = ":25826"
database = "collectd"
retention-policy = ""
batch-size = 5000
batch-pending = 10
batch-timeout = "10s"
read-buffer = 0
typesdb = "/usr/share/collectd/types.db"
[[opentsdb]]
enabled = false
bind-address = ":4242"
database = "opentsdb"
retention-policy = ""
consistency-level = "one"
tls-enabled = false
certificate = "/etc/ssl/influxdb.pem"
batch-size = 1000
batch-pending = 5
batch-timeout = "1s"
log-point-errors = true
[[udp]]
enabled = false
bind-address = ":8089"
database = "udp"
retention-policy = ""
batch-size = 5000
batch-pending = 10
read-buffer = 0
batch-timeout = "1s"
precision = ""
[continuous_queries]
log-enabled = true
enabled = true
run-interval = "1s"

View File

@ -17,19 +17,14 @@ spec:
spec: spec:
containers: containers:
- name: influxdb - name: influxdb
#image: gcr.io/google_containers/heapster-influxdb-amd64:v1.1.1 #image: gcr.io/google_containers/heapster-influxdb-amd64:v1.3.3
image: mirrorgooglecontainers/heapster-influxdb-amd64:v1.1.1 image: mirrorgooglecontainers/heapster-influxdb-amd64:v1.3.3
volumeMounts: volumeMounts:
- mountPath: /data - mountPath: /data
name: influxdb-storage name: influxdb-storage
- mountPath: /etc/
name: influxdb-config
volumes: volumes:
- name: influxdb-storage - name: influxdb-storage
emptyDir: {} emptyDir: {}
- name: influxdb-config
configMap:
name: influxdb-config
--- ---
apiVersion: v1 apiVersion: v1
kind: Service kind: Service
@ -48,143 +43,7 @@ spec:
- port: 8086 - port: 8086
targetPort: 8086 targetPort: 8086
name: http name: http
- port: 8083
targetPort: 8083
name: admin
selector: selector:
k8s-app: influxdb k8s-app: influxdb
--- ---
apiVersion: v1
kind: ConfigMap
metadata:
name: influxdb-config
namespace: kube-system
data:
config.toml: |
reporting-disabled = true
bind-address = ":8088"
[meta]
dir = "/data/meta"
retention-autocreate = true
logging-enabled = true
[data]
dir = "/data/data"
wal-dir = "/data/wal"
query-log-enabled = true
cache-max-memory-size = 1073741824
cache-snapshot-memory-size = 26214400
cache-snapshot-write-cold-duration = "10m0s"
compact-full-write-cold-duration = "4h0m0s"
max-series-per-database = 1000000
max-values-per-tag = 100000
trace-logging-enabled = false
[coordinator]
write-timeout = "10s"
max-concurrent-queries = 0
query-timeout = "0s"
log-queries-after = "0s"
max-select-point = 0
max-select-series = 0
max-select-buckets = 0
[retention]
enabled = true
check-interval = "30m0s"
[admin]
enabled = true
bind-address = ":8083"
https-enabled = false
https-certificate = "/etc/ssl/influxdb.pem"
[shard-precreation]
enabled = true
check-interval = "10m0s"
advance-period = "30m0s"
[monitor]
store-enabled = true
store-database = "_internal"
store-interval = "10s"
[subscriber]
enabled = true
http-timeout = "30s"
insecure-skip-verify = false
ca-certs = ""
write-concurrency = 40
write-buffer-size = 1000
[http]
enabled = true
bind-address = ":8086"
auth-enabled = false
log-enabled = true
write-tracing = false
pprof-enabled = false
https-enabled = false
https-certificate = "/etc/ssl/influxdb.pem"
https-private-key = ""
max-row-limit = 10000
max-connection-limit = 0
shared-secret = ""
realm = "InfluxDB"
unix-socket-enabled = false
bind-socket = "/var/run/influxdb.sock"
[[graphite]]
enabled = false
bind-address = ":2003"
database = "graphite"
retention-policy = ""
protocol = "tcp"
batch-size = 5000
batch-pending = 10
batch-timeout = "1s"
consistency-level = "one"
separator = "."
udp-read-buffer = 0
[[collectd]]
enabled = false
bind-address = ":25826"
database = "collectd"
retention-policy = ""
batch-size = 5000
batch-pending = 10
batch-timeout = "10s"
read-buffer = 0
typesdb = "/usr/share/collectd/types.db"
[[opentsdb]]
enabled = false
bind-address = ":4242"
database = "opentsdb"
retention-policy = ""
consistency-level = "one"
tls-enabled = false
certificate = "/etc/ssl/influxdb.pem"
batch-size = 1000
batch-pending = 5
batch-timeout = "1s"
log-point-errors = true
[[udp]]
enabled = false
bind-address = ":8089"
database = "udp"
retention-policy = ""
batch-size = 5000
batch-pending = 10
read-buffer = 0
batch-timeout = "1s"
precision = ""
[continuous_queries]
log-enabled = true
enabled = true
run-interval = "1s"