增加集群与应用监控章节

2017-11-09 10:55:58 +08:00 · 2017-11-09 10:55:58 +08:00 · 7ab2cdf30d
parent 0b1fde236d
commit 7ab2cdf30d
6 changed files with 537 additions and 11 deletions
--- a/SUMMARY.md
+++ b/SUMMARY.md
@ -24,6 +24,7 @@
    - [2.2.13 CronJob](concepts/cronjob.md)
    - [2.2.14 Ingress](concepts/ingress.md)
    - [2.2.15 ConfigMap](concepts/configmap.md)
+      - [2.2.15.1 ConfigMap的热更新](concepts/configmap-hot-update.md)
    - [2.2.16 Horizontal Pod Autoscaling](concepts/horizontal-pod-autoscaling.md)
    - [2.2.17 Label](concepts/label.md)
    - [2.2.18 垃圾收集](concepts/garbage-collection.md)
@ -79,23 +80,26 @@
     - [4.3.4 集群及应用监控](practice/monitor.md)
     - [4.3.6 数据持久化问题](practice/data-persistence-problem.md)
     - [4.3.7 管理容器的计算资源](practice/manage-compute-resources-container.md)
-     - [4.3.8 使用Prometheus监控kubernetes集群](practice/using-prometheus-to-monitor-kuberentes-cluster.md)
-     - [4.3.9 使用Heapster获取集群和对象的metric数据](practice/using-heapster-to-get-object-metrics.md)
  - [4.4 存储管理](practice/storage.md)
     - [4.4.1 GlusterFS](practice/glusterfs.md)
        - [4.4.1.1 使用GlusterFS做持久化存储](practice/using-glusterfs-for-persistent-storage.md)
        - [4.4.1.2 在OpenShift中使用GlusterFS做持久化存储](practice/storage-for-containers-using-glusterfs-with-openshift.md)
     - [4.4.2 CephFS](practice/cephfs.md)
        - [4.4.2.1 使用Ceph做持久化存储](practice/using-ceph-for-persistent-storage.md)
-  - [4.5 服务编排管理](practice/services-management-tool.md)
-     - [4.5.1 使用Helm管理kubernetes应用](practice/helm.md)
-     - [4.5.2 构建私有Chart仓库](practice/create-private-charts-repo.md)
-  - [4.6 持续集成与发布](practice/ci-cd.md)
-     - [4.6.1 使用Jenkins进行持续集成与发布](practice/jenkins-ci-cd.md)
-     - [4.6.2 使用Drone进行持续集成与发布](practice/drone-ci-cd.md)
-  - [4.7 更新与升级](practice/update-and-upgrade.md)
-     - [4.7.1 手动升级kubernetes集群](practice/manually-upgrade.md)
-     - [4.7.2 升级dashboard](practice/dashboard-upgrade.md)
+  - [4.5 集群与应用监控](practice/monitoring.md)
+     - [4.5.1 Heapster](practice//heapster.md)
+        - [4.5.1.1 使用Heapster获取集群和对象的metric数据](practice/using-heapster-to-get-object-metrics.md)
+     - [4.5.2 Prometheus](practice/prometheus.md)
+        - [4.5.2.1 使用Prometheus监控kubernetes集群](practice/using-prometheus-to-monitor-kuberentes-cluster.md)
+  - [4.6 服务编排管理](practice/services-management-tool.md)
+     - [4.6.1 使用Helm管理kubernetes应用](practice/helm.md)
+     - [4.6.2 构建私有Chart仓库](practice/create-private-charts-repo.md)
+  - [4.7 持续集成与发布](practice/ci-cd.md)
+     - [4.7.1 使用Jenkins进行持续集成与发布](practice/jenkins-ci-cd.md)
+     - [4.7.2 使用Drone进行持续集成与发布](practice/drone-ci-cd.md)
+  - [4.8 更新与升级](practice/update-and-upgrade.md)
+     - [4.8.1 手动升级kubernetes集群](practice/manually-upgrade.md)
+     - [4.8.2 升级dashboard](practice/dashboard-upgrade.md)
 - [5. 领域应用](usecases/index.md)
  - [5.1 微服务架构](usecases/microservices.md)
    - [5.1.1 微服务中的服务发现](usecases/service-discovery-in-microservices.md)
--- a/concepts/configmap-hot-update.md
+++ b/concepts/configmap-hot-update.md
@ -0,0 +1,430 @@
+# ConfigMap的热更新
+
+ConfigMap是用来存储配置文件的kubernetes资源对象，所有的配置内容都存储在etcd中，下文主要是探究 ConfigMap 的创建和更新流程，以及对 ConfigMap 更新后容器内挂载的内容是否同步更新的测试。
+
+## 测试示例
+
+假设我们在 `default` namespace 下有一个名为 `nginx-config` 的 ConfigMap，可以使用 `kubectl`命令来获取：
+
+```bash
+$ kubectl get configmap nginx-config
+NAME           DATA      AGE
+nginx-config   1         99d
+```
+
+获取该ConfigMap的内容。
+
+```bash
+kubectl get configmap nginx-config -o yaml
+```
+
+```bash
+apiVersion: v1
+data:
+  nginx.conf: |-
+    worker_processes 1;
+
+    events { worker_connections 1024; }
+
+    http {
+        sendfile on;
+
+        server {
+            listen 80;
+
+            # a test endpoint that returns http 200s
+            location / {
+                proxy_pass http://httpstat.us/200;
+                proxy_set_header  X-Real-IP  $remote_addr;
+            }
+        }
+
+        server {
+
+            listen 80;
+            server_name api.hello.world;
+
+            location / {
+                proxy_pass http://l5d.default.svc.cluster.local;
+                proxy_set_header Host $host;
+                proxy_set_header Connection "";
+                proxy_http_version 1.1;
+
+                more_clear_input_headers 'l5d-ctx-*' 'l5d-dtab' 'l5d-sample';
+            }
+        }
+
+        server {
+
+            listen 80;
+            server_name www.hello.world;
+
+            location / {
+
+
+                # allow 'employees' to perform dtab overrides
+                if ($cookie_special_employee_cookie != "letmein") {
+                  more_clear_input_headers 'l5d-ctx-*' 'l5d-dtab' 'l5d-sample';
+                }
+
+                # add a dtab override to get people to our beta, world-v2
+                set $xheader "";
+
+                if ($cookie_special_employee_cookie ~* "dogfood") {
+                  set $xheader "/host/world => /srv/world-v2;";
+                }
+
+                proxy_set_header 'l5d-dtab' $xheader;
+
+
+                proxy_pass http://l5d.default.svc.cluster.local;
+                proxy_set_header Host $host;
+                proxy_set_header Connection "";
+                proxy_http_version 1.1;
+            }
+        }
+    }
+kind: ConfigMap
+metadata:
+  creationTimestamp: 2017-08-01T06:53:17Z
+  name: nginx-config
+  namespace: default
+  resourceVersion: "14925806"
+  selfLink: /api/v1/namespaces/default/configmaps/nginx-config
+  uid: 18d70527-7686-11e7-bfbd-8af1e3a7c5bd
+```
+
+ConfigMap中的内容是存储到etcd中的，然后查询etcd：
+
+```bash
+ETCDCTL_API=3 etcdctl get /registry/configmaps/default/nginx-config
+/registry/configmaps/default/nginx-config
+```
+
+注意使用 v3 版本的 etcdctl API，下面是输出结果：
+
+```bash
+k8s
+
+v1	ConfigMap<61>
+
+T
+
+nginx-configdefault"*$18d70527-7686-11e7-bfbd-8af1e3a7c5bd28B
+                                                            <20>ʀ<EFBFBD><CA80><EFBFBD><EFBFBD>xz<78>
+
+
+nginx.conf<6E>
+           worker_processes 1;
+
+events { worker_connections 1024; }
+
+http {
+    sendfile on;
+
+    server {
+        listen 80;
+
+        # a test endpoint that returns http 200s
+        location / {
+            proxy_pass http://httpstat.us/200;
+            proxy_set_header  X-Real-IP  $remote_addr;
+        }
+    }
+
+    server {
+
+        listen 80;
+        server_name api.hello.world;
+
+        location / {
+            proxy_pass http://l5d.default.svc.cluster.local;
+            proxy_set_header Host $host;
+            proxy_set_header Connection "";
+            proxy_http_version 1.1;
+
+            more_clear_input_headers 'l5d-ctx-*' 'l5d-dtab' 'l5d-sample';
+        }
+    }
+
+    server {
+
+        listen 80;
+        server_name www.hello.world;
+
+        location / {
+
+
+            # allow 'employees' to perform dtab overrides
+            if ($cookie_special_employee_cookie != "letmein") {
+              more_clear_input_headers 'l5d-ctx-*' 'l5d-dtab' 'l5d-sample';
+            }
+
+            # add a dtab override to get people to our beta, world-v2
+            set $xheader "";
+
+            if ($cookie_special_employee_cookie ~* "dogfood") {
+              set $xheader "/host/world => /srv/world-v2;";
+            }
+
+            proxy_set_header 'l5d-dtab' $xheader;
+
+
+            proxy_pass http://l5d.default.svc.cluster.local;
+            proxy_set_header Host $host;
+            proxy_set_header Connection "";
+            proxy_http_version 1.1;
+        }
+    }
+}"
+```
+
+输出中在 `nginx.conf` 配置文件的基础中增加了文件头内容，是kubernetes增加的。 
+
+## 代码
+
+ConfigMap 结构体的定义：
+
+```go
+// ConfigMap holds configuration data for pods to consume.
+type ConfigMap struct {
+	metav1.TypeMeta `json:",inline"`
+	// Standard object's metadata.
+	// More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata
+	// +optional
+	metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`
+
+	// Data contains the configuration data.
+	// Each key must be a valid DNS_SUBDOMAIN with an optional leading dot.
+	// +optional
+	Data map[string]string `json:"data,omitempty" protobuf:"bytes,2,rep,name=data"`
+}
+```
+
+在 `staging/src/k8s.io/client-go/kubernetes/typed/core/v1/configmap.go` 中ConfigMap 的接口定义：
+
+```go
+// ConfigMapInterface has methods to work with ConfigMap resources.
+type ConfigMapInterface interface {
+	Create(*v1.ConfigMap) (*v1.ConfigMap, error)
+	Update(*v1.ConfigMap) (*v1.ConfigMap, error)
+	Delete(name string, options *meta_v1.DeleteOptions) error
+	DeleteCollection(options *meta_v1.DeleteOptions, listOptions meta_v1.ListOptions) error
+	Get(name string, options meta_v1.GetOptions) (*v1.ConfigMap, error)
+	List(opts meta_v1.ListOptions) (*v1.ConfigMapList, error)
+	Watch(opts meta_v1.ListOptions) (watch.Interface, error)
+	Patch(name string, pt types.PatchType, data []byte, subresources ...string) (result *v1.ConfigMap, err error)
+	ConfigMapExpansion
+}
+```
+
+在 `staging/src/k8s.io/client-go/kubernetes/typed/core/v1/configmap.go` 中创建 ConfigMap 的方法如下:
+
+```go
+// Create takes the representation of a configMap and creates it.  Returns the server's representation of the configMap, and an error, if there is any.
+func (c *configMaps) Create(configMap *v1.ConfigMap) (result *v1.ConfigMap, err error) {
+	result = &v1.ConfigMap{}
+	err = c.client.Post().
+		Namespace(c.ns).
+		Resource("configmaps").
+		Body(configMap).
+		Do().
+		Into(result)
+	return
+}
+```
+
+通过 RESTful 请求在 etcd 中存储 ConfigMap 的配置，该方法中设置了资源对象的 namespace 和 HTTP 请求中的 body，执行后将请求结果保存到 result 中返回给调用者。
+
+**注意 Body 的结构**
+
+```java
+// Body makes the request use obj as the body. Optional.
+// If obj is a string, try to read a file of that name.
+// If obj is a []byte, send it directly.
+// If obj is an io.Reader, use it directly.
+// If obj is a runtime.Object, marshal it correctly, and set Content-Type header.
+// If obj is a runtime.Object and nil, do nothing.
+// Otherwise, set an error.
+```
+
+创建 ConfigMap RESTful 请求中的的 Body 中包含 `ObjectMeta` 和 `namespace`。
+
+HTTP 请求中的结构体：
+
+```go
+// Request allows for building up a request to a server in a chained fashion.
+// Any errors are stored until the end of your call, so you only have to
+// check once.
+type Request struct {
+	// required
+	client HTTPClient
+	verb   string
+
+	baseURL     *url.URL
+	content     ContentConfig
+	serializers Serializers
+
+	// generic components accessible via method setters
+	pathPrefix string
+	subpath    string
+	params     url.Values
+	headers    http.Header
+
+	// structural elements of the request that are part of the Kubernetes API conventions
+	namespace    string
+	namespaceSet bool
+	resource     string
+	resourceName string
+	subresource  string
+	timeout      time.Duration
+
+	// output
+	err  error
+	body io.Reader
+
+	// This is only used for per-request timeouts, deadlines, and cancellations.
+	ctx context.Context
+
+	backoffMgr BackoffManager
+	throttle   flowcontrol.RateLimiter
+}
+```
+
+## 测试
+
+分别测试使用 ConfigMap 挂载 Env 和 Volume 的情况。
+
+### 更新使用ConfigMap挂载的Env
+
+使用下面的配置创建 nginx 容器测试更新 ConfigMap 后容器内的环境变量是否也跟着更新。
+
+```yaml
+apiVersion: extensions/v1beta1
+kind: Deployment
+metadata:
+  name: my-nginx
+spec:
+  replicas: 1
+  template:
+    metadata:
+      labels:
+        run: my-nginx
+    spec:
+      containers:
+      - name: my-nginx
+        image: sz-pg-oam-docker-hub-001.tendcloud.com/library/nginx:1.9
+        ports:
+        - containerPort: 80
+        envFrom:
+        - configMapRef:
+            name: env-config
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: env-config
+  namespace: default
+data:
+  log_level: INFO
+```
+
+获取环境变量的值
+
+```bash
+$ kubectl exec `kubectl get pods -l run=my-nginx  -o=name|cut -d "/" -f2` env|grep log_level
+log_level=INFO
+```
+
+修改 ConfigMap
+
+```bash
+$ kubectl edit configmap env-config
+```
+
+修改 `log_level` 的值为 `DEBUG`。
+
+再次查看环境变量的值。
+
+```bash
+$ kubectl exec `kubectl get pods -l run=my-nginx  -o=name|cut -d "/" -f2` env|grep log_level
+log_level=INFO
+```
+
+实践证明修改 ConfigMap 无法更新容器中已注入的环境变量信息。
+
+### 更新使用ConfigMap挂载的Volume
+
+使用下面的配置创建 nginx 容器测试更新 ConfigMap 后容器内挂载的文件是否也跟着更新。
+
+```yaml
+apiVersion: extensions/v1beta1
+kind: Deployment
+metadata:
+  name: my-nginx
+spec:
+  replicas: 1
+  template:
+    metadata:
+      labels:
+        run: my-nginx
+    spec:
+      containers:
+      - name: my-nginx
+        image: sz-pg-oam-docker-hub-001.tendcloud.com/library/nginx:1.9
+        ports:
+        - containerPort: 80
+      volumeMounts:
+      - name: config-volume
+        mountPath: /etc/config
+      volumes:
+        - name: config-volume
+          configMap:
+            name: special-config
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: special-config
+  namespace: default
+data:
+  log_level: INFO
+```
+
+```bash
+$ kubectl exec `kubectl get pods -l run=my-nginx  -o=name|cut -d "/" -f2` cat /tmp/log_level
+INFO
+```
+
+修改 ConfigMap
+
+```bash
+$ kubectl edit configmap special-config
+```
+
+修改 `log_level` 的值为 `DEBUG`。
+
+等待大概10秒钟时间，再次查看环境变量的值。
+
+```bash
+$ kubectl exec `kubectl get pods -l run=my-nginx  -o=name|cut -d "/" -f2` cat /tmp/log_level
+DEBUG
+```
+
+我们可以看到使用 ConfigMap 方式挂载的 Volume 的文件中的内容已经变成了 `DEBUG`。
+
+## 总结
+
+更新 ConfigMap 后：
+
+- 使用该 ConfigMap 挂载的 Env **不会**同步更新
+- 使用该 ConfigMap 挂载的 Volume 中的数据需要一段时间（实测大概10秒）才能同步更新
+
+ENV 是在容器启动的时候注入的，启动之后 kubernetes 就不会再改变环境变量的值，且同一个 namespace 中的 pod 的环境变量是不断累加的，参考 [Kubernetes中的服务发现与docker容器间的环境变量传递源码探究](https://jimmysong.io/posts/exploring-kubernetes-env-with-docker/)。为了更新容器中使用 ConfigMap 挂载的配置，可以通过滚动更新 pod 的方式来强制重新挂载 ConfigMap，也可以在更新了 ConfigMap 后，先将副本数设置为 0，然后再扩容。
+
+## 参考
+
+- [Kubernetes 1.7 security in practice](https://acotten.com/post/kube17-security)
+- [ConfigMap | kubernetes handbook - jimmysong.io](https://jimmysong.io/kubernetes-handbook/concepts/configmap.html)
+- [创建高可用ectd集群 | Kubernetes handbook - jimmysong.io](https://jimmysong.io/kubernetes-handbook/practice/etcd-cluster-installation.html)
+- [Kubernetes中的服务发现与docker容器间的环境变量传递源码探究](https://jimmysong.io/posts/exploring-kubernetes-env-with-docker/)
--- a/manifests/test/configmap-test.yaml
+++ b/manifests/test/configmap-test.yaml
@ -0,0 +1,31 @@
+apiVersion: extensions/v1beta1
+kind: Deployment
+metadata:
+  name: my-nginx
+spec:
+  replicas: 1
+  template:
+    metadata:
+      labels:
+        run: my-nginx
+    spec:
+      containers:
+      - name: my-nginx
+        image: sz-pg-oam-docker-hub-001.tendcloud.com/library/nginx:1.9
+        ports:
+        - containerPort: 80
+        volumeMounts:
+        - name: config-volume
+          mountPath: /tmp
+      volumes:
+        - name: config-volume
+          configMap:
+            name: special-config
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: special-config
+  namespace: default
+data:
+  log_level: WARN
--- a/practice/heapster.md
+++ b/practice/heapster.md
@ -0,0 +1,5 @@
+# Heapster
+
+Heapster作为kubernetes安装过程中默认安装的一个插件，见[安装heapster插件](practice/heapster-addon-installation.md)。这对于集群监控十分有用，同时在[Horizontal Pod Autoscaling](../concepts/horizontal-pod-autoscaling.md)中也用到了，HPA将Heapster作为`Resource Metrics API`，向其获取metric，做法是在`kube-controller-manager` 中配置`--api-server`指向[kube-aggregator](https://github.com/kubernetes/kube-aggregator)，也可以使用heapster来实现，通过在启动heapster的时候指定`--api-server=true`。
+
+Heapster可以收集Node节点上的cAdvisor数据，还可以按照kubernetes的资源类型来集合资源，比如Pod、Namespace域，可以分别获取它们的CPU、内存、网络和磁盘的metric。默认的metric数据聚合时间间隔是1分钟。
--- a/practice/monitoring.md
+++ b/practice/monitoring.md
@ -0,0 +1,11 @@
+# 监控
+
+Kubernetes 使得管理复杂环境变得更简单，但是对 kubernetes 本身的各种组件还有运行在 kubernetes 集群上的各种应用程序做到很好的洞察就很难了。Kubernetes 本身对应用程序的做了很多抽象，在生产环境下对这些不同的抽象组件的健康就是迫在眉睫的事情。
+
+我们在安装 kubernetes 集群的时候，默认安装了 kubernetes 官方提供的 [heapster](https://github.com/kubernetes/heapster) 插件，可以对 kubernetes 集群上的应用进行简单的监控，获取 pod 级别的**内存**、**CPU**和**网络**监控信息，同时还能够通过 API 监控 kubernetes 中的基本资源监控指标。
+
+然而，[Prometheus](https://prometheus.io) 的出现让人眼前一亮，与 kubernetes 一样同样为 CNCF 中的项目，而且是第一个加入到 CNCF 中的项目。
+
+[Prometheus](https://prometheus.io) 是由 SoundCloud 开源监控告警解决方案，从 2012 年开始编写代码，再到 2015 年 GitHub 上开源以来，已经吸引了 9k+ 关注，以及很多大公司的使用；2016 年 Prometheus 成为继 k8s 后，第二名 CNCF\([Cloud Native Computing Foundation](https://cncf.io/)\) 成员。
+
+作为新一代开源解决方案，很多理念与 Google SRE 运维之道不谋而合。
--- a/practice/prometheus.md
+++ b/practice/prometheus.md
@ -0,0 +1,45 @@
+# Prometheus
+
+[Prometheus](https://prometheus.io) 是由 SoundCloud 开源监控告警解决方案，从 2012 年开始编写代码，再到 2015 年 github 上开源以来，已经吸引了 9k+ 关注，以及很多大公司的使用；2016 年 Prometheus 成为继 k8s 后，第二名 CNCF\([Cloud Native Computing Foundation](https://cncf.io/)\) 成员。
+
+作为新一代开源解决方案，很多理念与 Google SRE 运维之道不谋而合。
+
+## 主要功能
+
+- 多维 [数据模型](https://prometheus.io/docs/concepts/data_model/)（时序由 metric 名字和 k/v 的 labels 构成）。
+- 灵活的查询语句（[PromQL](https://prometheus.io/docs/querying/basics/)）。
+- 无依赖存储，支持 local 和 remote 不同模型。
+- 采用 http 协议，使用 pull 模式，拉取数据，简单易懂。
+- 监控目标，可以采用服务发现或静态配置的方式。
+- 支持多种统计数据模型，图形化友好。
+
+## 核心组件
+
+- [Prometheus Server](https://github.com/prometheus/prometheus)， 主要用于抓取数据和存储时序数据，另外还提供查询和 Alert Rule 配置管理。
+- [client libraries](https://prometheus.io/docs/instrumenting/clientlibs/)，用于对接 Prometheus Server, 可以查询和上报数据。
+- [push gateway](https://github.com/prometheus/pushgateway) ，用于批量，短期的监控数据的汇总节点，主要用于业务数据汇报等。
+- 各种汇报数据的 [exporters](https://prometheus.io/docs/instrumenting/exporters/) ，例如汇报机器数据的 node\_exporter,  汇报 MongoDB 信息的 [MongoDB exporter](https://github.com/dcu/mongodb_exporter) 等等。
+- 用于告警通知管理的 [alertmanager](https://github.com/prometheus/alertmanager) 。
+
+## 基础架构
+
+一图胜千言，先来张官方的架构图
+
+![Prometheus 架构图](https://prometheus.io/assets/architecture.svg)
+
+从这个架构图，也可以看出 Prometheus 的主要模块包含， Server,  Exporters, Pushgateway, PromQL, Alertmanager, WebUI 等。
+
+它大致使用逻辑是这样：
+
+1. Prometheus server 定期从静态配置的 targets 或者服务发现的 targets 拉取数据。
+2. 当新拉取的数据大于配置内存缓存区的时候，Prometheus 会将数据持久化到磁盘（如果使用 remote storage 将持久化到云端）。
+3. Prometheus 可以配置 rules，然后定时查询数据，当条件触发的时候，会将 alert 推送到配置的 Alertmanager。
+4. Alertmanager 收到警告的时候，可以根据配置，聚合，去重，降噪，最后发送警告。
+5. 可以使用 API， Prometheus Console 或者 Grafana 查询和聚合数据。
+
+## 注意
+
+- Prometheus 的数据是基于时序的 float64 的值，如果你的数据值有更多类型，无法满足。
+- Prometheus 不适合做审计计费，因为它的数据是按一定时间采集的，关注的更多是系统的运行瞬时状态以及趋势，即使有少量数据没有采集也能容忍，但是审计计费需要记录每个请求，并且数据长期存储，这个和 Prometheus 无法满足，可能需要采用专门的审计系统。
+
+以上介绍来自 https://github.com/songjiayang/prometheus_practice/