增加集群与应用监控章节

2017-11-09 10:55:58 +08:00 · 2017-11-09 10:55:58 +08:00 · 7ab2cdf30d
parent 0b1fde236d
commit 7ab2cdf30d
6 changed files with 537 additions and 11 deletions
--- a/SUMMARY.md
+++ b/SUMMARY.md
@ -24,6 +24,7 @@
    - [2.2.13 CronJob](concepts/cronjob.md)
    - [2.2.14 Ingress](concepts/ingress.md)
    - [2.2.15 ConfigMap](concepts/configmap.md)
      - [2.2.15.1 ConfigMap的热更新](concepts/configmap-hot-update.md)
    - [2.2.16 Horizontal Pod Autoscaling](concepts/horizontal-pod-autoscaling.md)
    - [2.2.17 Label](concepts/label.md)
    - [2.2.18 垃圾收集](concepts/garbage-collection.md)
@ -79,23 +80,26 @@
     - [4.3.4 集群及应用监控](practice/monitor.md)
     - [4.3.6 数据持久化问题](practice/data-persistence-problem.md)
     - [4.3.7 管理容器的计算资源](practice/manage-compute-resources-container.md)
     - [4.3.8 使用Prometheus监控kubernetes集群](practice/using-prometheus-to-monitor-kuberentes-cluster.md)
     - [4.3.9 使用Heapster获取集群和对象的metric数据](practice/using-heapster-to-get-object-metrics.md)
  - [4.4 存储管理](practice/storage.md)
     - [4.4.1 GlusterFS](practice/glusterfs.md)
        - [4.4.1.1 使用GlusterFS做持久化存储](practice/using-glusterfs-for-persistent-storage.md)
        - [4.4.1.2 在OpenShift中使用GlusterFS做持久化存储](practice/storage-for-containers-using-glusterfs-with-openshift.md)
     - [4.4.2 CephFS](practice/cephfs.md)
        - [4.4.2.1 使用Ceph做持久化存储](practice/using-ceph-for-persistent-storage.md)
-  - [4.5 服务编排管理](practice/services-management-tool.md)
+  - [4.5 集群与应用监控](practice/monitoring.md)
-     - [4.5.1 使用Helm管理kubernetes应用](practice/helm.md)
+     - [4.5.1 Heapster](practice//heapster.md)
-     - [4.5.2 构建私有Chart仓库](practice/create-private-charts-repo.md)
+        - [4.5.1.1 使用Heapster获取集群和对象的metric数据](practice/using-heapster-to-get-object-metrics.md)
-  - [4.6 持续集成与发布](practice/ci-cd.md)
+     - [4.5.2 Prometheus](practice/prometheus.md)
-     - [4.6.1 使用Jenkins进行持续集成与发布](practice/jenkins-ci-cd.md)
+        - [4.5.2.1 使用Prometheus监控kubernetes集群](practice/using-prometheus-to-monitor-kuberentes-cluster.md)
-     - [4.6.2 使用Drone进行持续集成与发布](practice/drone-ci-cd.md)
+  - [4.6 服务编排管理](practice/services-management-tool.md)
-  - [4.7 更新与升级](practice/update-and-upgrade.md)
+     - [4.6.1 使用Helm管理kubernetes应用](practice/helm.md)
-     - [4.7.1 手动升级kubernetes集群](practice/manually-upgrade.md)
+     - [4.6.2 构建私有Chart仓库](practice/create-private-charts-repo.md)
-     - [4.7.2 升级dashboard](practice/dashboard-upgrade.md)
+  - [4.7 持续集成与发布](practice/ci-cd.md)
     - [4.7.1 使用Jenkins进行持续集成与发布](practice/jenkins-ci-cd.md)
     - [4.7.2 使用Drone进行持续集成与发布](practice/drone-ci-cd.md)
  - [4.8 更新与升级](practice/update-and-upgrade.md)
     - [4.8.1 手动升级kubernetes集群](practice/manually-upgrade.md)
     - [4.8.2 升级dashboard](practice/dashboard-upgrade.md)
 - [5. 领域应用](usecases/index.md)
  - [5.1 微服务架构](usecases/microservices.md)
    - [5.1.1 微服务中的服务发现](usecases/service-discovery-in-microservices.md)
--- a/concepts/configmap-hot-update.md
+++ b/concepts/configmap-hot-update.md
@ -0,0 +1,430 @@
 # ConfigMap的热更新
 ConfigMap是用来存储配置文件的kubernetes资源对象，所有的配置内容都存储在etcd中，下文主要是探究 ConfigMap 的创建和更新流程，以及对 ConfigMap 更新后容器内挂载的内容是否同步更新的测试。
 ## 测试示例
 假设我们在 `default` namespace 下有一个名为 `nginx-config` 的 ConfigMap，可以使用 `kubectl`命令来获取：
 ```bash
 $ kubectl get configmap nginx-config
 NAME           DATA      AGE
 nginx-config   1         99d
 ```
 获取该ConfigMap的内容。
 ```bash
 kubectl get configmap nginx-config -o yaml
 ```
 ```bash
 apiVersion: v1
 data:
  nginx.conf: |-
    worker_processes 1;
    events { worker_connections 1024; }
    http {
        sendfile on;
        server {
            listen 80;
            # a test endpoint that returns http 200s
            location / {
                proxy_pass http://httpstat.us/200;
                proxy_set_header  X-Real-IP  $remote_addr;
            }
        }
        server {
            listen 80;
            server_name api.hello.world;
            location / {
                proxy_pass http://l5d.default.svc.cluster.local;
                proxy_set_header Host $host;
                proxy_set_header Connection "";
                proxy_http_version 1.1;
                more_clear_input_headers 'l5d-ctx-*' 'l5d-dtab' 'l5d-sample';
            }
        }
        server {
            listen 80;
            server_name www.hello.world;
            location / {
                # allow 'employees' to perform dtab overrides
                if ($cookie_special_employee_cookie != "letmein") {
                  more_clear_input_headers 'l5d-ctx-*' 'l5d-dtab' 'l5d-sample';
                }
                # add a dtab override to get people to our beta, world-v2
                set $xheader "";
                if ($cookie_special_employee_cookie ~* "dogfood") {
                  set $xheader "/host/world => /srv/world-v2;";
                }
                proxy_set_header 'l5d-dtab' $xheader;
                proxy_pass http://l5d.default.svc.cluster.local;
                proxy_set_header Host $host;
                proxy_set_header Connection "";
                proxy_http_version 1.1;
            }
        }
    }
 kind: ConfigMap
 metadata:
  creationTimestamp: 2017-08-01T06:53:17Z
  name: nginx-config
  namespace: default
  resourceVersion: "14925806"
  selfLink: /api/v1/namespaces/default/configmaps/nginx-config
  uid: 18d70527-7686-11e7-bfbd-8af1e3a7c5bd
 ```
 ConfigMap中的内容是存储到etcd中的，然后查询etcd：
 ```bash
 ETCDCTL_API=3 etcdctl get /registry/configmaps/default/nginx-config
 /registry/configmaps/default/nginx-config
 ```
 注意使用 v3 版本的 etcdctl API，下面是输出结果：
 ```bash
 k8s
 v1	ConfigMap<61>
 T
 nginx-configdefault"*$18d70527-7686-11e7-bfbd-8af1e3a7c5bd28B
                                                            <20>ʀ<EFBFBD><CA80><EFBFBD><EFBFBD>xz<78>
 nginx.conf<6E>
           worker_processes 1;
 events { worker_connections 1024; }
 http {
    sendfile on;
    server {
        listen 80;
        # a test endpoint that returns http 200s
        location / {
            proxy_pass http://httpstat.us/200;
            proxy_set_header  X-Real-IP  $remote_addr;
        }
    }
    server {
        listen 80;
        server_name api.hello.world;
        location / {
            proxy_pass http://l5d.default.svc.cluster.local;
            proxy_set_header Host $host;
            proxy_set_header Connection "";
            proxy_http_version 1.1;
            more_clear_input_headers 'l5d-ctx-*' 'l5d-dtab' 'l5d-sample';
        }
    }
    server {
        listen 80;
        server_name www.hello.world;
        location / {
            # allow 'employees' to perform dtab overrides
            if ($cookie_special_employee_cookie != "letmein") {
              more_clear_input_headers 'l5d-ctx-*' 'l5d-dtab' 'l5d-sample';
            }
            # add a dtab override to get people to our beta, world-v2
            set $xheader "";
            if ($cookie_special_employee_cookie ~* "dogfood") {
              set $xheader "/host/world => /srv/world-v2;";
            }
            proxy_set_header 'l5d-dtab' $xheader;
            proxy_pass http://l5d.default.svc.cluster.local;
            proxy_set_header Host $host;
            proxy_set_header Connection "";
            proxy_http_version 1.1;
        }
    }
 }"
 ```
 输出中在 `nginx.conf` 配置文件的基础中增加了文件头内容，是kubernetes增加的。 
 ## 代码
 ConfigMap 结构体的定义：
 ```go
 // ConfigMap holds configuration data for pods to consume.
 type ConfigMap struct {
 	metav1.TypeMeta `json:",inline"`
 	// Standard object's metadata.
 	// More info: http://releases.k8s.io/HEAD/docs/devel/api-conventions.md#metadata
 	// +optional
 	metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`
 	// Data contains the configuration data.
 	// Each key must be a valid DNS_SUBDOMAIN with an optional leading dot.
 	// +optional
 	Data map[string]string `json:"data,omitempty" protobuf:"bytes,2,rep,name=data"`
 }
 ```
 在 `staging/src/k8s.io/client-go/kubernetes/typed/core/v1/configmap.go` 中ConfigMap 的接口定义：
 ```go
 // ConfigMapInterface has methods to work with ConfigMap resources.
 type ConfigMapInterface interface {
 	Create(*v1.ConfigMap) (*v1.ConfigMap, error)
 	Update(*v1.ConfigMap) (*v1.ConfigMap, error)
 	Delete(name string, options *meta_v1.DeleteOptions) error
 	DeleteCollection(options *meta_v1.DeleteOptions, listOptions meta_v1.ListOptions) error
 	Get(name string, options meta_v1.GetOptions) (*v1.ConfigMap, error)
 	List(opts meta_v1.ListOptions) (*v1.ConfigMapList, error)
 	Watch(opts meta_v1.ListOptions) (watch.Interface, error)
 	Patch(name string, pt types.PatchType, data []byte, subresources ...string) (result *v1.ConfigMap, err error)
 	ConfigMapExpansion
 }
 ```
 在 `staging/src/k8s.io/client-go/kubernetes/typed/core/v1/configmap.go` 中创建 ConfigMap 的方法如下:
 ```go
 // Create takes the representation of a configMap and creates it.  Returns the server's representation of the configMap, and an error, if there is any.
 func (c *configMaps) Create(configMap *v1.ConfigMap) (result *v1.ConfigMap, err error) {
 	result = &v1.ConfigMap{}
 	err = c.client.Post().
 		Namespace(c.ns).
 		Resource("configmaps").
 		Body(configMap).
 		Do().
 		Into(result)
 	return
 }
 ```
 通过 RESTful 请求在 etcd 中存储 ConfigMap 的配置，该方法中设置了资源对象的 namespace 和 HTTP 请求中的 body，执行后将请求结果保存到 result 中返回给调用者。
 **注意 Body 的结构**
 ```java
 // Body makes the request use obj as the body. Optional.
 // If obj is a string, try to read a file of that name.
 // If obj is a []byte, send it directly.
 // If obj is an io.Reader, use it directly.
 // If obj is a runtime.Object, marshal it correctly, and set Content-Type header.
 // If obj is a runtime.Object and nil, do nothing.
 // Otherwise, set an error.
 ```
 创建 ConfigMap RESTful 请求中的的 Body 中包含 `ObjectMeta` 和 `namespace`。
 HTTP 请求中的结构体：
 ```go
 // Request allows for building up a request to a server in a chained fashion.
 // Any errors are stored until the end of your call, so you only have to
 // check once.
 type Request struct {
 	// required
 	client HTTPClient
 	verb   string
 	baseURL     *url.URL
 	content     ContentConfig
 	serializers Serializers
 	// generic components accessible via method setters
 	pathPrefix string
 	subpath    string
 	params     url.Values
 	headers    http.Header
 	// structural elements of the request that are part of the Kubernetes API conventions
 	namespace    string
 	namespaceSet bool
 	resource     string
 	resourceName string
 	subresource  string
 	timeout      time.Duration
 	// output
 	err  error
 	body io.Reader
 	// This is only used for per-request timeouts, deadlines, and cancellations.
 	ctx context.Context
 	backoffMgr BackoffManager
 	throttle   flowcontrol.RateLimiter
 }
 ```
 ## 测试
 分别测试使用 ConfigMap 挂载 Env 和 Volume 的情况。
 ### 更新使用ConfigMap挂载的Env
 使用下面的配置创建 nginx 容器测试更新 ConfigMap 后容器内的环境变量是否也跟着更新。
 ```yaml
 apiVersion: extensions/v1beta1
 kind: Deployment
 metadata:
  name: my-nginx
 spec:
  replicas: 1
  template:
    metadata:
      labels:
        run: my-nginx
    spec:
      containers:
      - name: my-nginx
        image: sz-pg-oam-docker-hub-001.tendcloud.com/library/nginx:1.9
        ports:
        - containerPort: 80
        envFrom:
        - configMapRef:
            name: env-config
 ---
 apiVersion: v1
 kind: ConfigMap
 metadata:
  name: env-config
  namespace: default
 data:
  log_level: INFO
 ```
 获取环境变量的值
 ```bash
 $ kubectl exec `kubectl get pods -l run=my-nginx  -o=name|cut -d "/" -f2` env|grep log_level
 log_level=INFO
 ```
 修改 ConfigMap
 ```bash
 $ kubectl edit configmap env-config
 ```
 修改 `log_level` 的值为 `DEBUG`。
 再次查看环境变量的值。
 ```bash
 $ kubectl exec `kubectl get pods -l run=my-nginx  -o=name|cut -d "/" -f2` env|grep log_level
 log_level=INFO
 ```
 实践证明修改 ConfigMap 无法更新容器中已注入的环境变量信息。
 ### 更新使用ConfigMap挂载的Volume
 使用下面的配置创建 nginx 容器测试更新 ConfigMap 后容器内挂载的文件是否也跟着更新。
 ```yaml
 apiVersion: extensions/v1beta1
 kind: Deployment
 metadata:
  name: my-nginx
 spec:
  replicas: 1
  template:
    metadata:
      labels:
        run: my-nginx
    spec:
      containers:
      - name: my-nginx
        image: sz-pg-oam-docker-hub-001.tendcloud.com/library/nginx:1.9
        ports:
        - containerPort: 80
      volumeMounts:
      - name: config-volume
        mountPath: /etc/config
      volumes:
        - name: config-volume
          configMap:
            name: special-config
 ---
 apiVersion: v1
 kind: ConfigMap
 metadata:
  name: special-config
  namespace: default
 data:
  log_level: INFO
 ```
 ```bash
 $ kubectl exec `kubectl get pods -l run=my-nginx  -o=name|cut -d "/" -f2` cat /tmp/log_level
 INFO
 ```
 修改 ConfigMap
 ```bash
 $ kubectl edit configmap special-config
 ```
 修改 `log_level` 的值为 `DEBUG`。
 等待大概10秒钟时间，再次查看环境变量的值。
 ```bash
 $ kubectl exec `kubectl get pods -l run=my-nginx  -o=name|cut -d "/" -f2` cat /tmp/log_level
 DEBUG
 ```
 我们可以看到使用 ConfigMap 方式挂载的 Volume 的文件中的内容已经变成了 `DEBUG`。
 ## 总结
 更新 ConfigMap 后：
 - 使用该 ConfigMap 挂载的 Env **不会**同步更新
 - 使用该 ConfigMap 挂载的 Volume 中的数据需要一段时间（实测大概10秒）才能同步更新
 ENV 是在容器启动的时候注入的，启动之后 kubernetes 就不会再改变环境变量的值，且同一个 namespace 中的 pod 的环境变量是不断累加的，参考 [Kubernetes中的服务发现与docker容器间的环境变量传递源码探究](https://jimmysong.io/posts/exploring-kubernetes-env-with-docker/)。为了更新容器中使用 ConfigMap 挂载的配置，可以通过滚动更新 pod 的方式来强制重新挂载 ConfigMap，也可以在更新了 ConfigMap 后，先将副本数设置为 0，然后再扩容。
 ## 参考
 - [Kubernetes 1.7 security in practice](https://acotten.com/post/kube17-security)
 - [ConfigMap | kubernetes handbook - jimmysong.io](https://jimmysong.io/kubernetes-handbook/concepts/configmap.html)
 - [创建高可用ectd集群 | Kubernetes handbook - jimmysong.io](https://jimmysong.io/kubernetes-handbook/practice/etcd-cluster-installation.html)
 - [Kubernetes中的服务发现与docker容器间的环境变量传递源码探究](https://jimmysong.io/posts/exploring-kubernetes-env-with-docker/)
--- a/manifests/test/configmap-test.yaml
+++ b/manifests/test/configmap-test.yaml
@ -0,0 +1,31 @@
 apiVersion: extensions/v1beta1
 kind: Deployment
 metadata:
  name: my-nginx
 spec:
  replicas: 1
  template:
    metadata:
      labels:
        run: my-nginx
    spec:
      containers:
      - name: my-nginx
        image: sz-pg-oam-docker-hub-001.tendcloud.com/library/nginx:1.9
        ports:
        - containerPort: 80
        volumeMounts:
        - name: config-volume
          mountPath: /tmp
      volumes:
        - name: config-volume
          configMap:
            name: special-config
 ---
 apiVersion: v1
 kind: ConfigMap
 metadata:
  name: special-config
  namespace: default
 data:
  log_level: WARN
--- a/practice/heapster.md
+++ b/practice/heapster.md
@ -0,0 +1,5 @@
 # Heapster
 Heapster作为kubernetes安装过程中默认安装的一个插件，见[安装heapster插件](practice/heapster-addon-installation.md)。这对于集群监控十分有用，同时在[Horizontal Pod Autoscaling](../concepts/horizontal-pod-autoscaling.md)中也用到了，HPA将Heapster作为`Resource Metrics API`，向其获取metric，做法是在`kube-controller-manager` 中配置`--api-server`指向[kube-aggregator](https://github.com/kubernetes/kube-aggregator)，也可以使用heapster来实现，通过在启动heapster的时候指定`--api-server=true`。
 Heapster可以收集Node节点上的cAdvisor数据，还可以按照kubernetes的资源类型来集合资源，比如Pod、Namespace域，可以分别获取它们的CPU、内存、网络和磁盘的metric。默认的metric数据聚合时间间隔是1分钟。
--- a/practice/monitoring.md
+++ b/practice/monitoring.md
@ -0,0 +1,11 @@
 # 监控
 Kubernetes 使得管理复杂环境变得更简单，但是对 kubernetes 本身的各种组件还有运行在 kubernetes 集群上的各种应用程序做到很好的洞察就很难了。Kubernetes 本身对应用程序的做了很多抽象，在生产环境下对这些不同的抽象组件的健康就是迫在眉睫的事情。
 我们在安装 kubernetes 集群的时候，默认安装了 kubernetes 官方提供的 [heapster](https://github.com/kubernetes/heapster) 插件，可以对 kubernetes 集群上的应用进行简单的监控，获取 pod 级别的**内存**、**CPU**和**网络**监控信息，同时还能够通过 API 监控 kubernetes 中的基本资源监控指标。
 然而，[Prometheus](https://prometheus.io) 的出现让人眼前一亮，与 kubernetes 一样同样为 CNCF 中的项目，而且是第一个加入到 CNCF 中的项目。
 [Prometheus](https://prometheus.io) 是由 SoundCloud 开源监控告警解决方案，从 2012 年开始编写代码，再到 2015 年 GitHub 上开源以来，已经吸引了 9k+ 关注，以及很多大公司的使用；2016 年 Prometheus 成为继 k8s 后，第二名 CNCF\([Cloud Native Computing Foundation](https://cncf.io/)\) 成员。
 作为新一代开源解决方案，很多理念与 Google SRE 运维之道不谋而合。
--- a/practice/prometheus.md
+++ b/practice/prometheus.md
@ -0,0 +1,45 @@
 # Prometheus
 [Prometheus](https://prometheus.io) 是由 SoundCloud 开源监控告警解决方案，从 2012 年开始编写代码，再到 2015 年 github 上开源以来，已经吸引了 9k+ 关注，以及很多大公司的使用；2016 年 Prometheus 成为继 k8s 后，第二名 CNCF\([Cloud Native Computing Foundation](https://cncf.io/)\) 成员。
 作为新一代开源解决方案，很多理念与 Google SRE 运维之道不谋而合。
 ## 主要功能
 - 多维 [数据模型](https://prometheus.io/docs/concepts/data_model/)（时序由 metric 名字和 k/v 的 labels 构成）。
 - 灵活的查询语句（[PromQL](https://prometheus.io/docs/querying/basics/)）。
 - 无依赖存储，支持 local 和 remote 不同模型。
 - 采用 http 协议，使用 pull 模式，拉取数据，简单易懂。
 - 监控目标，可以采用服务发现或静态配置的方式。
 - 支持多种统计数据模型，图形化友好。
 ## 核心组件
 - [Prometheus Server](https://github.com/prometheus/prometheus)， 主要用于抓取数据和存储时序数据，另外还提供查询和 Alert Rule 配置管理。
 - [client libraries](https://prometheus.io/docs/instrumenting/clientlibs/)，用于对接 Prometheus Server, 可以查询和上报数据。
 - [push gateway](https://github.com/prometheus/pushgateway) ，用于批量，短期的监控数据的汇总节点，主要用于业务数据汇报等。
 - 各种汇报数据的 [exporters](https://prometheus.io/docs/instrumenting/exporters/) ，例如汇报机器数据的 node\_exporter,  汇报 MongoDB 信息的 [MongoDB exporter](https://github.com/dcu/mongodb_exporter) 等等。
 - 用于告警通知管理的 [alertmanager](https://github.com/prometheus/alertmanager) 。
 ## 基础架构
 一图胜千言，先来张官方的架构图
 ![Prometheus 架构图](https://prometheus.io/assets/architecture.svg)
 从这个架构图，也可以看出 Prometheus 的主要模块包含， Server,  Exporters, Pushgateway, PromQL, Alertmanager, WebUI 等。
 它大致使用逻辑是这样：
 1. Prometheus server 定期从静态配置的 targets 或者服务发现的 targets 拉取数据。
 2. 当新拉取的数据大于配置内存缓存区的时候，Prometheus 会将数据持久化到磁盘（如果使用 remote storage 将持久化到云端）。
 3. Prometheus 可以配置 rules，然后定时查询数据，当条件触发的时候，会将 alert 推送到配置的 Alertmanager。
 4. Alertmanager 收到警告的时候，可以根据配置，聚合，去重，降噪，最后发送警告。
 5. 可以使用 API， Prometheus Console 或者 Grafana 查询和聚合数据。
 ## 注意
 - Prometheus 的数据是基于时序的 float64 的值，如果你的数据值有更多类型，无法满足。
 - Prometheus 不适合做审计计费，因为它的数据是按一定时间采集的，关注的更多是系统的运行瞬时状态以及趋势，即使有少量数据没有采集也能容忍，但是审计计费需要记录每个请求，并且数据长期存储，这个和 Prometheus 无法满足，可能需要采用专门的审计系统。
 以上介绍来自 https://github.com/songjiayang/prometheus_practice/