Add three new posts
- add cloud native definition - add kubernetes networking - add cloud native local quick startpull/248/head
parent
3f48c6b10a
commit
2f7f167af8
|
@ -7,7 +7,9 @@
|
|||
|
||||
## 云原生
|
||||
|
||||
* [云原生的定义](cloud-native/cloud-native-definition.md)
|
||||
* [Play with Kubernetes](cloud-native/play-with-kubernetes.md)
|
||||
* [快速部署一个云原生本地实验环境](cloud-native/kubernetes-quick-start.md)
|
||||
* [Kubernetes与云原生应用概览](cloud-native/kubernetes-and-cloud-native-app-overview.md)
|
||||
* [云原生应用之路——从Kubernetes到Cloud Native](cloud-native/from-kubernetes-to-cloud-native.md)
|
||||
* [云原生编程语言](cloud-native/cloud-native-programming-languages.md)
|
||||
|
@ -24,8 +26,8 @@
|
|||
* [CRI - Container Runtime Interface(容器运行时接口)](concepts/cri.md)
|
||||
* [CNI - Container Network Interface(容器网络接口)](concepts/cni.md)
|
||||
* [CSI - Container Storage Interface(容器存储接口)](concepts/csi.md)
|
||||
* Kubernetes中的网络
|
||||
* [Kubernetes中的网络解析——以flannel为例](concepts/networking.md)
|
||||
* [Kubernetes中的网络](concepts/networking.md)
|
||||
* [Kubernetes中的网络解析——以flannel为例](concepts/flannel.md)
|
||||
* [Kubernetes中的网络解析——以calico为例](concepts/calico.md)
|
||||
* [资源对象与基本概念解析](concepts/objects.md)
|
||||
* [Pod状态与生命周期管理](concepts/pod-state-and-lifecycle.md)
|
||||
|
|
|
@ -0,0 +1,43 @@
|
|||
# 云原生的定义
|
||||
|
||||
[Pivotal](https://pivotal.io/) 是云原生应用的提出者,并推出了 [Pivotal Cloud Foundry](https://pivotal.io/platform) 云原生应用平台和 [Spring](https://spring.io/) 开源 Java 开发框架,成为云原生应用架构中先驱者和探路者。
|
||||
|
||||
## Pivotal最初的定义
|
||||
|
||||
早在2015年Pivotal公司的Matt Stine写了一本叫做[迁移到云原生应用架构](https://jimmysong.io/migrating-to-cloud-native-application-architectures/)的小册子,其中探讨了云原生应用架构的几个主要特征:
|
||||
|
||||
- 符合12因素应用
|
||||
- 面向微服务架构
|
||||
- 自服务敏捷架构
|
||||
- 基于API的协作
|
||||
- 抗脆弱性
|
||||
|
||||
我已于2017年翻译了本书,详见[迁移到云原生应用架构](https://jimmysong.io/migrating-to-cloud-native-application-architectures/)。
|
||||
|
||||
## CNCF最初的定义
|
||||
|
||||
到了2015年Google主导成了云原生计算基金会(CNCF),起初CNCF对云原生(Cloud Native)的定义包含以下三个方面:
|
||||
|
||||
- 应用容器化
|
||||
- 面向微服务架构
|
||||
- 应用支持容器的编排调度
|
||||
|
||||
## 重定义
|
||||
|
||||
到了2018年,而随着仅几年来云原生生态的不断状态,所有主流云计算供应商都加入了该基金会,且从[Cloud Native Landscape](https://i.cncf.io)中可以看出云原生有意蚕食原先非云原生应用的部分。CNCF基金会中的会员以及容纳的项目越来越多,该定义已经限制了云原生生态的发展,CNCF为云原生进行了重新定位。
|
||||
|
||||
以下是CNCF对云原生的重新定义(中英对照):
|
||||
|
||||
> Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.
|
||||
|
||||
云原生技术帮助公司和机构在公有云、私有云和混合云等新型动态环境中,构建和运行可弹性扩展的应用。云原生的代表技术包括容器、服务网格、微服务、不可变基础设施和声明式API。
|
||||
|
||||
> These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.
|
||||
|
||||
这些技术能够构建容错性好、易于管理和便于观察的松耦合系统。结合可靠的自动化手段,云原生技术可以使开发者轻松地对系统进行频繁并可预测的重大变更。
|
||||
|
||||
> The Cloud Native Computing Foundation seeks to drive adoption of this paradigm by fostering and sustaining an ecosystem of open source, vendor-neutral projects. We democratize state-of-the-art patterns to make these innovations accessible for everyone.
|
||||
|
||||
云原生计算基金会(CNCF)致力于培育和维护一个厂商中立的开源生态系统,来推广云原生技术。我们通过将最前沿的模式普惠,让这些创新为大众所用。
|
||||
|
||||
**注**:该定义的中文译本还为正式确定,详见[Cloud Native Definition in Chinese](https://docs.google.com/document/d/1-QhD9UeOqEBxaEiXrFP89EhG5K-MFJ4vsfNZOg6E9co/)。
|
|
@ -0,0 +1,307 @@
|
|||
# 快速部署一个云原生本地实验环境
|
||||
|
||||
本文旨在帮助您快速部署一个云原生本地实验环境,让您可以基本不需要任何Kubernetes和云原生技术的基础就可以对云原生环境一探究竟。
|
||||
|
||||
另外本环境也可以作为一个Kubernetes及其它云原生应用的测试与演示环境。
|
||||
|
||||
|
||||
|
||||
## 准备环境
|
||||
|
||||
需要准备以下软件和环境:
|
||||
|
||||
- 8G以上内存
|
||||
- [Vagrant 2.0+](https://www.vagrantup.com/)
|
||||
- [VirtualBox 5.0 +](https://www.virtualbox.org/wiki/Downloads)
|
||||
- 提前下载kubernetes1.9.1以上版本的release压缩包,[至百度网盘下载](https://pan.baidu.com/s/1zkg2xEAedvZHObmTHDFChg)
|
||||
- Mac/Linux,**不支持Windows**
|
||||
|
||||
## 集群
|
||||
|
||||
我们使用Vagrant和Virtualbox安装包含3个节点的kubernetes集群,其中master节点同时作为node节点。
|
||||
|
||||
| IP | 主机名 | 组件 |
|
||||
| ------------ | ------ | ------------------------------------------------------------ |
|
||||
| 172.17.8.101 | node1 | kube-apiserver、kube-controller-manager、kube-scheduler、etcd、kubelet、docker、flannel、dashboard |
|
||||
| 172.17.8.102 | node2 | kubelet、docker、flannel、traefik |
|
||||
| 172.17.8.103 | node3 | kubelet、docker、flannel |
|
||||
|
||||
**注意**:以上的IP、主机名和组件都是固定在这些节点的,即使销毁后下次使用vagrant重建依然保持不变。
|
||||
|
||||
容器IP范围:172.33.0.0/30
|
||||
|
||||
Kubernetes service IP范围:10.254.0.0/16
|
||||
|
||||
## 安装的组件
|
||||
|
||||
安装完成后的集群包含以下组件:
|
||||
|
||||
- flannel(`host-gw`模式)
|
||||
- kubernetes dashboard 1.8.2
|
||||
- etcd(单节点)
|
||||
- kubectl
|
||||
- CoreDNS
|
||||
- kubernetes(版本根据下载的kubernetes安装包而定)
|
||||
|
||||
**可选插件**
|
||||
|
||||
- Heapster + InfluxDB + Grafana
|
||||
- ElasticSearch + Fluentd + Kibana
|
||||
- Istio service mesh
|
||||
|
||||
## 使用说明
|
||||
|
||||
确保安装好以上的准备环境后,执行下列命令启动kubernetes集群:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/rootsongjc/kubernetes-vagrant-centos-cluster.git
|
||||
cd kubernetes-vagrant-centos-cluster
|
||||
vagrant up
|
||||
```
|
||||
|
||||
**注意**:克隆完Git仓库后,需要提前下载kubernetes的压缩包到`kubenetes-vagrant-centos-cluster`目录下,包括如下两个文件:
|
||||
|
||||
- kubernetes-client-linux-amd64.tar.gz
|
||||
- kubernetes-server-linux-amd64.tar.gz
|
||||
|
||||
如果是首次部署,会自动下载`centos/7`的box,这需要花费一些时间,另外每个节点还需要下载安装一系列软件包,整个过程大概需要10几分钟。
|
||||
|
||||
如果您在运行`vagrant up`的过程中发现无法下载`centos/7`的box,可以手动下载后将其添加到vagrant中。
|
||||
|
||||
**手动添加centos/7 box**
|
||||
|
||||
```bash
|
||||
wget -c http://cloud.centos.org/centos/7/vagrant/x86_64/images/CentOS-7-x86_64-Vagrant-1801_02.VirtualBox.box
|
||||
vagrant box add CentOS-7-x86_64-Vagrant-1801_02.VirtualBox.box --name centos/7
|
||||
```
|
||||
|
||||
这样下次运行`vagrant up`的时候就会自动读取本地的`centos/7` box而不会再到网上下载。
|
||||
|
||||
### 访问kubernetes集群
|
||||
|
||||
访问Kubernetes集群的方式有三种:
|
||||
|
||||
- 本地访问
|
||||
- 在VM内部访问
|
||||
- kubernetes dashboard
|
||||
|
||||
**通过本地访问**
|
||||
|
||||
可以直接在你自己的本地环境中操作该kubernetes集群,而无需登录到虚拟机中,执行以下步骤:
|
||||
|
||||
将`conf/admin.kubeconfig`文件放到`~/.kube/config`目录下即可在本地使用`kubectl`命令操作集群。
|
||||
|
||||
```bash
|
||||
mkdir -p ~/.kube
|
||||
cp conf/admin.kubeconfig ~/.kube/config
|
||||
```
|
||||
|
||||
我们推荐您使用这种方式。
|
||||
|
||||
**在虚拟机内部访问**
|
||||
|
||||
如果有任何问题可以登录到虚拟机内部调试:
|
||||
|
||||
```bash
|
||||
vagrant ssh node1
|
||||
sudo -i
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
**Kubernetes dashboard**
|
||||
|
||||
还可以直接通过dashboard UI来访问:https://172.17.8.101:8443
|
||||
|
||||
可以在本地执行以下命令获取token的值(需要提前安装kubectl):
|
||||
|
||||
```bash
|
||||
kubectl -n kube-system describe secret `kubectl -n kube-system get secret|grep admin-token|cut -d " " -f1`|grep "token:"|tr -s " "|cut -d " " -f2
|
||||
```
|
||||
|
||||
**注意**:token的值也可以在`vagrant up`的日志的最后看到。
|
||||
|
||||
**Heapster监控**
|
||||
|
||||
创建Heapster监控:
|
||||
|
||||
```bash
|
||||
kubectl apply -f addon/heapster/
|
||||
```
|
||||
|
||||
访问Grafana
|
||||
|
||||
使用Ingress方式暴露的服务,在本地`/etc/hosts`中增加一条配置:
|
||||
|
||||
```ini
|
||||
172.17.8.102 grafana.jimmysong.io
|
||||
```
|
||||
|
||||
访问Grafana:<http://grafana.jimmysong.io>
|
||||
|
||||
**Traefik**
|
||||
|
||||
部署Traefik ingress controller和增加ingress配置:
|
||||
|
||||
```bash
|
||||
kubectl apply -f addon/traefik-ingress
|
||||
```
|
||||
|
||||
在本地`/etc/hosts`中增加一条配置:
|
||||
|
||||
```ini
|
||||
172.17.8.102 traefik.jimmysong.io
|
||||
```
|
||||
|
||||
访问Traefik UI:<http://traefik.jimmysong.io>
|
||||
|
||||
**EFK**
|
||||
|
||||
使用EFK做日志收集。
|
||||
|
||||
```bash
|
||||
kubectl apply -f addon/efk/
|
||||
```
|
||||
|
||||
**注意**:运行EFK的每个节点需要消耗很大的CPU和内存,请保证每台虚拟机至少分配了4G内存。
|
||||
|
||||
**Helm**
|
||||
|
||||
用来部署helm。
|
||||
|
||||
```bash
|
||||
hack/deploy-helm.sh
|
||||
```
|
||||
|
||||
### Service Mesh
|
||||
|
||||
我们使用 [istio](https://istio.io) 作为 service mesh。
|
||||
|
||||
**安装**
|
||||
|
||||
```bash
|
||||
kubectl apply -f addon/istio/
|
||||
```
|
||||
|
||||
**运行示例**
|
||||
|
||||
```bash
|
||||
kubectl apply -n default -f <(istioctl kube-inject -f yaml/istio-bookinfo/bookinfo.yaml)
|
||||
istioctl create -f yaml/istio-bookinfo/bookinfo-gateway.yaml
|
||||
```
|
||||
|
||||
在您自己的本地主机的`/etc/hosts`文件中增加如下配置项。
|
||||
|
||||
```
|
||||
172.17.8.102 grafana.istio.jimmysong.io
|
||||
172.17.8.102 servicegraph.istio.jimmysong.io
|
||||
```
|
||||
|
||||
我们可以通过下面的URL地址访问以上的服务。
|
||||
|
||||
| Service | URL |
|
||||
| ------------ | ------------------------------------------------------------ |
|
||||
| grafana | http://grafana.istio.jimmysong.io |
|
||||
| servicegraph | http://servicegraph.istio.jimmysong.io/dotviz>, <http://servicegraph.istio.jimmysong.io/graph>,http://servicegraph.istio.jimmysong.io/force/forcegraph.html |
|
||||
| tracing | http://172.17.8.101:$JAEGER_PORT |
|
||||
| productpage | http://172.17.8.101:$GATEWAY_PORT/productpage |
|
||||
|
||||
**注意**:`JAEGER_PORT`可以通过`kubectl -n istio-system get svc tracing -o jsonpath='{.spec.ports[0].nodePort}'`获取,`GATEWAY_PORT`可以通过`kubectl -n istio-system get svc istio-ingressgateway -o jsonpath='{.spec.ports[0].nodePort}'`获取。
|
||||
|
||||
详细信息请参阅 https://istio.io/docs/guides/bookinfo.html
|
||||
|
||||
### Vistio
|
||||
|
||||
[Vizceral](https://github.com/Netflix/vizceral)是Netflix发布的一个开源项目,用于近乎实时地监控应用程序和集群之间的网络流量。Vistio是使用Vizceral对Istio和网格监控的改进。它利用Istio Mixer生成的指标,然后将其输入Prometheus。Vistio查询Prometheus并将数据存储在本地以允许重播流量。
|
||||
|
||||
```bash
|
||||
# Deploy vistio via kubectl
|
||||
kubectl apply -f addon/vistio/
|
||||
|
||||
# Expose vistio-api
|
||||
kubectl -n default port-forward $(kubectl -n default get pod -l app=vistio-api -o jsonpath='{.items[0].metadata.name}') 9091:9091 &
|
||||
|
||||
# Expose vistio in another terminal window
|
||||
kubectl -n default port-forward $(kubectl -n default get pod -l app=vistio-web -o jsonpath='{.items[0].metadata.name}') 8080:8080 &
|
||||
```
|
||||
|
||||
如果一切都已经启动并准备就绪,您就可以访问Vistio UI,开始探索服务网格网络,访问[http://localhost:8080](http://localhost:8080/) 您将会看到类似下图的输出。
|
||||
|
||||
![vistio视图动画](https://github.com/rootsongjc/kubernetes-vagrant-centos-cluster/raw/master/images/vistio-animation.gif)
|
||||
|
||||
更多详细内容请参考[Vistio—使用Netflix的Vizceral可视化Istio service mesh](https://servicemesher.github.io/blog/vistio-visualize-your-istio-mesh-using-netflixs-vizceral/)。
|
||||
|
||||
## 管理
|
||||
|
||||
除了特别说明,以下命令都在当前的repo目录下操作。
|
||||
|
||||
### 挂起
|
||||
|
||||
将当前的虚拟机挂起,以便下次恢复。
|
||||
|
||||
```bash
|
||||
vagrant suspend
|
||||
```
|
||||
|
||||
### 恢复
|
||||
|
||||
恢复虚拟机的上次状态。
|
||||
|
||||
```bash
|
||||
vagrant resume
|
||||
```
|
||||
|
||||
注意:我们每次挂起虚拟机后再重新启动它们的时候,看到的虚拟机中的时间依然是挂载时候的时间,这样将导致监控查看起来比较麻烦。因此请考虑先停机再重新启动虚拟机。
|
||||
|
||||
### 重启
|
||||
|
||||
停机后重启启动。
|
||||
|
||||
```bash
|
||||
vagrant halt
|
||||
vagrant up
|
||||
# login to node1
|
||||
vagrant ssh node1
|
||||
# run the prosivision scripts
|
||||
/vagrant/hack/k8s-init.sh
|
||||
exit
|
||||
# login to node2
|
||||
vagrant ssh node2
|
||||
# run the prosivision scripts
|
||||
/vagrant/hack/k8s-init.sh
|
||||
exit
|
||||
# login to node3
|
||||
vagrant ssh node3
|
||||
# run the prosivision scripts
|
||||
/vagrant/hack/k8s-init.sh
|
||||
sudo -i
|
||||
cd /vagrant/hack
|
||||
./deploy-base-services.sh
|
||||
exit
|
||||
```
|
||||
|
||||
现在你已经拥有一个完整的基础的kubernetes运行环境,在该repo的根目录下执行下面的命令可以获取kubernetes dahsboard的admin用户的token。
|
||||
|
||||
```bash
|
||||
hack/get-dashboard-token.sh
|
||||
```
|
||||
|
||||
根据提示登录即可。
|
||||
|
||||
### 清理
|
||||
|
||||
清理虚拟机。
|
||||
|
||||
```bash
|
||||
vagrant destroy
|
||||
rm -rf .vagrant
|
||||
```
|
||||
|
||||
### 注意
|
||||
|
||||
仅做开发测试使用,不要在生产环境使用该项目。
|
||||
|
||||
## 参考
|
||||
|
||||
- [Kubernetes handbook - jimmysong.io](https://jimmysong.io/kubernetes-handbook)
|
||||
- [duffqiu/centos-vagrant](https://github.com/duffqiu/centos-vagrant)
|
||||
- [Kubernetes 1.8 kube-proxy 开启 ipvs](https://mritd.me/2017/10/10/kube-proxy-use-ipvs-on-kubernetes-1.8/#%E4%B8%80%E7%8E%AF%E5%A2%83%E5%87%86%E5%A4%87)
|
|
@ -476,6 +476,3 @@ Spark原生支持standalone、mesos和YARN资源调度,现已支持Kubernetes
|
|||
* [Cloud Native Go - 已由电子工业出版社出版](https://jimmysong.io/cloud-native-go)
|
||||
* [Cloud Native Python - 将由电子工业出版社出版](https://jimmysong.io/posts/cloud-native-python)
|
||||
* [Istio Service Mesh 中文文档](http://istio.doczh.cn/)
|
||||
|
||||
|
||||
|
||||
|
|
|
@ -0,0 +1,394 @@
|
|||
### Kubernetes中的网络解析——以flannel为例
|
||||
|
||||
我们当初使用[kubernetes-vagrant-centos-cluster](https://github.com/rootsongjc/kubernetes-vagrant-centos-cluster)安装了拥有三个节点的kubernetes集群,节点的状态如下所述。
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# kubectl get nodes -o wide
|
||||
NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
|
||||
node1 Ready <none> 2d v1.9.1 <none> CentOS Linux 7 (Core) 3.10.0-693.11.6.el7.x86_64 docker://1.12.6
|
||||
node2 Ready <none> 2d v1.9.1 <none> CentOS Linux 7 (Core) 3.10.0-693.11.6.el7.x86_64 docker://1.12.6
|
||||
node3 Ready <none> 2d v1.9.1 <none> CentOS Linux 7 (Core) 3.10.0-693.11.6.el7.x86_64 docker://1.12.6
|
||||
```
|
||||
|
||||
当前Kubernetes集群中运行的所有Pod信息:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# kubectl get pods --all-namespaces -o wide
|
||||
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
|
||||
kube-system coredns-5984fb8cbb-sjqv9 1/1 Running 0 1h 172.33.68.2 node1
|
||||
kube-system coredns-5984fb8cbb-tkfrc 1/1 Running 1 1h 172.33.96.3 node3
|
||||
kube-system heapster-v1.5.0-684c7f9488-z6sdz 4/4 Running 0 1h 172.33.31.3 node2
|
||||
kube-system kubernetes-dashboard-6b66b8b96c-mnm2c 1/1 Running 0 1h 172.33.31.2 node2
|
||||
kube-system monitoring-influxdb-grafana-v4-54b7854697-tw9cd 2/2 Running 2 1h 172.33.96.2 node3
|
||||
```
|
||||
|
||||
当前etcd中的注册的宿主机的pod地址网段信息:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# etcdctl ls /kube-centos/network/subnets
|
||||
/kube-centos/network/subnets/172.33.68.0-24
|
||||
/kube-centos/network/subnets/172.33.31.0-24
|
||||
/kube-centos/network/subnets/172.33.96.0-24
|
||||
```
|
||||
|
||||
而每个node上的Pod子网是根据我们在安装flannel时配置来划分的,在etcd中查看该配置:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# etcdctl get /kube-centos/network/config
|
||||
{"Network":"172.33.0.0/16","SubnetLen":24,"Backend":{"Type":"host-gw"}}
|
||||
```
|
||||
|
||||
我们知道Kubernetes集群内部存在三类IP,分别是:
|
||||
|
||||
- Node IP:宿主机的IP地址
|
||||
- Pod IP:使用网络插件创建的IP(如flannel),使跨主机的Pod可以互通
|
||||
- Cluster IP:虚拟IP,通过iptables规则访问服务
|
||||
|
||||
在安装node节点的时候,节点上的进程是按照flannel -> docker -> kubelet -> kube-proxy的顺序启动的,我们下面也会按照该顺序来讲解,flannel的网络划分和如何与docker交互,如何通过iptables访问service。
|
||||
|
||||
### Flannel
|
||||
|
||||
Flannel是作为一个二进制文件的方式部署在每个node上,主要实现两个功能:
|
||||
|
||||
- 为每个node分配subnet,容器将自动从该子网中获取IP地址
|
||||
- 当有node加入到网络中时,为每个node增加路由配置
|
||||
|
||||
下面是使用`host-gw` backend的flannel网络架构图:
|
||||
|
||||
![flannel网络架构(图片来自openshift)](../images/flannel-networking.png)
|
||||
|
||||
**注意**:以上IP非本示例中的IP,但是不影响读者理解。
|
||||
|
||||
Node1上的flannel配置如下:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# cat /usr/lib/systemd/system/flanneld.service
|
||||
[Unit]
|
||||
Description=Flanneld overlay address etcd agent
|
||||
After=network.target
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
After=etcd.service
|
||||
Before=docker.service
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
EnvironmentFile=/etc/sysconfig/flanneld
|
||||
EnvironmentFile=-/etc/sysconfig/docker-network
|
||||
ExecStart=/usr/bin/flanneld-start $FLANNEL_OPTIONS
|
||||
ExecStartPost=/usr/libexec/flannel/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker
|
||||
Restart=on-failure
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
RequiredBy=docker.service
|
||||
```
|
||||
|
||||
其中有两个环境变量文件的配置如下:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# cat /etc/sysconfig/flanneld
|
||||
# Flanneld configuration options
|
||||
FLANNEL_ETCD_ENDPOINTS="http://172.17.8.101:2379"
|
||||
FLANNEL_ETCD_PREFIX="/kube-centos/network"
|
||||
FLANNEL_OPTIONS="-iface=eth2"
|
||||
```
|
||||
|
||||
上面的配置文件仅供flanneld使用。
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# cat /etc/sysconfig/docker-network
|
||||
# /etc/sysconfig/docker-network
|
||||
DOCKER_NETWORK_OPTIONS=
|
||||
```
|
||||
|
||||
还有一个`ExecStartPost=/usr/libexec/flannel/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker`,其中的`/usr/libexec/flannel/mk-docker-opts.sh`脚本是在flanneld启动后运行,将会生成两个环境变量配置文件:
|
||||
|
||||
- /run/flannel/docker
|
||||
- /run/flannel/subnet.env
|
||||
|
||||
我们再来看下`/run/flannel/docker`的配置。
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# cat /run/flannel/docker
|
||||
DOCKER_OPT_BIP="--bip=172.33.68.1/24"
|
||||
DOCKER_OPT_IPMASQ="--ip-masq=true"
|
||||
DOCKER_OPT_MTU="--mtu=1500"
|
||||
DOCKER_NETWORK_OPTIONS=" --bip=172.33.68.1/24 --ip-masq=true --mtu=1500"
|
||||
```
|
||||
|
||||
如果你使用`systemctl`命令先启动flannel后启动docker的话,docker将会读取以上环境变量。
|
||||
|
||||
我们再来看下`/run/flannel/subnet.env`的配置。
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# cat /run/flannel/subnet.env
|
||||
FLANNEL_NETWORK=172.33.0.0/16
|
||||
FLANNEL_SUBNET=172.33.68.1/24
|
||||
FLANNEL_MTU=1500
|
||||
FLANNEL_IPMASQ=false
|
||||
```
|
||||
|
||||
以上环境变量是flannel向etcd中注册的。
|
||||
|
||||
### Docker
|
||||
|
||||
Node1的docker配置如下:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# cat /usr/lib/systemd/system/docker.service
|
||||
[Unit]
|
||||
Description=Docker Application Container Engine
|
||||
Documentation=http://docs.docker.com
|
||||
After=network.target rhel-push-plugin.socket registries.service
|
||||
Wants=docker-storage-setup.service
|
||||
Requires=docker-cleanup.timer
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
NotifyAccess=all
|
||||
EnvironmentFile=-/run/containers/registries.conf
|
||||
EnvironmentFile=-/etc/sysconfig/docker
|
||||
EnvironmentFile=-/etc/sysconfig/docker-storage
|
||||
EnvironmentFile=-/etc/sysconfig/docker-network
|
||||
Environment=GOTRACEBACK=crash
|
||||
Environment=DOCKER_HTTP_HOST_COMPAT=1
|
||||
Environment=PATH=/usr/libexec/docker:/usr/bin:/usr/sbin
|
||||
ExecStart=/usr/bin/dockerd-current \
|
||||
--add-runtime docker-runc=/usr/libexec/docker/docker-runc-current \
|
||||
--default-runtime=docker-runc \
|
||||
--exec-opt native.cgroupdriver=systemd \
|
||||
--userland-proxy-path=/usr/libexec/docker/docker-proxy-current \
|
||||
$OPTIONS \
|
||||
$DOCKER_STORAGE_OPTIONS \
|
||||
$DOCKER_NETWORK_OPTIONS \
|
||||
$ADD_REGISTRY \
|
||||
$BLOCK_REGISTRY \
|
||||
$INSECURE_REGISTRY\
|
||||
$REGISTRIES
|
||||
ExecReload=/bin/kill -s HUP $MAINPID
|
||||
LimitNOFILE=1048576
|
||||
LimitNPROC=1048576
|
||||
LimitCORE=infinity
|
||||
TimeoutStartSec=0
|
||||
Restart=on-abnormal
|
||||
MountFlags=slave
|
||||
KillMode=process
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
查看Node1上的docker启动参数:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# systemctl status -l docker
|
||||
● docker.service - Docker Application Container Engine
|
||||
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
|
||||
Drop-In: /usr/lib/systemd/system/docker.service.d
|
||||
└─flannel.conf
|
||||
Active: active (running) since Fri 2018-02-02 22:52:43 CST; 2h 28min ago
|
||||
Docs: http://docs.docker.com
|
||||
Main PID: 4334 (dockerd-current)
|
||||
CGroup: /system.slice/docker.service
|
||||
‣ 4334 /usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --selinux-enabled --log-driver=journald --signature-verification=false --bip=172.33.68.1/24 --ip-masq=true --mtu=1500
|
||||
```
|
||||
|
||||
我们可以看到在docker在启动时有如下参数:`--bip=172.33.68.1/24 --ip-masq=true --mtu=1500`。上述参数flannel启动时运行的脚本生成的,通过环境变量传递过来的。
|
||||
|
||||
我们查看下node1宿主机上的网络接口:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# ip addr
|
||||
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
|
||||
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
|
||||
inet 127.0.0.1/8 scope host lo
|
||||
valid_lft forever preferred_lft forever
|
||||
inet6 ::1/128 scope host
|
||||
valid_lft forever preferred_lft forever
|
||||
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
|
||||
link/ether 52:54:00:00:57:32 brd ff:ff:ff:ff:ff:ff
|
||||
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic eth0
|
||||
valid_lft 85095sec preferred_lft 85095sec
|
||||
inet6 fe80::5054:ff:fe00:5732/64 scope link
|
||||
valid_lft forever preferred_lft forever
|
||||
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
|
||||
link/ether 08:00:27:7b:0f:b1 brd ff:ff:ff:ff:ff:ff
|
||||
inet 172.17.8.101/24 brd 172.17.8.255 scope global eth1
|
||||
valid_lft forever preferred_lft forever
|
||||
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
|
||||
link/ether 08:00:27:ef:25:06 brd ff:ff:ff:ff:ff:ff
|
||||
inet 172.30.113.231/21 brd 172.30.119.255 scope global dynamic eth2
|
||||
valid_lft 85096sec preferred_lft 85096sec
|
||||
inet6 fe80::a00:27ff:feef:2506/64 scope link
|
||||
valid_lft forever preferred_lft forever
|
||||
5: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
|
||||
link/ether 02:42:d0:ae:80:ea brd ff:ff:ff:ff:ff:ff
|
||||
inet 172.33.68.1/24 scope global docker0
|
||||
valid_lft forever preferred_lft forever
|
||||
inet6 fe80::42:d0ff:feae:80ea/64 scope link
|
||||
valid_lft forever preferred_lft forever
|
||||
7: veth295bef2@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP
|
||||
link/ether 6a:72:d7:9f:29:19 brd ff:ff:ff:ff:ff:ff link-netnsid 0
|
||||
inet6 fe80::6872:d7ff:fe9f:2919/64 scope link
|
||||
valid_lft forever preferred_lft forever
|
||||
```
|
||||
|
||||
我们分类来解释下该虚拟机中的网络接口。
|
||||
|
||||
- lo:回环网络,127.0.0.1
|
||||
- eth0:NAT网络,虚拟机创建时自动分配,仅可以在几台虚拟机之间访问
|
||||
- eth1:bridge网络,使用vagrant分配给虚拟机的地址,虚拟机之间和本地电脑都可以访问
|
||||
- eth2:bridge网络,使用DHCP分配,用于访问互联网的网卡
|
||||
- docker0:bridge网络,docker默认使用的网卡,作为该节点上所有容器的虚拟交换机
|
||||
- veth295bef2@if6:veth pair,连接docker0和Pod中的容器。veth pair可以理解为使用网线连接好的两个接口,把两个端口放到两个namespace中,那么这两个namespace就能打通。参考[linux 网络虚拟化: network namespace 简介](http://cizixs.com/2017/02/10/network-virtualization-network-namespace)。
|
||||
|
||||
我们再看下该节点的docker上有哪些网络。
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# docker network ls
|
||||
NETWORK ID NAME DRIVER SCOPE
|
||||
940bb75e653b bridge bridge local
|
||||
d94c046e105d host host local
|
||||
2db7597fd546 none null local
|
||||
```
|
||||
|
||||
再检查下bridge网络`940bb75e653b`的信息。
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# docker network inspect 940bb75e653b
|
||||
[
|
||||
{
|
||||
"Name": "bridge",
|
||||
"Id": "940bb75e653bfa10dab4cce8813c2b3ce17501e4e4935f7dc13805a61b732d2c",
|
||||
"Scope": "local",
|
||||
"Driver": "bridge",
|
||||
"EnableIPv6": false,
|
||||
"IPAM": {
|
||||
"Driver": "default",
|
||||
"Options": null,
|
||||
"Config": [
|
||||
{
|
||||
"Subnet": "172.33.68.1/24",
|
||||
"Gateway": "172.33.68.1"
|
||||
}
|
||||
]
|
||||
},
|
||||
"Internal": false,
|
||||
"Containers": {
|
||||
"944d4aa660e30e1be9a18d30c9dcfa3b0504d1e5dbd00f3004b76582f1c9a85b": {
|
||||
"Name": "k8s_POD_coredns-5984fb8cbb-sjqv9_kube-system_c5a2e959-082a-11e8-b4cd-525400005732_0",
|
||||
"EndpointID": "7397d7282e464fc4ec5756d6b328df889cdf46134dbbe3753517e175d3844a85",
|
||||
"MacAddress": "02:42:ac:21:44:02",
|
||||
"IPv4Address": "172.33.68.2/24",
|
||||
"IPv6Address": ""
|
||||
}
|
||||
},
|
||||
"Options": {
|
||||
"com.docker.network.bridge.default_bridge": "true",
|
||||
"com.docker.network.bridge.enable_icc": "true",
|
||||
"com.docker.network.bridge.enable_ip_masquerade": "true",
|
||||
"com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
|
||||
"com.docker.network.bridge.name": "docker0",
|
||||
"com.docker.network.driver.mtu": "1500"
|
||||
},
|
||||
"Labels": {}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
我们可以看到该网络中的`Config`与docker的启动配置相符。
|
||||
|
||||
Node1上运行的容器:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# docker ps
|
||||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
|
||||
a37407a234dd docker.io/coredns/coredns@sha256:adf2e5b4504ef9ffa43f16010bd064273338759e92f6f616dd159115748799bc "/coredns -conf /etc/" About an hour ago Up About an hour k8s_coredns_coredns-5984fb8cbb-sjqv9_kube-system_c5a2e959-082a-11e8-b4cd-525400005732_0
|
||||
944d4aa660e3 docker.io/openshift/origin-pod "/usr/bin/pod" About an hour ago Up About an hour k8s_POD_coredns-5984fb8cbb-sjqv9_kube-system_c5a2e959-082a-11e8-b4cd-525400005732_0
|
||||
```
|
||||
|
||||
我们可以看到当前已经有2个容器在运行。
|
||||
|
||||
Node1上的路由信息:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# route -n
|
||||
Kernel IP routing table
|
||||
Destination Gateway Genmask Flags Metric Ref Use Iface
|
||||
0.0.0.0 10.0.2.2 0.0.0.0 UG 100 0 0 eth0
|
||||
0.0.0.0 172.30.116.1 0.0.0.0 UG 101 0 0 eth2
|
||||
10.0.2.0 0.0.0.0 255.255.255.0 U 100 0 0 eth0
|
||||
172.17.8.0 0.0.0.0 255.255.255.0 U 100 0 0 eth1
|
||||
172.30.112.0 0.0.0.0 255.255.248.0 U 100 0 0 eth2
|
||||
172.33.68.0 0.0.0.0 255.255.255.0 U 0 0 0 docker0
|
||||
172.33.96.0 172.30.118.65 255.255.255.0 UG 0 0 0 eth2
|
||||
```
|
||||
|
||||
以上路由信息是由flannel添加的,当有新的节点加入到Kubernetes集群中后,每个节点上的路由表都将增加。
|
||||
|
||||
我们在node上来`traceroute`下node3上的`coredns-5984fb8cbb-tkfrc`容器,其IP地址是`172.33.96.3`,看看其路由信息。
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# traceroute 172.33.96.3
|
||||
traceroute to 172.33.96.3 (172.33.96.3), 30 hops max, 60 byte packets
|
||||
1 172.30.118.65 (172.30.118.65) 0.518 ms 0.367 ms 0.398 ms
|
||||
2 172.33.96.3 (172.33.96.3) 0.451 ms 0.352 ms 0.223 ms
|
||||
```
|
||||
|
||||
我们看到路由直接经过node3的公网IP后就到达了node3节点上的Pod。
|
||||
|
||||
Node1的iptables信息:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# iptables -L
|
||||
Chain INPUT (policy ACCEPT)
|
||||
target prot opt source destination
|
||||
KUBE-FIREWALL all -- anywhere anywhere
|
||||
KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */
|
||||
|
||||
Chain FORWARD (policy ACCEPT)
|
||||
target prot opt source destination
|
||||
KUBE-FORWARD all -- anywhere anywhere /* kubernetes forward rules */
|
||||
DOCKER-ISOLATION all -- anywhere anywhere
|
||||
DOCKER all -- anywhere anywhere
|
||||
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
|
||||
ACCEPT all -- anywhere anywhere
|
||||
ACCEPT all -- anywhere anywhere
|
||||
|
||||
Chain OUTPUT (policy ACCEPT)
|
||||
target prot opt source destination
|
||||
KUBE-FIREWALL all -- anywhere anywhere
|
||||
KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */
|
||||
|
||||
Chain DOCKER (1 references)
|
||||
target prot opt source destination
|
||||
|
||||
Chain DOCKER-ISOLATION (1 references)
|
||||
target prot opt source destination
|
||||
RETURN all -- anywhere anywhere
|
||||
|
||||
Chain KUBE-FIREWALL (2 references)
|
||||
target prot opt source destination
|
||||
DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
|
||||
|
||||
Chain KUBE-FORWARD (1 references)
|
||||
target prot opt source destination
|
||||
ACCEPT all -- anywhere anywhere /* kubernetes forwarding rules */ mark match 0x4000/0x4000
|
||||
ACCEPT all -- 10.254.0.0/16 anywhere /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
|
||||
ACCEPT all -- anywhere 10.254.0.0/16 /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED
|
||||
|
||||
Chain KUBE-SERVICES (2 references)
|
||||
target prot opt source destination
|
||||
```
|
||||
|
||||
从上面的iptables中可以看到注入了很多Kuberentes service的规则,请参考[iptables 规则](https://www.cnyunwei.cc/archives/393)获取更多详细信息。
|
||||
|
||||
## 参考
|
||||
|
||||
- [coreos/flannel - github.com](https://github.com/coreos/flannel)
|
||||
- [linux 网络虚拟化: network namespace 简介](http://cizixs.com/2017/02/10/network-virtualization-network-namespace)
|
||||
- [Linux虚拟网络设备之veth](https://segmentfault.com/a/1190000009251098)
|
||||
- [iptables 规则](https://www.cnyunwei.cc/archives/393)
|
||||
- [flannel host-gw network](http://hustcat.github.io/flannel-host-gw-network/)
|
||||
- [flannel - openshift.com](https://docs.openshift.com/container-platform/3.4/architecture/additional_concepts/flannel.html)
|
|
@ -1,394 +1,33 @@
|
|||
### Kubernetes中的网络解析——以flannel为例
|
||||
# Kubernetes中的网络
|
||||
|
||||
我们当初使用[kubernetes-vagrant-centos-cluster](https://github.com/rootsongjc/kubernetes-vagrant-centos-cluster)安装了拥有三个节点的kubernetes集群,节点的状态如下所述。
|
||||
Kubernetes中的网络可以说对初次接触Kubernetes或者没有网络方面经验的人来说可能是其中最难的部分。Kubernetes本身并不提供网络功能,只是把网络接口开放出来,通过插件的形式实现。
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# kubectl get nodes -o wide
|
||||
NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
|
||||
node1 Ready <none> 2d v1.9.1 <none> CentOS Linux 7 (Core) 3.10.0-693.11.6.el7.x86_64 docker://1.12.6
|
||||
node2 Ready <none> 2d v1.9.1 <none> CentOS Linux 7 (Core) 3.10.0-693.11.6.el7.x86_64 docker://1.12.6
|
||||
node3 Ready <none> 2d v1.9.1 <none> CentOS Linux 7 (Core) 3.10.0-693.11.6.el7.x86_64 docker://1.12.6
|
||||
```
|
||||
## 网络要解决的问题
|
||||
|
||||
当前Kubernetes集群中运行的所有Pod信息:
|
||||
既然Kubernetes中将容器的联网通过插件的方式来实现,那么该如何解决容器的联网问题呢?
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# kubectl get pods --all-namespaces -o wide
|
||||
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
|
||||
kube-system coredns-5984fb8cbb-sjqv9 1/1 Running 0 1h 172.33.68.2 node1
|
||||
kube-system coredns-5984fb8cbb-tkfrc 1/1 Running 1 1h 172.33.96.3 node3
|
||||
kube-system heapster-v1.5.0-684c7f9488-z6sdz 4/4 Running 0 1h 172.33.31.3 node2
|
||||
kube-system kubernetes-dashboard-6b66b8b96c-mnm2c 1/1 Running 0 1h 172.33.31.2 node2
|
||||
kube-system monitoring-influxdb-grafana-v4-54b7854697-tw9cd 2/2 Running 2 1h 172.33.96.2 node3
|
||||
```
|
||||
如果您在本地单台机器上运行docker容器的话会注意到所有容器都会处在`docker0`网桥自动分配的一个网络IP段内(172.17.0.1/16)。该值可以通过docker启动参数`--bip`来设置。这样所有本地的所有的容器都拥有了一个IP地址,而且还是在一个网段内彼此就可以互相通信了。
|
||||
|
||||
当前etcd中的注册的宿主机的pod地址网段信息:
|
||||
但是Kubernetes管理的是集群,Kubernetes中的网络要解决的核心问题就是每台主机的IP地址网段划分,以及单个容器的IP地址分配。概括为:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# etcdctl ls /kube-centos/network/subnets
|
||||
/kube-centos/network/subnets/172.33.68.0-24
|
||||
/kube-centos/network/subnets/172.33.31.0-24
|
||||
/kube-centos/network/subnets/172.33.96.0-24
|
||||
```
|
||||
- 保证每个Pod拥有一个集群内唯一的IP地址
|
||||
- 保证不同节点的IP地址划分不会重复
|
||||
- 保证跨节点的Pod可以互相通信
|
||||
- 保证不同节点的Pod可以与跨节点的主机互相通信
|
||||
|
||||
而每个node上的Pod子网是根据我们在安装flannel时配置来划分的,在etcd中查看该配置:
|
||||
为了解决该问题,出现了一系列开源的Kubernetes中的网络插件与方案,如:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# etcdctl get /kube-centos/network/config
|
||||
{"Network":"172.33.0.0/16","SubnetLen":24,"Backend":{"Type":"host-gw"}}
|
||||
```
|
||||
- flannel
|
||||
- calico
|
||||
- contiv
|
||||
- weave net
|
||||
- kube-router
|
||||
- cilium
|
||||
- canal
|
||||
|
||||
我们知道Kubernetes集群内部存在三类IP,分别是:
|
||||
还有很多就不一一列举了,只要实现Kubernetes官方的设计的[CNI - Container Network Interface(容器网络接口)](cni.md)就可以自己写一个网络插件。
|
||||
|
||||
- Node IP:宿主机的IP地址
|
||||
- Pod IP:使用网络插件创建的IP(如flannel),使跨主机的Pod可以互通
|
||||
- Cluster IP:虚拟IP,通过iptables规则访问服务
|
||||
下面仅以当前最常用的flannel和calico插件为例解析。
|
||||
|
||||
在安装node节点的时候,节点上的进程是按照flannel -> docker -> kubelet -> kube-proxy的顺序启动的,我们下面也会按照该顺序来讲解,flannel的网络划分和如何与docker交互,如何通过iptables访问service。
|
||||
|
||||
### Flannel
|
||||
|
||||
Flannel是作为一个二进制文件的方式部署在每个node上,主要实现两个功能:
|
||||
|
||||
- 为每个node分配subnet,容器将自动从该子网中获取IP地址
|
||||
- 当有node加入到网络中时,为每个node增加路由配置
|
||||
|
||||
下面是使用`host-gw` backend的flannel网络架构图:
|
||||
|
||||
![flannel网络架构(图片来自openshift)](../images/flannel-networking.png)
|
||||
|
||||
**注意**:以上IP非本示例中的IP,但是不影响读者理解。
|
||||
|
||||
Node1上的flannel配置如下:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# cat /usr/lib/systemd/system/flanneld.service
|
||||
[Unit]
|
||||
Description=Flanneld overlay address etcd agent
|
||||
After=network.target
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
After=etcd.service
|
||||
Before=docker.service
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
EnvironmentFile=/etc/sysconfig/flanneld
|
||||
EnvironmentFile=-/etc/sysconfig/docker-network
|
||||
ExecStart=/usr/bin/flanneld-start $FLANNEL_OPTIONS
|
||||
ExecStartPost=/usr/libexec/flannel/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker
|
||||
Restart=on-failure
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
RequiredBy=docker.service
|
||||
```
|
||||
|
||||
其中有两个环境变量文件的配置如下:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# cat /etc/sysconfig/flanneld
|
||||
# Flanneld configuration options
|
||||
FLANNEL_ETCD_ENDPOINTS="http://172.17.8.101:2379"
|
||||
FLANNEL_ETCD_PREFIX="/kube-centos/network"
|
||||
FLANNEL_OPTIONS="-iface=eth2"
|
||||
```
|
||||
|
||||
上面的配置文件仅供flanneld使用。
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# cat /etc/sysconfig/docker-network
|
||||
# /etc/sysconfig/docker-network
|
||||
DOCKER_NETWORK_OPTIONS=
|
||||
```
|
||||
|
||||
还有一个`ExecStartPost=/usr/libexec/flannel/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker`,其中的`/usr/libexec/flannel/mk-docker-opts.sh`脚本是在flanneld启动后运行,将会生成两个环境变量配置文件:
|
||||
|
||||
- /run/flannel/docker
|
||||
- /run/flannel/subnet.env
|
||||
|
||||
我们再来看下`/run/flannel/docker`的配置。
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# cat /run/flannel/docker
|
||||
DOCKER_OPT_BIP="--bip=172.33.68.1/24"
|
||||
DOCKER_OPT_IPMASQ="--ip-masq=true"
|
||||
DOCKER_OPT_MTU="--mtu=1500"
|
||||
DOCKER_NETWORK_OPTIONS=" --bip=172.33.68.1/24 --ip-masq=true --mtu=1500"
|
||||
```
|
||||
|
||||
如果你使用`systemctl`命令先启动flannel后启动docker的话,docker将会读取以上环境变量。
|
||||
|
||||
我们再来看下`/run/flannel/subnet.env`的配置。
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# cat /run/flannel/subnet.env
|
||||
FLANNEL_NETWORK=172.33.0.0/16
|
||||
FLANNEL_SUBNET=172.33.68.1/24
|
||||
FLANNEL_MTU=1500
|
||||
FLANNEL_IPMASQ=false
|
||||
```
|
||||
|
||||
以上环境变量是flannel向etcd中注册的。
|
||||
|
||||
### Docker
|
||||
|
||||
Node1的docker配置如下:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# cat /usr/lib/systemd/system/docker.service
|
||||
[Unit]
|
||||
Description=Docker Application Container Engine
|
||||
Documentation=http://docs.docker.com
|
||||
After=network.target rhel-push-plugin.socket registries.service
|
||||
Wants=docker-storage-setup.service
|
||||
Requires=docker-cleanup.timer
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
NotifyAccess=all
|
||||
EnvironmentFile=-/run/containers/registries.conf
|
||||
EnvironmentFile=-/etc/sysconfig/docker
|
||||
EnvironmentFile=-/etc/sysconfig/docker-storage
|
||||
EnvironmentFile=-/etc/sysconfig/docker-network
|
||||
Environment=GOTRACEBACK=crash
|
||||
Environment=DOCKER_HTTP_HOST_COMPAT=1
|
||||
Environment=PATH=/usr/libexec/docker:/usr/bin:/usr/sbin
|
||||
ExecStart=/usr/bin/dockerd-current \
|
||||
--add-runtime docker-runc=/usr/libexec/docker/docker-runc-current \
|
||||
--default-runtime=docker-runc \
|
||||
--exec-opt native.cgroupdriver=systemd \
|
||||
--userland-proxy-path=/usr/libexec/docker/docker-proxy-current \
|
||||
$OPTIONS \
|
||||
$DOCKER_STORAGE_OPTIONS \
|
||||
$DOCKER_NETWORK_OPTIONS \
|
||||
$ADD_REGISTRY \
|
||||
$BLOCK_REGISTRY \
|
||||
$INSECURE_REGISTRY\
|
||||
$REGISTRIES
|
||||
ExecReload=/bin/kill -s HUP $MAINPID
|
||||
LimitNOFILE=1048576
|
||||
LimitNPROC=1048576
|
||||
LimitCORE=infinity
|
||||
TimeoutStartSec=0
|
||||
Restart=on-abnormal
|
||||
MountFlags=slave
|
||||
KillMode=process
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
查看Node1上的docker启动参数:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# systemctl status -l docker
|
||||
● docker.service - Docker Application Container Engine
|
||||
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
|
||||
Drop-In: /usr/lib/systemd/system/docker.service.d
|
||||
└─flannel.conf
|
||||
Active: active (running) since Fri 2018-02-02 22:52:43 CST; 2h 28min ago
|
||||
Docs: http://docs.docker.com
|
||||
Main PID: 4334 (dockerd-current)
|
||||
CGroup: /system.slice/docker.service
|
||||
‣ 4334 /usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --selinux-enabled --log-driver=journald --signature-verification=false --bip=172.33.68.1/24 --ip-masq=true --mtu=1500
|
||||
```
|
||||
|
||||
我们可以看到在docker在启动时有如下参数:`--bip=172.33.68.1/24 --ip-masq=true --mtu=1500`。上述参数flannel启动时运行的脚本生成的,通过环境变量传递过来的。
|
||||
|
||||
我们查看下node1宿主机上的网络接口:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# ip addr
|
||||
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
|
||||
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
|
||||
inet 127.0.0.1/8 scope host lo
|
||||
valid_lft forever preferred_lft forever
|
||||
inet6 ::1/128 scope host
|
||||
valid_lft forever preferred_lft forever
|
||||
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
|
||||
link/ether 52:54:00:00:57:32 brd ff:ff:ff:ff:ff:ff
|
||||
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic eth0
|
||||
valid_lft 85095sec preferred_lft 85095sec
|
||||
inet6 fe80::5054:ff:fe00:5732/64 scope link
|
||||
valid_lft forever preferred_lft forever
|
||||
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
|
||||
link/ether 08:00:27:7b:0f:b1 brd ff:ff:ff:ff:ff:ff
|
||||
inet 172.17.8.101/24 brd 172.17.8.255 scope global eth1
|
||||
valid_lft forever preferred_lft forever
|
||||
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
|
||||
link/ether 08:00:27:ef:25:06 brd ff:ff:ff:ff:ff:ff
|
||||
inet 172.30.113.231/21 brd 172.30.119.255 scope global dynamic eth2
|
||||
valid_lft 85096sec preferred_lft 85096sec
|
||||
inet6 fe80::a00:27ff:feef:2506/64 scope link
|
||||
valid_lft forever preferred_lft forever
|
||||
5: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
|
||||
link/ether 02:42:d0:ae:80:ea brd ff:ff:ff:ff:ff:ff
|
||||
inet 172.33.68.1/24 scope global docker0
|
||||
valid_lft forever preferred_lft forever
|
||||
inet6 fe80::42:d0ff:feae:80ea/64 scope link
|
||||
valid_lft forever preferred_lft forever
|
||||
7: veth295bef2@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP
|
||||
link/ether 6a:72:d7:9f:29:19 brd ff:ff:ff:ff:ff:ff link-netnsid 0
|
||||
inet6 fe80::6872:d7ff:fe9f:2919/64 scope link
|
||||
valid_lft forever preferred_lft forever
|
||||
```
|
||||
|
||||
我们分类来解释下该虚拟机中的网络接口。
|
||||
|
||||
- lo:回环网络,127.0.0.1
|
||||
- eth0:NAT网络,虚拟机创建时自动分配,仅可以在几台虚拟机之间访问
|
||||
- eth1:bridge网络,使用vagrant分配给虚拟机的地址,虚拟机之间和本地电脑都可以访问
|
||||
- eth2:bridge网络,使用DHCP分配,用于访问互联网的网卡
|
||||
- docker0:bridge网络,docker默认使用的网卡,作为该节点上所有容器的虚拟交换机
|
||||
- veth295bef2@if6:veth pair,连接docker0和Pod中的容器。veth pair可以理解为使用网线连接好的两个接口,把两个端口放到两个namespace中,那么这两个namespace就能打通。参考[linux 网络虚拟化: network namespace 简介](http://cizixs.com/2017/02/10/network-virtualization-network-namespace)。
|
||||
|
||||
我们再看下该节点的docker上有哪些网络。
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# docker network ls
|
||||
NETWORK ID NAME DRIVER SCOPE
|
||||
940bb75e653b bridge bridge local
|
||||
d94c046e105d host host local
|
||||
2db7597fd546 none null local
|
||||
```
|
||||
|
||||
再检查下bridge网络`940bb75e653b`的信息。
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# docker network inspect 940bb75e653b
|
||||
[
|
||||
{
|
||||
"Name": "bridge",
|
||||
"Id": "940bb75e653bfa10dab4cce8813c2b3ce17501e4e4935f7dc13805a61b732d2c",
|
||||
"Scope": "local",
|
||||
"Driver": "bridge",
|
||||
"EnableIPv6": false,
|
||||
"IPAM": {
|
||||
"Driver": "default",
|
||||
"Options": null,
|
||||
"Config": [
|
||||
{
|
||||
"Subnet": "172.33.68.1/24",
|
||||
"Gateway": "172.33.68.1"
|
||||
}
|
||||
]
|
||||
},
|
||||
"Internal": false,
|
||||
"Containers": {
|
||||
"944d4aa660e30e1be9a18d30c9dcfa3b0504d1e5dbd00f3004b76582f1c9a85b": {
|
||||
"Name": "k8s_POD_coredns-5984fb8cbb-sjqv9_kube-system_c5a2e959-082a-11e8-b4cd-525400005732_0",
|
||||
"EndpointID": "7397d7282e464fc4ec5756d6b328df889cdf46134dbbe3753517e175d3844a85",
|
||||
"MacAddress": "02:42:ac:21:44:02",
|
||||
"IPv4Address": "172.33.68.2/24",
|
||||
"IPv6Address": ""
|
||||
}
|
||||
},
|
||||
"Options": {
|
||||
"com.docker.network.bridge.default_bridge": "true",
|
||||
"com.docker.network.bridge.enable_icc": "true",
|
||||
"com.docker.network.bridge.enable_ip_masquerade": "true",
|
||||
"com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
|
||||
"com.docker.network.bridge.name": "docker0",
|
||||
"com.docker.network.driver.mtu": "1500"
|
||||
},
|
||||
"Labels": {}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
我们可以看到该网络中的`Config`与docker的启动配置相符。
|
||||
|
||||
Node1上运行的容器:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# docker ps
|
||||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
|
||||
a37407a234dd docker.io/coredns/coredns@sha256:adf2e5b4504ef9ffa43f16010bd064273338759e92f6f616dd159115748799bc "/coredns -conf /etc/" About an hour ago Up About an hour k8s_coredns_coredns-5984fb8cbb-sjqv9_kube-system_c5a2e959-082a-11e8-b4cd-525400005732_0
|
||||
944d4aa660e3 docker.io/openshift/origin-pod "/usr/bin/pod" About an hour ago Up About an hour k8s_POD_coredns-5984fb8cbb-sjqv9_kube-system_c5a2e959-082a-11e8-b4cd-525400005732_0
|
||||
```
|
||||
|
||||
我们可以看到当前已经有2个容器在运行。
|
||||
|
||||
Node1上的路由信息:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# route -n
|
||||
Kernel IP routing table
|
||||
Destination Gateway Genmask Flags Metric Ref Use Iface
|
||||
0.0.0.0 10.0.2.2 0.0.0.0 UG 100 0 0 eth0
|
||||
0.0.0.0 172.30.116.1 0.0.0.0 UG 101 0 0 eth2
|
||||
10.0.2.0 0.0.0.0 255.255.255.0 U 100 0 0 eth0
|
||||
172.17.8.0 0.0.0.0 255.255.255.0 U 100 0 0 eth1
|
||||
172.30.112.0 0.0.0.0 255.255.248.0 U 100 0 0 eth2
|
||||
172.33.68.0 0.0.0.0 255.255.255.0 U 0 0 0 docker0
|
||||
172.33.96.0 172.30.118.65 255.255.255.0 UG 0 0 0 eth2
|
||||
```
|
||||
|
||||
以上路由信息是由flannel添加的,当有新的节点加入到Kubernetes集群中后,每个节点上的路由表都将增加。
|
||||
|
||||
我们在node上来`traceroute`下node3上的`coredns-5984fb8cbb-tkfrc`容器,其IP地址是`172.33.96.3`,看看其路由信息。
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# traceroute 172.33.96.3
|
||||
traceroute to 172.33.96.3 (172.33.96.3), 30 hops max, 60 byte packets
|
||||
1 172.30.118.65 (172.30.118.65) 0.518 ms 0.367 ms 0.398 ms
|
||||
2 172.33.96.3 (172.33.96.3) 0.451 ms 0.352 ms 0.223 ms
|
||||
```
|
||||
|
||||
我们看到路由直接经过node3的公网IP后就到达了node3节点上的Pod。
|
||||
|
||||
Node1的iptables信息:
|
||||
|
||||
```bash
|
||||
[root@node1 ~]# iptables -L
|
||||
Chain INPUT (policy ACCEPT)
|
||||
target prot opt source destination
|
||||
KUBE-FIREWALL all -- anywhere anywhere
|
||||
KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */
|
||||
|
||||
Chain FORWARD (policy ACCEPT)
|
||||
target prot opt source destination
|
||||
KUBE-FORWARD all -- anywhere anywhere /* kubernetes forward rules */
|
||||
DOCKER-ISOLATION all -- anywhere anywhere
|
||||
DOCKER all -- anywhere anywhere
|
||||
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
|
||||
ACCEPT all -- anywhere anywhere
|
||||
ACCEPT all -- anywhere anywhere
|
||||
|
||||
Chain OUTPUT (policy ACCEPT)
|
||||
target prot opt source destination
|
||||
KUBE-FIREWALL all -- anywhere anywhere
|
||||
KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */
|
||||
|
||||
Chain DOCKER (1 references)
|
||||
target prot opt source destination
|
||||
|
||||
Chain DOCKER-ISOLATION (1 references)
|
||||
target prot opt source destination
|
||||
RETURN all -- anywhere anywhere
|
||||
|
||||
Chain KUBE-FIREWALL (2 references)
|
||||
target prot opt source destination
|
||||
DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
|
||||
|
||||
Chain KUBE-FORWARD (1 references)
|
||||
target prot opt source destination
|
||||
ACCEPT all -- anywhere anywhere /* kubernetes forwarding rules */ mark match 0x4000/0x4000
|
||||
ACCEPT all -- 10.254.0.0/16 anywhere /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
|
||||
ACCEPT all -- anywhere 10.254.0.0/16 /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED
|
||||
|
||||
Chain KUBE-SERVICES (2 references)
|
||||
target prot opt source destination
|
||||
```
|
||||
|
||||
从上面的iptables中可以看到注入了很多Kuberentes service的规则,请参考[iptables 规则](https://www.cnyunwei.cc/archives/393)获取更多详细信息。
|
||||
|
||||
## 参考
|
||||
|
||||
- [coreos/flannel - github.com](https://github.com/coreos/flannel)
|
||||
- [linux 网络虚拟化: network namespace 简介](http://cizixs.com/2017/02/10/network-virtualization-network-namespace)
|
||||
- [Linux虚拟网络设备之veth](https://segmentfault.com/a/1190000009251098)
|
||||
- [iptables 规则](https://www.cnyunwei.cc/archives/393)
|
||||
- [flannel host-gw network](http://hustcat.github.io/flannel-host-gw-network/)
|
||||
- [flannel - openshift.com](https://docs.openshift.com/container-platform/3.4/architecture/additional_concepts/flannel.html)
|
||||
- [Kubernetes中的网络解析——以flannel为例](flannel.md)
|
||||
- [Kubernetes中的网络解析——以calico为例](calico.md)
|
|
@ -1,5 +1,9 @@
|
|||
# 本地分布式开发环境搭建(使用Vagrant和Virtualbox)
|
||||
|
||||
**注意:本文停止更新,请直接转到[kubernetes-vagrant-centos-cluster](https://github.com/rootsongjc/kubernetes-vagrant-centos-cluster)仓库浏览最新版本。**
|
||||
|
||||
---
|
||||
|
||||
当我们需要在本地开发时,更希望能够有一个开箱即用又可以方便定制的分布式开发环境,这样才能对Kubernetes本身和应用进行更好的测试。现在我们使用[Vagrant](https://www.vagrantup.com/)和[VirtualBox](https://www.virtualbox.org/wiki/Downloads)来创建一个这样的环境。
|
||||
|
||||
部署时需要使用的配置文件和`vagrantfile`请见:https://github.com/rootsongjc/kubernetes-vagrant-centos-cluster
|
||||
|
|
Loading…
Reference in New Issue