diff --git a/guide/cluster-management.md b/guide/cluster-management.md deleted file mode 100644 index 5c34d2d72..000000000 --- a/guide/cluster-management.md +++ /dev/null @@ -1,3 +0,0 @@ -# 集群管理 - -Kubernetes 集群的各种杂项管理。 \ No newline at end of file diff --git a/usecases/support-spark-natively-in-kubernetes.md b/usecases/support-spark-natively-in-kubernetes.md index bf93c2388..657f874c9 100644 --- a/usecases/support-spark-natively-in-kubernetes.md +++ b/usecases/support-spark-natively-in-kubernetes.md @@ -53,7 +53,7 @@ 还需要安装支持 kubernetes 的 spark 客户端,在这里下载:https://github.com/apache-spark-on-k8s/spark/releases -根据使用的镜像版本,我下载的是 [v2.1.0-kubernetes-0.3.1](https://github.com/apache-spark-on-k8s/spark/releases/tag/v2.1.0-kubernetes-0.3.1) +根据使用的镜像版本,我下载的是 [v2.1.0-kubernetes-0.3.1](https://github.com/apache-spark-on-k8s/spark/releases/tag/v2.1.0-kubernetes-0.3.1) **运行 SparkPi 测试** @@ -73,89 +73,24 @@ local:///opt/spark/examples/jars/spark-examples_2.11-2.1.0-k8s-0.3.1-SNAPSHOT.ja 关于该命令参数的介绍请参考:https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html -**注意:** 该 jar 包实际上是 +**注意:** 该 jar 包实际上是 `spark.kubernetes.executor.docker.image` 镜像中的。 -运行失败,报错信息: +这时候提交任务运行还是失败,报错信息中可以看到两个问题: -```Ini -2017-09-05 14:45:52 INFO Client:54 - Waiting for application spark-pi to finish... -2017-09-05 14:45:52 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: - pod name: spark-pi-1504593950039-driver - namespace: spark-cluster - labels: spark-app-selector -> spark-81cd1d33adbd4f728f7c609356b54c43, spark-role -> driver - pod uid: dbf66ecf-9205-11e7-970c-f4e9d49f8ed0 - creation time: 2017-09-05T06:45:52Z - service account name: default - volumes: default-token-klxp8 - node name: N/A - start time: N/A - container images: N/A - phase: Pending - status: [] -2017-09-05 14:45:52 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: - pod name: spark-pi-1504593950039-driver - namespace: spark-cluster - labels: spark-app-selector -> spark-81cd1d33adbd4f728f7c609356b54c43, spark-role -> driver - pod uid: dbf66ecf-9205-11e7-970c-f4e9d49f8ed0 - creation time: 2017-09-05T06:45:52Z - service account name: default - volumes: default-token-klxp8 - node name: 172.20.0.115 - start time: N/A - container images: N/A - phase: Pending - status: [] -2017-09-05 14:45:52 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: - pod name: spark-pi-1504593950039-driver - namespace: spark-cluster - labels: spark-app-selector -> spark-81cd1d33adbd4f728f7c609356b54c43, spark-role -> driver - pod uid: dbf66ecf-9205-11e7-970c-f4e9d49f8ed0 - creation time: 2017-09-05T06:45:52Z - service account name: default - volumes: default-token-klxp8 - node name: 172.20.0.115 - start time: 2017-09-05T06:45:52Z - container images: sz-pg-oam-docker-hub-001.tendcloud.com/library/kubespark-spark-driver:v2.1.0-kubernetes-0.3.1 - phase: Pending - status: [ContainerStatus(containerID=null, image=sz-pg-oam-docker-hub-001.tendcloud.com/library/kubespark-spark-driver:v2.1.0-kubernetes-0.3.1, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})] -2017-09-05 14:45:53 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: - pod name: spark-pi-1504593950039-driver - namespace: spark-cluster - labels: spark-app-selector -> spark-81cd1d33adbd4f728f7c609356b54c43, spark-role -> driver - pod uid: dbf66ecf-9205-11e7-970c-f4e9d49f8ed0 - creation time: 2017-09-05T06:45:52Z - service account name: default - volumes: default-token-klxp8 - node name: 172.20.0.115 - start time: 2017-09-05T06:45:52Z - container images: sz-pg-oam-docker-hub-001.tendcloud.com/library/kubespark-spark-driver:v2.1.0-kubernetes-0.3.1 - phase: Running - status: [ContainerStatus(containerID=docker://53de39eb83435a344ef780aae83139229d4d6d78fa4e1655f9f81da95d89f439, image=sz-pg-oam-docker-hub-001.tendcloud.com/library/kubespark-spark-driver:v2.1.0-kubernetes-0.3.1, imageID=docker-pullable://sz-pg-oam-docker-hub-001.tendcloud.com/library/kubespark-spark-driver@sha256:19c3b76a34fee02104de0d859a60d79608ebd0b7ebae33ec3b86a71af777c833, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=2017-09-05T06:45:53Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})] -2017-09-05 14:45:56 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: - pod name: spark-pi-1504593950039-driver - namespace: spark-cluster - labels: spark-app-selector -> spark-81cd1d33adbd4f728f7c609356b54c43, spark-role -> driver - pod uid: dbf66ecf-9205-11e7-970c-f4e9d49f8ed0 - creation time: 2017-09-05T06:45:52Z - service account name: default - volumes: default-token-klxp8 - node name: 172.20.0.115 - start time: 2017-09-05T06:45:52Z - container images: sz-pg-oam-docker-hub-001.tendcloud.com/library/kubespark-spark-driver:v2.1.0-kubernetes-0.3.1 - phase: Failed - status: [ContainerStatus(containerID=docker://53de39eb83435a344ef780aae83139229d4d6d78fa4e1655f9f81da95d89f439, image=sz-pg-oam-docker-hub-001.tendcloud.com/library/kubespark-spark-driver:v2.1.0-kubernetes-0.3.1, imageID=docker-pullable://sz-pg-oam-docker-hub-001.tendcloud.com/library/kubespark-spark-driver@sha256:19c3b76a34fee02104de0d859a60d79608ebd0b7ebae33ec3b86a71af777c833, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=docker://53de39eb83435a344ef780aae83139229d4d6d78fa4e1655f9f81da95d89f439, exitCode=1, finishedAt=2017-09-05T06:45:55Z, message=null, reason=Error, signal=null, startedAt=null, additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})] -2017-09-05 14:45:56 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses: - - - Container name: spark-kubernetes-driver - Container image: sz-pg-oam-docker-hub-001.tendcloud.com/library/kubespark-spark-driver:v2.1.0-kubernetes-0.3.1 - Container state: Terminated - Exit code: 1 -2017-09-05 14:45:56 INFO Client:54 - Application spark-pi finished. -``` +- Executor 无法找到 driver pod +- 用户 `system:serviceaccount:spark-cluster:defaul` 没有权限获取 `spark-cluster` 中的 pod 信息。 提了个 issue [Failed to run the sample spark-pi test using spark-submit on the doc #478](https://github.com/apache-spark-on-k8s/spark/issues/478) +需要为 spark 集群创建一个 `serviceaccount` 和 `clusterrolebinding`: + +```bash +kubectl create serviceaccount spark --namespace spark-cluster +kubectl create rolebinding spark-edit --clusterrole=edit --serviceaccount=spark-cluster:spark --namespace=spark-cluster +``` + +该 Bug 将在新版本中修复。 + ## 参考 [Spark动态资源分配-Dynamic Resource Allocation](http://lxw1234.com/archives/2015/12/593.htm)