2017-08-31 14:23:44 +08:00
<!DOCTYPE HTML>
< html lang = "zh-cn" >
< head >
< meta charset = "UTF-8" >
< meta content = "text/html; charset=utf-8" http-equiv = "Content-Type" >
< title > 5.2.2 运行支持kubernetes原生调度的Spark程序 · Kubernetes Handbook< / title >
< meta http-equiv = "X-UA-Compatible" content = "IE=edge" / >
< meta name = "description" content = "" >
< meta name = "generator" content = "GitBook 3.2.2" >
< meta name = "author" content = "Jimmy Song" >
< link rel = "stylesheet" href = "../gitbook/style.css" >
< link rel = "stylesheet" href = "../gitbook/gitbook-plugin-splitter/splitter.css" >
< link rel = "stylesheet" href = "../gitbook/gitbook-plugin-page-toc-button/plugin.css" >
< link rel = "stylesheet" href = "../gitbook/gitbook-plugin-image-captions/image-captions.css" >
< link rel = "stylesheet" href = "../gitbook/gitbook-plugin-page-footer-ex/style/plugin.css" >
< link rel = "stylesheet" href = "../gitbook/gitbook-plugin-search-plus/search.css" >
< link rel = "stylesheet" href = "../gitbook/gitbook-plugin-highlight/website.css" >
< link rel = "stylesheet" href = "../gitbook/gitbook-plugin-fontsettings/website.css" >
< meta name = "HandheldFriendly" content = "true" / >
< meta name = "viewport" content = "width=device-width, initial-scale=1, user-scalable=no" >
< meta name = "apple-mobile-web-app-capable" content = "yes" >
< meta name = "apple-mobile-web-app-status-bar-style" content = "black" >
< link rel = "apple-touch-icon-precomposed" sizes = "152x152" href = "../gitbook/images/apple-touch-icon-precomposed-152.png" >
< link rel = "shortcut icon" href = "../gitbook/images/favicon.ico" type = "image/x-icon" >
< link rel = "next" href = "serverless.html" / >
< link rel = "prev" href = "spark-standalone-on-kubernetes.html" / >
< / head >
< body >
< div class = "book" >
< div class = "book-summary" >
< div id = "book-search-input" role = "search" >
< input type = "text" placeholder = "輸入並搜尋" / >
< / div >
< nav role = "navigation" >
< ul class = "summary" >
< li class = "chapter " data-level = "1.1" data-path = "../" >
< a href = "../" >
1. 前言
< / a >
< / li >
< li class = "chapter " data-level = "1.2" data-path = "../concepts/" >
< a href = "../concepts/" >
2. 概念原理
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.2.1" data-path = "../concepts/concepts.html" >
< a href = "../concepts/concepts.html" >
2.1 设计理念
< / a >
< / li >
< li class = "chapter " data-level = "1.2.2" data-path = "../concepts/objects.html" >
< a href = "../concepts/objects.html" >
2.2 主要概念
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.2.2.1" data-path = "../concepts/pod-overview.html" >
< a href = "../concepts/pod-overview.html" >
2.2.1 Pod
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.2.2.1.1" data-path = "../concepts/pod.html" >
< a href = "../concepts/pod.html" >
2.2.1.1 Pod解析
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "1.2.2.2" data-path = "../concepts/node.html" >
< a href = "../concepts/node.html" >
2.2.2 Node
< / a >
< / li >
< li class = "chapter " data-level = "1.2.2.3" data-path = "../concepts/namespace.html" >
< a href = "../concepts/namespace.html" >
2.2.3 Namespace
< / a >
< / li >
< li class = "chapter " data-level = "1.2.2.4" data-path = "../concepts/service.html" >
< a href = "../concepts/service.html" >
2.2.4 Service
< / a >
< / li >
< li class = "chapter " data-level = "1.2.2.5" data-path = "../concepts/volume.html" >
< a href = "../concepts/volume.html" >
2.2.5 Volume和Persistent Volume
< / a >
< / li >
< li class = "chapter " data-level = "1.2.2.6" data-path = "../concepts/deployment.html" >
< a href = "../concepts/deployment.html" >
2.2.6 Deployment
< / a >
< / li >
< li class = "chapter " data-level = "1.2.2.7" data-path = "../concepts/secret.html" >
< a href = "../concepts/secret.html" >
2.2.7 Secret
< / a >
< / li >
< li class = "chapter " data-level = "1.2.2.8" data-path = "../concepts/statefulset.html" >
< a href = "../concepts/statefulset.html" >
2.2.8 StatefulSet
< / a >
< / li >
< li class = "chapter " data-level = "1.2.2.9" data-path = "../concepts/daemonset.html" >
< a href = "../concepts/daemonset.html" >
2.2.9 DaemonSet
< / a >
< / li >
< li class = "chapter " data-level = "1.2.2.10" data-path = "../concepts/serviceaccount.html" >
< a href = "../concepts/serviceaccount.html" >
2.2.10 ServiceAccount
< / a >
< / li >
< li class = "chapter " data-level = "1.2.2.11" data-path = "../concepts/replicaset.html" >
< a href = "../concepts/replicaset.html" >
2.2.11 ReplicationController和ReplicaSet
< / a >
< / li >
< li class = "chapter " data-level = "1.2.2.12" data-path = "../concepts/job.html" >
< a href = "../concepts/job.html" >
2.2.12 Job
< / a >
< / li >
< li class = "chapter " data-level = "1.2.2.13" data-path = "../concepts/cronjob.html" >
< a href = "../concepts/cronjob.html" >
2.2.13 CronJob
< / a >
< / li >
< li class = "chapter " data-level = "1.2.2.14" data-path = "../concepts/ingress.html" >
< a href = "../concepts/ingress.html" >
2.2.14 Ingress
< / a >
< / li >
< li class = "chapter " data-level = "1.2.2.15" data-path = "../concepts/configmap.html" >
< a href = "../concepts/configmap.html" >
2.2.15 ConfigMap
< / a >
< / li >
< li class = "chapter " data-level = "1.2.2.16" data-path = "../concepts/horizontal-pod-autoscaling.html" >
< a href = "../concepts/horizontal-pod-autoscaling.html" >
2.2.16 Horizontal Pod Autoscaling
< / a >
< / li >
< li class = "chapter " data-level = "1.2.2.17" data-path = "../concepts/label.html" >
< a href = "../concepts/label.html" >
2.2.17 Label
< / a >
< / li >
< / ul >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "1.3" data-path = "../guide/" >
< a href = "../guide/" >
3. 用户指南
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.3.1" data-path = "../guide/resource-configuration.html" >
< a href = "../guide/resource-configuration.html" >
3.1 资源配置
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.3.1.1" data-path = "../guide/configure-liveness-readiness-probes.html" >
< a href = "../guide/configure-liveness-readiness-probes.html" >
3.1.1 配置Pod的liveness和readiness探针
< / a >
< / li >
< li class = "chapter " data-level = "1.3.1.2" data-path = "../guide/configure-pod-service-account.html" >
< a href = "../guide/configure-pod-service-account.html" >
3.1.2 配置Pod的Service Account
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "1.3.2" data-path = "../guide/command-usage.html" >
< a href = "../guide/command-usage.html" >
3.2 命令使用
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.3.2.1" data-path = "../guide/using-kubectl.html" >
< a href = "../guide/using-kubectl.html" >
3.2.1 使用kubectl
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "1.3.3" data-path = "../guide/cluster-management.html" >
< a href = "../guide/cluster-management.html" >
3.3 集群管理
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.3.3.1" data-path = "../guide/managing-tls-in-a-cluster.html" >
< a href = "../guide/managing-tls-in-a-cluster.html" >
3.3.1 管理集群中的TLS
< / a >
< / li >
< li class = "chapter " data-level = "1.3.3.2" data-path = "../guide/kubelet-authentication-authorization.html" >
< a href = "../guide/kubelet-authentication-authorization.html" >
3.3.2 kubelet的认证授权
< / a >
< / li >
< li class = "chapter " data-level = "1.3.3.3" data-path = "../guide/tls-bootstrapping.html" >
< a href = "../guide/tls-bootstrapping.html" >
3.3.3 TLS bootstrap
< / a >
< / li >
< li class = "chapter " data-level = "1.3.3.4" data-path = "../guide/kubectl-user-authentication-authorization.html" >
< a href = "../guide/kubectl-user-authentication-authorization.html" >
3.3.4 kubectl的用户认证授权
< / a >
< / li >
< li class = "chapter " data-level = "1.3.3.5" data-path = "../guide/rbac.html" >
< a href = "../guide/rbac.html" >
3.3.5 RBAC——基于角色的访问控制
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "1.3.4" data-path = "../guide/access-kubernetes-cluster.html" >
< a href = "../guide/access-kubernetes-cluster.html" >
3.4 访问 Kubernetes 集群
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.3.4.1" data-path = "../guide/access-cluster.html" >
< a href = "../guide/access-cluster.html" >
3.4.1 访问集群
< / a >
< / li >
< li class = "chapter " data-level = "1.3.4.2" data-path = "../guide/authenticate-across-clusters-kubeconfig.html" >
< a href = "../guide/authenticate-across-clusters-kubeconfig.html" >
3.4.2 使用 kubeconfig 文件配置跨集群认证
< / a >
< / li >
< li class = "chapter " data-level = "1.3.4.3" data-path = "../guide/connecting-to-applications-port-forward.html" >
< a href = "../guide/connecting-to-applications-port-forward.html" >
3.4.3 通过端口转发访问集群中的应用程序
< / a >
< / li >
< li class = "chapter " data-level = "1.3.4.4" data-path = "../guide/service-access-application-cluster.html" >
< a href = "../guide/service-access-application-cluster.html" >
3.4.4 使用 service 访问群集中的应用程序
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "1.3.5" data-path = "../guide/application-development-deployment-flow.html" >
< a href = "../guide/application-development-deployment-flow.html" >
3.5 在kubernetes中开发部署应用
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.3.5.1" data-path = "../guide/deploy-applications-in-kubernetes.html" >
< a href = "../guide/deploy-applications-in-kubernetes.html" >
3.5.1 适用于kubernetes的应用开发部署流程
< / a >
< / li >
< li class = "chapter " data-level = "1.3.5.2" data-path = "../guide/migrating-hadoop-yarn-to-kubernetes.html" >
< a href = "../guide/migrating-hadoop-yarn-to-kubernetes.html" >
3.5.2 迁移传统应用到kubernetes中——以Hadoop YARN为例
< / a >
< / li >
< / ul >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "1.4" data-path = "../practice/" >
< a href = "../practice/" >
4. 最佳实践
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.4.1" data-path = "../practice/install-kbernetes1.6-on-centos.html" >
< a href = "../practice/install-kbernetes1.6-on-centos.html" >
4.1 在CentOS上部署kubernetes1.6集群
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.4.1.1" data-path = "../practice/create-tls-and-secret-key.html" >
< a href = "../practice/create-tls-and-secret-key.html" >
4.1.1 创建TLS证书和秘钥
< / a >
< / li >
< li class = "chapter " data-level = "1.4.1.2" data-path = "../practice/create-kubeconfig.html" >
< a href = "../practice/create-kubeconfig.html" >
4.1.2 创建kubeconfig文件
< / a >
< / li >
< li class = "chapter " data-level = "1.4.1.3" data-path = "../practice/etcd-cluster-installation.html" >
< a href = "../practice/etcd-cluster-installation.html" >
4.1.3 创建高可用etcd集群
< / a >
< / li >
< li class = "chapter " data-level = "1.4.1.4" data-path = "../practice/kubectl-installation.html" >
< a href = "../practice/kubectl-installation.html" >
4.1.4 安装kubectl命令行工具
< / a >
< / li >
< li class = "chapter " data-level = "1.4.1.5" data-path = "../practice/master-installation.html" >
< a href = "../practice/master-installation.html" >
4.1.5 部署master节点
< / a >
< / li >
< li class = "chapter " data-level = "1.4.1.6" data-path = "../practice/node-installation.html" >
< a href = "../practice/node-installation.html" >
4.1.6 部署node节点
< / a >
< / li >
< li class = "chapter " data-level = "1.4.1.7" data-path = "../practice/kubedns-addon-installation.html" >
< a href = "../practice/kubedns-addon-installation.html" >
4.1.7 安装kubedns插件
< / a >
< / li >
< li class = "chapter " data-level = "1.4.1.8" data-path = "../practice/dashboard-addon-installation.html" >
< a href = "../practice/dashboard-addon-installation.html" >
4.1.8 安装dashboard插件
< / a >
< / li >
< li class = "chapter " data-level = "1.4.1.9" data-path = "../practice/heapster-addon-installation.html" >
< a href = "../practice/heapster-addon-installation.html" >
4.1.9 安装heapster插件
< / a >
< / li >
< li class = "chapter " data-level = "1.4.1.10" data-path = "../practice/efk-addon-installation.html" >
< a href = "../practice/efk-addon-installation.html" >
4.1.10 安装EFK插件
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "1.4.2" data-path = "../practice/service-discovery-and-loadbalancing.html" >
< a href = "../practice/service-discovery-and-loadbalancing.html" >
4.2 服务发现与负载均衡
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.4.2.1" data-path = "../practice/traefik-ingress-installation.html" >
< a href = "../practice/traefik-ingress-installation.html" >
4.2.1 安装Traefik ingress
< / a >
< / li >
< li class = "chapter " data-level = "1.4.2.2" data-path = "../practice/distributed-load-test.html" >
< a href = "../practice/distributed-load-test.html" >
4.2.2 分布式负载测试
< / a >
< / li >
< li class = "chapter " data-level = "1.4.2.3" data-path = "../practice/network-and-cluster-perfermance-test.html" >
< a href = "../practice/network-and-cluster-perfermance-test.html" >
4.2.3 网络和集群性能测试
< / a >
< / li >
< li class = "chapter " data-level = "1.4.2.4" data-path = "../practice/edge-node-configuration.html" >
< a href = "../practice/edge-node-configuration.html" >
4.2.4 边缘节点配置
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "1.4.3" data-path = "../practice/operation.html" >
< a href = "../practice/operation.html" >
4.3 运维管理
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.4.3.1" data-path = "../practice/service-rolling-update.html" >
< a href = "../practice/service-rolling-update.html" >
4.3.1 服务滚动升级
< / a >
< / li >
< li class = "chapter " data-level = "1.4.3.2" data-path = "../practice/app-log-collection.html" >
< a href = "../practice/app-log-collection.html" >
4.3.2 应用日志收集
< / a >
< / li >
< li class = "chapter " data-level = "1.4.3.3" data-path = "../practice/configuration-best-practice.html" >
< a href = "../practice/configuration-best-practice.html" >
4.3.3 配置最佳实践
< / a >
< / li >
< li class = "chapter " data-level = "1.4.3.4" data-path = "../practice/monitor.html" >
< a href = "../practice/monitor.html" >
4.3.4 集群及应用监控
< / a >
< / li >
< li class = "chapter " data-level = "1.4.3.5" data-path = "../practice/jenkins-ci-cd.html" >
< a href = "../practice/jenkins-ci-cd.html" >
4.3.5 使用Jenkins进行持续构建与发布
< / a >
< / li >
< li class = "chapter " data-level = "1.4.3.6" data-path = "../practice/data-persistence-problem.html" >
< a href = "../practice/data-persistence-problem.html" >
4.3.6 数据持久化问题
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "1.4.4" data-path = "../practice/storage.html" >
< a href = "../practice/storage.html" >
4.4 存储管理
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.4.4.1" data-path = "../practice/glusterfs.html" >
< a href = "../practice/glusterfs.html" >
4.4.1 GlusterFS
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.4.4.1.1" data-path = "../practice/using-glusterfs-for-persistent-storage.html" >
< a href = "../practice/using-glusterfs-for-persistent-storage.html" >
4.4.1.1 使用GlusterFS做持久化存储
< / a >
< / li >
< li class = "chapter " data-level = "1.4.4.1.2" data-path = "../practice/storage-for-containers-using-glusterfs-with-openshift.html" >
< a href = "../practice/storage-for-containers-using-glusterfs-with-openshift.html" >
4.4.1.2 在OpenShift中使用GlusterFS做持久化存储
< / a >
< / li >
< / ul >
< / li >
< / ul >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "1.5" data-path = "./" >
< a href = "./" >
5. 领域应用
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.5.1" data-path = "microservices.html" >
< a href = "microservices.html" >
5.1 微服务架构
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.5.1.1" data-path = "istio.html" >
< a href = "istio.html" >
5.1.1 Istio
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.5.1.1.1" data-path = "istio-installation.html" >
< a href = "istio-installation.html" >
5.1.1.1 安装istio
< / a >
< / li >
< li class = "chapter " data-level = "1.5.1.1.2" data-path = "configuring-request-routing.html" >
< a href = "configuring-request-routing.html" >
5.1.1.2 配置请求的路由规则
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "1.5.1.2" data-path = "linkerd.html" >
< a href = "linkerd.html" >
5.1.2 Linkerd
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.5.1.2.1" data-path = "linkerd-user-guide.html" >
< a href = "linkerd-user-guide.html" >
5.1.2.1 Linkerd 使用指南
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "1.5.1.3" data-path = "service-discovery-in-microservices.html" >
< a href = "service-discovery-in-microservices.html" >
5.1.3 微服务中的服务发现
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "1.5.2" data-path = "big-data.html" >
< a href = "big-data.html" >
5.2 大数据
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.5.2.1" data-path = "spark-standalone-on-kubernetes.html" >
< a href = "spark-standalone-on-kubernetes.html" >
5.2.1 Spark standalone on Kubernetes
< / a >
< / li >
< li class = "chapter active" data-level = "1.5.2.2" data-path = "support-spark-natively-in-kubernetes.html" >
< a href = "support-spark-natively-in-kubernetes.html" >
5.2.2 运行支持kubernetes原生调度的Spark程序
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "1.5.3" data-path = "serverless.html" >
< a href = "serverless.html" >
5.3 Serverless架构
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "1.6" data-path = "../develop/" >
< a href = "../develop/" >
6. 开发指南
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.6.1" data-path = "../develop/developing-environment.html" >
< a href = "../develop/developing-environment.html" >
6.1 开发环境搭建
< / a >
< / li >
< li class = "chapter " data-level = "1.6.2" data-path = "../develop/testing.html" >
< a href = "../develop/testing.html" >
6.2 单元测试和集成测试
< / a >
< / li >
< li class = "chapter " data-level = "1.6.3" data-path = "../develop/client-go-sample.html" >
< a href = "../develop/client-go-sample.html" >
6.3 client-go示例
< / a >
< / li >
< li class = "chapter " data-level = "1.6.4" data-path = "../develop/contribute.html" >
< a href = "../develop/contribute.html" >
6.4 社区贡献
< / a >
< / li >
< / ul >
< / li >
< li class = "chapter " data-level = "1.7" data-path = "../appendix/" >
< a href = "../appendix/" >
7. 附录
< / a >
< ul class = "articles" >
< li class = "chapter " data-level = "1.7.1" data-path = "../appendix/docker-best-practice.html" >
< a href = "../appendix/docker-best-practice.html" >
7.1 Docker最佳实践
< / a >
< / li >
< li class = "chapter " data-level = "1.7.2" data-path = "../appendix/issues.html" >
< a href = "../appendix/issues.html" >
7.2 问题记录
< / a >
< / li >
< li class = "chapter " data-level = "1.7.3" data-path = "../appendix/tricks.html" >
< a href = "../appendix/tricks.html" >
7.3 使用技巧
< / a >
< / li >
< / ul >
< / li >
< li class = "divider" > < / li >
< li >
< a href = "https://www.gitbook.com" target = "blank" class = "gitbook-link" >
本書使用 GitBook 釋出
< / a >
< / li >
< / ul >
< / nav >
< / div >
< div class = "book-body" >
< div class = "body-inner" >
< div class = "book-header" role = "navigation" >
<!-- Title -->
< h1 >
< i class = "fa fa-circle-o-notch fa-spin" > < / i >
< a href = ".." > 5.2.2 运行支持kubernetes原生调度的Spark程序< / a >
< / h1 >
< / div >
< div class = "page-wrapper" tabindex = "-1" role = "main" >
< div class = "page-inner" >
< div class = "search-plus" id = "book-search-results" >
< div class = "search-noresults" >
< section class = "normal markdown-section" >
< h1 id = "运行支持kubernetes原生调度的spark程序" > 运 行 支 持 kubernetes原 生 调 度 的 Spark程 序 < / h1 >
< p > 我 们 之 前 就 在 kubernetes 中 运 行 过 standalone 方 式 的 spark 集 群 , 见 < / p >
< h2 id = "spark-概念说明" > Spark 概 念 说 明 < / h2 >
< p > < a href = "http://spark.apache.org" target = "_blank" > Apache Spark< / a > 是 一 个 围 绕 速 度 、 易 用 性 和 复 杂 分 析 构 建 的 大 数 据 处 理 框 架 。 最 初 在 2009年 由 加 州 大 学 伯 克 利 分 校 的 AMPLab开 发 , 并 于 2010年 成 为 Apache的 开 源 项 目 之 一 。 < / p >
< p > 在 Spark 中 包 括 如 下 组 件 或 概 念 : < / p >
< ul >
< li > < strong > Application< / strong > : Spark Application 的 概 念 和 Hadoop 中 的 MapReduce 类 似 , 指 的 是 用 户 编 写 的 Spark 应 用 程 序 , 包 含 了 一 个 Driver 功 能 的 代 码 和 分 布 在 集 群 中 多 个 节 点 上 运 行 的 Executor 代 码 ; < / li >
< li > < strong > Driver< / strong > : Spark 中 的 Driver 即 运 行 上 述 Application 的 main() 函 数 并 且 创 建 SparkContext, 其 中 创 建 SparkContext 的 目 的 是 为 了 准 备 Spark应 用 程 序 的 运 行 环 境 。 在 Spark 中 由 SparkContext 负 责 和 ClusterManager 通 信 , 进 行 资 源 的 申 请 、 任 务 的 分 配 和 监 控 等 ; 当 Executor 部 分 运 行 完 毕 后 , Driver负 责 将 SparkContext 关 闭 。 通 常 用 SparkContext 代 表 Driver; < / li >
< li > < strong > Executor< / strong > : Application运 行 在 Worker 节 点 上 的 一 个 进 程 , 该 进 程 负 责 运 行 Task, 并 且 负 责 将 数 据 存 在 内 存 或 者 磁 盘 上 , 每 个 Application都 有 各 自 独 立 的 一 批 Executor。 在 Spark on Yarn模 式 下 , 其 进 程 名 称 为 < code > CoarseGrainedExecutorBackend< / code > , 类 似 于 Hadoop MapReduce 中 的 YarnChild。 一 个 < code > CoarseGrainedExecutorBackend< / code > 进 程 有 且 仅 有 一 个 executor 对 象 , 它 负 责 将 Task 包 装 成 taskRunner, 并 从 线 程 池 中 抽 取 出 一 个 空 闲 线 程 运 行 Task。 每 个 < code > CoarseGrainedExecutorBackend< / code > 能 并 行 运 行 Task 的 数 量 就 取 决 于 分 配 给 它 的 CPU 的 个 数 了 ; < / li >
< li > < strong > Cluster Manager< / strong > : 指 的 是 在 集 群 上 获 取 资 源 的 外 部 服 务 , 目 前 有 : < ul >
< li > Standalone: Spark原 生 的 资 源 管 理 , 由 Master负 责 资 源 的 分 配 ; < / li >
< li > Hadoop Yarn: 由 YARN中 的 ResourceManager负 责 资 源 的 分 配 ; < / li >
< / ul >
< / li >
< li > < strong > Worker< / strong > : 集 群 中 任 何 可 以 运 行 Application代 码 的 节 点 , 类 似 于 YARN中 的 NodeManager节 点 。 在 Standalone模 式 中 指 的 就 是 通 过 Slave文 件 配 置 的 Worker节 点 , 在 Spark on Yarn模 式 中 指 的 就 是 NodeManager节 点 ; < / li >
< li > < strong > 作 业 ( Job) < / strong > : 包 含 多 个 Task组 成 的 并 行 计 算 , 往 往 由 Spark Action催 生 , 一 个 JOB包 含 多 个 RDD及 作 用 于 相 应 RDD上 的 各 种 Operation; < / li >
< li > < strong > 阶 段 ( Stage) < / strong > : 每 个 Job会 被 拆 分 很 多 组 Task, 每 组 任 务 被 称 为 Stage, 也 可 称 TaskSet, 一 个 作 业 分 为 多 个 阶 段 , 每 一 个 stage的 分 割 点 是 action。 比 如 一 个 job是 : ( transformation1 -> transformation1 -> action1 -> transformation3 -> action2) , 这 个 job就 会 被 分 为 两 个 stage, 分 割 点 是 action1和 action2。 < / li >
< li > < p > < strong > 任 务 ( Task) < / strong > : 被 送 到 某 个 Executor上 的 工 作 任 务 ; < / p >
< / li >
< li > < p > < strong > Context< / strong > : 启 动 spark application的 时 候 创 建 , 作 为 Spark 运 行 时 环 境 。 < / p >
< / li >
< li > < strong > Dynamic Allocation( 动 态 资 源 分 配 ) < / strong > : 一 个 配 置 选 项 , 可 以 将 其 打 开 。 从 Spark1.2之 后 , 对 于 On Yarn模 式 , 已 经 支 持 动 态 资 源 分 配 ( Dynamic Resource Allocation) , 这 样 , 就 可 以 根 据 Application的 负 载 ( Task情 况 ) , 动 态 的 增 加 和 减 少 executors, 这 种 策 略 非 常 适 合 在 YARN上 使 用 spark-sql做 数 据 开 发 和 分 析 , 以 及 将 spark-sql作 为 长 服 务 来 使 用 的 场 景 。 Executor 的 动 态 分 配 需 要 在 cluster mode 下 启 用 " external shuffle service" 。 < / li >
< li > < strong > 动 态 资 源 分 配 策 略 < / strong > : 开 启 动 态 分 配 策 略 后 , application会 在 task因 没 有 足 够 资 源 被 挂 起 的 时 候 去 动 态 申 请 资 源 , 这 意 味 着 该 application现 有 的 executor无 法 满 足 所 有 task并 行 运 行 。 spark一 轮 一 轮 的 申 请 资 源 , 当 有 task挂 起 或 等 待 < code > spark.dynamicAllocation.schedulerBacklogTimeout< / code > (默 认 1s)时 间 的 时 候 , 会 开 始 动 态 资 源 分 配 ; 之 后 会 每 隔 < code > spark.dynamicAllocation.sustainedSchedulerBacklogTimeout< / code > (默 认 1s)时 间 申 请 一 次 , 直 到 申 请 到 足 够 的 资 源 。 每 次 申 请 的 资 源 量 是 指 数 增 长 的 , 即 1,2,4,8等 。 之 所 以 采 用 指 数 增 长 , 出 于 两 方 面 考 虑 : 其 一 , 开 始 申 请 的 少 是 考 虑 到 可 能 application会 马 上 得 到 满 足 ; 其 次 要 成 倍 增 加 , 是 为 了 防 止 application需 要 很 多 资 源 , 而 该 方 式 可 以 在 很 少 次 数 的 申 请 之 后 得 到 满 足 。 < / li >
< / ul >
< h2 id = "架构设计" > 架 构 设 计 < / h2 >
2017-08-31 18:20:45 +08:00
< p > 关 于 spark standalone 的 局 限 性 与 kubernetes native spark 架 构 之 间 的 区 别 请 参 考 Anirudh Ramanathan 在 2016年 10月 8日 提 交 的 issue < a href = "https://github.com/kubernetes/kubernetes/issues/34377" target = "_blank" > Support Spark natively in Kubernetes #34377< / a > 。 < / p >
< p > 简 而 言 之 , spark standalone on kubernetes 有 如 下 几 个 缺 点 : < / p >
< ul >
< li > 无 法 对 于 多 租 户 做 隔 离 , 每 个 用 户 都 想 给 pod 申 请 node 节 点 可 用 的 最 大 的 资 源 。 < / li >
< li > Spark 的 master/ worker 本 来 不 是 设 计 成 使 用 kubernetes 的 资 源 调 度 , 这 样 会 存 在 两 层 的 资 源 调 度 问 题 , 不 利 于 与 kuberentes 集 成 。 < / li >
< / ul >
< p > 而 kubernetes native spark 集 群 中 , spark 可 以 调 用 kubernetes API 获 取 集 群 资 源 和 调 度 。 要 实 现 kubernetes native spark 需 要 为 spark 提 供 一 个 集 群 外 部 的 manager 可 以 用 来 跟 kubernetes API 交 互 。 < / p >
< h2 id = "安装指南" > 安 装 指 南 < / h2 >
< p > 我 们 可 以 直 接 使 用 官 方 已 编 译 好 的 docker 镜 像 来 部 署 。 < / p >
< table >
< thead >
< tr >
< th > 组 件 < / th >
< th > 镜 像 < / th >
< / tr >
< / thead >
< tbody >
< tr >
< td > Spark Driver Image< / td >
< td > < code > kubespark/spark-driver:v2.1.0-kubernetes-0.3.1< / code > < / td >
< / tr >
< tr >
< td > Spark Executor Image< / td >
< td > < code > kubespark/spark-executor:v2.1.0-kubernetes-0.3.1< / code > < / td >
< / tr >
< tr >
< td > Spark Initialization Image< / td >
< td > < code > kubespark/spark-init:v2.1.0-kubernetes-0.3.1< / code > < / td >
< / tr >
< tr >
< td > Spark Staging Server Image< / td >
< td > < code > kubespark/spark-resource-staging-server:v2.1.0-kubernetes-0.3.1< / code > < / td >
< / tr >
< tr >
< td > PySpark Driver Image< / td >
< td > < code > kubespark/driver-py:v2.1.0-kubernetes-0.3.1< / code > < / td >
< / tr >
< tr >
< td > PySpark Executor Image< / td >
< td > < code > kubespark/executor-py:v2.1.0-kubernetes-0.3.1< / code > < / td >
< / tr >
< / tbody >
< / table >
2017-08-31 14:23:44 +08:00
< h2 id = "参考" > 参 考 < / h2 >
< p > < a href = "http://lxw1234.com/archives/2015/12/593.htm" target = "_blank" > Spark动 态 资 源 分 配 -Dynamic Resource Allocation< / a > < / p >
< p > < a href = "https://apache-spark-on-k8s.github.io/userdocs/running-on-kubernetes.html" target = "_blank" > Running Spark on Kubernetes< / a > < / p >
< p > < a href = "https://issues.apache.org/jira/browse/SPARK-18278" target = "_blank" > Apache Spark Jira Issue - 18278 - SPIP: Support native submission of spark jobs to a kubernetes cluster< / a > < / p >
< p > < a href = "https://github.com/kubernetes/kubernetes/issues/34377" target = "_blank" > Kubernetes Github Issue - 34377 Support Spark natively in Kubernetes< / a > < / p >
< p > < a href = "https://github.com/kubernetes/kubernetes/tree/master/examples/spark" target = "_blank" > Kubernetes example spark< / a > < / p >
< p > < a href = "https://github.com/rootsongjc/spark-on-kubernetes" target = "_blank" > https://github.com/rootsongjc/spark-on-kubernetes< / a > < / p >
< footer class = "page-footer-ex" > < span class = "page-footer-ex-copyright" > for GitBook< / span >                       < span class = "page-footer-ex-footer-update" > update
2017-08-31 18:20:45 +08:00
2017-08-31 18:10:05
2017-08-31 14:23:44 +08:00
< / span > < / footer >
< / section >
< / div >
< div class = "search-results" >
< div class = "has-results" >
< h1 class = "search-results-title" > < span class = 'search-results-count' > < / span > results matching "< span class = 'search-query' > < / span > "< / h1 >
< ul class = "search-results-list" > < / ul >
< / div >
< div class = "no-results" >
< h1 class = "search-results-title" > No results matching "< span class = 'search-query' > < / span > "< / h1 >
< / div >
< / div >
< / div >
< / div >
< / div >
< / div >
< a href = "spark-standalone-on-kubernetes.html" class = "navigation navigation-prev " aria-label = "Previous page: 5.2.1 Spark standalone on Kubernetes" >
< i class = "fa fa-angle-left" > < / i >
< / a >
< a href = "serverless.html" class = "navigation navigation-next " aria-label = "Next page: 5.3 Serverless架构" >
< i class = "fa fa-angle-right" > < / i >
< / a >
< / div >
< script >
var gitbook = gitbook || [];
gitbook.push(function() {
2017-08-31 18:20:45 +08:00
gitbook.page.hasChanged({"page":{"title":"5.2.2 运行支持kubernetes原生调度的Spark程序","level":"1.5.2.2","depth":3,"next":{"title":"5.3 Serverless架构","level":"1.5.3","depth":2,"path":"usecases/serverless.md","ref":"usecases/serverless.md","articles":[]},"previous":{"title":"5.2.1 Spark standalone on Kubernetes","level":"1.5.2.1","depth":3,"path":"usecases/spark-standalone-on-kubernetes.md","ref":"usecases/spark-standalone-on-kubernetes.md","articles":[]},"dir":"ltr"},"config":{"plugins":["github","codesnippet","splitter","page-toc-button","image-captions","page-footer-ex","editlink","-lunr","-search","search-plus"],"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"pluginsConfig":{"github":{"url":"https://github.com/rootsongjc/kubernetes-handbook"},"editlink":{"label":"编辑本页","multilingual":false,"base":"https://github.com/rootsongjc/kubernetes-handbook/blob/master/"},"page-footer-ex":{"copyright":"for GitBook","update_format":"YYYY-MM-DD HH:mm:ss","update_label":"update"},"splitter":{},"codesnippet":{},"fontsettings":{"theme":"white","family":"sans","size":2},"highlight":{},"page-toc-button":{},"sharing":{"facebook":true,"twitter":true,"google":false,"weibo":false,"instapaper":false,"vk":false,"all":["facebook","google","twitter","weibo","instapaper"]},"theme-default":{"styles":{"website":"styles/website.css","pdf":"styles/pdf.css","epub":"styles/epub.css","mobi":"styles/mobi.css","ebook":"styles/ebook.css","print":"styles/print.css"},"showLevel":false},"search-plus":{},"image-captions":{"variable_name":"_pictures"}},"page-footer-ex":{"copyright":"Jimmy Song","update_label":"最后更新:","update_format":"YYYY-MM-DD HH:mm:ss"},"theme":"default","author":"Jimmy Song","pdf":{"pageNumbers":true,"fontSize":12,"fontFamily":"Arial","paperSize":"a4","chapterMark":"pagebreak","pageBreaksBefore":"/","margin":{"right":62,"left":62,"top":56,"bottom":56}},"structure":{"langs":"LANGS.md","readme":"README.md","glossary":"GLOSSARY.md","summary":"SUMMARY.md"},"variables":{"_pictures":[{"backlink":"concepts/index.html#fig1.2.1","level":"1.2","list_caption":"Figure: Borg架构","alt":"Borg架构","nro":1,"url":"../images/borg.png","index":1,"caption_template":"Figure: _CAPTION_","label":"Borg架构","attributes":{},"skip":false,"key":"1.2.1"},{"backlink":"concepts/index.html#fig1.2.2","level":"1.2","list_caption":"Figure: Kubernetes架构","alt":"Kubernetes架构","nro":2,"url":"../images/architecture.png","index":2,"caption_template":"Figure: _CAPTION_","label":"Kubernetes架构","attributes":{},"skip":false,"key":"1.2.2"},{"backlink":"concepts/index.html#fig1.2.3","level":"1.2","list_caption":"Figure: kubernetes整体架构示意图","alt":"kubernetes整体架构示意图","nro":3,"url":"../images/kubernetes-whole-arch.png","index":3,"caption_template":"Figure: _CAPTION_","label":"kubernetes整体架构示意图","attributes":{},"skip":false,"key":"1.2.3"},{"backlink":"concepts/index.html#fig1.2.4","level":"1.2","list_caption":"Figure: Kubernetes master架构示意图","alt":"Kubernetes master架构示意图","nro":4,"url":"../images/kubernetes-master-arch.png","index":4,"caption_template":"Figure: _CAPTION_","label":"Kubernetes master架构示意图","attributes":{},"skip":false,"key":"1.2.4"},{"backlink":"concepts/index.html#fig1.2.5","level":"1.2","list_caption":"Figure: kubernetes node架构示意图","alt":"kubernetes node架构示意图","nro":5,"url":"../images/kubernetes-node-arch.png","index":5,"caption_template":"Figure: _CAPTION_","label":"kubernetes node架构示意图","attributes":{},"skip":false,"key":"1.2.5"},{"backlink":"concepts/index.html#fig1.2.6","level":"1.2","list_caption":"Figure: Kubernetes分层架构示意图","alt":"Kubernetes分层架构示意图","nro":6,"url":"../images/kubernetes-layers-arch.jpg","index":6,"caption_template":"Figure: _CAPTION_","label":"Kubernetes分层架构示意图","attributes":{},"skip":false,"key":"1.2.6"},{"backlink":"concepts
2017-08-31 14:23:44 +08:00
});
< / script >
< / div >
< script src = "../gitbook/gitbook.js" > < / script >
< script src = "../gitbook/theme.js" > < / script >
< script src = "../gitbook/gitbook-plugin-github/plugin.js" > < / script >
< script src = "../gitbook/gitbook-plugin-splitter/splitter.js" > < / script >
< script src = "../gitbook/gitbook-plugin-page-toc-button/plugin.js" > < / script >
< script src = "../gitbook/gitbook-plugin-editlink/plugin.js" > < / script >
< script src = "../gitbook/gitbook-plugin-search-plus/jquery.mark.min.js" > < / script >
< script src = "../gitbook/gitbook-plugin-search-plus/search.js" > < / script >
< script src = "../gitbook/gitbook-plugin-sharing/buttons.js" > < / script >
< script src = "../gitbook/gitbook-plugin-fontsettings/fontsettings.js" > < / script >
< / body >
< / html >