Spark spark-submit 提交应用程序

#Spark spark-submit 提交应用程序
##Spark支持三种集群管理方式
- Standalone—Spark自带的一种集群管理方式，易于构建集群。
- Apache Mesos—通用的集群管理，可以在其上运行Hadoop MapReduce和一些服务应用。
- Hadoop YARN—Hadoop2中的资源管理器。

> **注意**：
1、在集群不是特别大，并且没有mapReduce和Spark同时运行的需求的情况下，用Standalone模式效率最高。
2、Spark可以在应用间（通过集群管理器）和应用中（如果一个SparkContext中有多项计算任务）进行资源调度。

##Running Spark on YARN
###cluster mode
```bash
./bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 1 \
    lib/spark-examples*.jar \
    10
```

###client mode
```bash
./bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode client \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 1 \
    lib/spark-examples*.jar \
    10
```

##spark-submit 详细参数说明

###Master_URL的值

##区分client，cluster，本地模式
下图是典型的client模式，spark的drive在任务提交的本机上。
![spark client 运行模式](/media/editor/file_1571154771000_20191015235254921462.png "spark client 运行模式")
 
下图是cluster模式，spark drive在yarn上。
![spark cluster 运行模式](/media/editor/file_1571154809000_20191015235331638835.png "spark cluster 运行模式")

###三种模式的比较
 ||Yarn Cluster| Yarn Client |Spark Standalone
 |--|--|--|
|Driver在哪里运行| Application Master |Client |Client
|谁请求资源 |Application Master| Application Master| Client
|谁启动executor进程| Yarn NodeManager |Yarn NodeManager |Spark Slave
|驻内存进程 |1.Yarn ResourceManager 2.NodeManager |1.Yarn ResourceManager 2.NodeManager |1.Spark Master 2.Spark Worker
|是否支持Spark Shell| No |Yes| Yes

##spark-submit提交应用程序示例
```bash
# Run application locally on 8 cores(本地模式8核)
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master local[8] \
  /path/to/examples.jar \
  100

# Run on a Spark standalone cluster in client deploy mode(standalone client模式)
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

# Run on a Spark standalone cluster in cluster deploy mode with supervise(standalone cluster模式使用supervise)
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

# Run on a YARN cluster(YARN cluster模式)
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode cluster \  # can be client for client mode
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000

# Run on a Mesos cluster in cluster deploy mode with supervise(Mesos cluster模式使用supervise)
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master mesos://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  http://path/to/examples.jar \
  1000

# Run a Python application on a Spark standalone cluster(standalone cluster模式提交python application)
./bin/spark-submit \
  --master spark://207.184.161.138:7077 \
  examples/src/main/python/pi.py \
  1000
```

大象教程

Spark SQL

Spark Streaming

Spark GraphX

PySpark 教程

大象教程

Spark SQL

Spark Streaming

Spark GraphX

PySpark 教程

加我微信交流吧