spark-submit 提交的spark任务已经结束, 但是spark-submit没有结束

我用azkaban 调度起了一个spark-submit任务, spark程序已经结束但是spark-submit没有结束。Spark版本1.6
spark-submit参数如下

1547692764(1).jpg

 
在yarn的ResourceManager WebUI 查看提交的spark任务已经正常结束,任务是执行成功的。但是spark-submit还没有结束
我看了下spark-submit源码, 在执行完我jar包的main方法之后已经没有操作了呀, 为什么还会卡住
 /**
* Run the main method of the child class using the provided launch environment.
*
* Note that this main class will not be the one provided by the user if we're
* running cluster deploy mode or python applications.
*/
private def runMain(
childArgs: Seq[String],
childClasspath: Seq[String],
sysProps: Map[String, String],
childMainClass: String,
verbose: Boolean): Unit = {
// scalastyle:off println
if (verbose) {
printStream.println(s"Main class:\n$childMainClass")
printStream.println(s"Arguments:\n${childArgs.mkString("\n")}")
printStream.println(s"System properties:\n${sysProps.mkString("\n")}")
printStream.println(s"Classpath elements:\n${childClasspath.mkString("\n")}")
printStream.println("\n")
}
// scalastyle:on println

val loader =
if (sysProps.getOrElse("spark.driver.userClassPathFirst", "false").toBoolean) {
new ChildFirstURLClassLoader(new Array[URL](0),
Thread.currentThread.getContextClassLoader)
} else {
new MutableURLClassLoader(new Array[URL](0),
Thread.currentThread.getContextClassLoader)
}
Thread.currentThread.setContextClassLoader(loader)

for (jar <- childClasspath) {
addJarToClasspath(jar, loader)
}

for ((key, value) <- sysProps) {
System.setProperty(key, value)
}

var mainClass: Class[_] = null

try {
mainClass = Utils.classForName(childMainClass)
} catch {
case e: ClassNotFoundException =>
e.printStackTrace(printStream)
if (childMainClass.contains("thriftserver")) {
// scalastyle:off println
printStream.println(s"Failed to load main class $childMainClass.")
printStream.println("You need to build Spark with -Phive and -Phive-thriftserver.")
// scalastyle:on println
}
System.exit(CLASS_NOT_FOUND_EXIT_STATUS)
case e: NoClassDefFoundError =>
e.printStackTrace(printStream)
if (e.getMessage.contains("org/apache/hadoop/hive")) {
// scalastyle:off println
printStream.println(s"Failed to load hive class.")
printStream.println("You need to build Spark with -Phive and -Phive-thriftserver.")
// scalastyle:on println
}
System.exit(CLASS_NOT_FOUND_EXIT_STATUS)
}

// SPARK-4170
if (classOf[scala.App].isAssignableFrom(mainClass)) {
printWarning("Subclasses of scala.App may not work correctly. Use a main() method instead.")
}

val mainMethod = mainClass.getMethod("main", new Array[String](0).getClass)
if (!Modifier.isStatic(mainMethod.getModifiers)) {
throw new IllegalStateException("The main method in the given main class must be static")
}

def findCause(t: Throwable): Throwable = t match {
case e: UndeclaredThrowableException =>
if (e.getCause() != null) findCause(e.getCause()) else e
case e: InvocationTargetException =>
if (e.getCause() != null) findCause(e.getCause()) else e
case e: Throwable =>
e
}

try {
//这里调用main方法之后就没有任何操作了
mainMethod.invoke(null, childArgs.toArray)
} catch {
case t: Throwable =>
findCause(t) match {
case SparkUserAppException(exitCode) =>
System.exit(exitCode)

case t: Throwable =>
throw t
}
}
}
SparkSubmit源码地址 ,1.6版
已邀请:

Journey - 95后IT男...

赞同来自:

spark.conf.set("mapreduce.fileoutputcommitter.algorithm.version", "2")
试试

Potato - 我的土豆不是梦

赞同来自:

请问你的spark on yarn问题解决了吗?
我使用spark-1.6.0的spark集群运行在yarn-cluster模式,全部tasks均成功运行完成,但job仍停留在10%的运行状态长达三四个小时,然后重新从头运行所有tasks。
如下三张图所示

3.png


2.png


1.png

一时找不到解决方案,然后改用standalone运行在luster模式,多次运行尝试每次都报“不能打开/tmp/spark-xxx,紧接着的一个异常,它说‘权限拒绝’”,其中/tmp目录是我安装系统时创建的,/tmp/spark-xxx是程序运行过程中自己创建的,运行结束它好像自动删除。如下所示:
FetchFailed(BlockManagerId(0, worker, 7337), shuffleId=0, mapId=0, reduceId=24, message=
org.apache.spark.shuffle.FetchFailedException: java.lang.RuntimeException: Failed to open file: /tmp/spark-d77f97b2-246e-429e-b64d-cd9f83b710a6/executor-51312050-a885-4431-9a5a-a5f2c39f3851/blockmgr-ae86a07d-2c6f-45c3-a95f-6f1cb0ca1889/30/shuffle_0_0_0.index
Caused by: java.io.FileNotFoundException: /tmp/spark-d77f97b2-246e-429e-b64d-cd9f83b710a6/executor-51312050-a885-4431-9a5a-a5f2c39f3851/blockmgr-ae86a07d-2c6f-45c3-a95f-6f1cb0ca1889/30/shuffle_0_0_0.index (Permission denied)

4.png


5.png


6.png

 
然后经人指点,在main方法下面第一句设置
def main(args: Array[String]) {
  System.setProperty("HADOOP_USER_NAME","spark")
然而无论设置成“spark”,还是“root”仍报此错。
 
整个精神状态快被提交任务搞崩溃了。。。。
 
另外,上面yarn和standalone运行在client模式,我也试了,可能因为内存少的原因,每次各种不相同的错误。
那就local[*]模式,然而仍报错。错误见此网址留言处:http://hbase.group/question/233
 
有种窒息的赶脚。。。
 
跪求!!!
 
 

要回复问题请先登录注册


中国HBase技术社区微信公众号:
hbasegroup

欢迎加入HBase生态+Spark社区钉钉大群