生而为人

程序员的自我修养

0%

查看jar包中是否包含某个文件

1
jar vtf analysis-tool-1.0.0-SNAPSHOT.jar application.yml

yarn架构

结束任务:

yarn application -kill application_1628131882983_0053

触发spark合并小文件的阈值(1M)

SET spark.sql.mergeSmallFileSize=104857600;

spark shuffle默认分区数

set spark.sql.shuffle.partitions=3000;
set spark.dynamicAllocation.enabled=true;
set spark.dynamicAllocation.minExecutors=100;
set spark.dynamicAllocation.maxExecutors=1000;
set spark.speculation.multiplier=1.5;
set spark.speculation.quantile=0.85;
set spark.driver.memory=8g;
set spark.executor.memory=13G;

2 详细内容

2.1 参数

2.1.1 资源参数
托管平台参数名 xt平台参数 含义 建议
num-executors set spark.dynamicAllocation.enabled=true;set spark.dynamicAllocation.minExecutors=10;set spark.dynamicAllocation.maxExecutors=200; 决定作业会向yarn申请多少executor,如果设置过少会导致并行度不够,过多会导致浪费资源或pending 根据数据量,以及设置的并行度(最大最慢的stage的并行度)和executor-cores来决定设置,建议根据瓶颈的stage的并行度来决定申请资源,一般使并行度=(num-executors * executor-cores)*(2~3)
executor-memory set spark.executor.memory=2g; 缓存数据、代码执行的堆内存以及JVM运行时需要的内存,堆内+对外=一个container内存大小。spark任务并不是内存申请越大执行越快,而是合理利用内存之后才会变快。 根据处理的数据量来申请,或者debug过程中,oom的情况,建议调大参数,考虑到队列资源情况,可以1G为粒度向上加,直至解决问题
set spark.yarn.executor.memoryOverhead=1024; 堆外内存,直接向系统申请,如数据传输时的netty等。 如果shuffle数据量比较大,或者排序逻辑很复杂时可以调大参数
executor-cores set spark.executor.cores=5; 决定了每个executor进程并行执行task线程的能力。因为每个cu同一时间只能执行一个task线程,与并行度相关很大。 根据 并行度=(num-executors * executor-cores)*(2~3)
driver-memory set spark.driver.memory=4g; driver的内存,如果有collect操作,或者在driver有复杂计算,或者executor num和task num很大可以多申请些。 一般4g足够了,如果executor和并行度很大,建议设置高一些,避免oom
spark.memory.fraction set spark.memory.fraction=0.6 storage和execution相加占JVM内存占比,默认0.6,剩余0.4用于用户数据结构以及spark元数据(整体内存还有约300M系统预留) 建议根据缓存数据,自定义数据结构调整比例。
spark.memory.storageFraction set spark.memory.storageFraction=0.5 storage和execution内存比例,由于spark2.0后这部分内存动态占用机制,所以这块基本可以忽略 建议忽略
spark.network.timeout set spark.network.timeout=480s; 由网络或者gc引起,worker或executor没有接收到executor或task的心跳反馈。 适当提高可以提升稳定性,shuffle很重(shuffle read,write数据量很大)的场景下可以提升点
spark.task.maxFailures set spark.task.maxFailures=6; 单个task失败次数上限,触发后job会失败 适当提高可以提升稳定性,一般不需要调整
spark.speculation set spark.speculation=true 推测执行的开关, 默认是true,推测执行的各个参数调整都会使得任务处于多申请资源,无法避免慢节点的情况,因此调整需要尽量保证测试效果。 因为是分布式计算引擎,task分散在各个服务器上运行,而每个服务器又承载不止一个任务,因为大体量的集群下,慢节点(磁盘io满,网络io满,cpu满)这种情况几乎是无法避免的,因此出现个别task在慢节点执行时,需要推测执行(将相同的task分发到别的节点去执行,避免慢节点拖垮任务)
spark.speculation.interval set spark.speculation.interval=1000ms 检查task是否需要推测执行的时间间隔, 默认1000ms
spark.speculation.multiplier set spark.speculation.multiplier=3 当task的时间超过已完成task执行时间中位数的几倍后, 才对这个task开启推测执行, 默认是3倍 谨慎调整,如果任务平均task时间都很快,且任务很重要,可以适当调低这个值,更快的发起推测执行,尽量避免慢节点
spark.speculation.quantile set spark.speculation.quantile=0.99 当一个stage中有多少task已经完成后才开启推测执行, 默认0.75(75%) 谨慎调整,如果任务平均task时间都很快,且任务很重要,可以适当调低这个值,更大范围的发起推测执行,尽量避免慢节点
2.2.2 读写文件
参数 含义 建议
set hive.exec.orc.split.strategy=HYBRID; 读取ORC表时生成split的策略//BI策略以文件为粒度进行split划分;//ETL策略会将文件进行切分,多个stripe组成一个split;//HYBRID策略为:当文件的平均大小大于hadoop最大split值(默认256 * 1024 * 1024)时使用ETL策略,否则使用BI策略。//对于一些较大的ORC表,可能其footer较大,ETL策略可能会导致其从hdfs拉取大量的数据来切分split,甚至会导致driver端OOM,因此这类表的读取建议使用BI策略。//对于一些较小的尤其有数据倾斜的表(这里的数据倾斜指大量stripe存储于少数文件中),建议使用ETL策略。//不想麻烦的可以直接使用HYBRID策略//平台默认应该是HYBRID,个人感觉一般情况下不需要调整 默认值即可,但是如果读取文件数特别大,大小也特别大时需要考虑切分策略
set spark.hadoop.mapreduce.input.fileinputformat.split.maxsize=134217728set spark.hadoop.mapreduce.input.fileinputformat.split.minsize=134217728 如果使用ETL策略读取orc文件,设置以上参数可以完成对小文件的合并效果~
set spark.hadoopRDD.targetBytesInPartition=134217728; //Spark在读取hive表时,默认会为每个文件创建一个task,如果一个SQL没有shuffle类型的算子,每个task执行完都会产生一个文件写回HDFS,这样就潜在存在小文件问题。//该参数可以将多个文件放到一个task中处理,默认为33554432,即如果一个文件和另一个文件大小之和小于32M,就会被放到一个task钟处理。适当提高该值,可以降低调度压力,避免无shuffle作业产生 读取文件慢的时候,可以调小
set spark.sql.mergeSmallFileSize=134217728; //如果最终写入分区目录下文件平均大小小于参数,会再启动一个map only的stage做合并,强烈建议设置该参数
2.2.3 shuffle参数
参数 spark-sql 含义 建议
spark.sql.shuffle.partitions set spark.sql.shuffle.partitions=800; 每个stage的默认task数量,也就是默认并行度。在shuffle操作时从mapper端写出的partition个数,平台默认2000 可以根据申请的executor和cores数来决定,建议代码中会对瓶颈的stage对rdd做repartition操作,
spark.sql.adaptive.enabled set spark.sql.adaptive.enabled=true 是否开启调整partition功能,如果开启,spark.sql.shuffle.partitions设置的partition可能会被合并到一个reducer里运行, 类似于repartition操作只不过是自动化的
spark.sql.adaptive.shuffle.targetPostShuffleInputSize set spark.sql.adaptive.shuffle.targetPostShuffleInputSize 开启spark.sql.adaptive.enabled后,两个partition的和低于该阈值会合并到一个reducer 同上
spark.sql.adaptive.minNumPostShufflePartitions set spark.sql.adaptive.minNumPostShufflePartitions 开启spark.sql.adaptive.enabled后,最小的分区数 同上
set spark.hadoop.mapreduce.input.fileinputformat.split.maxsize 当几个stripe的大小大于该值时,会合并到一个task中处理 同上
spark.shuffle.compress set spark.shuffle.compress=true; shuffle过程是否压缩 建议开启
spark.io.compression.codec set spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec; 压缩的编码格式 建议默认值即可,这块对性能提升影响不是很大
spark.io.compression.snappy.block.size set spark.io.compression.snappy.block.size=65536; 压缩时每个block的size 建议默认值即可,这块对性能提升影响不是很大
spark.shuffle.manager=rss; set spark.shuffle.manager=rss; 使用平台的rss服务,用于超大shuffle作业的性能优化
2.2.3 其他

2.2 日志界面介绍与问题定位

http://spark-his.data.sankuai.com/history/application_1600078095841_2155359/1/stages/

模块 介绍
job 作业的基本job信息
Stages 作业划分各个stage的完成情况与并行度,以及task的执行情况
Storage 缓存的数据情况,比如cache操作
Environment 环境变量以及参数的设置
Executors 申请到的executor与task执行情况,日志,driver日志,以及堆栈
SQL 任务的dag图,以及Logical Plan ,Physical Plan等
Debug 任务资源申请的metrics

2.3 问题解决思路

ETL的优化尽量按照 分析日志 ----> 找到问题点 ----> 思考如何解决问题点 的模式进行,解决瓶颈问题需要靠设置参数,改部分逻辑联动进行,才能得到最大收益,因此一定要找到瓶颈问题对症下药。

任务优化举例:https://km.sankuai.com/page/469785434

2.3.1 oom

内存不足需要看任务执行到哪个stage,对应sql的哪块逻辑,是否有数据倾斜。

调整的话调大spark.executor.memory一般都可以解决。

2.3.2 数据倾斜

https://km.sankuai.com/page/150274845

2.3.3 资源不足

https://km.sankuai.com/page/345269968

3 一些开发上的建议

ETL:

尽量根据作业处理数据量合理设置 至少是(set spark.dynamicAllocation.maxExecutors=200; set spark.executor.memory=2g; set spark.executor.cores=5; set spark.sql.shuffle.partitions=800;)这四个参数;

读取hive表时,尽量加上分区的限制,无论是左表还是右表;

关联时,小表写在右边,尽量对小表进行cache处理;

关联时,尽量把右表的限制条件写在子查询里;

关联时,尽量避免笛卡尔积,实在无法避免的情况下,请做好分区扩张(调大spark.sql.shuffle.partitiions);

对于sql中多次使用的数据,尽量做cache处理;

使用grouping set的时,尽量做到输入最小的数据,最少的字段,尽可能通过设计数据结构避免count distinct。

RDD:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
1,避免创建重复的rdd
避免以下操作
val rdd1 = sqlContext.sql("select A.a,A.b from A").rdd
rdd1.map()
val rdd2 = sqlContext.sql("select A.a,A.b from A").rdd
rdd2.reduce()
2,尽量复用同一个rdd
val rdd1 = sqlContext.sql("select A.a,A.b from A").rdd
rdd1.map(_._1).map(...)...
rdd1.filter(_._1 > 5).map(...)...
避免以下操作
val rdd2 = sqlContext.sql("select A.a,A.b from A where A.a > 5").rdd.map(...)
val rdd2 = rdd1.map(_._1)
rdd2.map(...)
3,对多次使用的rdd,尽量进行持久化
rdd.persist(storageLevel)
4,尽量避免shuffle算子,如无法避免,使用map端预聚合的shuffle算子
使用reduceByKey,aggregateByKey 替换 groupByKey
5,使用高性能算子
使用mapPartition代替map:
使用mapPartitions时,函数会一次接受一个partition的数据中计算,性能会高。
使用foreachPartitions代替foreach:
原理类似mapPartitions,在批量插入数据(mysql等)时优势明显,因为避免了每条数据都调用数据库连接。
rdd在filter后根据过滤数据量做coalesce/repartition:
6,大变量用广播
val map = Map(1 -> "a")
val mapBc = sc.broadcast(map)
避免直接在map或reduce算子中使用driver声明的大变量

4 附录

其实平台已经对spark文档做过很详细的整理了,这里附一些链接,希望能帮到大家,日常大家在优化过程中有问题也可以找zhaoxu05或者duntianlun,我们一起讨论,共同解决问题,共同学习:

总目录: https://km.sankuai.com/page/69786825

Spark UI : https://km.sankuai.com/page/237124654

常见问题:https://km.sankuai.com/page/50309540

待整理config

  1. spark.sql.shuffle.partitions

[toc]

总结

大数据基础—Spark整体复习

sparkUI

Spark UI (1) - Jobs页面

Spark Web UI 监控详解

sparkSQL

spark sql多维分析优化——细节是魔鬼

从一个sql引发的hive谓词下推的全面复盘及源码分析(上)

优化

spark执行map-join优化

使用Spark进行搜狗日志分析实例——map join的使用

记录一次spark sql的优化过程

spark sql多维分析优化——提高读取文件的并行度

SparkConfiguration

Spark性能优化指南——高级篇

Streaming

Streaming Join

Introducing Stream-Stream Joins in Apache Spark 2.3

架构

Running Spark on YARN

Job Scheduling

Spark 如何并行执行多个job

Task如何共享变量

解决方案

Spark项目实战-数据倾斜解决方案之将reduce join转换为map join

Add jars to a Spark Job - spark-submit

spark3 jar加载顺序

use batch or streaming

5 Minutes Spark Batch Job vs Streaming Job

常见错误汇总

Container xxx is running beyond physical memory limits

spark读文件

http://www.waitingfy.com/archives/4325

http://www.waitingfy.com/archives/4342

读文件tips:

  1. 可以写路径"path/xxx/*" 来读取全部的文件

本地搭建spark环境

原理

Dynamic resource allocation in Spark

序列化

spark之kryo 序列化

配置

spark广播变量大小

调试

  1. Get current number of partitions of a DataFrame

questions

  1. spark batch任务可以并行执行吗
  2. spark.streams.awaitAnyTermination()
  3. java.lang.RuntimeException: Could not serialize lambda broadcast
  4. 为什么spark任务,建议每个executor不超过5个core
  5. spark广播变量的最大值

[toc]

考虑driver内存不够

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 153 tasks (10.2 GB) is bigger than spark.driver.maxResultSize (10.0 GB)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1580)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1568)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1567)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1567)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:825)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:825)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:825)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1798)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1750)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1739)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0
at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$1.apply(MapOutputTracker.scala:384)
at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$1.apply(MapOutputTracker.scala:381)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$$convertMapStatuses(MapOutputTracker.scala:380)
at org.apache.spark.MapOutputTracker.getServerStatuses(MapOutputTracker.scala:176)
at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.fetch(BlockStoreShuffleFetcher.scala:42)
at org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:40)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

https://stackoverflow.com/questions/28901123/why-do-spark-jobs-fail-with-org-apache-spark-shuffle-metadatafetchfailedexceptio

https://stackoverflow.com/questions/35918095/why-does-spark-fail-with-fetchfailed-error?noredirect=1&lq=1

https://community.cloudera.com/t5/Support-Questions/Spark-Metadata-Fetch-Failed-Exception-Missing-an-output/td-p/203771

NPE

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
21/07/01 19:58:59 WARN [SparkUI-42] server.HttpChannel: /api/v1/applications/application_1625117983144_0490/sql
javax.servlet.ServletException: java.lang.NullPointerException
at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
at org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:763)
at org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1633)
at org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
at org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1609)
at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:162)
at org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1609)
at org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:561)
at org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)
at org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:766)
at org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
at org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.sparkproject.jetty.server.Server.handle(Server.java:516)
at org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
at org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:556)
at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:375)
at org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
at org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.sparkproject.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
at org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
at org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773)
at org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at org.apache.spark.status.api.v1.sql.SqlResource.getMetric$1(SqlResource.scala:120)
at org.apache.spark.status.api.v1.sql.SqlResource.$anonfun$printableMetrics$3(SqlResource.scala:130)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
at org.apache.spark.status.api.v1.sql.SqlResource.$anonfun$printableMetrics$2(SqlResource.scala:130)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at org.apache.spark.status.api.v1.sql.SqlResource.printableMetrics(SqlResource.scala:127)
at org.apache.spark.status.api.v1.sql.SqlResource.prepareExecutionData(SqlResource.scala:97)
at org.apache.spark.status.api.v1.sql.SqlResource.$anonfun$sqlList$2(SqlResource.scala:45)
at scala.collection.immutable.Stream.map(Stream.scala:418)
at org.apache.spark.status.api.v1.sql.SqlResource.$anonfun$sqlList$1(SqlResource.scala:43)
at org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:140)
at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:117)
at org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:135)
at org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:133)
at org.apache.spark.status.api.v1.sql.SqlResource.withUI(SqlResource.scala:31)
at org.apache.spark.status.api.v1.sql.SqlResource.sqlList(SqlResource.scala:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:391)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:80)
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:253)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:232)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:680)
at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
... 39 more
21/07/01 19:58:59 WARN [SparkUI-42] server.HttpChannelState: unhandled due to prior sendError
javax.servlet.ServletException: java.lang.NullPointerException
at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
at org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:763)
at org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1633)
at org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
at org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1609)
at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:162)
at org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1609)
at org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:561)
at org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)
at org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:766)
at org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
at org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.sparkproject.jetty.server.Server.handle(Server.java:516)
at org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
at org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:556)
at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:375)
at org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
at org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.sparkproject.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
at org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
at org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773)
at org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at org.apache.spark.status.api.v1.sql.SqlResource.getMetric$1(SqlResource.scala:120)
at org.apache.spark.status.api.v1.sql.SqlResource.$anonfun$printableMetrics$3(SqlResource.scala:130)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
at org.apache.spark.status.api.v1.sql.SqlResource.$anonfun$printableMetrics$2(SqlResource.scala:130)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at org.apache.spark.status.api.v1.sql.SqlResource.printableMetrics(SqlResource.scala:127)
at org.apache.spark.status.api.v1.sql.SqlResource.prepareExecutionData(SqlResource.scala:97)
at org.apache.spark.status.api.v1.sql.SqlResource.$anonfun$sqlList$2(SqlResource.scala:45)
at scala.collection.immutable.Stream.map(Stream.scala:418)
at org.apache.spark.status.api.v1.sql.SqlResource.$anonfun$sqlList$1(SqlResource.scala:43)
at org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:140)
at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:117)
at org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:135)
at org.apache.spark.status.api.v1.BaseAppResource.withUI$(ApiRootResource.scala:133)
at org.apache.spark.status.api.v1.sql.SqlResource.withUI(SqlResource.scala:31)
at org.apache.spark.status.api.v1.sql.SqlResource.sqlList(SqlResource.scala:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:391)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:80)
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:253)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:232)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:680)
at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
... 39 more
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
21/08/20 02:52:56 ERROR TransportRequestHandler [rpc-server-3-3]: Error while invoking RpcHandler#receive() for one-way message.
org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:160)
at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:140)
at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:655)
at org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:274)
at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:105)
at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514)
at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
1
FetchFailed(null, shuffleId=0, mapIndex=-1, mapId=-1, reduceId=0, message= org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0 at org.apache.spark.MapOutputTracker$.$anonfun$convertMapStatuses$2(MapOutputTracker.scala:1010) at org.apache.spark.MapOutputTracker$.$anonfun$convertMapStatuses$2$adapted(MapOutputTracker.scala:1006) at scala.collection.Iterator.foreach(Iterator.scala:941) at scala.collection.Iterator.foreach$(Iterator.scala:941) at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at org.apache.spark.MapOutputTracker$.convertMapStatuses(MapOutputTracker.scala:1006) at org.apache.spark.MapOutputTrackerWorker.getMapSizesByExecutorId(MapOutputTracker.scala:811) at org.apache.spark.shuffle.sort.SortShuffleManager.getReader(SortShuffleManager.scala:128) at org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:185) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:91) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:106) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.sql.execution.SQLExecutionRDD.compute(SQLExecutionRDD.scala:55) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:362) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1374) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1301) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1365) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1189) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360) at org.apache.spark.rdd.RDD.iterator(RDD.scala:311) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:127) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) )

NoClassDefFoundError

怎么解决java.lang.NoClassDefFoundError错误

java.lang.NoClassDefFoundError: org/apache/logging/log4j/Logger

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
21/09/29 09:10:00 ERROR ApplicationMaster [Driver]: User class threw exception: java.lang.NoClassDefFoundError: org/apache/logging/log4j/core/lookup/AbstractLookup
java.lang.NoClassDefFoundError: org/apache/logging/log4j/core/lookup/AbstractLookup
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:406)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.logging.log4j.util.LoaderUtil.loadClass(LoaderUtil.java:168)
at org.apache.logging.log4j.util.LoaderUtil.newInstanceOf(LoaderUtil.java:207)
at org.apache.logging.log4j.util.LoaderUtil.newCheckedInstanceOf(LoaderUtil.java:228)
at org.apache.logging.log4j.core.util.Loader.newCheckedInstanceOf(Loader.java:311)
at org.apache.logging.log4j.core.lookup.Interpolator.<init>(Interpolator.java:124)
at org.apache.logging.log4j.core.config.AbstractConfiguration.<init>(AbstractConfiguration.java:129)
at org.apache.logging.log4j.core.config.NullConfiguration.<init>(NullConfiguration.java:32)
at org.apache.logging.log4j.core.LoggerContext.<clinit>(LoggerContext.java:85)
at org.apache.logging.log4j.core.selector.ClassLoaderContextSelector.createContext(ClassLoaderContextSelector.java:179)
at org.apache.logging.log4j.core.selector.ClassLoaderContextSelector.locateContext(ClassLoaderContextSelector.java:153)
at org.apache.logging.log4j.core.selector.ClassLoaderContextSelector.getContext(ClassLoaderContextSelector.java:78)
at org.apache.logging.log4j.core.selector.ClassLoaderContextSelector.getContext(ClassLoaderContextSelector.java:65)
at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:148)
at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:45)
at org.apache.logging.log4j.LogManager.getContext(LogManager.java:194)
at org.apache.logging.log4j.LogManager.getLogger(LogManager.java:581)
at com.microsoft.sam.SAMJobRunner$.<init>(SAMJobRunner.scala:9)
at com.microsoft.sam.SAMJobRunner$.<clinit>(SAMJobRunner.scala)
at com.microsoft.sam.SAMJobRunner.main(SAMJobRunner.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
Caused by: java.lang.ClassNotFoundException: org.apache.logging.log4j.core.lookup.AbstractLookup
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
... 39 more

因为引入了以下jar?
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.12.1</version>
</dependency>

tips:
1. 程序中用的是3.0的spark,但集群是2.4的版本
2. 3.0中弃用了org.apache.log4j,使用的是log4j2(org.apache.logging.log4j)

解决办法:
1. 注释掉log4j-core的引用
2. 程序中的Logger类替换成 org.apache.log4j.Logger

Task not serializable

表面上看,该序列化的类都做了序列化,所以怀疑是processorMap.asJava导致的

Scala converters convert Java collections to Wrapper objects

How to Serialize HashMap in Java?

java. util. HashMap$Values is NOT Serializable

Cannot Assume Serializability for Java Map’s Nested Classes

另外,需要了解下,

1
2
sparkConf.set("spark.serializer","org.apache.spark.serializer.KryoSerializer")
sparkConf.registerKryoClasses(Array(classOf[com.sdp.hbase.entity.IndexRowkeyMetaData]))

可以解决哪些序列化问题。

在 Spark 程序中,在 map 等算子内部使用了外部定义的变量和函数,从而引发 Task 未序列化问题。其中最普遍的是当引用了某个类的成员函数或变量时,会导致这个类的所有成员(整个类)都需要支持序列化。虽然许多情形下,类使用了extends Serializable声明支持序列化,但是由于某些字段不支持序列化,仍然会导致整个类序列化时出现问题,最终导致出现 Task 未序列化问题。

Apache Spark:Task not serializable异常的排查和解决

Spark Task未序列化(Task not serializable)问题分析及解决

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
public class LogRecordProcessorFactory implements Serializable {
private Map<String, GetRowsStragety> processorMap;

public LogRecordProcessorFactory(Map<String, GetRowsStragety> processorMap) {
this.processorMap = processorMap;
}

public GetRowsStragety getProcessor(String topic) {
return processorMap.getOrDefault(topic, null);
}
}

public interface GetRowsStragety {
List<MediationGenericRecord> getRows(byte[] bytes );
}

public class ServingLogProcessor extends LogRecordProcessor implements GetRowsStragety, Serializable {

private static Logger logger = Logger.getLogger(ServingLogProcessor.class);

private final LogType logType = LogType.Serving;

@Override
public List<MediationGenericRecord> getRows(byte[] bytes) {
...
}
}
1
2
3
4
5
6
import scala.collection.JavaConverters._

var processorMap : Map[String, GetRowsStragety] = Map()
processorMap += ("servinglog" -> new ServingLogProcessor())
logRecordProcessorFactory = new LogRecordProcessorFactory(processorMap.asJava)
logRecordProcessorFactory.getProcessor("servinglog")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
21/09/30 03:03:18 ERROR MicroBatchExecution [stream execution thread for [id = 9e6c637b-d8a1-4709-8949-ee3cdf7bbef7, runId = 49c19ca4-3204-44ff-805c-44029b734f74]]: Query [id = 9e6c637b-d8a1-4709-8949-ee3cdf7bbef7, runId = 49c19ca4-3204-44ff-805c-44029b734f74] terminated with error
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:403)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:393)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2330)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:850)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:849)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.mapPartitionsWithIndex(RDD.scala:849)
at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:630)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.GenerateExec.doExecute(GenerateExec.scala:80)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:391)
at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.streaming.EventTimeWatermarkExec.doExecute(EventTimeWatermarkExec.scala:99)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.AppendColumnsExec.doExecute(objects.scala:261)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:92)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:128)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:391)
at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:121)
at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.streaming.FlatMapGroupsWithStateExec.doExecute(FlatMapGroupsWithStateExec.scala:107)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:391)
at org.apache.spark.sql.execution.SerializeFromObjectExec.inputRDDs(objects.scala:110)
at org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121)
at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.UnionExec$$anonfun$doExecute$1.apply(basicPhysicalOperators.scala:582)
at org.apache.spark.sql.execution.UnionExec$$anonfun$doExecute$1.apply(basicPhysicalOperators.scala:582)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:582)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.streaming.EventTimeWatermarkExec.doExecute(EventTimeWatermarkExec.scala:99)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.AppendColumnsExec.doExecute(objects.scala:261)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:92)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:128)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:391)
at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:121)
at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.streaming.FlatMapGroupsWithStateExec.doExecute(FlatMapGroupsWithStateExec.scala:107)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:391)
at org.apache.spark.sql.execution.SerializeFromObjectExec.inputRDDs(objects.scala:110)
at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.UnionExec$$anonfun$doExecute$1.apply(basicPhysicalOperators.scala:582)
at org.apache.spark.sql.execution.UnionExec$$anonfun$doExecute$1.apply(basicPhysicalOperators.scala:582)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:582)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:391)
at org.apache.spark.sql.execution.DeserializeToObjectExec.inputRDDs(objects.scala:76)
at org.apache.spark.sql.execution.MapElementsExec.inputRDDs(objects.scala:207)
at org.apache.spark.sql.execution.SerializeFromObjectExec.inputRDDs(objects.scala:110)
at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41)
at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:627)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:33)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$17.apply(MicroBatchExecution.scala:537)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5.apply(MicroBatchExecution.scala:535)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:534)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
Caused by: java.io.NotSerializableException: scala.collection.convert.Wrappers$MapWrapper
Serialization stack:
- object not serializable (class: scala.collection.convert.Wrappers$MapWrapper, value: {samauctionlogs-ueq=com.microsoft.ads.bi.processor.impl.ServingLogProcessor@3d519353, sambeacon-ueq=com.microsoft.ads.bi.processor.impl.ViewabilityLogProcessor@90c5f43, samclickbeacon-ueq=com.microsoft.ads.bi.processor.impl.ClickLogProcessor@2c55e9c})
- field (class: com.microsoft.ads.bi.processor.LogRecordProcessorFactory, name: processorMap, type: interface java.util.Map)
- object (class com.microsoft.ads.bi.processor.LogRecordProcessorFactory, com.microsoft.ads.bi.processor.LogRecordProcessorFactory@6287391e)
- field (class: com.microsoft.sam.SAMStreamingJob, name: logRecordProcessorFactory, type: class com.microsoft.ads.bi.processor.LogRecordProcessorFactory)
- object (class com.microsoft.sam.SAMStreamingJob, com.microsoft.sam.SAMStreamingJob@d67a7c6)
- field (class: com.microsoft.sam.SAMStreamingJob$$anonfun$callLogRecordProcessor$1, name: $outer, type: class com.microsoft.sam.SAMStreamingJob)
- object (class com.microsoft.sam.SAMStreamingJob$$anonfun$callLogRecordProcessor$1, <function2>)
- element of array (index: 3)
- array (class [Ljava.lang.Object;, size 4)
- field (class: org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13, name: references$1, type: class [Ljava.lang.Object;)
- object (class org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13, <function2>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:400)
... 196 more
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
22/10/12 23:35:56 ERROR [rpc-server-4-6] client.TransportClient: Failed to send RPC RPC 6830745377217288298 to /100.103.234.130:38594: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
22/10/12 23:35:56 WARN [block-manager-ask-thread-pool-164719] storage.BlockManagerMasterEndpoint: Error trying to remove broadcast 18736 from block manager BlockManagerId(165, CO01APBA132288B, 38614, None)
java.io.IOException: Failed to send RPC RPC 6830745377217288298 to /100.103.234.130:38594: java.nio.channels.ClosedChannelException
at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:363) ~[spark-network-common_2.12-3.0.1-mt-SNAPSHOT.jar:3.0.1-mt-SNAPSHOT]
at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:340) ~[spark-network-common_2.12-3.0.1-mt-SNAPSHOT.jar:3.0.1-mt-SNAPSHOT]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:993) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
... 12 more
22/10/12 23:35:56 ERROR [rpc-server-4-6] client.TransportClient: Failed to send RPC RPC 6308174558094175038 to /100.103.230.170:56350: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
22/10/12 23:35:56 WARN [block-manager-ask-thread-pool-164714] storage.BlockManagerMasterEndpoint: Error trying to remove broadcast 18736 from block manager BlockManagerId(136, CO01AP2455B8898, 56404, None)
java.io.IOException: Failed to send RPC RPC 6308174558094175038 to /100.103.230.170:56350: java.nio.channels.ClosedChannelException
at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:363) ~[spark-network-common_2.12-3.0.1-mt-SNAPSHOT.jar:3.0.1-mt-SNAPSHOT]
at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:340) ~[spark-network-common_2.12-3.0.1-mt-SNAPSHOT.jar:3.0.1-mt-SNAPSHOT]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:993) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
... 12 more






22/10/12 23:34:57 INFO [Thread-44891] autopilot.AutopilotTopologyMapping: Read 0 physical mappings (removed 0) in 352 ms
22/10/12 23:35:01 INFO [task-result-getter-3] scheduler.TaskSetManager: Finished task 145.0 in stage 1734.0 (TID 7309398) in 148688 ms on CO01AP512A3516E (executor 8) (998/1000)
22/10/12 23:35:14 INFO [task-result-getter-0] scheduler.TaskSetManager: Finished task 749.0 in stage 1734.0 (TID 7309924) in 161694 ms on CO01AP7F54C63C6 (executor 209) (999/1000)
22/10/12 23:35:54 INFO [task-result-getter-1] scheduler.TaskSetManager: Finished task 480.0 in stage 1734.0 (TID 7309162) in 205635 ms on CO01AP36214B261 (executor 17) (1000/1000)
22/10/12 23:35:54 INFO [task-result-getter-1] cluster.YarnClusterScheduler: Removed TaskSet 1734.0, whose tasks have all completed, from pool
22/10/12 23:35:54 INFO [dag-scheduler-event-loop] scheduler.DAGScheduler: ResultStage 1734 (foreachPartition at M2BMain.scala:233) finished in 205.773 s
22/10/12 23:35:54 INFO [dag-scheduler-event-loop] scheduler.DAGScheduler: Job 1387 is finished. Cancelling potential speculative or zombie tasks for this job
22/10/12 23:35:54 INFO [dag-scheduler-event-loop] cluster.YarnClusterScheduler: Killing all running tasks in stage 1734: Stage finished
22/10/12 23:35:54 INFO [Driver] scheduler.DAGScheduler: Job 1387 finished: foreachPartition at M2BMain.scala:233, took 205.787261 s
22/10/12 23:35:54 ERROR [spark-listener-group-eventLog] scheduler.AsyncEventQueue: Listener EventLoggingListener threw an exception
java.io.IOException: Lease timeout of 30 seconds expired.
at org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:763) ~[hadoop-hdfs-client-2.9.2-MT-SNAPSHOT.jar:?]
at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:656) ~[hadoop-hdfs-client-2.9.2-MT-SNAPSHOT.jar:?]
at org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:434) ~[hadoop-hdfs-client-2.9.2-MT-SNAPSHOT.jar:?]
at org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$600(LeaseRenewer.java:76) ~[hadoop-hdfs-client-2.9.2-MT-SNAPSHOT.jar:?]
at org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:307) ~[hadoop-hdfs-client-2.9.2-MT-SNAPSHOT.jar:?]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_202]
22/10/12 23:35:54 ERROR [spark-listener-group-eventLog] scheduler.AsyncEventQueue: Listener EventLoggingListener threw an exception
java.io.IOException: Lease timeout of 30 seconds expired.
at org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:763) ~[hadoop-hdfs-client-2.9.2-MT-SNAPSHOT.jar:?]
at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:656) ~[hadoop-hdfs-client-2.9.2-MT-SNAPSHOT.jar:?]
at org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:434) ~[hadoop-hdfs-client-2.9.2-MT-SNAPSHOT.jar:?]
at org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$600(LeaseRenewer.java:76) ~[hadoop-hdfs-client-2.9.2-MT-SNAPSHOT.jar:?]
at org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:307) ~[hadoop-hdfs-client-2.9.2-MT-SNAPSHOT.jar:?]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_202]
22/10/12 23:35:54 ERROR [spark-listener-group-eventLog] scheduler.AsyncEventQueue: Listener EventLoggingListener threw an exception
java.io.IOException: Lease timeout of 30 seconds expired.
at org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:763) ~[hadoop-hdfs-client-2.9.2-MT-SNAPSHOT.jar:?]
at org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:656) ~[hadoop-hdfs-client-2.9.2-MT-SNAPSHOT.jar:?]
at org.apache.hadoop.hdfs.client.impl.LeaseRenewer.run(LeaseRenewer.java:434) ~[hadoop-hdfs-client-2.9.2-MT-SNAPSHOT.jar:?]
at org.apache.hadoop.hdfs.client.impl.LeaseRenewer.access$600(LeaseRenewer.java:76) ~[hadoop-hdfs-client-2.9.2-MT-SNAPSHOT.jar:?]
at org.apache.hadoop.hdfs.client.impl.LeaseRenewer$1.run(LeaseRenewer.java:307) ~[hadoop-hdfs-client-2.9.2-MT-SNAPSHOT.jar:?]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_202]
22/10/12 23:35:54 ERROR [rpc-server-4-6] client.TransportClient: Failed to send RPC RPC 7505683107670261422 to /100.103.234.130:38594: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
22/10/12 23:35:54 WARN [block-manager-ask-thread-pool-164647] storage.BlockManagerMasterEndpoint: Error trying to remove broadcast 18734 from block manager BlockManagerId(165, CO01APBA132288B, 38614, None)
java.io.IOException: Failed to send RPC RPC 7505683107670261422 to /100.103.234.130:38594: java.nio.channels.ClosedChannelException
at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:363) ~[spark-network-common_2.12-3.0.1-mt-SNAPSHOT.jar:3.0.1-mt-SNAPSHOT]
at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:340) ~[spark-network-common_2.12-3.0.1-mt-SNAPSHOT.jar:3.0.1-mt-SNAPSHOT]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:993) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
... 12 more
22/10/12 23:35:54 ERROR [rpc-server-4-6] client.TransportClient: Failed to send RPC RPC 9181049462703589877 to /100.103.230.170:56350: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException: null
at io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.47.Final.jar:4.1.47.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
22/10/12 23:35:54 WARN [block-manager-ask-thread-pool-164648] storage.BlockManagerMasterEndpoint: Error trying to remove broadcast 18734 from block manager BlockManagerId(136, CO01AP2455B8898, 56404, None)
java.io.IOException: Failed to send RPC RPC 9181049462703589877 to /100.103.230.170:56350: java.nio.channels.ClosedChannelException
at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:363) ~[spark-network-common_2.12-3.0.1-mt-SNAPSHOT.jar:3.0.1-mt-SNAPSHOT]
at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:340) ~[spark-network-common_2.12-3.0.1-mt-SNAPSHOT.jar:3.0.1-mt-SNAPSHOT]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:608) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:993) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1104) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) ~[netty-all-4.1.47.Final.jar:4.1.47.Final]
... 12 more


这个节点一直在报错,是否开启blacklist会缓解这个问题?

lambda serialize

广播kafka连接,然后在foreach中send msg

How and Why to Serialize Lambdas

Lambda表达式秒用——SerializedLambda序列化

spark无法序列化lambda

spark Could not serialize lambda

How to serialize a lambda?

Configure function/lambda serialization in Spark

ExecutionPolicy

Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope MachinePolicy
Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope UserPolicy
Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope Process
Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope CurrentUser
Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope LocalMachine

Get-ExecutionPolicy -List

Set-ExecutionPolicy Bypass -Scope Process -Force; Copy-Item \cosbj-01\public\CosmosPowerShell\builds\install.ps1 $HOME\Downloads\install.ps1; & $HOME\Downloads\install.ps1 -Force

timestamp to date

  1. second: =(A1+8*3600)/86400+70*365+19
  2. millsecond: =(A1+8*3600)/8640000+70*365+19

date to timestamp

  1. second: =(B1-70*365-19)*86400-8*3600