| 生而为人

1
2

21/09/08 03:17:10 WARN AbstractCoordinator [kafka-coordinator-heartbeat-thread | spark-kafka-source-babd462a-415a-413a-aeea-e55435fff762-448121830-driver-0]: [Consumer clientId=consumer-1, groupId=spark-kafka-source-babd462a-415a-413a-aeea-e55435fff762-448121830-driver-0] This member will leave the group because consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.

21/09/06 07:15:58 WARN TaskSetManager [task-result-getter-3]: Lost task 78.0 in stage 11.0 (TID 3591, wn29-msnbi.awfbdxsze1iudhhki0l2sbzfaf.bx.internal.cloudapp.net, executor 22): org.apache.kafka.common.errors.TimeoutException: Failed to allocate memory within the configured max blocking time 60000 ms.

21/09/06 07:16:57 WARN TaskSetManager [task-result-getter-1]: Lost task 112.0 in stage 11.0 (TID 3625, wn29-msnbi.awfbdxsze1iudhhki0l2sbzfaf.bx.internal.cloudapp.net, executor 22): org.apache.kafka.common.errors.TimeoutException: Expiring 22 record(s) for sambeacon-pressure-test-183:120029 ms has passed since batch creation

21/09/06 07:16:57 WARN TaskSetManager [task-result-getter-0]: Lost task 248.0 in stage 11.0 (TID 3761, wn29-msnbi.awfbdxsze1iudhhki0l2sbzfaf.bx.internal.cloudapp.net, executor 22): org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.

Cannot fetch record for offset xxx in 120000 milliseconds

backgroud:
1. 读取3个topic，每个256partition
2. 后续的default shuffle partitions 为1200，但还没执行到这个相关的
3. offset xxx 不是我指定的startingOffsets
4. 现象是卡在读取数据的stage，并且没有任何task完成

结论：
1. 没有设置maxOffsetPerTrigger，导致读了从startingOffsets以后全部的历史数据，数据量太大，导致卡死