已复制
全屏展示
复制代码

Airflow异常task killed externally


· 1 min read

问题场景

task 的状态莫名奇妙的成 failed 的了,task 出错没有任何运行 task 的日志,只在 Scheduler 打印的日志中看到如下信息,信息表明是 celery 的Worker 执行任务时出错了。

2023-06-29 10:03:07,330 scheduler_job.py:510 INFO - Executor reports execution of goods_day_dag.ods_goods_detail run_id=scheduled__2023-06-29T01:02:00+00:00 exited with status failed for try_number 1
2023-06-29 10:03:07,337 scheduler_job.py:563 INFO - TaskInstance Finished: dag_id=goods_day_dag, task_id=ods_goods_detail, run_id=scheduled__2023-06-29T01:02:00+00:00, run_start_date=None, run_end_date=None, run_duration=None, state=queued, executor_state=failed, try_number=1, max_tries=0, job_id=None, pool=default_pool, queue=default, priority_weight=19998, operator=HivePartitionSensor
2023-06-29 10:03:07,337 scheduler_job.py:572 ERROR - Executor reports task instance <TaskInstance: goods_day_dag.ods_goods_detail scheduled__2023-06-29T01:02:00+00:00 [queued]> finished (failed) although the task says its queued. (Info: None) Was the task killed externally?
2023-06-29 10:03:07,338 taskinstance.py:1705 ERROR - Executor reports task instance <TaskInstance: goods_day_dag.ods_goods_detail scheduled__2023-06-29T01:02:00+00:00 [queued]> finished (failed) although the task says its queued. (Info: None) Was the task killed externally?
2023-06-29 10:03:07,343 taskinstance.py:1280 INFO - Marking task as FAILED. dag_id=goods_day_dag, task_id=ods_goods_detail, execution_date=20230629T010200, start_date=, end_date=20230629T010307

解决方法

查阅官方文档  https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/tasks.html#zombie-undead-tasks ,官网建议添加 retry 机制。

解决:重启celery worker,同时把任务添加重试参数 retries,以及设置合适的 retry_delay

with DAG(
        ...
        default_args={
            ...
            'retries': 3,
            'retry_delay': datetime.timedelta(minutes=5),
            ...
        },
        ...
) as dag:
    pass
🔗

文章推荐