How to kill a monster workflow 👹

danielhollas · December 17, 2023, 2:01pm

So, uh. Somebody (I hope inadvertently) decided to push the limit of our AiiDAlab deployment and submitted a WorkChain that is kind of big.
Specifically, verdi process list now counts ~18000 processes (and counting!) in various stages of “running”. I tried to increase the number of daemon workers and tweak various poll intervals but the overall system is hopelessly stuck, and verdi process kill either hangs or times out saying that process is unreachable.

While I have some ideas how to prevent this situation in the future, the question now is, how do I stop this monster? Any advice would be greatly appreciated. (thoughts and prayers are welcome as well in these trying times )

sphuber · December 17, 2023, 2:57pm

Stop the daemon. Then you can simply delete the nodes. Once the daemon starts again, any worker that receives the task associated with a deleted node will note this and just discard the task

danielhollas · December 23, 2023, 10:57am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.

danielhollas · December 23, 2023, 7:02pm

Thanks @sphuber! I am not sure what I expected, perhaps something more magical, but this indeed did the trick.

It is also really nice that I could just run verdi node delete on the root-level WorkChain and the whole workflow tree was deleted due to provenance rules.

Topic		Replies	Views
How to debug a workflow stuck in "Waiting" state General Usage question	25	234	January 24, 2024
Aiida not running past 'Created' General Usage	3	122	August 9, 2023
Graceful kill - instruct a paused job to retrieve results General Usage question	7	99	November 29, 2023
Cannot kill a process New to AiiDA	2	119	February 26, 2024
How to stop the running workflow? New to AiiDA	2	64	March 1, 2024

How to kill a monster workflow 👹

Related topics