Kill job allocation when workflow ends

I’ve been working on a persistent scheduler plugin which creates an allocation that it can proxy into and submit more jobs. I have it all working but have now come to the part where I need to kill the allocation if the jobs finish before hitting the maximum walltime allocated. Is there a function in the scheduler section or other parts of aiida-core that would get triggered after a job is finished to see if there are more jobs to be submitted in the workchain?

Hi,

just to confirm, do you want to terminate the job allocation from AiiDA when no more jobs exist? This might be possible but tricky. Some other similar solutions, e.g. hyperqueue (HyperQueue), just kill the job on the scheduler if no more jobs are submitted for, say, 5 mins.

BTW, did you already check hyper queue? Maybe it solves your needs, and we have a plugin for it (GitHub - aiidateam/aiida-hyperqueue: AiiDA plugin for the HyperQueue metascheduler.)

Thank you for the feedback. Yes I’ve had HyperQueue suggested by several people. I’ll take a look to see how they implement this feature. I still do need to develop the Flux plugin as that is what we are using at LLNL.

@giovannipizzi Could you point me to the example you were mentioning. I went through the HQ plugin but didn’t see a functionality that kills the job allocation.

Sorry, what I meant is that hyperqueue takes care of this. If you stop submitting, so eventually a worker on the scheduler has no jobs to poll and processs, after a configurable time (default: 5 minutes) the worker will terminate to avoid wasting computational nodes. This is the [idle timeout in hyperqueue]( Workers - HyperQueue ). I think it’s a good idea to try to push this outside of AiiDA and let the “meta” scheduler take care of it as it will be more robust. E.g.: it can work by reusing workers also between independent workflows (even from independent AiiDA profiles/databases, if wanted), and if will “guarantee” that the jobs stop after X minutes independent of AiiDA (e.g., say there is a connection issue or you rebooted the AiiDA machine - this might prevent AiiDA from promptly terminating your allocations)

Yeah those points make a lot of sense. I ended up putting a simple bash command to kill the allocation if there were no jobs after so many minutes. Thanks for the suggestions.