Job Arrays / Task Farming

Schastef · April 22, 2025, 10:22am

Dear Aiida community,

I am currently using a QE workchain in Aiida (specifically aiida_hubbard) that consists of a “bigger” series of calcjobs – namely a cycle of scf-, vc-relax and hp-calculations.

I execute these tasks on the CSCS cluster, daint.alps, that uses slurm to distribute tasks. Individual calculations in my system are rather short, something between 5 minutes to 1 hour.

Now, executing the workchain results in the creation of maybe 30 tasks, each submitted as its own calculation. So, I run into the following problems / inconveniences:

WorkChain takes super long to finish, as the individual task take a long time to get through the queue
My Slurm priority decreases with each calculation, leading to even longer waiting times.

Are there methods to exploit the short execution times of my workflow? I am looking for something where I request a node on CSCS for 24 hours, and all sequential steps of the Workchain get executed on this very node without requesting new resources. Does something like that already exist?

Thanks and warm regards,
Stefan

giovannipizzi · April 22, 2025, 10:42am

Yes, the suggestion is to use hyperqueue as a meta scheduler, that you can install on Alps, and that AiiDA supports via GitHub - aiidateam/aiida-hyperqueue: AiiDA plugin for the HyperQueue metascheduler.

I suggest you try it out and configure for the CSCS machines; if there are issues, probably @mbercx or @jusong.yu or @t-reents or @Minotakm or others can help (internal note - we could also have some internal documentation, or in a specific section on the plugin, specific to how to setup the Alps supercomputer)

Topic		Replies	Views
How to batch multiple CalcJobs in single submission (SLURM scheduler)? General Usage schedulers	1	110	July 24, 2023
Is there a Desktop Job Scheduler for MacOS New to AiiDA question	5	224	November 27, 2024
Run only one job on local machine Developer question , schedulers , aiida , plugin	2	64	November 27, 2024
Implementing the Flux scheduler from LLNL Developer question , aiida , plugin	11	130	April 16, 2025
Stderr=sbatch: error: Batch job submission failed: Memory required by task is not available New to AiiDA	8	153	February 18, 2024

Job Arrays / Task Farming

Related topics