Job Arrays / Task Farming

Dear Aiida community,

I am currently using a QE workchain in Aiida (specifically aiida_hubbard) that consists of a “bigger” series of calcjobs – namely a cycle of scf-, vc-relax and hp-calculations.

I execute these tasks on the CSCS cluster, daint.alps, that uses slurm to distribute tasks. Individual calculations in my system are rather short, something between 5 minutes to 1 hour.

Now, executing the workchain results in the creation of maybe 30 tasks, each submitted as its own calculation. So, I run into the following problems / inconveniences:

  1. WorkChain takes super long to finish, as the individual task take a long time to get through the queue
  2. My Slurm priority decreases with each calculation, leading to even longer waiting times.

Are there methods to exploit the short execution times of my workflow? I am looking for something where I request a node on CSCS for 24 hours, and all sequential steps of the Workchain get executed on this very node without requesting new resources. Does something like that already exist?

Thanks and warm regards,
Stefan

Yes, the suggestion is to use hyperqueue as a meta scheduler, that you can install on Alps, and that AiiDA supports via GitHub - aiidateam/aiida-hyperqueue: AiiDA plugin for the HyperQueue metascheduler.

I suggest you try it out and configure for the CSCS machines; if there are issues, probably @mbercx or @jusong.yu or @t-reents or @Minotakm or others can help (internal note - we could also have some internal documentation, or in a specific section on the plugin, specific to how to setup the Alps supercomputer)