AiiDA Submission Controller Help

Hi all,

Finally getting around to implementing my WorkChain and could use some help with aiida_submission_controller

First, I want to know if what I want is possible - it seems to be, by using the FromGroupSubmissionController

My ideal steps are:

  1. Provide a list of input files to my WorkChain
  2. It does some processing and launches a number of Gaussian calculations on a remote cluster, this does not need to be limited as the slurm scheduler will take care of that
  3. These need to then be processed on a local program, but I canā€™t run 100 at a time. This is where Iā€™d like to use the FromGroupSubmissionController to run, e.g. 4 at a time. Might need to periodically check (or rerun) to see if new calculations need to be submitted
  4. Then I want to resubmit a calculation using those results to the cluster, and donā€™t need the controller (and really, should not have the controller, as that would be limiting)

So, firstly, just want to verify that this is possible!

Next, I already have the base WorkChain for all the above steps, which I can break apart for the submission controller.

I see these use Groups and extras. I imagine I can just modify the WorkChain to store all the Gaussian calculation nodes in a group with a label that I provide to the controller.

However, for my second question - can someone explain the ā€˜extrasā€™ that this controller works with? How, or do I, need to modify my Gaussian calculation nodes for this?

Thanks in advance!

Hi @kmlefran , I am not the author of the aiida-submission-controller but I have had a quick look and think I have a good understanding of how it is supposed to work. In principle, you subclass one of the controllers and implement the necessary methods and then to use it, you repeatedly call submit_new_batch and it will submit as much new processes as there are slots available.

The examples show that you can typically add this call in a Python script and then call this using a bash loop: while true ; do verdi run add_in_batches.py ; sleep 5 ; done

Now I am not sure that this will work when integrating with a WorkChain. At some point you will have to add a step in your workchain that does this loop to submit new processes as slots come available, but this while-loop will be blocking and the workchain cannot do anything else. With the current API, it is not possible for you to relinquish control to the workchain and have it do other things, such as submit more Gaussian calculations or check on their status.

But that being said, it is not necessary to wrap all of your functionality in a single WorkChain. Rather, you can have a simple Python script that launches the gaussian calculations and then starts the submission controller to launch the post-processing jobs locally.

However, for my second question - can someone explain the ā€˜extrasā€™ that this controller works with? How, or do I, need to modify my Gaussian calculation nodes for this?

Extras in AiiDA function kind of like ā€œtagsā€. Any node can have extras and they are essentially like a JSON dictionary. The submission controller uses these extras to tag certain nodes in order to figure out which nodes to submit next.

Okay, I think Iā€™ve figured it out. Thanks for your reply!

Like you said, it doesnā€™t need to be in a single WorkChain, so Iā€™ll have an initial WorkChain that does my initial processing then the Gaussian, and Iā€™ll have a script similar to add_in_batches running, but using the FromGroupSubmissionController. This will launch a second WorkChain, and Iā€™ll have a similar script looking for outputs from that WorkChain to submit back to Gaussian.

Just need to split my code apart, and add some modifications, but I think it will work.

For the extras, I was having trouble figuring out how to add them for the FromGroupSubmissionController so Iā€™ll just put my solution here if anyone does a search in the future

from aiida.orm.extras import EntityExtras
my_node_extras = EntityExtras(my_node)
my_node_extras.set('test_key','test_value')
# verify extras added with
my_node.extras

Good to hear you found a solution. Regarding the extras, those you can access directly on the node:

my_node.base.extras.set('test_key','test_value')
my_node.base.extras.set_many({
    'a': 1,
    'b': 2,
})
my_node.base.extras.get('a')
my_node.base.extras.delete('a')

etc.

Ah, I was missing the .base

I had been trying my_node.extras.set(ā€¦) etc, but obviously my_node.extras is just a dictionary

Thanks!

1 Like

Hi @kmlefran,

would you be open to sharing your final workflow / WorkChain? Is it per-chance available somewhere on GitHub? I am currently dealing with similar issue and am interested in technical details. Thanks!

Hi!

I just made the repository public and released. on PyPi.

Of interest to you:

  • I created an example folder with the notebook Iā€™m using to run currently.
  • The controllers.py in aiida_aimall/ shows how I defined the four controllers I use
  • The calculations.py and work chains.py show how I added the extras as tags in my output. (Search for ā€œextrasā€)

Here I used a chemical identifier, a SMILES code, as the tag. Note that when you are using the controller in your case the tags need to be unique!

Thatā€™s all the time I have now to write an explanation, but take a look and let me know if I can help you any more!

2 Likes

Oh, and I didnā€™t include the data Iā€™m running on, so the cell looking for the files wonā€™t execute properly

2 Likes