Finally getting around to implementing my WorkChain and could use some help with aiida_submission_controller
First, I want to know if what I want is possible - it seems to be, by using the FromGroupSubmissionController
My ideal steps are:
Provide a list of input files to my WorkChain
It does some processing and launches a number of Gaussian calculations on a remote cluster, this does not need to be limited as the slurm scheduler will take care of that
These need to then be processed on a local program, but I canāt run 100 at a time. This is where Iād like to use the FromGroupSubmissionController to run, e.g. 4 at a time. Might need to periodically check (or rerun) to see if new calculations need to be submitted
Then I want to resubmit a calculation using those results to the cluster, and donāt need the controller (and really, should not have the controller, as that would be limiting)
So, firstly, just want to verify that this is possible!
Next, I already have the base WorkChain for all the above steps, which I can break apart for the submission controller.
I see these use Groups and extras. I imagine I can just modify the WorkChain to store all the Gaussian calculation nodes in a group with a label that I provide to the controller.
However, for my second question - can someone explain the āextrasā that this controller works with? How, or do I, need to modify my Gaussian calculation nodes for this?
Hi @kmlefran , I am not the author of the aiida-submission-controller but I have had a quick look and think I have a good understanding of how it is supposed to work. In principle, you subclass one of the controllers and implement the necessary methods and then to use it, you repeatedly call submit_new_batch and it will submit as much new processes as there are slots available.
The examples show that you can typically add this call in a Python script and then call this using a bash loop: while true ; do verdi run add_in_batches.py ; sleep 5 ; done
Now I am not sure that this will work when integrating with a WorkChain. At some point you will have to add a step in your workchain that does this loop to submit new processes as slots come available, but this while-loop will be blocking and the workchain cannot do anything else. With the current API, it is not possible for you to relinquish control to the workchain and have it do other things, such as submit more Gaussian calculations or check on their status.
But that being said, it is not necessary to wrap all of your functionality in a single WorkChain. Rather, you can have a simple Python script that launches the gaussian calculations and then starts the submission controller to launch the post-processing jobs locally.
However, for my second question - can someone explain the āextrasā that this controller works with? How, or do I, need to modify my Gaussian calculation nodes for this?
Extras in AiiDA function kind of like ātagsā. Any node can have extras and they are essentially like a JSON dictionary. The submission controller uses these extras to tag certain nodes in order to figure out which nodes to submit next.
Okay, I think Iāve figured it out. Thanks for your reply!
Like you said, it doesnāt need to be in a single WorkChain, so Iāll have an initial WorkChain that does my initial processing then the Gaussian, and Iāll have a script similar to add_in_batches running, but using the FromGroupSubmissionController. This will launch a second WorkChain, and Iāll have a similar script looking for outputs from that WorkChain to submit back to Gaussian.
Just need to split my code apart, and add some modifications, but I think it will work.
For the extras, I was having trouble figuring out how to add them for the FromGroupSubmissionController so Iāll just put my solution here if anyone does a search in the future
from aiida.orm.extras import EntityExtras
my_node_extras = EntityExtras(my_node)
my_node_extras.set('test_key','test_value')
# verify extras added with
my_node.extras
would you be open to sharing your final workflow / WorkChain? Is it per-chance available somewhere on GitHub? I am currently dealing with similar issue and am interested in technical details. Thanks!