We recently finished a Hackathon and I was discussing with one of the developers of another code, INQ, that was interested in AiiDA. He has his code written in such a way that it can dynamically be controlled at any point and have settings changed on the fly (or at least that’s what I’m understanding). I’m wondering if there is a way that AiiDA could take advantage of this currently or if that would fall to far outside the scope.
From my current understanding, AiiDA is simply checking whether the job is currently running on the designated computer and if it finds it to be done it’ll run the parser and move on to the next step in a work flow. His question was if AiiDA could be configured to check the status of the job and make small adjustments as needed which made me think something along the lines of the error handlers that the workflows have. I know this is different than what is currently implemented but thought it was a nice idea and wanted to see if anyone else is thinking about this. I’m also not sure if perhaps ASE or some other python module is already capable of this and might be a better point to look into. Thanks for the thoughts and discussion in advance.
Hi Nathan, interesting point, thanks for raising it. It is true, at the core of AiiDA’s data model, a calculation on a remote computer is somewhat treated as a black box. It submits it to the scheduler, and then just polls that until the scheduler says the job was terminated, at which point the files are retrieved. In that sense, it doesn’t really allow for much “workflow” logic on the remote. This was a design choice in order to make it easy to distribute AiiDA workloads to heterogeneous compute resources since it wouldn’t require installing AiiDA and its dependencies there.
Fireworks is another well-known project in the workflow management sphere of computational materials science that has taken a different approach, as in that there the manager does run on the remote. This has been leveraged by Custodian (also by the Materials Project) to provide workflow management on a job level exactly as you describe. It has particular good support for VASP, and in their model, the scheduler job is running Custodian (and not VASP directly) and it is responsible for running VASP and reacting to problems or adjusting parameters as the simulation progresses. This model definitely is worth looking into as a potential solution.
Since this is a common and use-case though, I have since implemented a similar feature in AiiDA called “job monitors”. Essentially, when launching a calculation job to run on a remote computer, you can attach one or multiple monitors, which are essentially simple Python functions. They are called ever so often, while the job is running, by AiiDA’s engine and they have access to the remote working directory of the job, allowing them to check output files and execute commands, for example to create new files, edit existing files and or cancel the job. For more details, I would refer you to the documentation that gives examples and more information on all supported options and configuration.
Without knowing how INQ works or exactly what they would want to accomplish, it is difficult to say if the monitors would be a good solution, but it seems worth taking a look. If they have a more concrete example or questions, I would be happy to provide more feedback and or suggestions.