Obtain a `CalcJob` instance from a corresponding `CalcJobNode` or builder

Hello everybody,
for the implementation of PR #6276 (dumping of workchain files for inspection), for each CalcJobNode, we currently dump the files of the node’s repository, the retrieved outputs, and any file-based input-nodes. This works well for inspection of the workchain and doesn’t require the respective aiida-plugin to be installed.

In addition, we would also like to provide an option to dump the files related to each CalcJobNode in such a way that one could directly re-submit each job*. One option to achieve this could be through the use of the prepare_for_submission method of the respective CalcJob. However, as CalcJob and CalcJobNode originate from separate, unrelated inheritance hierarchies**, I don’t think there is a direct way to reconstruct the former Process from the latter ProcessNode (possibly related to this outdated PR)?

An alternative approach is to get a prepopulated builder via the get_builder_restart method of the CalcJobNode and just run a dry_run, however, that dumps the files into directories named by the localtime under submit_test in the current working directory. I haven’t found a way to provide a custom path here, which would be necessary to be able to mirror the workchain logic (which is what verdi workchain dump would otherwise do).

Maybe somebody with more experience knows how to make one of the two options work as desired.
Thank you!

* This would require the respective aiida-plugin to be installed, and jobs would still have to be run manually one by one
** CalcJob -> Process -> plumpy.processes.Process
CalcJobNode -> CalculationNode -> ProcessNode -> (Sealable, Node)

You are correct that there is no way (once the process is terminated) to reconstruct a Process from a ProcessNode. The reason is that it is the Process class that is leading and that is instantiated when a new process is launched. The Process simply uses an instance of ProcessNode to persist some of its data in the provenance graph. The Process instance lives in memory of the interpreter for the duration of the process but goes away once the process terminates. The ProcessNode is a permanent representation of its execution.

Since it is merely a representation of the process’ execution, the node interface does not have any methods that are used in running the process. To construct a Process instance from a ProcessNode, the only solution is to recreate a new process instance with the original inputs. You can do this as follows:

from aiida.engine.utils import instantiate_process
from aiida.manage import get_manager()
process_node = load_node(...)
builder = process_node.get_builder_restart()
runner = get_manager().get_runner()
process = instantiate_process(runner, builder)

You now have a Process instance that should be equal to the original process that created the ProcessNode. This not how the API is supposed to be used, however, so I am not sure if there might be subtle problems or differences.

For the rest of your purposes, you could now call process.presubmit if the process is a CalcJob. Note that presubmit will call the prepare_for_submission method of the plugin implementation. You will need to pass in the sandboxfolder to which the input files will be written.

Note that this is not enough to mirror the actual execution of a CalcJob. This method only prepares the instructions for copying local and remote files, that are specified in the local_copy_list and remote_copy_list (and remote_symlink_list). This is done by aiida.engine.daemon.execmanager.upload_calculation, so that function needs to be called as well. However, this will require some refactoring as it currently does not allow specifying the target directory but will determine this automatically. In your use case, you also have to think what to do with files specified in the remote_copy_list that can be on a remote computer and may not exist anymore.

1 Like

Hi @sphuber, wow, thank you for this detailed explanation! Everything makes a lot of sense. instantiate_process and get_manager().get_runner() were the missing pieces.

Been playing around a bit now with this implementation, but yeah, I’m arriving at the same conclusion as you (obviously), that this is not very straightforward. Indeed, I’m not getting the actual file structure by running the presubmit and upload_calculation, as the corresponding “CalcJobNode already has a remote_folder output: skipping upload”. The whole setup also seems like quite the API-abuse :sweat_smile: so I’d just issue a warning for now when this option is chosen, and we can continue the discussion on that at the actual PR.

I think the initial question is nicely answered here, and will be helpful for people that might wonder the same thing, thanks again!

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.