I’m sure this has been asked 100+ times, but I could not find the relevant post.
I would like to make a workchain that starts from a given calcjob, i.e. it uses the same inputs and the produced temporary files (namely the wavefunctions) and creates new calculations by changing specific input parameters one at a time.
I know I cannot start from the Calcjob node itself, so I wonder how you suggest to proceed:
expose the input for the calculation, copy all inputs (including metadata) outside the workchin and add as a second input for the remote folder?
use the uuid as input? This would allow me to use calc.get_builder_restart() that is ideal for the task I described. But a uuid of a calcjob as input looks a bit ugly, right?
Hi @bonfus! I’ll just give some off the cuff suggestions, none of these are tested.
expose the input for the calculation, copy all inputs (including metadata) outside the workchin and add as a second input for the remote folder?
That would be my first inclination as well. I assume you want to be able to set any input of the CalcJob you are wrapping here, so exposing the inputs makes sense. Excluding the parent_folder when exposing the CalcJob input and making that a top-level one also seems reasonable.
Using the UUID as an input won’t properly link the CalcJob node in the provenance. If the remote_folder of the CalcJob is the parent_folder input of your work chain, it’s only two steps to get back to the parent CalcJob anyways:
workchain.inputs.parent_folder.creator
To make it easier to use the workchain, I’d add a get_builder_from_calcjob method, where the first input argument is calcjob (can provide the CalcJob node itself, or any identifier). This will use get_builder_restart() and also set the remote_folder as the parent_folder input in the builder. Afterwards you can adapt the builder to set the new parameters, or you could consider also adding other input arguments to set the new parameters directly in the method.
However, you may want to be more explicit in your API in what parameters are adapted, i.e. by adding a new_parameters input. Could you explain the use case some more, i.e. what is the goal of this workchain?
Just one additional comment. Otherwise, I agree with what has been suggested.
Depending on how much you want to keep track of this in the provenance, you could also follow a similar approach to what is done in the AiiDA-common-workflows. There, we also want to parse some of the input parameters (i.e. the k-mesh) between the different calculations when computing the equation of state.
In case keeping track of the provenance for this calculation is less critical, you could pass the WorkChain or CalcJob node as an input an make it non_db:
However, since you want to reuse some of the produced files, I think that the approach via the parent_folder is probably more suitable, as I assume that you want to keep track of the provenance.
Just for completeness, I need this because I’m writing some kind of convergence algorithm and this is the structure I have in mind:
check_convergence_workchain:
task: checks the variation between simulations differing by one input parameter. Can also consider two calculations and report relative differences.
inputs:
calcjob1: a reference calcjob
calcjob2 (optional): a second reference calcjob whose results should be subtracted to calcjob1
input_params: dictionary with parameters to be changed and step to be used. For example in the context of QE it could be {‘ecutwfc’: +20}.
outputs:
dE: dictionary of energy differences between calcjob1 and calcjob1 with parameter in input_params incremented by specified value. Alternatively, (calcjob1 - calcjob2) - (calcjob1_with_incremented_param - calcjob2_with_incremented_param). I hope notation is clear enough.
dF: same as dE for forces
dM: same as dE for moments
This can be used in a variety of context, and one may be
auto_convergence_workchain:
task: run check_convergence_workchain with relatively small steps, check the largest value in entries of dE squared, increment it and run again a new check_convergence_workchain starting from the updated set of parameters until threshold is reached.
inputs:
exposed check_convergence_workchain
thresholds: when to stop the loop when convergence is reached
boundaries: when to stop because simulations would be too expensive