Hi,
I have a couple of questions concerning the usage of with_mpi
in CalcJobs
.
More precisely, I am referring to the Wannier90 and QE plugins, but it seems that this behavior is fundamental and I am not 100% sure whether it is intended like this. In case it is, I would be really interested in the line of reasoning.
There are 3 different levels of specifying withmpi
. In the QE plugin, withmpi
is set to True per default but submitting a PwCalculation
shows that it is submitted without the usage of MPI. A small example to reproduce the behavior:
b = PwCalculation.get_builder()
b.code = code
b.structure = structure
kpoints = KpointsData()
kpoints.set_kpoints_mesh([4, 4, 4])
b.kpoints = kpoints
b.metadata.options.resources = {"num_machines": 1, "num_mpiprocs_per_machine": 4}
b.parameters = pw.parameters
b.pseudos = pseudos
submit(b)
In case one adds the line b.metadata.options.withmpi = True
, MPI is used correctly. withmpi
isnāt specified on the code level, so this isnāt an issue of overwriting the different levels.
On the other hand, one doesnāt observe this behavior when using the PwBaseWorkChain
, in that case, the builder correctly recognizes the default (True) value for withmpi
.
I followed the different parts of the source code, ending up in the plumpy package and my naive understanding is the following:
The CalchJob looks for withmpi
in the raw_inputs
(https://github.com/aiidateam/aiida-core/blob/ec64780c206cdb040eee740b17865e6f0ff81cd8/aiida/engine/processes/calcjobs/calcjob.py#L925C31-L925C42) and I would expect that this is populated by the default value which is specified in the QE plugin.
If this would not be the case, and since withmpi
is neither specified in the CodeInfo
nor on the code
level, all three levels would be None and the default of the aiida-core CalcJob
implementation, which is False, would be used (this seems to be what is happening in the example). When one checks the _inputs
method of the builder, one observes that withmpi
isnāt returned, in case it isnāt explicitly specified. It seems that these inputs would be included in the raw_inputs
and therefore it makes somehow sense why it works when a WorkChain is used. Please correct me, if Iām wrong.
Although it seems understandable, why this happening wrt. the code, Iām not 100% sure whether this is the intended behavior.
As a user, one would expect that the default value of the CalcJob is used.
Thanks a lot in advance!