Questions about `with_mpi`

Hi,

I have a couple of questions concerning the usage of with_mpi in CalcJobs.

More precisely, I am referring to the Wannier90 and QE plugins, but it seems that this behavior is fundamental and I am not 100% sure whether it is intended like this. In case it is, I would be really interested in the line of reasoning.

There are 3 different levels of specifying withmpi. In the QE plugin, withmpi is set to True per default but submitting a PwCalculation shows that it is submitted without the usage of MPI. A small example to reproduce the behavior:

b = PwCalculation.get_builder()
b.code = code
b.structure = structure
kpoints = KpointsData()
kpoints.set_kpoints_mesh([4, 4, 4])
b.kpoints = kpoints
b.metadata.options.resources = {"num_machines": 1, "num_mpiprocs_per_machine": 4}
b.parameters = pw.parameters
b.pseudos = pseudos
submit(b)

In case one adds the line b.metadata.options.withmpi = True, MPI is used correctly. withmpi isnā€™t specified on the code level, so this isnā€™t an issue of overwriting the different levels.
On the other hand, one doesnā€™t observe this behavior when using the PwBaseWorkChain, in that case, the builder correctly recognizes the default (True) value for withmpi.

I followed the different parts of the source code, ending up in the plumpy package and my naive understanding is the following:
The CalchJob looks for withmpi in the raw_inputs (https://github.com/aiidateam/aiida-core/blob/ec64780c206cdb040eee740b17865e6f0ff81cd8/aiida/engine/processes/calcjobs/calcjob.py#L925C31-L925C42) and I would expect that this is populated by the default value which is specified in the QE plugin.
If this would not be the case, and since withmpi is neither specified in the CodeInfo nor on the code level, all three levels would be None and the default of the aiida-core CalcJob implementation, which is False, would be used (this seems to be what is happening in the example). When one checks the _inputs method of the builder, one observes that withmpi isnā€™t returned, in case it isnā€™t explicitly specified. It seems that these inputs would be included in the raw_inputs and therefore it makes somehow sense why it works when a WorkChain is used. Please correct me, if Iā€™m wrong.

Although it seems understandable, why this happening wrt. the code, Iā€™m not 100% sure whether this is the intended behavior.
As a user, one would expect that the default value of the CalcJob is used.

Thanks a lot in advance!

Can you please specify which versions of aiida-core and aiida-quantumespresso you are using? The behavior of with_mpi changed in AiiDA v2.3. Its current behavior and design is described here in the documentation.

Yes, we also checked the wiki. Iā€™m using aiida-core v2.4.0 and aiida-quantumespresso v4.4.0.
My colleague who initially experienced this behavior of Wannier90 is also using aiida-core v2.4.0 and aiida-wannier90 v2.1.0. So basically the latest versions.

It looks like there might be a bug in the current behaviour of the MPI setting.

According to the documentation if the feature is not set in any of those places:

The Code input

The CalcJob implementation

The metadata.options.withmpi input

Then the value is taken from the default of metadata.options.withmpi.

After migrating AiiDA from 2.2 to 2.4 our simulations started to fail cause MPI was not enabled. One particular example is the Cp2kPdosWorkChain. Internally, it uses an overlap plugin where the default of metadata.options.withmpi is set to True.

For your information, this is the output of verdi code show:

PK                       18904
UUID                     cd30b62f-9df2-412f-879b-414d591a984f
Type                     core.code.installed
Computer                 daint-hybrid-s1141 (daint.cscs.ch), pk: 2
Filepath executable      /users/cpi/aiida-soft/cp2k-spm-tools/overlap_from_wfns.py
Label                    k-overlap
Description              Overlap code from https://github.com/nanotech-empa/cp2k-spm-tools.git.
Default calc job plugin  nanotech_empa.overlap
Use double quotes        False
With mpi
Prepend text             module load cray-python
                         export PYTHONPATH=$PYTHONPATH:/users/keimre/soft/ase
Append text

As you can see, the With mpi attribute is not set.

We are a bit at a loss currently. Any help would be aprreciated.

I think I see where the bug is. It seems that the implementation does not actually respect the documentation and ignores the default of metadata.options.withmpi port. I have opened a PR that should fix the problem: `CalcJob`: Fix MPI behavior if `withmpi` option default is True by sphuber Ā· Pull Request #6218 Ā· aiidateam/aiida-core Ā· GitHub

Could you please give that branch a go? Thanks!

Hi @sphuber, I just gave that branch a try and it solved the issue. Thanks a lot!

Great, thanks for testing and letting me know. I will make sure it gets merged and released with the upcoming v2.5 which should be out soon.

1 Like

Thanks for taking care of that @sphuber.

Just a note that this was also backported and released in v2.4.3

2 Likes