HarmonicWorkChain fails

Hi,
I am trying to run the example given in:
https://aiida-vibroscopy.readthedocs.io/en/latest/5_iraman_functionals.html
But the workflow keeps failing. Here is the summary of the relevant section:

builder = IRamanSpectraWorkChain.get_builder_from_protocol(
code=pw_code,
structure=structure,
protocol=“fast”,
options={‘withmpi’: True},
overrides={‘dielectric’:scf_overrides, ‘phonon’:scf_overrides}
)

File “./python3.9/site-packages/aiida_quantumespresso/parsers/parse_raw/pw.py”, line 416, in
trajectory_data[‘atomic_species_name’] = [data_lines[i + 1 + j].split()[1] for j in range(nat)]
IndexError: list index out of range

<35686> aiida.parser.PwParser: [WARNING] Error while parsing stress tensor: “P=” not found within 15 lines from the start of the stress block
<35686> aiida.parser.PwParser: [WARNING] "the scf_accuracy array was parsed but the scf_iterations was not.
<35686> aiida.parser.PwParser: [ERROR] ERROR_BROYDEN_FACTORIZATION
<35686> aiida.parser.PwParser: [ERROR] ERROR_OUTPUT_STDOUT_INCOMPLETE
<35686> aiida.parser.PwParser: [ERROR] The factorization in the Broyden routine failed.
<35686> aiida.orm.nodes.process.calculation.calcjob.CalcJobNode: [WARNING] output parser returned exit code<468>: The factorization in the Broyden routine failed.
<35686> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [2333|PwBaseWorkChain|report_error_handled]: PwCalculation<2480> failed with exit status 468: The factorization in the Broyden routine failed.
<35686> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [2333|PwBaseWorkChain|report_error_handled]: Action taken: found diagonalization issues but already exploited all supported algorithms, aborting…
<35686> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [2333|PwBaseWorkChain|inspect_process]: PwCalculation<2480> failed but a handler detected an unrecoverable problem, aborting
<35686> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [2333|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
<35686> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [2233|DielectricWorkChain|inspect_null_field_scfs]: electric field scf failed with exit status 300
<35686> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [2233|DielectricWorkChain|on_terminated]: cleaned remote folders of calculations: 2262 2282 …
<35686> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [2229|HarmonicWorkChain|inspect_processes]: the child PhononWorkChain with <PK=2232> failed
<35686> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [2229|HarmonicWorkChain|on_terminated]: cleaned remote folders of calculations: 2270 2303 …
<35686> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [2227|IRamanSpectraWorkChain|inspect_process]: HarmonicWorkChain failed with exit status 400

I have tried two other options for ‘protocol’, i.e. ‘precise’ and ‘moderate’, and also tried different values for “ecutwfc” and "ecutrho’’ but the issue persists.
I would really appreciate if someone could help me resolve this,
Vahid

Hi @Vahid_Bozorgi

your pw.x calculation is apparently failing in the broyden routine. To get an idea of what is going on, you should check the output of your calculation.

You can use verdi calcjob gotocomputer <pk of the failed PwCalculation> to connect to the remote working directory. Maybe you can already figure out the source of the error when inspecting the output file. Otherwise, feel free to report back additional information about the failed job.

If you receive the message that the working directory has already been cleaned, you can use verdi calcjob outputcat <pk of the failed PwCalculation>, which will print the whole output file into your terminal.

Thank you very much for your reply. I checked the outputs but could not determine what was causing the problem. I have attached outputs for the ecutwfc 50 and ecutwfc 30.
ecutwfc-30.txt (183.5 KB)
ecutwfc-50.txt (314.3 KB)

Best,

Dear Vahid,

Your outputs show clearly that pw.x is not compiled correctly, or run properly. As you can see, the header and different part of the output are repeated multiple times, meaning the MPI parallelization is not working properly. It would be great to understand how you compiled the code an do how you are running it.

Note: if you installed quantum espresso locally using anaconda/mamba, you should add “export OMP_NUM_THREADS=1” in the prepend_text, so that it will appear in the submission job.

Hi,
Thank you very much for your reply. I did what you suggested:
scf_overrides = {
“scf”:{
“pw”:{
“parameters”:{
“SYSTEM”:{
“ecutwfc”: 40.0,
“ecutrho”: 40*8,
},
},
“metadata”:{
“options”:{
“resources”:{“num_machines”:1, “num_mpiprocs_per_machine”:5,‘num_cores_per_mpiproc’:4}, ‘prepend_text’: ‘export OMP_NUM_THREADS=1’,
‘max_wallclock_seconds’:36000, ‘queue_name’:“normal”, ‘withmpi’: True, ‘qos’:“express”, ‘account’:“MyAccount”, ‘max_memory_kb’:10000000
},
},
}
}
}

builder = IRamanSpectraWorkChain.get_builder_from_protocol(
code=pw_code,
structure=structure,
protocol=“fast”,
options={‘withmpi’: True},
overrides={‘dielectric’:scf_overrides, ‘phonon’:scf_overrides}
)

But the pw.x still failed. I have attached the output.
Output.txt (13.9 KB)

Hello again,
unfortunately what you just report is not very informative. It would be great if you could manage first to run a simple PwBaseWorkChain (i.e., a simple SCF calculation using pw.x). Could you do that already? Or not? Even more important: could you manage to run pw.x without AiiDA? You can try out one of the examples provided by the QE input generator (https://qeinputgenerator.materialscloud.io/) to see if your submission configuration is working.

These are fundamental steps before using more advanced workflows.

Please, have a look at this tutorials (Running processes — AiiDA Tutorials) and try to replicate the results on your cluster.

My general feeling is that you have one of the following issues:

  1. A bad Quantum ESPRESSO installation
  2. An improper submission/computer (SLURM?) configuration
  3. In general, some issues with the cluster/scheduler and QE

If you could try to check/solve these points, I am sure that you will be then able to run the HarmonicWorkChain.