HarmonicWorkChain fails

Hi,
I am trying to run the example given in:
https://aiida-vibroscopy.readthedocs.io/en/latest/5_iraman_functionals.html
But the workflow keeps failing. Here is the summary of the relevant section:

builder = IRamanSpectraWorkChain.get_builder_from_protocol(
code=pw_code,
structure=structure,
protocol=“fast”,
options={‘withmpi’: True},
overrides={‘dielectric’:scf_overrides, ‘phonon’:scf_overrides}
)

File “./python3.9/site-packages/aiida_quantumespresso/parsers/parse_raw/pw.py”, line 416, in
trajectory_data[‘atomic_species_name’] = [data_lines[i + 1 + j].split()[1] for j in range(nat)]
IndexError: list index out of range

<35686> aiida.parser.PwParser: [WARNING] Error while parsing stress tensor: “P=” not found within 15 lines from the start of the stress block
<35686> aiida.parser.PwParser: [WARNING] "the scf_accuracy array was parsed but the scf_iterations was not.
<35686> aiida.parser.PwParser: [ERROR] ERROR_BROYDEN_FACTORIZATION
<35686> aiida.parser.PwParser: [ERROR] ERROR_OUTPUT_STDOUT_INCOMPLETE
<35686> aiida.parser.PwParser: [ERROR] The factorization in the Broyden routine failed.
<35686> aiida.orm.nodes.process.calculation.calcjob.CalcJobNode: [WARNING] output parser returned exit code<468>: The factorization in the Broyden routine failed.
<35686> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [2333|PwBaseWorkChain|report_error_handled]: PwCalculation<2480> failed with exit status 468: The factorization in the Broyden routine failed.
<35686> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [2333|PwBaseWorkChain|report_error_handled]: Action taken: found diagonalization issues but already exploited all supported algorithms, aborting…
<35686> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [2333|PwBaseWorkChain|inspect_process]: PwCalculation<2480> failed but a handler detected an unrecoverable problem, aborting
<35686> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [2333|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
<35686> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [2233|DielectricWorkChain|inspect_null_field_scfs]: electric field scf failed with exit status 300
<35686> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [2233|DielectricWorkChain|on_terminated]: cleaned remote folders of calculations: 2262 2282 …
<35686> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [2229|HarmonicWorkChain|inspect_processes]: the child PhononWorkChain with <PK=2232> failed
<35686> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [2229|HarmonicWorkChain|on_terminated]: cleaned remote folders of calculations: 2270 2303 …
<35686> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [2227|IRamanSpectraWorkChain|inspect_process]: HarmonicWorkChain failed with exit status 400

I have tried two other options for ‘protocol’, i.e. ‘precise’ and ‘moderate’, and also tried different values for “ecutwfc” and "ecutrho’’ but the issue persists.
I would really appreciate if someone could help me resolve this,
Vahid

Hi @Vahid_Bozorgi

your pw.x calculation is apparently failing in the broyden routine. To get an idea of what is going on, you should check the output of your calculation.

You can use verdi calcjob gotocomputer <pk of the failed PwCalculation> to connect to the remote working directory. Maybe you can already figure out the source of the error when inspecting the output file. Otherwise, feel free to report back additional information about the failed job.

If you receive the message that the working directory has already been cleaned, you can use verdi calcjob outputcat <pk of the failed PwCalculation>, which will print the whole output file into your terminal.

Thank you very much for your reply. I checked the outputs but could not determine what was causing the problem. I have attached outputs for the ecutwfc 50 and ecutwfc 30.
ecutwfc-30.txt (183.5 KB)
ecutwfc-50.txt (314.3 KB)

Best,

Dear Vahid,

Your outputs show clearly that pw.x is not compiled correctly, or run properly. As you can see, the header and different part of the output are repeated multiple times, meaning the MPI parallelization is not working properly. It would be great to understand how you compiled the code an do how you are running it.

Note: if you installed quantum espresso locally using anaconda/mamba, you should add “export OMP_NUM_THREADS=1” in the prepend_text, so that it will appear in the submission job.

Hi,
Thank you very much for your reply. I did what you suggested:
scf_overrides = {
“scf”:{
“pw”:{
“parameters”:{
“SYSTEM”:{
“ecutwfc”: 40.0,
“ecutrho”: 40*8,
},
},
“metadata”:{
“options”:{
“resources”:{“num_machines”:1, “num_mpiprocs_per_machine”:5,‘num_cores_per_mpiproc’:4}, ‘prepend_text’: ‘export OMP_NUM_THREADS=1’,
‘max_wallclock_seconds’:36000, ‘queue_name’:“normal”, ‘withmpi’: True, ‘qos’:“express”, ‘account’:“MyAccount”, ‘max_memory_kb’:10000000
},
},
}
}
}

builder = IRamanSpectraWorkChain.get_builder_from_protocol(
code=pw_code,
structure=structure,
protocol=“fast”,
options={‘withmpi’: True},
overrides={‘dielectric’:scf_overrides, ‘phonon’:scf_overrides}
)

But the pw.x still failed. I have attached the output.
Output.txt (13.9 KB)

Hello again,
unfortunately what you just report is not very informative. It would be great if you could manage first to run a simple PwBaseWorkChain (i.e., a simple SCF calculation using pw.x). Could you do that already? Or not? Even more important: could you manage to run pw.x without AiiDA? You can try out one of the examples provided by the QE input generator (https://qeinputgenerator.materialscloud.io/) to see if your submission configuration is working.

These are fundamental steps before using more advanced workflows.

Please, have a look at this tutorials (Running processes — AiiDA Tutorials) and try to replicate the results on your cluster.

My general feeling is that you have one of the following issues:

  1. A bad Quantum ESPRESSO installation
  2. An improper submission/computer (SLURM?) configuration
  3. In general, some issues with the cluster/scheduler and QE

If you could try to check/solve these points, I am sure that you will be then able to run the HarmonicWorkChain.

Hi,
I have tried to fix some issues with pw.x and run the workflow without MPI. See the full output (output.txt) in the attachment. I have also attached the output of some PwCalculation and DielectricWorkChain with their corresponding pk. It seems that the pw calculations and the whole workchain could be run successfully. However, the workflow still crashes in the following section with the error: (LinAlgError: Eigenvalues did not converge).
from aiida_vibroscopy.utils.plotting import get_spectra_plot
vibro = calc.outputs.vibrational_data.numerical_accuracy_4
polarized_intensities, unpolarized_intensities, frequencies_pbesol, labels = vibro.run_powder_raman_intensities(frequency_laser=532, temperature=300)
total_intensities_pbesol = polarized_intensities + unpolarized_intensities
plt = get_spectra_plot(frequencies_pbesol, total_intensities_pbesol)
plt.show()
Attached is also my inputs for the calculation section (input.txt).
I really have no idea how to fix this, any help would be greatly appreciated.
Best,
Vahid

DiElc_7882.txt (14.1 KB)
Output.txt (17.5 KB)
PW_7911.txt (62.4 KB)
PW_7919.txt (54.7 KB)
PW_7997.txt (119.4 KB)
Input.txt (1.8 KB)

Dear Vahid,

Thanks a lot to still keep using the code! This error seems related to phonopy, i.e. it does not manage to diagonalize the dynamical matrix. This is a bit strange. Could you report

  1. The full provenance of the IRamanSpectraWorkChain (or HarmonicWorkChain)?
  2. The code versions you are using for aiida-quantumespresso, aiida-vibroscopy, aiida-phonopy, phonopy, and numpy? (you can get the versions by using the command pip freeze | grep <code name>, where <code name> is one among the previously mentioned names)

And could you confirm that you manage to run the example on silicon?

Hi,
Thanks a lot for your reply!
aiida-quantumespresso==4.7.0
aiida-vibroscopy==1.1.1
aiida-phonopy==1.1.4
phonopy==2.25.0
numpy-1.26.4
I have attached the notebook that I used.
Best,
Vahid
IR_RA_1-Copy.ipynb.txt (11.9 KB)

Hello again,
I did manage to run smoothly everything and plot the spectra. I am not sure why you don’t. Could you report the provenance of the IRamanSpectraWorkChain you ran? What version of Quantum ESPRESSO are you using?
I manage to run the calculations using QE v7.2 as installed via conda on my laptop (it took around 15 minutes having 10 cores cpu).

Hi,
Thanks a lot for testing the notebook.
qe-7.3.1
Please find the provenance enclosed.
Best,
Vahid

(attachments)

Process_Report.txt (12.9 KB)

Could you try doing in the verdi shell:

n = load_node(7881) # Your PhononWorkChain PK
ph = n.outputs.phonopy_data.get_phonopy_instance()
ph.produce_force_constants()
ph.get_frequencies([0,0,0])

and see if it works? It seems everything worked, but alternatively it would be better if you would report the verdi process status instead of the verdi process report.

Here is the output of the verdi shell command:
In [1]: n = load_node(7881)
…: ph = n.outputs.phonopy_data.get_phonopy_instance()
…: ph.produce_force_constants()
…: ph.get_frequencies([0,0,0])
Out[1]:
array([-2.00266291e-02, -1.39820465e-02, -1.39820465e-02, 7.27788649e+00,
7.27788649e+00, 1.16944780e+01, 1.41530356e+01, 1.41530356e+01,
1.61898654e+01, 1.63304745e+01, 1.63304745e+01, 1.75850572e+01])
The output of ‘verdi process status’ is enclosed.
Process_Report_2.txt (5.2 KB)

Best,
Vahid

Interesting, so it seems phonons work, and diagonalization is successful, so i don’t understand why you are getting that error.
What if you now do:

n = load_node(7878)
v = n.outputs.vibrational_data.numerical_accuracy_4
v.run_powder_raman_intensities()

Can you tell me if now this works?

Please find the output of the command attached
Process_Report_3.txt (4.3 KB)
.
Best,
Vahid

Very curious, I honestly do not understand.
I would kindly ask you to try rerunning by updating the overrides as follow:

overrides = {
    "phonon": {
        "scf":{
            "kpoints_distance": 0.6,
            ...
        },
    },
    "dielectric":{
        "kpoints_parallel_distance": 0.6,
        "scf":{
            "kpoints_distance": 0.6,
            ...
        },
    },
}

in the meantime, could you export the data using verdi archive create -N 7878 -- workchain.aiida and send it here so I can inspect further what’s going on?

All the best.

Thanks again! Please find the exported data output attached.
I have updated the override based on what you mentioned and rerun the calculation. I will report the output as soon as it finishes.

workchain.aiida.txt (2.6 MB)

Best,
Vahid

The calculation with the updated override is finished with the same error:
‘LinAlgError: Eigenvalues did not converge’
Enclosed is the output report.
Process_Report_Update_1.txt (21.6 KB)

Best,
Vahid