Hello, everyone, when using AiiDA-Siesta with the SLURM queue, I specified the default memory in the computer settings, but I encountered an error when trying to change the number of cores using the resources parameter in my specific task. The settings are as follows:
I received the following error: ValueError: these parameters were not recognized: tot_memory. How can I resolve this error? or How should I set this memory so that it runs correctly
zhouchao
@agoscinski is right that memory limits are defined through the max_memory_kb option (see documentation on calcjob options). Note that it should be an option and not be part of the resources. So something like
options = Dict(
{
"max_wallclock_seconds": 1360,
'withmpi': True,
#'account': "tcphy113c",
'queue_name': "hfacnormal01,hfacnormal02,hfacnormal04",
"max_memory_kb": 29600, # Make sure to correct the units to kB
"resources": {
"num_machines": 1,
"num_mpiprocs_per_machine": 16,
}
})
Thank you both for your help. I followed Mr. Sphuber’s instructions and managed to rewrite the SLURM script. However, the submission still exceeds the memory limit. Is there any way to cancel the --mem and --time directives? I commented out these two parameters, but they still appear in the submission script by default.
I am a bit confused. Do I understand correctly that the --mem and --time flags are specified in the generated submission script, and you don’t want them to be specified?
Since you are running something from aiida-siesta it is possible that the plugin is defining some options by default, so even if you don’t specify them they get set. Can you share exactly what class you are running with what inputs and what the generated submission script is. Then tell us exactly what changes you expect.
Yes also a bit confused, why don’t you just increase it as much as you want/can, looking at the limits of your HPC? Or is the problem that your inputs are ignored.
Specifying 0 for the max_memory_kb should give you the maximum available memory access in slurm. If you don’t specify max_wallclock_seconds it should not appear in the slurm script, but then I think you use the DefaultTime that is defined on your HPC, so you might get a higher time limit by using the max time of the node (You should see this with sinfo).
I don’t know exactly which issues @chao_zhou is experiencing but as another aiida-siesta user I can maybe chip in.
aiida-siesta does not set any defaults for max_wallclock_seconds or max_memory_kb. It does require you to set max_wallclock_seconds since it also uses that value in the input file for SIESTA, so I don’t think it’s possible to run with aiida-siesta without setting a time limit.
Specifying 0 for the max_memory_kb should give you the maximum available memory access in slurm.
@agoscinski I have never been able to get that to work. If I set it to 0 in the options of a siesta job, the generated SLURM script always seems to use the default value for max_memory_kb that was set for the computer that the code runs on. Do you need to do something extra to get that to work?
Would be useful to understand what people expect - probably the default should be used if unset, but if set explicitly to zero, maybe we should actually use zero? (I guess this is what some schedulers interpret as “all memory”?)
If this is the case, maybe a change similar to the following might fix this:
job_tmpl.max_memory_kb = self.node.get_option('max_memory_kb') if self.node.get_option('max_memory_kb') is not None else computer.get_default_memory_per_machine()
My 2 cents in case somebody is not aware: you can also do once and for all the following in the verdi shell:
c = load_computer('localhost')
c.set_default_memory_per_machine(None)
so the memory keywork won’t be specified in the job script (at least this works for the SLURM implementation AFAIK).
PS: this reminds me that one cannot still specify None (or ~) in a computer yaml configuration, thus one should do this manually aftewards (or maybe in newer aiida-core versions?).
In that use case you simply do not specify the key in the YAML and run verdi computer setup --non-interactive in which case it won’t prompt. Or if you need the prompts for other options, when it prompts for the default memory, you can enter ! to set it to None. The beginning of the command prints a reminder of this special token
I did not know that you were allowed to set the default memory for a computer to None. I think on our HPC not specifying the memory keyword is equivalent to asking for all the memory of a node so that would work in my case. Thanks for the suggestions @bastonero and @sphuber !