Basically, when importing the SSSP archive (sssp-infos.aiida from Materials Cloud Archive), memory usage spikes over 15GB, and this causes an OOM on my cluster, which means I can’t deploy this dataset. I tried to use --batch-size 4096 to verdi archive import, but that didn’t seem to matter.
Does anybody know a solution (even a hacky one) for this? Also, it might be interesting to investigate why this specific archive is problematic (the total size is not too high, 13 GB).
Hi @eimrek, this is just to make you aware that we investigated a while ago the functionality of streaming directly to packed during archive import in this PR. @rabbull now took over and is currently working this feature, so I propose you can get in touch with him. The changes in the PR might solve your issue here, and I think we should also keep your use case in mind while developing. Cheers!
For completeness, I ran into this also on the v2.7.0pre1 tag. Here’s the command, that shows 15 GB Ram usage (Maximum resident set size (kbytes)). The spike happens at the beginning of “Adding archive files to repository” stage, here i waited for 1.1% to be done and cancelled it.
$ /usr/bin/time -v verdi archive import sssp-infos.aiida --batch-size 512
/home/kristjan/opt/micromamba/envs/aiida2/lib/python3.10/site-packages/click/core.py:1193: UserWarning: The parameter -n is used more than once. Remove its duplicate as parameters should be unique.
parser = self.make_parser(ctx)
/home/kristjan/local_work/aiida/aiida-core/src/aiida/cmdline/groups/verdi.py:113: UserWarning: The parameter -n is used more than once. Remove its duplicate as parameters should be unique.
return super().parse_args(ctx, args)
Report: starting import: sssp-infos.aiida
Report: Parameters
------------------------------- ----------------
Archive sssp-infos.aiida
New Node Extras keep
Merge Node Extras (in database) (k)eep
Merge Node Extras (in archive) do (n)ot create
Merge Node Extras (in both) (l)eave existing
Merge Comments leave
Computer Authinfos exclude
Report: Adding 5 new user(s)
Report: Adding 14 new computer(s)
Report: Collecting Node(s) ...
Report: Adding 468701 new node(s)
Report: Gathering existing 'create' Link(s)
Report: Added 252929 new 'create' Link(s)
Report: Gathering existing 'input_calc' Link(s)
Report: Added 392203 new 'input_calc' Link(s)
Report: Adding 601 new group(s)
Report: Adding 979965 Node(s) to new Group(s)
Report: Created new import Group: PK=602, label=20250521-230231
Report: Checking keys against repository ...
Report: Adding 14184533 new repository files
Adding archive files to repository 1.1%|█ | 154516/14184533
Aborted!
Command exited with non-zero status 1
Command being timed: "verdi archive import sssp-infos.aiida --batch-size 512"
User time (seconds): 989.28
System time (seconds): 89.45
Percent of CPU this job got: 51%
Elapsed (wall clock) time (h:mm:ss or m:ss): 34:45.93
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 15422428
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 18
Minor (reclaiming a frame) page faults: 7017499
Voluntary context switches: 640176
Involuntary context switches: 82400
Swaps: 0
File system inputs: 6137944
File system outputs: 12858816
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 1