High RAM usage during archive import

I’m running into the same problem as a few years ago: Testing of v2.0.0a1 (archive migration) · aiidateam · Discussion #5407 · GitHub

Basically, when importing the SSSP archive (sssp-infos.aiida from Materials Cloud Archive), memory usage spikes over 15GB, and this causes an OOM on my cluster, which means I can’t deploy this dataset. I tried to use --batch-size 4096 to verdi archive import, but that didn’t seem to matter.

Does anybody know a solution (even a hacky one) for this? Also, it might be interesting to investigate why this specific archive is problematic (the total size is not too high, 13 GB).

Pinging people more familiar with sssp: @jusong.yu @edan-bainglass

Hi @eimrek, this is just to make you aware that we investigated a while ago the functionality of streaming directly to packed during archive import in this PR. @rabbull now took over and is currently working this feature, so I propose you can get in touch with him. The changes in the PR might solve your issue here, and I think we should also keep your use case in mind while developing. Cheers!

1 Like

Thanks @geiger_j, @rabbull. Is the draft PR ready for me to test the archive import on? or is there another PR?

@agoscinski

For completeness, I ran into this also on the v2.7.0pre1 tag. Here’s the command, that shows 15 GB Ram usage (Maximum resident set size (kbytes)). The spike happens at the beginning of “Adding archive files to repository” stage, here i waited for 1.1% to be done and cancelled it.

$ /usr/bin/time -v verdi archive import sssp-infos.aiida --batch-size 512
/home/kristjan/opt/micromamba/envs/aiida2/lib/python3.10/site-packages/click/core.py:1193: UserWarning: The parameter -n is used more than once. Remove its duplicate as parameters should be unique.
  parser = self.make_parser(ctx)
/home/kristjan/local_work/aiida/aiida-core/src/aiida/cmdline/groups/verdi.py:113: UserWarning: The parameter -n is used more than once. Remove its duplicate as parameters should be unique.
  return super().parse_args(ctx, args)
Report: starting import: sssp-infos.aiida
Report: Parameters
-------------------------------  ----------------
Archive                          sssp-infos.aiida
New Node Extras                  keep
Merge Node Extras (in database)  (k)eep
Merge Node Extras (in archive)   do (n)ot create
Merge Node Extras (in both)      (l)eave existing
Merge Comments                   leave
Computer Authinfos               exclude

Report: Adding 5 new user(s)
Report: Adding 14 new computer(s)                                                                                                                                       
Report: Collecting Node(s) ...                                                                                                                                          
Report: Adding 468701 new node(s)
Report: Gathering existing 'create' Link(s)                                                                                                                             
Report: Added 252929 new 'create' Link(s)                                                                                                                               
Report: Gathering existing 'input_calc' Link(s)
Report: Added 392203 new 'input_calc' Link(s)                                                                                                                           
Report: Adding 601 new group(s)
Report: Adding 979965 Node(s) to new Group(s)                                                                                                                           
Report: Created new import Group: PK=602, label=20250521-230231                                                                                                         
Report: Checking keys against repository ...                                                                                                                            
Report: Adding 14184533 new repository files
Adding archive files to repository         1.1%|█                                                                                                      | 154516/14184533                                                                                                                                                                        
Aborted!
Command exited with non-zero status 1
        Command being timed: "verdi archive import sssp-infos.aiida --batch-size 512"
        User time (seconds): 989.28
        System time (seconds): 89.45
        Percent of CPU this job got: 51%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 34:45.93
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 15422428
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 18
        Minor (reclaiming a frame) page faults: 7017499
        Voluntary context switches: 640176
        Involuntary context switches: 82400
        Swaps: 0
        File system inputs: 6137944
        File system outputs: 12858816
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 1

dependencies:

$ pip freeze
-e git+ssh://git@github.com/aiidateam/aiida-core.git@fa81b493856a18819cca6acfef276c8a8c700f00#egg=aiida_core
aio-pika==9.4.3
aiormq==6.8.1
alembic==1.15.2
annotated-types==0.7.0
archive-path==0.4.2
asttokens==3.0.0
asyncssh==2.19.0
bcrypt==4.3.0
certifi==2025.4.26
cffi==1.17.1
charset-normalizer==3.4.2
circus==0.19.0
click==8.2.1
click-spinner==0.1.10
cryptography==45.0.2
decorator==5.2.1
deprecation==2.1.0
disk_objectstore==1.3.0
docstring_parser==0.16
exceptiongroup==1.3.0
executing==2.2.0
graphviz==0.20.3
greenlet==3.2.2
idna==3.10
importlib-metadata==6.11.0
ipython==8.36.0
jedi==0.18.2
Jinja2==3.1.6
kiwipy==0.8.5
Mako==1.3.10
MarkupSafe==3.0.2
matplotlib-inline==0.1.7
multidict==6.4.4
nest-asyncio==1.6.0
numpy==1.26.4
packaging==25.0
pamqp==3.3.0
paramiko==3.5.1
parso==0.8.4
pexpect==4.9.0
pgsu==0.3.0
plumpy==0.25.0
prompt_toolkit==3.0.51
propcache==0.3.1
psutil==5.9.8
psycopg==3.2.9
psycopg-binary==3.2.9
ptyprocess==0.7.0
pure_eval==0.2.3
pycparser==2.22
pydantic==2.11.4
pydantic_core==2.33.2
Pygments==2.19.1
PyNaCl==1.5.0
pytray==0.3.4
pytz==2021.3
PyYAML==6.0.2
pyzmq==26.4.0
requests==2.32.3
shortuuid==1.0.13
SQLAlchemy==2.0.41
stack-data==0.6.3
tabulate==0.9.0
tornado==6.5
tqdm==4.67.1
traitlets==5.14.3
typing-inspection==0.4.0
typing_extensions==4.13.2
upf-to-json==0.9.5
urllib3==2.4.0
wcwidth==0.2.13
wrapt==1.17.2
yarl==1.20.0
zipp==3.21.0