`verdi archive` runs out of space even though it is available. Maybe chunking does not work?

During the archiving of nodes the process stops very early on. It is a very large database of 2TB, but the space should be there on partition. This is the short error message (whole log at the end as file).

Archiving database: nodes                  0.0%|                                                                                                                                                   | 378/16749234
Traceback (most recent call last):
  File "/home/aiida/.virtualenvs/aiida-fermisurf/lib/python3.8/site-packages/archive_path/zip_path.py", line 325, in open
    yield handle
  File "/home/aiida/envs/aiida-fermisurf/code/aiida-core/aiida/tools/archive/implementations/sqlite_zip/writer.py", line 182, in _stream_binary
    shutil.copyfileobj(handle, zip_handle)
  File "/usr/lib/python3.8/shutil.py", line 208, in copyfileobj
    fdst_write(buf)
  File "/usr/lib/python3.8/zipfile.py", line 1141, in write
    self._fileobj.write(data)
OSError: [Errno 28] No space left on device

Maybe the caveat is that the space is available on a different partition and not on the partition where the current aiida database is stored. I thought that this might be because the data is not chunked properly and added for a local tests the -b 1000 option to the archive command, but this still invokes this line

during archiving that does not chunk. I am not sure if this fixes the problem. I could hard code it to try out, bu I have doubts that this is the actual solution. On the directory we archive from there is still 1.6TB space free and the database consists is a lot of loose files that should not be that large individually.

Environment

py[aiida-fermisurf]  prnmarvelsrv3  ~/e/aiida-fermisurf   main ±  verdi status
 :heavy_check_mark: version:     AiiDA v2.3.1
 :heavy_check_mark: config:      /home/aiida/envs/aiida-fermisurf/.aiida
 :heavy_check_mark: profile:     main
 :heavy_check_mark: storage:     Storage for 'main' [open] @ postgresql://aiida:***@localhost:5432/aiida-fermisurf / DiskObjectStoreRepository: 580f68c6eb6347b7a203030615ef1d40 | /hith/aiida-fermisurf/repositories/main/container
 :heavy_check_mark: rabbitmq:    Connected to RabbitMQ v3.8.2 as amqp://guest:guest@127.0.0.1:5672?heartbeat=600
 :heavy_check_mark: daemon:      Daemon is running with PID 1193148

Full logs

verdi_archive-out_of_diskspace.txt (6.0 KB)

Just for completeness, what is the exact command that you invoked? And what is the output of df -h of the filesystem of the output file.

I thought that this might be because the data is not chunked properly and added for a local tests the -b 1000 option to the archive command, but this still invokes this line

The batch size option is not the same as the buffer_size argument in the Writer._stream_binary method. The batch size dictates the size of the number of nodes that are queried at once from the source database before writing them to the target. The buffer_size here is the number of bytes with which the write writes files to the archive.

Anyway, I don’t think the buffering is the problem here. This is mostly to prevent out of memory errors by not reading too much into memory at once. If there is not enough space on the target disk, changing the batch and/or buffer size is not going to help.

It is a very large database of 2TB,

How did you determine this? When you say database, do you mean the total storage, or just the PostgreSQL database? When you create an archive your are writing the sum of the PSQL database and the file repository to a single file. So your target diskspace should have at least that.

I didn’t look into the details of the code, but most probably the code is creating some temporary files on disk (maybe here? Or maybe somewhere else as well). This will go by default in the location specified in env vars like TMPDIR, TEMP or TMP (see more specific logic here). Typically it’s /tmp that is in a different partition.

My suggestion is to try, instead of

verdi archive XXX

to run

TMPDIR=/some/path/to/a/temp/folder/on/the/big/partition verdi archive XXX

If this works, we know this is the problem (my bet is that this is the cause). Then you can proceed, but we should then discuss what is the best strategy to prevent this kind of “suprise effect” - maybe changing the TMP folder for this specific operation to the same partition of the destination by default, but still allowing to optionally specify it in the verdi command line (e.g. if the space left there is “just enough” and you want to put the temp files in a different large partition).

Thanks Giovanni. This is indeed very likely the case. The zip writer uses the temp folder as working space. The file repository contents are directly written to the archive (which is the filename specified in the verdi archive create command), however, the database contents are written to an sqlite database (i.e. file) inside that working space. When the context manager is exited, that database file is written directly to the archive.

We should document this and for now provide the workaround you highlighted. And then we should discuss what a more permanent solution should look like.

Great! I’m curious to see if the workaround I suggest indeed fixes the problem. This is also marginally related to what I was discussing with @geiger_j, @agoscinski, @ali-khosravi and @eimrek, about having a profile accessing directly the sqlite_zip file. However, in an efficient way to avoid redecompressing at every profile load, but caching the unzipped sqlite db. This will have to be probably written in the profile config.json file, and be configurable for similar reasons to those above. For context, this is to be able to serve the rest api (eg for Materials Cloud) in read only mode directly from an .aiida file, without having to import everything in a new profile, thus making deployment much easier.

1 Like

most probably the code is creating some temporary files on disk (maybe here? Or maybe somewhere else as well). This will go by default in the location specified in env vars like TMPDIR, TEMP or TMP (see more specific logic here). Typically it’s /tmp that is in a different partition.

Yes seems more promising. Will try it out

Just for completeness, what is the exact command that you invoked? And what is the output of df -h of the filesystem of the output file.

Filesystem on HPC and verdi archive commend

Right. So here again the relevant information of df -h and the whole log at the end

df: /scratch: Stale file handle
Filesystem                         Size  Used Avail Use% Mounted on
...
prnmarvelsrv4:/hith                 18T  5.2T   12T  32% /hith_nfs/prnmarvelsrv4 <--- where to archive
...
prnmarvelcommonfs:/home            3.7T  1.9T  1.6T  55% /home  <-- repository and $AIIDA_PATH

Command to archive

We used this command

verdi archive create -a /hith_nfs/prnmarvelsrv4/paulish_qiao-main.aiidaarchive

We checked that there is only one profile, so we don’t back up potentially files from other profiles with -a

AiiDA database information

How did you determine this? When you say database, do you mean the total storage, or just the PostgreSQL database? When you create an archive your are writing the sum of the PSQL database and the file repository to a single file. So your target diskspace should have at least that.

I meant the repository mainly. IT the the same as here Performance issues with verdi commands `verdi process` and `verdi storage` I will copy there things over

$ du -h --max-depth=1 /home/aiida/repository/main/container
4.0K	./duplicates
4.0K	./sandbox
2.0T	./loose
4.0K	./packs
2.0T	.

additonal information. Size of AIIDA path that keeps all the process id’s

$ du -h --max-depth=1 $AIIDA_PATH/.aiida/access
2.2G	/home/aiida/envs/aiida-fermisurf/.aiida/access/main
2.2G	/home/aiida/envs/aiida-fermisurf/.aiida/access

Number of files in this path

$ find $AIIDA_PATH/.aiida/access -type f | wc -l
557638
$ verdi storage info
entities:
  Users:
    count: 10
  Computers:
    count: 31
  Nodes:
    count: 4734923
  Groups:
    count: 331
  Comments:
    count: 0
  Logs:
    count: 1203422
  Links:
    count: 8936853
repository:
  SHA-hash algorithm: sha256
  Compression algorithm: zlib+1

Appendix

df_h.txt (1.8 KB)

This is already more or less possible. You can essentially “mount” an archive as a profile. For example:

verdi profile setup core.sqlite_zip --profile archive-test --filepath archive.aiida

You can then just use that profile as normal (read-only of course). The sqlite database is cached (extracted to temporary file on disk) but only as long as the storage is kept alive. So this is not quite yet maintained between profile loads. We would have to look how to graciously implement this.

Have you benchmarked the performance for the sqlite backend though? I imagine that there may be a non-negligible difference between SQLite and PostgreSQL here. In which case it makes more sense to take the little extra effort to create a core.psql_dos profile and import the archive.

I see the core.sqlite_zip mounting more as a convenience for users in order to be able to quickly explore an archive without having to import it. Of course, we will have to document this and probably come up with a shortcut command that is easier to use.

I meant the repository mainly.

Right, but the database contents itself are also not going to be negligible so you need to take this into account. But if the hypothesis is correct of the TMP_DIR running out of space, it is actually the database size that determines the problem as that is copied to the temp dir, and not the file repository, whose contents are streamed directly to the target archive.

Size of AIIDA path that keeps all the process id’s

:scream: 2.2 gigs in PID files. Dear lord. I think you should definitely think of running verdi storage maintain before creating an archive. This only will this potentially speed up the archiving (and backups) it will also clean these stale PID files.

Finally, to support the hypothesis that the tmp dir size may be the bottleneck, could you please report the output of

df -h $(python -c "import tempfile;print(tempfile.gettempdir())")

Hi @sphuber indeed the plan is to benchmark performance. I have the feeling that it will not be much slower, sqlite can in some cases be even faster than psql, but we’ll have to see the results