Clean AIIDA storage after deleting the nodes

abmazitov · November 14, 2023, 1:16pm

Hello!

I recently faced something buggy in the way AIIDA maintains the database. Essentially, I finished working on a particular part of my calculations, archived the corresponding nodes group, and then deleted the nodes from AIIDA.

However, it seems like the disk space occupied by these nodes was not freed. Is there a way, perhaps, to update the database state in this situation to free the disk space?

mbercx · November 14, 2023, 1:29pm

Thanks for reposting here @abmazitov! As mentioned, I think running

verdi storage maintain

would already help, but maybe others have a better idea on how this works internally. I’m assuming the postgresql database was actually reduced in size, but the repository part of the storage was perhaps not updated without running the maintenance.

t-reents · November 14, 2023, 2:15pm

If you are referring to the space that is occupied by the database (and not to the file repository), I think verdi storage maintain should indeed solve it. If you delete rows from a PostgreSQL database, the space can be reused by new nodes/rows but it is still reserved, therefore, you don’t see a change in the disk space (again, assuming that you refer to the occupied disk space of the database). This is the typical behavior of a PostgreSQL database: PostgreSQL: Documentation: 16: VACUUM
This vacuum operation is called by verdi storage maintain.

Edit: Even if you are referring to the repository, verdi storage maintain would also take care of the files that don’t belong to a database entry anymore.

sphuber · November 14, 2023, 9:11pm

The solution has already been mentioned: verdi storage maintain will reclaim space that has been freed. For optimal effect, stop the daemon and run verdi storage maintain --full.

Then a slight correction to the provided explanation. verdi storage maintain does not actually trigger a vacuum on the PostgreSQL database. There is currently no API in AiiDA to force this and it is left to PostgreSQL itself to automatically decide when to do this. Instead, verdi storage maintain forces a clean up of the repository. Since the repository has automatic deduplication, when a node gets deleted, we can not automatically proceed and delete the file form the repository, because it may be referenced by another node. This is why you delete nodes, you do not see a reduction in size of the repository on disk. When you call verdi storage maintain, AiiDA checks which files in the repo are no longer referenced by any nodes, and are then fully deleted, which now actually frees up the space on disk. There is a vacuum option for the storage maintain, but this is to vacuum the Sqlite database that is used by the repository. It does not affect the PostgreSQL database.

t-reents · November 14, 2023, 10:28pm

@sphuber Thanks a lot for the clarification and sorry for the confusion!

sphuber · November 15, 2023, 6:33am

Not to worry @t-reents ! I appreciate you joining in the discussion, much appreciated

edan-bainglass · November 21, 2023, 9:49am

What if in addition, one wanted to also reset the pks? This is often the case when developing. I run a few jobs, test the output, etc. etc., then clean periodically or when satisfied. But my pks are out of control. I’m at 15k+

giovannipizzi · November 21, 2023, 10:14am

Well, this is a different thing from the original post. How important it is, apart from peace of mind?

I think you should just ignore the big numbers (and just create a new profile from scratch if it really bothers you); you could check some low-level PostgreSQL commands as e.g. discussed here but I’d be very careful as you risk to make big damage (e.g. codes have a PK and unless you really want to delete everything - and then it’s easier to make a new profile? - you risk to have data loss).

To facilitate this, we could discuss a way to recreate a new empty profile but with some default things already set up (e.g. computers and codes). Some kind of default data stored in YAML that gets filled right after you create the profile.

@mbercx had worked on something like this, I think, to create a new project? (aiida-project?). Good to check it and discuss with him

edan-bainglass · November 21, 2023, 10:34am

Not important, just annoying. Not enough to dig into PostgreSQL commands, but definitely enough to consider implementing a verdi command to handle it. I have not played with profiles much. If creating one allocates fresh DB space (blank slate w.r.t. pks), then perhaps this is one valid approach. The verdi command could take care of it then for automation/ease-of-use. Will check with @mbercx

mbercx · November 21, 2023, 10:49am

Haha, I sort of understand your pain, this also used to bother me. But I think after working with databases with 10m+ nodes for so long I’ve become desensitized.

AiiDA-project is more about quickly setting up properly separated (looking at you, $AIIDA_PATH) Python environments and quickly changing between them/nuking them once you’re done (although this is still not complete).

It’s already quite easy to set up new profiles and then delete them:

❯ verdi quicksetup -n --profile tmp --db-name tmp
Success: created new profile `tmp`.
Report: initialising the profile storage.
Report: initialising empty storage schema
Report: Migrating to the head of the main branch
Success: storage initialisation completed.
❯ verdi profile setdefault tmp
Success: tmp set as default profile
❯ verdi shell
Python 3.10.13 (main, Aug 24 2023, 22:43:20) [Clang 14.0.0 (clang-1400.0.29.202)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.12.2 -- An enhanced Interactive Python. Type '?' for help.

In [1]: Int(2).store()
Out[1]: <Int: uuid: c1b04942-88b8-4b84-a16f-474adf3298dd (pk: 1) value: 2>

In [2]: exit()
❯ verdi profile delete -f tmp
Warning: deleting profile `tmp` excluding: database user.
Warning: this operation cannot be undone, Success: profile `tmp` was deleted excluding: database user..
❯ psql -l
                          List of databases
    Name    | Owner  | Encoding | Collate | Ctype | Access privileges
------------+--------+----------+---------+-------+-------------------
 core-dev   | mbercx | UTF8     | C       | C     |
 cwf-dev    | mbercx | UTF8     | C       | C     |
 postgres   | mbercx | UTF8     | C       | C     |
 pseudo-dev | mbercx | UTF8     | C       | C     |
 qe-dev     | mbercx | UTF8     | C       | C     |
 subcon-dev | mbercx | UTF8     | C       | C     |
 super-dev  | mbercx | UTF8     | C       | C     |
 template0  | mbercx | UTF8     | C       | C     | =c/mbercx        +
            |        |          |         |       | mbercx=CTc/mbercx
 template1  | mbercx | UTF8     | C       | C     | =c/mbercx        +
            |        |          |         |       | mbercx=CTc/mbercx
 test-dev   | mbercx | UTF8     | C       | C     |
(10 rows)

But the tedious part is setting up all the computers/codes etc afterward which you might want in your dev environment. I typically have a Makefile with all the commands to do this, but that’s not really practical. I think what we’re looking for here would be more something along the lines of:

github.com/aiidateam/aiida-core

Allow for `verdi --config`

opened 03:54PM - 06 Jul 23 UTC

chrisjsewell

topic/verdi type/enhancement

It should be possible to supply a YAML file to `verdi` at the *top-level* This …would be similar to how you use e.g. `verdi code core.installed --config config.yaml`, **but** the YAML would also provide the parts of the `verdi` *path*, i.e. `code` and `core.installed`. The reasoning for the addition is that, especially with additions like `Code` types and potential changes like #6023, you can no longer store the "full" information about a code/profile configuration in the YAML, e.g. the `core.installed` and `core.sqlite_dos` Such a YAML might look like then (note the `command` key): ```yaml command: - profile - setup - core.sqlite_dos profile: test_aiida email: aiida@localhost first_name: Giuseppe last_name: Verdi institution: Khedivial db_path: some/path/db.sqlite repository: /tmp/test_repository_test_aiida/ broker_protocol: amqp broker_username: guest broker_password: guest broker_host: 127.0.0.1 broker_port: 5672 broker_virtual_host: '' ``` and you would just run `verdi config --yaml config.yaml`

This could then also be integrated with AiiDA-project.

sphuber · November 21, 2023, 1:14pm

The issue you link for verdi --config is not really about that though. Rather, it tries to address the fact that the verdi commands that have dynamic sub-commands, such as verdi code create, now no longer take YAML files that are completely self-contained. For example, if you have a YAML to setup a code, that file doesn’t include the exact subcommand of verdi code create should be called and the user should know that. The idea would be to add --config to the verdi code create base group and that config file can contain the actual subcommand, in addition to the actual command options.

Then, to get back on topic. What we really need is a command that can easily export codes (and their computers). There is the verdi archive create method, but archives are not the ideal vehicle for this. Rather, we would ideally have it be exported to a YAML file. Then there would be a single command that can recreate the codes and computers from said YAML. The latter is actually quite straightforward, especially with the recent improvements using pydantic for these models, and I have already a working version that we use internally at Microsoft. The trickier part is the former though. That is to say, an inflexible hard-coded ad-hoc solution for each code/computer type will be easy, but writing a solution that is applicable to any plugin is what is tricky.

Now that we have pydantic for storage configurations, I will migrate the Code implementations as well. This will make it easier to write such a generic solution. Then we just need to do the Transports as well, and we are almost there.

mbercx · November 21, 2023, 3:51pm

Ah yes, it seems I projected my desires on that issue quite a bit, thanks for correcting @sphuber.

I think we may be hijacking this topic for a discussion that is somewhat unrelated though, so I will resist the urge to respond here and open a new discussion in the Developer category later today, summarizing the notes above.

system · November 27, 2023, 11:52am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Storage corruption when creating new AiiDA instance General Usage aiida	8	53	April 7, 2025
Performance issues with verdi commands `verdi process` and `verdi storage` General Usage	8	132	May 28, 2024
Issues with .aiida/container folder General Usage question , discussion	10	83	October 11, 2023
`verdi archive` runs out of space even though it is available. Maybe chunking does not work? General Usage	8	82	June 7, 2024
Migrating pg database and aiida profiles onto another machine General Usage question	4	99	September 19, 2023

Clean AIIDA storage after deleting the nodes

Related topics