ahxbcn
February 28, 2025, 9:06am
1
Hello, everyone! I just started to use AiiDA few days ago to retrieve calculated atomic magnetic moment from spin-polarized calculations in publicly avaliable MC3D and MC2D dataset in my research program. I’ve downloaded the database files from Materials Cloud (MC3D: Materials Cloud Archive , MC2D: Materials Cloud Archive ), but I found it difficult for me to use this database. How can I use datas in these database files? Is there any tutorials about it?
Hello, I think this recent blog post might be what you are looking for!
Hi @ahxbcn ,
Thanks for your post, and welcome! We hope you are enjoying your AiiDA journey so far
To add to Giovanni’s response, we also recently published a blog post on getting data out of AiiDA, which can be found here:
Improvements in the ways to get your data out of AiiDA — AiiDA documentation .
Further information is also available on AiiDA’s ReadTheDocs page, e.g., here:
There are also past tutorials that cover this topic:
However, note that those guides were used in past, week-long AiiDA tutorials, and, therefore, don’t have a guarantee to fully reflect the current state of the code base. Overall, your best bet is still the ReadTheDocs of aiida-core
.
Lastly, I am currently working on a feature to extract data from AiiDA to disk in a sensible directory tree structure, in this PR:
main
← GeigerJ2:feature/verdi-profile-mirror
opened 12:56PM - 22 Jan 25 UTC
## Design questions:
- If a Node itself is not created newly, but only added … to a new group, and the mirroring is run, it is not dumped, as the node is not _new_... does adding to a group affect its `mtime`? Possibly not. Other way to get this change, e.g., check collections since the last dump?
- If a sub-workflow/calculation of another workflow is put into its own group, no folder for this group is created with the default settings
- When I delete a group, but don't delete its nodes, and I re-run the mirroring, this change is not picked up, as the filtering is done by the mtime of the node -> Currently working on this.
- Three dumping modes:
- `incremental=False, overwrite=False`: Will error out with `FileExistsError` if the directory exists, goes through if doesn't exist or empty.
- `incremental=True, overwrite=False`: Will keep original main directory, but update subdirectories with new data.
- `incremental=False, overwrite=True`: Will clean main directory and perform the dumping again from scratch.
- If both options are set to `True`, `--overwrite` will take precedence, and a report message will be issued to the user. This is because `--incremental` is by default `True`, as it is the most sensible option, and should not be required to be always specified. However, if also `--overwrite` is set, we don't raise an exception (as I had it initially implemented), as that would require the user to always pass `--overwrite --no-incremental`, which is annoying. Automatically setting `--incremental` to `False` if `--overwrite` is specified could be handled by a `click` callback, but for now I just change the options on the fly at a later stage in the code.
- Ways to specify the output path for dumping/mirroring (of all, processes, groups, and the profile data):
- Passing no value should generate a sensible default output path in all cases.
- If a relative path is given, this should be created under the CWD.
- If an absolute path is given, this should be used as the top-level directory of the dumping.
- These three options should be handled in the same way via the verdi CLI, as well as the top-level `dump` method of each Dumper class, so that the path can be set accordingly via the Python API.
- To achieve this, internally, the path is split into the `dump_parent_path` (absolute, defaults to CWD) and an `output_path` part (relative, either provided by the user, or automatically generated), which, combined, yield the full top-level path where the files are dumped.
## General notes:
- What happens if I delete a calculation that was called by another workchain, from AiiDA's DB, and I run with the `--delete-missing` option? -> Possibly use `graph_traversal_rules` like for `verdi node delete` when updating directories after a node was deleted.
- What to do if group gets deleted and `verdi profile mirror --delete-missing` is executed? Should also keep track of the groups in the DumpLogger, and delete the directory in that case.
- `dump_parent_path` is the CWD from which the dumping/mirroring command is called, while `dump` still provides an `output_path` parameter to denote the directory name of the profile, group, or process that will be dumped. This is optional, and if not provided by the user, it will be auto-generated.
- Possibly use `graph_traversal_rules` and add `get_nodes_dump` to `src/aiida/tools/graph/graph_traversers.py`, as well as `AiidaEntitySet` from `src/aiida/tools/graph/age_entities.py`, etc., to first obtain the nodes, and then run the dumping.
- Use `@property.setter` and `@property.getter`
## (Possible) future TODOs:
- [ ] Allow specifying options via config file
- [ ] Add data dumping (raw/rich)
- [ ] Support batch operations?
- [ ] Keep track of symlinks and/or dumped node types in `DumpLogger`?
- [x] Expose endpoint to `CollectionDumper` and allow for mixed node types
- [ ] Add option to check directory existence/mtime/contents to determine what to do for incremental dumping
## Bugs
- `README` for dumped processes in wrong (too high) directory
While it is still a work in progress, it might be a feature you could be interested in, so feel free to subscribe to the PR. It will allow dumping files in the same way as if the calculations were run without AiiDA, making it simpler to use common Linux CLI tools like grep
. On the other hand, for programmatic access directly from AiiDA’s DB, the QueryBuilder
is still the way to go. Hope this helps!