Quantumespresso.pw.relax seems to copy label from input to output structures

(Apologies if this should be an Issue against the relevant plugin repo. I see lots of questions about aiida-quantumespresso here so it seemed ok!)

I have been trying to use aiida-quantumespresso for geometry relaxation with a few different XC functionals. To easily find/reuse my input structure it is marked with a label.

However, the workchain is creating new structures with the same label. I think these are the relaxed output? This then breaks subsequent code that tries to access the node by label, as it no longer resolves unambiguously.

Is that a bug, or expected/idiomatic behaviour of an aiida workflow? Surely we should use “group” to tag multiple nodes and “label” should always be unique?

(What I would really like to do is provide a fresh label that is assigned to the relaxed structure, so I can easily find it later. I think this would have to be managed at workflow level, as my outer script doesn’t know when that file has been created.)

Hi @ajjackson

this is indeed not a bug, but also not necessarily an expected behavior for all AiiDA workflows in general. It is rather a decision that has been made in the case of AiiDA-QE.

The reason why you observe that behavior is the following: When the calculation has finished and the output structure is parsed/created ( aiida-quantumespresso/src/aiida_quantumespresso/parsers/pw.py at main ¡ aiidateam/aiida-quantumespresso ¡ GitHub ), the input structure is cloned (aiida-quantumespresso/src/aiida_quantumespresso/parsers/parse_raw/base.py at main ¡ aiidateam/aiida-quantumespresso ¡ GitHub). This means that the input structure and all its attributes are first copied to the new output structure and afterwards the positions and the cell are updated.

One way to handle this could be to define a label in the extras of the WorkChain. After submission, you can do it like this: workchain_node.base.extras.set(‘label’, ‘my-label’) (note, this is not the label that you set at the node label. So instead of ‘label’, you could also define any other key, to avoid confusion).

In a next step, when doing the post-processing/analysis, you can then query for the WorkChainNodes that define such a key in the extras and do a nested query, i.e., get the uuid or node of the input and output structure. (How to find and query for data — AiiDA 2.7.1 documentation)

In case you think that this approach seems reasonable for your use-case, but you don’t have much experience with the QueryBuilder, don’t hesitate to reach out again. We can provide further assistance.

1 Like

I see! clone() is implemented on the base Data class and does not have any special treatment for label. aiida-core/src/aiida/orm/nodes/data/data.py at 669f249e895623c1edcdb9cca4195baa4baa3ecc ¡ aiidateam/aiida-core ¡ GitHub
Perhaps there is a question to be addressed at that level: does it ever make sense to clone a label? Should that be the default behaviour of Clone when various parts of aiida expect labels to be unique?

In my use-case I am currently calling multiple workchains (with different XC functionals, dispersion parameters) on the same structure input, and would prefer to maintain that connection in the graph. I’m leaning towards a workflow that frequently dumps data to .aiida files and imports them into fresh VMs, so would prefer not to rely on PK values too much.

Defining a label in the workchain extras solves half the problem here, of retrieving data related to a particular workchain instance. (I can already achieve something similar by labelling the workchain node directly, just after submitting it.) But the problem remains that my labelled input data becomes inaccessible to load_node(label=”mylabel”); the workflow breaks a reasonable-looking operation that worked before. A more flexible query might not raise an error, but instead return the wrong node.

I suppose I could replace load_node(label=...) with a query function that finds a node carrying the label without any parents. But that could break down once I start doing e.g. convergence tests on relaxed structures.

Thanks for the interesting questions!

In general, clone tries to clone the node identically, including label, description, extras, …

As a note, label is not required to be unique; the only part where this is highly recommended is for codes, because indeed the label is often used for fetching it. The fact that load_node allows loading from label is for that reason, to have a single interface also for codes.

However, I see your problem. We could argue that in the QE parser, we should unset label/description/extras when cloning. But I don’t know if someone is relying now on that functionality…

In your case, I suggest that you keep using the label (or even better use extras, since you seem to have more than 1 parameter, so the queries can be more flexible - but that’s up to you). But you also put all nodes you want to be able to get in a group, e.g. source_structures.

You can then define a helper function e.g. def load_node_from_group(group_name, label) that internally uses query builder to return the node in that group with the required label (or if you only use one group name, you can even hardcode the group in the function of course, def load_node_from_source(label). (I suggest this is better than looking to cases without parents, both because it’s more complex to describe in the query builder, and because as you say it’s more prone to unexpected behavior; explicitly putting in a group makes you decide which nodes you want, and you can even replace them in the future if e.g. you decide not to use a structure anymore, etc.)

However, I see your problem. We could argue that in the QE parser, we should unset label/description/extras when cloning. But I don’t know if someone is relying now on that functionality…

I don’t have a strong opinion here, because one would typically use the pk or uuid to load data nodes, instead of the label, as outline above (or query based on the extras). Pinging @mbercx here, as this could in principle be addressed in AiiDA-QE v5.0, in case we decide that it makes sense to rethink the cloning.

I’m leaning towards a workflow that frequently dumps data to .aiida files and imports them into fresh VMs, so would prefer not to rely on PK values too much

I see, that’s why I was also suggesting the uuid instead of the pk as an alternative identifier. However, I also understand if you prefer a more “human-readable”, in that case, the helper function would indeed be a useful approach

1 Like

PS - the admin above was me, I was logged in as admin by mistake

1 Like

I could maintain a lookup table of uuid somewhere, but as a new user am trying to learn existing features of AiiDA. Combining label with a group such as source_structures seems a very nice solution, that improves the overall usability of the profile/graph :smiley:

How safe am I, then, to assume that clone() and similar operations used inside workfunctions will not put other nodes into the group? Are there conventions around this?

Unless you explicitly add nodes to the group, there should be no process (or cloning operation) that adds nodes to a group. This being said, this is a safe approach, as long as the user doesn’t explicitly add unwanted nodes to that group

1 Like

Sorry for being late to the party, and thanks @ajjackson for the comments! I never really used the label in this way, but I can see why you did. Although I also use extras for this purpose as @giovannipizzi suggested, using load_node(label=...) is a fair bit more succinct than a query. ^^

Seems I last touched that piece of code, but that was just changing variable names. The .clone approach was added some time ago:

Since .copy() doesn’t exist anymore, I assume this is from the pre-v1.0 aiida-core days. If only the commit message would explain. :wink:

I doubt the label was considered in this change. Perhaps someone is using the fact that the label remains the same. I can imagine that e.g. for the MC3D we could have labeled each structure with the source ID and it would be handy if that was passed on. I’ve opened an issue to track this:


As a side note: I have also been tripped up by label not being unique initially, in the context of Code instances. Now I use it all the time (pw@localhost, pw@eiger, …). Funny enough, I never think of regular Nodes having labels, only codes, computers, groups, …

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.