Strange behaviour of calcfunctions when many links are generated in outputs

bastonero · March 12, 2025, 10:07am

Dear all, I am experiencing a weird behaviour of calcfunction for a method I wrote to extract structures. This method can generate ~10^3 – 10^4 output nodes, and in the provenance graph something very strange happens:

...
        └── FirstWorkChain<5065414> Waiting [2:if_(...)]
            ├── SecondWorkChain<5065417> Finished [0] [2:run_...]
            │   ├── ThirdWorkChain<5065418> Finished [0] [4:if_(...)]
            │   │   ...
            │   ├── get_structures<5065474> Running [0]
            │   └── get_structures<5066148> Finished [0]
...

In the SecondWorkChain where the get_structures is called there is no loop that might call a second time the calcfunction. It seems that the first calcfunction job finishes successfully, but somehow does not lock completely (as i guess is still creating the output links). At this point, somehow a second call is made to the same function and it finished correctly. This is very weird, and I also experienced this behaviour multiple times with even more than 2 calls (even 10 at times).

Is there a problem or a known limitation in having so many output nodes? How to circumvent this?

Thanks a lot for any pointer.

giovannipizzi · March 13, 2025, 8:04pm

Hi Lorenzo! I don’t think there should be a limitation. Which version of AiiDA-core and main dependencies are you using? Which backend? (PSQL+dostore?) Would you be able to make a reproducible mock example so we can debug more easily? Pinging @agoscinski @geiger_j @jusong.yu

bastonero · March 13, 2025, 8:30pm

Thanks Giovanni for the help! It is PostgreSQL + RabbitMQ. The aiida version i am using is 2.4.0 with a “special patch” from Sebastiaan from a while ago (this is the discussion: Excepted workchains, due to strange error from kiwipy/plumpy?).
I don’t know whether that fix has now been implemented in the latest main branch of aiida-core.
The error though seems kind of random, depending also on the rest of the daemon workload, and even running using caching to skip the steps the behaviour was quite different each time. So, I guess it’s probably related to the version of aiida-core i am using, and by the fact i am running quite heavily on my workstation (?).

giovannipizzi · March 13, 2025, 8:48pm

Mmm, it might be. A few things have been fixed recently in the current main branch. If you could try to see if you can reproduce in a test environment with master (soon to be released as 2.7) that would be great.

sphuber · March 14, 2025, 10:38am

This is very likely due to the bugs in the engine that could result in multiple daemon workers running the same task (exactly as in the thread you linked). This has been fixed and should be released with v2.7 if I remember correctly.

bastonero · March 14, 2025, 11:26am

Thanks a lot @sphuber !

system · March 20, 2025, 7:26am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to return an arbitrary number of items from a calcfunction/workfunction General Usage question	2	62	May 3, 2024
Aiida-workgraph: How to handle dynamic inputs and outputs of tasks General Usage	10	137	August 30, 2024
NEB implementation in Aiida WOrkgraph for QE General Usage question , aiida , plugin	3	28	February 3, 2025
Lose provenance inside WorkChain General Usage	6	82	October 18, 2023
Recommended way of retrieving additional files of a calculation General Usage	11	202	February 21, 2024

Strange behaviour of calcfunctions when many links are generated in outputs

Related topics