How to declare dependencies between calculations?

I am submitting calcuation jobs as well as workchains to the daemon. Is there a possibility to declare a calculation to be dependent on another?

In my case the outputs of one should be the inputs of another and I would like to submit several “pipelines” of these dependent calculations to an hpc system.

I know that I can use e.g. run() instead of submit() and maybe also await_processes() to get the right execution order but this increases the overall runtime (wait time) a lot.

I also think that aiida-workgraph may allow this structure of dependant calculations but maybe there is also some AiiDA-native functionality?

Thanks!
Lars

The AiiDA native way of defining a workflow with sequential dependencies is using a workfunction or a WorkChain. The workfunction is the simplest but also the most limited. You simply write a python function decorated with @workfunction that calls sequentially the processes you want to call. Since it is a normal blocking execution, the dependencies are handled implicitly by the Python invocation order itself.

In most cases, you would want to use a WorkChain instead. The outline defined in the spec defines the higher level logic. In each step you can submit new subprocesses and by using WorkChain.to_context (or return the ToContext from the step) you communicate to the engine that the workchain should wait until those processes complete before going to the next step.

The downside of writing a WorkChain is that it requires writing a class with quite a bit of boilerplate and it has to be discoverable by the daemon. So you either have to make sure it is in the PYTHONPATH or you register it with an entry point. The aiida-workgraph was conceived exactly to make this easier. Like aiida-shell it aims to make writing custom calculations/workchains on-the-fly a lot easier. Note that aiida-workgraph is in very active development and the API is not stable yet. So if you are writing a workflow that you will be using more often, it might still make sense to invest the time to write it as a WorkChain.

2 Likes

Hello Lars,

Yes, aiida-workgraph is specifically designed to facilitate the creation of workflows where tasks are dependent on one another. In this framework, you can link the output of one node directly to the input of another, establishing a clear dependency. Bedsides, aiida-workgraph also allows for explicit dependencies to be set up between nodes, even in cases where there is no direct data transfer between them. This feature can be invaluable for managing complex workflows and ensuring that certain conditions or staging requirements are met before proceeding.

I personally use aiida-workgraph in my daily research and have found it to be extremely flexible and beneficial. As @sphuber mentioned, the tool is still under active development, and we are in the process of finalizing its API. Eventually, WorkGraph is expected to become an integral component of AiiDA, similar to existing tools like WorkFunction and WorkChain.

If you decide to incorporate aiida-workgraph into your workflow, your feedback would be incredibly valuable. We encourage you to share your experiences, and feel free to contribute by reporting issues or submitting pull requests.

Best regards,
Xing

1 Like

Hi Lars,

Interestingly, another user also recently asked for such a feature, in particular, to be able to set the --dependency <jobid> option of the SLURM scheduler. This would allow submitting multiple jobs immediately, possibly reducing the queueing time, rather than writing the logic in a WorkChain where AiiDA waits until completion of the job until the next one is submitted to the (daemon and) scheduler. Is that in line with your use case, as well?

In principle, I don’t think such a functionality should be too difficult to implement. Once the job is submitted, AiiDA has access to the scheduler jobid, so one could write the correct --dependency to the prepend_text of the following CalcJob. Of course, one doesn’t get all the advantages that writing an actual WorkChain provides (and the idea is specific to SLURM right now), but it might still be a useful feature, that I’ll be working on one of these days. The aiida-hyperqueue, or, in general, task farming don’t seem to exactly cover this use case. There, running multiple jobs on one node seems to be the main concern, while this doesn’t necessary have to be the case here. Maybe @sphuber has some additional insight.

Cheers,
Julian

1 Like

Hi, I think this is indeed an interesting feature (we’ve received various requests on this I think, @geiger_j also in the context of W&C workflows.

One note is that I think the concept of job dependencies is not only in SLURM, but in many schedulers. We could quickly check but I think so.

We could then add the concept in an abstract way so it can be implemented by various schedulers.
E.g. adding a new attribute in the JobTemplate[aiida-core/src/aiida/schedulers/datastructures.py at 589a3b2c03d44cebd26e88243ca34fcdb0e23ff4 · aiidateam/aiida-core · GitHub) datastructure, e.g. depends_on where the value is a list of job IDs, and every plugin will convert appropriately.
We just need to think how to deal with this not being implemented by a given plugin.

Then, when submitting or in a workflow, one just has to set the corresponding metadata of the CalcJob.