@ahkole, you’ve posed some excellent questions that would be a great addition to the FAQ section of our documentation website. I’ll do my best to provide answers to each of them.
BTW, what @sphuber suggested is very useful.
(1) Is there a difference between @calcfunction and @task.calcfunction? …
They are the same if you don’t pass any arguments to @task.calcfunction
. However, if the calcfunction returns mutiple AiiDA data node, then you need to define the outputs
explicitly:
@task.calcfunction(outputs=[{"name": "sum"}, {"name": "diff"}])
def add_minus(x, y):
return {"sum": x + y, "difference": x - y}
WorkGraph requires users to explicitly define ports to establish connections between tasks. This strict requirement reduces flexibility, but it clearly defines dependencies, enhancing the workflow’s readability and robustness.
- What exactly is a @task task and … For which kinds of tasks would you use @task?
When running a @task
task, i.e., a normal task
, in the engine, it is just a normal Python function, it does not store any provenance information. If you are familar with writing workflow using WorkChain, e.g., the While_
in a workchain.
spec.outline(
cls.setup,
while_(cls.should_run_process)(
cls.run_process,
cls.inspect_process,
),
cls.results,
)
# example
@classmethod
def should_run_process(cls):
return self.ctx.some_force < 0.01
This cls.should_run_process
does some calculation (compare value), but does not need to store provenance information. Similar in workgraph, you can use @task
to define a task that does not need to store provenance information. Acutally, all the methods listed in the outline
do not generate new AiiDA data node, but they are used to gather the results and transfer them to the next step throught the context variable.
(3) I am also trying to understand which (kinds of) ports are generated when using different function signatures. …
For a single input port kwargs
, because it is variable keyword arguments, when run the function in the engine, it will pass **kwargs
to the function, e.g.,
f(**{x=Int(1), y=Int(2)})
This is equivalent to to f(x=Int(1), y=Int(2))
. You can run the example code, and use verdi process show pk
to show the ports of a process. Here is the info:
Inputs PK Type
---------- ------ -------------
energies
Al_0 134994 Float
Al_1 134995 Float
Al_2 134996 Float
structures
Al_0 134988 StructureData
Al_1 134989 StructureData
Al_2 134990 StructureData
Here is the info for the graph_builder
:
Outputs PK Type
--------------- ------ -------------
datas
structures
Al_0 134988 StructureData
Al_1 134989 StructureData
Al_2 134990 StructureData
energies
Al_0 134994 Float
Al_1 134995 Float
Al_2 134996 Float
execution_count 134993 Int
The datas
output is a nested namespace. In the engine, when pass it to the kwargs
port of next task, the engine will run get_stabest_structure(**datas)
, thus generate the inputs
as shown before.
Is there also a @task.workfunction that you can define? And…
Yes, you can use @task.workfunction
to define a workfunction task.
Yes, you can add input ports with the same syntax.
And what kind of task is a @task.graph_builder? Is it more similar to a calculation or a workflow? I.e., is it allowed to create new data nodes? Is it allowed to return existing data nodes? Both?
I would say it is a normal task
, it does not store any provenance information on its inputs and outputs. Looking at the return value of the graph_builder
function, it does not return an AiiDA node, but always return a WorkGraph
object. Similar, you can pass any data type (AiiDA data node, python object, etc) to the graph_builder
function. That’s why it very flexible, and you can really create dynamic (complex) nested workflows using graph_builder.
Again, let’s compare it with the WorkChain. For example, let’s say we have a WorkChain that calculate the energies of a set of structures, and then find the minimum energy.
class ExampleWorkChain(WorkChain):
@classmethod
def define(cls, spec):
super().define(spec)
spec.input_namespace('structures', valid_type=StructureData)
spec.outline(
cls.run_dft,
cls.inspect_dft_result,
cls.find_minimum,
cls.results
)
spec.output('result')
def run_dft(self):
processes = {}
for key, structure in self.inputs.structures.items():
process = self.submit(SomeCalclation, inputs = {'structure': structure})
processes[key] = process
self.to_context(**processes)
def inspect_dft_result(self):
dft_outputs = {}
for key in self.inputs.structures:
process = self.ctx[key]
dft_outputs[key] = process.outputs.some_output
self.ctx.dft_outputs = dft_outputs
def find_minimum(self):
self.ctx.minimum = find_minimum(**self.ctx.dft_outputs)
def results(self):
self.out('result', self.ctx.minimum)
Using WorkGraph and @task.graph_builder
:
@task.graph_builder(outputs=[{"name": "dft_outputs", "from": "context.dft_outputs"}])
def run_dft(structures):
"""Run DFT calculation for a set of structures."""
wg = WorkGraph()
for key, structure in structures.items():
dft_task = wg.add_task(SomeCalclation, inputs = {'structure': structure})
dft_task.set_context({"some_output": f"dft_outputs.{key}"})
# return the workgraph
return wg
wg = WorkGraph()
run_dft_task = wg.add_task(run_dft, structures = structures)
wg.add_link(run_dft_task.outputs["some_output"], inspect_dft_result_task.inputs["dft_outputs"])
wg.add_task(find_minimum, inputs = run_dft_task.outputs["dft_outputs"])
The run_dft
graph_builder task has similar functionality as the run_dft
and inspect_dft_result
methods in the WorkChain. But do we lose the provenance information by using @task.graph_builder
? No, the key is the WorkGraph object which returned by the graph_builder. After we launching the return WorkGraph object, it created a new process node, which will store all the provenance information.
But is there any risk of losing provenance information when using @task.graph_builder
? Yes, if you created a AiiDA data inside the graph_builder function, and passed it into the return WorkGraph object, then you will lose the provenance information.
Same risk happens in the WorkChain, if you create a AiiDA data node inside a method, e.g., inspect_dft_result
, and pass it to the next step, then you will lose the provenance information. And I found this is very common in the WorkChain! For example, in the PdosWorkChain, it generates a Dict
node, and pass it as input node for the dos calculation. However, the developer (or user) needs to decide whether to keep strict provenance information or not. It is a trade-off between the flexibility and the provenance information.