Ok thanks for the additional information, everything is clear now. Unfortunately, it is currently not possible what you are trying to do in the way you are suggesting. The WorkChain
API will only actually start running subprocesses that are submitted in a workchain step after the step has returned.
If you are looping over the dates (i.e. number of independent runs) as you suggest:
spec.outline(
while_(cls.dates)(
cls.transfer_data,
cls.run_sim
),
cls.finalize,
)
When you launch the transfer job in transfer_data
it will only be actually started once transfer_data
returns and it is only when that job finished that the workchain will move on to run_sim
.
One alternative would be to change the logic to:
spec.outline(
cls.transfer_all_data,
cls.run_all_sim,
cls.finalize
)
Here instead of looping in the outline, you simply call transfer_all_data
once and submit all transfer jobs. This way they will be running in parallel and once all are done, it will call run_all_sim
and you can submit all simulation jobs. One potential downside is that if there is a big change in runtime for the transfer jobs (either intrinsically or due to waiting times in the queue), the slowest one will determine the start time for all simulation runs.
There is a way to get around this and that is to write a sub worfklow that just does one date. It would then have two steps: launch the transfer job followed by running the simulation job. You could then have a parent worfklow that loops over the dates and calls the subworkflow:
spec.outline(
while_(cls.dates)(
cls.run_sub_workflow
),
cls.finalize,
)
and the sub workflow would have date
as an input and then just have
spec.outline(
cls.transfer_data,
cls.run_sim
)
In this way, your transfer and sim jobs for each date are truly independent from one another and I think should minimize unnecessary waiting time.
Hopefully that helps.