How to debug `engine.submit` processes?

agoscinski · September 26, 2024, 4:48pm

Hello all,

Sometimes I would like to debug submitted jobs to see what is going wrong, but I am not sure if this is possible. Often I can go back to engine.run but some things only fail during engine.submit and it would be therefore helpful to be also debug engine.submit jobs.

On one side I am puzzled how processes are executed with engine.submit. When I engine.run a job I end up in the plumpy.processes.Process.execute function but it appears that this does not happen in the engine.submit case (I write to a file to an absolute path as a check). I tried to see where engine.submit goes and besides the initialization I could track it to send_task with the message

{'task': 'continue', 'args': {'pid': 354, 'nowait': False, 'tag': None}}

where this goes to kiwipy.rmq.threadcomms.RmqThreadCommunicator. Is it possible to see where this task is received? I tried the web interface from rabbitmq_management but I cannot find anything in this interface. In this stackexchange other alternatives are suggested but maybe there is already a solution that people have established?

Another problem I have is that the process is run by a daemon worker, I guess would need to get into its running instance. I tried to use verdi daemon start --foreground and verdi daemon worker to see if somehow this allows me to receive kill commands for which I know the breakpoints for in the plumpy code, but it does not really work.

sphuber · September 26, 2024, 10:45pm

The send_task sends a task to RabbitMQ. That will then send the task to any worker that is subscribed to the task queue, which will then deserialize the process from the database and continue running it.

So you are right, it will probably end up in a daemon worker if you are running the daemon. To debug this, I would shut down the daemon, and then run verdi daemon worker which runs a single worker in the foreground. Or rather, you probably want to run the actual python function that the CLI command calls, so that you can call it from a debugger. It should be aiida.engine.daemon.worker.start_daemon_worker.

agoscinski · October 4, 2024, 9:04am

Thanks for the reply! Now it works. I was trying to do this before but I think because I had a daemon running for some other project and used the same ports for rabbitmq my verdi daemon worker did not pick up the jobs. Also important to decrease the number of existing daemon workers to 0 so one can be sure only the verdi daemon worker pick up the jobs.

Or rather, you probably want to run the actual python function that the CLI command calls, so that you can call it from a debugger. It should be aiida.engine.daemon.worker.start_daemon_worker .

On can also run the verdi commands in debug mode

python -m pdb $(which verdi) daemon worker

One has to go through some CLI layers, but creating a breakpoint in the aiida source code however makes this easy aiida-core/src/aiida/cmdline/commands/cmd_daemon.py at c7c289d3892bf76894714f53f58b7ce5b0761178 · aiidateam/aiida-core · GitHub . In the end your suggestion does not require to make temporary changes in the existing code base.

Topic		Replies	Views
How to debug a workflow stuck in "Waiting" state General Usage question	25	234	January 24, 2024
Warning: Inconsistencies detected between database and RabbitMQ New to AiiDA rabbitmq	3	79	April 19, 2024
Process status is created only New to AiiDA	0	83	March 11, 2024
A walk through calcjob, from beginning to the end? Developer	1	62	October 18, 2024
Graceful kill - instruct a paused job to retrieve results General Usage question	7	99	November 29, 2023

How to debug `engine.submit` processes?

Related topics