How to debug a workflow stuck in "Waiting" state

Hi all,

I am trying to debug an issue where the top-level WorkChain is stuck in the “Waiting” Process State, even though all the sub-WorkChains have finished successfully (Exit Code 0).

$ verdi process list
    PK  Created    Process label      Process State    Process status
------  ---------  -----------------  ---------------  ----------------
117419  34D ago    ATMOSPEC workflow  ⏵ Waiting

Total results: 1

Report: last time an entry changed state: 1D ago (at 04:53:21 on 2024-01-16)
Report: Checking daemon load... OK
Report: Using 0% of the available daemon worker slots.

I’ve tried all the usual stuff - restarting the daemon and RabbitMQ, running verdi devel rabbitmq tasks analyze --fix with no luck. What else can I do? The workflow itself was launched with aiida-core 2.2. I’ve since upgraded to aiida-core 2.4 since I was seeing some other issue that was fixed in 2.4.

The only other thing is that when this WorkChain was launched, the daemon workers were very busy, so I tried to pause it via verdi process pause 117419. But at that time I got an error that the process was unreachable. So I am not sure if this process was succesfully paused. But since then I’ve issued the verdi process play many times with no effect.

Another observation: When I try to pause to process via verdi process pause, I sometimes get:

Error: failed to pause Process<117419>: deep copying a base Node is not supported

Other times it says Report: request to pause Process<117419> sent

Could you run the following in a shell and report the output

load_node(117419).checkpoint

Another observation: When I try to pause to process via verdi process pause, I sometimes get:

Error: failed to pause Process<117419>: deep copying a base Node is not supported

That I have never seen before, very weird. Don’t have an idea what could be causing that.

1 Like

The resulting output has 12k lines (there are 1000 Caljobs within this Workchain). I am attaching as a file.

checkpoint.txt (319.1 KB)

(side note: I noticed that we were not allowing file uploads so I changed to settings to allow text/document and image files to be uploaded. CC @mbercx)

Back when I was using v2.2 I was getting the full stack trace. I’ll try to get it again so we can further debug this. But it does not seem to be the immediate cause of this issue I guess. For the record, the error message comes from here:

We disabled this because we have a 5 GB storage limit. Is there now at least a size limit? If not, I would revert the change to be honest.

Actually, I think this may very well be the problem. We do not guarantee that processes that have been started with a specific version of AiiDA, will continue to work when AiiDA is upgraded. I now actually remember having addressed a bug in the last year that affected the CalcJob engine. The necessary fix would also change the expected layout of the process checkpoint, and so old checkpoints may no longer be compatible. I will try to remember what the change was and to see if I can remember what the change was, but I am not even sure if we can “reverse” engineer the correct checkpoint file. Do you have caching enabled for these subprocesses? In that case, your best bet would be to simply rerun the workflow I’m afraid.

By the way, did you check the daemon logs for an exception? I expect there should be an exception when the daemon worker that picks up the workflow tries to reconstruct the Process instance from the checkpoint and continues it. Presumable, the checkpoint is corrupt due to the upgrade of aiida-core and the daemon worker fails to continue running it.

Hmm, but I had the issue already in the 2.2 version, i.e. before the upgrade.

I remember checking and did not see it. What’s the best way to double check? Currently the verdi daemon logshow doesn’t show any exceptions. Presumable, if I restart the daemon it should try to pick up the checkpoint right?

Yes, if you hard restart the daemon verdi daemon restart --reset it should pick it up again and try to recreate the process and continue it. verdi daemon logshow essentially just does tail -f so this just shows the latest. You would want to look directly at the file. It should be in .aiida/daemon/log/aiida-{profile}.log

Hmm, after doing this I see nothing in the logs (at info verbosity level).

I don’t unfortunately (this is all happening inside the AiiDAlab Docker image btw). If we convince ourselves that the checkpoint is indeed the problem, then I am happy to just relaunch. But right now I don’t think it is since all the symptoms were already present for the v2.2 aiida-core version.

Looking at the older logs, I see a bunch of these:

12/17/2023 03:16:05 AM <964422> kiwipy.rmq.tasks: [ERROR] Exception occurred while processing task.

But I don’t know if they correspond to the WorkChain in question. This was about the time where I was dealing with How to kill a monster workflow :japanese_ogre:

Okay, this log entry is actually suspicious (I don’t see other entries for this PK)

Error: A subscriber with the process id<117419> already exists, which most likely means this worker is already working on it and this task was sent as a duplicate by mistake. Deleting the task now.

Should I try to run

verdi devel rabbitmq tasks revive

?

But is that a recent message? That error means the tasks was sent out again, erroneously, and this check is preventing that a second daemon worker starts working on the same process, which would be very bad, most likely leading to it excepting. Reviving the task shouldn’t help in this case, because apparently there already was an existing task.

I also wouldn’t ever use the revive command, because that has no security checks built-in. If you have to, stop the daemon and run verdi devel rabbitmq tasks analyze --fix. Or if you are on v2.5 you can run verdi process repair, that is the same command but easier to remember.

As for how to debug further. We would have to check what happens when the task gets picked up. Is that workchain 117419 the only one marked as active by verdi process list? Please stop the daemon and then run verdi devel rabbitmq tasks analyze. What is the output? It should say that there is one active process and it has a corresponding task.

If that is true, then keep the daemon stopped and increase the log level:

verdi config set logging.aiida_loglevel INFO

and start a single daemon worker in the foreground:

verdi daemon worker

This should now “block” the shell as it is running a daemon runner. You should now see log messages that it receives the task of the active process and tries to continue the process. Please report the output.

Yep!

$ verdi process list
     PK  Created    Process label      Process State    Process status
------  ---------  -----------------  ---------------  ----------------
117419  34D ago    ATMOSPEC workflow  ⏵ Waiting

Total results: 1
$ verdi daemon stop
Profile: default
Stopping the daemon... OK

$ verdi devel rabbitmq tasks analyze -v info
...
...
...
Info: RELEASE SAVEPOINT sa_savepoint_1
Info: [no key 0.00040s] {}
Info: Active processes: [117419]
Info: Process tasks: [117419]
Success: No inconsistencies detected between database and RabbitMQ.

$ verdi devel rabbitmq tasks list
117419
$ verdi config set logging.aiida_loglevel INFO
Success: 'logging.aiida_loglevel' set to INFO for 'default' profile

$ verdi daemon worker
/

The worker does not show any output. Perhaps the logging level config does not apply to it? When I run it with increased verbosity explicitly I get

$ verdi daemon worker -v INFO
...
...
...
Info: DbLink.output relationship direction symbol('MANYTOONE')
Info: DbLink.output lazy loading clause :param_1 = db_dbnode.id
Info: DbLink.output will use Session.get() to optimize instance loads
Info: (DbLink|db_dblink) initialize prop id
Info: (DbLink|db_dblink) initialize prop input_id
Info: (DbLink|db_dblink) initialize prop output_id
Info: (DbLink|db_dblink) initialize prop label
Info: (DbLink|db_dblink) initialize prop type
Info: (DbLink|db_dblink) _post_configure_properties() complete
Info: SELECT db_dbsetting.val 
FROM db_dbsetting 
WHERE db_dbsetting.key = %(key_1)s
Info: [generated in 0.00014s] {'key_1': 'repository|uuid'}
Info: ROLLBACK
Info: Pool disposed. Pool size: 5  Connections in pool: 0 Current Overflow: -5 Current Checked out connections: 0
Info: Pool recreating
Full output
-Info: (Obj|db_object) _configure_property(id, Column)
Info: (Obj|db_object) _configure_property(hashkey, Column)
Info: (Obj|db_object) _configure_property(compressed, Column)
Info: (Obj|db_object) _configure_property(size, Column)
Info: (Obj|db_object) _configure_property(offset, Column)
Info: (Obj|db_object) _configure_property(length, Column)
Info: (Obj|db_object) _configure_property(pack_id, Column)
Info: (Obj|db_object) Identified primary key columns: ColumnSet([Column('id', Integer(), table=<db_object>, primary_key=True, nullable=False)])
Info: (Obj|db_object) constructed
-Info: (DbSetting|db_dbsetting) _configure_property(id, Column)
Info: (DbSetting|db_dbsetting) _configure_property(key, Column)
Info: (DbSetting|db_dbsetting) _configure_property(val, Column)
Info: (DbSetting|db_dbsetting) _configure_property(description, Column)
Info: (DbSetting|db_dbsetting) _configure_property(time, Column)
Info: (DbSetting|db_dbsetting) Identified primary key columns: ColumnSet([Column('id', Integer(), table=<db_dbsetting>, primary_key=True, nullable=False)])
Info: (DbSetting|db_dbsetting) constructed
Info: (DbAuthInfo|db_dbauthinfo) _configure_property(_metadata, Column)
Info: (DbAuthInfo|db_dbauthinfo) _configure_property(aiidauser, RelationshipProperty)
Info: (DbAuthInfo|db_dbauthinfo) _configure_property(dbcomputer, RelationshipProperty)
Info: (DbAuthInfo|db_dbauthinfo) _configure_property(id, Column)
Info: (DbAuthInfo|db_dbauthinfo) _configure_property(aiidauser_id, Column)
Info: (DbAuthInfo|db_dbauthinfo) _configure_property(dbcomputer_id, Column)
Info: (DbAuthInfo|db_dbauthinfo) _configure_property(auth_params, Column)
Info: (DbAuthInfo|db_dbauthinfo) _configure_property(enabled, Column)
Info: (DbAuthInfo|db_dbauthinfo) Identified primary key columns: ColumnSet([Column('id', Integer(), table=<db_dbauthinfo>, primary_key=True, nullable=False)])
Info: (DbAuthInfo|db_dbauthinfo) constructed
Info: (DbComputer|db_dbcomputer) _configure_property(_metadata, Column)
Info: (DbComputer|db_dbcomputer) _configure_property(id, Column)
Info: (DbComputer|db_dbcomputer) _configure_property(uuid, Column)
Info: (DbComputer|db_dbcomputer) _configure_property(label, Column)
Info: (DbComputer|db_dbcomputer) _configure_property(hostname, Column)
Info: (DbComputer|db_dbcomputer) _configure_property(description, Column)
Info: (DbComputer|db_dbcomputer) _configure_property(scheduler_type, Column)
Info: (DbComputer|db_dbcomputer) _configure_property(transport_type, Column)
Info: (DbComputer|db_dbcomputer) Identified primary key columns: ColumnSet([Column('id', Integer(), table=<db_dbcomputer>, primary_key=True, nullable=False)])
Info: (DbComputer|db_dbcomputer) constructed
Info: (DbUser|db_dbuser) _configure_property(id, Column)
Info: (DbUser|db_dbuser) _configure_property(email, Column)
Info: (DbUser|db_dbuser) _configure_property(first_name, Column)
Info: (DbUser|db_dbuser) _configure_property(last_name, Column)
Info: (DbUser|db_dbuser) _configure_property(institution, Column)
Info: (DbUser|db_dbuser) Identified primary key columns: ColumnSet([Column('id', Integer(), table=<db_dbuser>, primary_key=True, nullable=False)])
Info: (DbUser|db_dbuser) constructed
Info: (DbComment|db_dbcomment) _configure_property(dbnode, RelationshipProperty)
Info: (DbComment|db_dbcomment) _configure_property(user, RelationshipProperty)
Info: (DbComment|db_dbcomment) _configure_property(id, Column)
Info: (DbComment|db_dbcomment) _configure_property(uuid, Column)
Info: (DbComment|db_dbcomment) _configure_property(dbnode_id, Column)
Info: (DbComment|db_dbcomment) _configure_property(ctime, Column)
Info: (DbComment|db_dbcomment) _configure_property(mtime, Column)
Info: (DbComment|db_dbcomment) _configure_property(user_id, Column)
Info: (DbComment|db_dbcomment) _configure_property(content, Column)
Info: (DbComment|db_dbcomment) Identified primary key columns: ColumnSet([Column('id', Integer(), table=<db_dbcomment>, primary_key=True, nullable=False)])
Info: (DbComment|db_dbcomment) constructed
Info: (DbGroupNode|db_dbgroup_dbnodes) _configure_property(id, Column)
Info: (DbGroupNode|db_dbgroup_dbnodes) _configure_property(dbnode_id, Column)
Info: (DbGroupNode|db_dbgroup_dbnodes) _configure_property(dbgroup_id, Column)
Info: (DbGroupNode|db_dbgroup_dbnodes) Identified primary key columns: ColumnSet([Column('id', Integer(), table=<db_dbgroup_dbnodes>, primary_key=True, nullable=False)])
Info: (DbGroupNode|db_dbgroup_dbnodes) constructed
Info: (DbGroup|db_dbgroup) _configure_property(user, RelationshipProperty)
Info: (DbGroup|db_dbgroup) _configure_property(dbnodes, RelationshipProperty)
Info: (DbGroup|db_dbgroup) _configure_property(id, Column)
Info: (DbGroup|db_dbgroup) _configure_property(uuid, Column)
Info: (DbGroup|db_dbgroup) _configure_property(label, Column)
Info: (DbGroup|db_dbgroup) _configure_property(type_string, Column)
Info: (DbGroup|db_dbgroup) _configure_property(time, Column)
Info: (DbGroup|db_dbgroup) _configure_property(description, Column)
Info: (DbGroup|db_dbgroup) _configure_property(extras, Column)
Info: (DbGroup|db_dbgroup) _configure_property(user_id, Column)
Info: (DbGroup|db_dbgroup) Identified primary key columns: ColumnSet([Column('id', Integer(), table=<db_dbgroup>, primary_key=True, nullable=False)])
Info: (DbGroup|db_dbgroup) constructed
Info: (DbLog|db_dblog) _configure_property(_metadata, Column)
Info: (DbLog|db_dblog) _configure_property(dbnode, RelationshipProperty)
Info: (DbLog|db_dblog) _configure_property(id, Column)
Info: (DbLog|db_dblog) _configure_property(uuid, Column)
Info: (DbLog|db_dblog) _configure_property(time, Column)
Info: (DbLog|db_dblog) _configure_property(loggername, Column)
Info: (DbLog|db_dblog) _configure_property(levelname, Column)
Info: (DbLog|db_dblog) _configure_property(dbnode_id, Column)
Info: (DbLog|db_dblog) _configure_property(message, Column)
Info: (DbLog|db_dblog) Identified primary key columns: ColumnSet([Column('id', Integer(), table=<db_dblog>, primary_key=True, nullable=False)])
Info: (DbLog|db_dblog) constructed
Info: (DbNode|db_dbnode) _configure_property(dbcomputer, RelationshipProperty)
Info: (DbNode|db_dbnode) _configure_property(user, RelationshipProperty)
Info: (DbNode|db_dbnode) _configure_property(outputs_q, RelationshipProperty)
Info: (DbNode|db_dbnode) _configure_property(id, Column)
Info: (DbNode|db_dbnode) _configure_property(uuid, Column)
Info: (DbNode|db_dbnode) _configure_property(node_type, Column)
Info: (DbNode|db_dbnode) _configure_property(process_type, Column)
Info: (DbNode|db_dbnode) _configure_property(label, Column)
Info: (DbNode|db_dbnode) _configure_property(description, Column)
Info: (DbNode|db_dbnode) _configure_property(ctime, Column)
Info: (DbNode|db_dbnode) _configure_property(mtime, Column)
Info: (DbNode|db_dbnode) _configure_property(attributes, Column)
Info: (DbNode|db_dbnode) _configure_property(extras, Column)
Info: (DbNode|db_dbnode) _configure_property(repository_metadata, Column)
Info: (DbNode|db_dbnode) _configure_property(dbcomputer_id, Column)
Info: (DbNode|db_dbnode) _configure_property(user_id, Column)
Info: (DbNode|db_dbnode) Identified primary key columns: ColumnSet([Column('id', Integer(), table=<db_dbnode>, primary_key=True, nullable=False)])
Info: (DbNode|db_dbnode) constructed
Info: (DbLink|db_dblink) _configure_property(input, RelationshipProperty)
Info: (DbLink|db_dblink) _configure_property(output, RelationshipProperty)
Info: (DbLink|db_dblink) _configure_property(id, Column)
Info: (DbLink|db_dblink) _configure_property(input_id, Column)
Info: (DbLink|db_dblink) _configure_property(output_id, Column)
Info: (DbLink|db_dblink) _configure_property(label, Column)
Info: (DbLink|db_dblink) _configure_property(type, Column)
Info: (DbLink|db_dblink) Identified primary key columns: ColumnSet([Column('id', Integer(), table=<db_dblink>, primary_key=True, nullable=False)])
Info: (DbLink|db_dblink) constructed
Info: select pg_catalog.version()
Info: [raw sql] {}
Info: select current_schema()
Info: [raw sql] {}
Info: show standard_conforming_strings
Info: [raw sql] {}
Info: BEGIN (implicit)
Info: select relname from pg_class c join pg_namespace n on n.oid=c.relnamespace where pg_catalog.pg_table_is_visible(c.oid) and relname=%(name)s
Info: [generated in 0.00024s] {'name': 'alembic_version'}
Info: Context impl PostgresqlImpl.
Info: Will assume transactional DDL.
Info: select relname from pg_class c join pg_namespace n on n.oid=c.relnamespace where pg_catalog.pg_table_is_visible(c.oid) and relname=%(name)s
Info: [cached since 0.0306s ago] {'name': 'alembic_version'}
Info: SELECT alembic_version.version_num 
FROM alembic_version
Info: [generated in 0.00011s] {}
Info: (DbSetting|db_dbsetting) _post_configure_properties() started
Info: (DbSetting|db_dbsetting) initialize prop id
Info: (DbSetting|db_dbsetting) initialize prop key
Info: (DbSetting|db_dbsetting) initialize prop val
Info: (DbSetting|db_dbsetting) initialize prop description
Info: (DbSetting|db_dbsetting) initialize prop time
Info: (DbSetting|db_dbsetting) _post_configure_properties() complete
Info: (DbAuthInfo|db_dbauthinfo) _post_configure_properties() started
Info: (DbAuthInfo|db_dbauthinfo) initialize prop _metadata
Info: (DbAuthInfo|db_dbauthinfo) initialize prop aiidauser
Info: DbAuthInfo.aiidauser setup primary join db_dbuser.id = db_dbauthinfo.aiidauser_id
Info: DbAuthInfo.aiidauser setup secondary join None
Info: DbAuthInfo.aiidauser synchronize pairs [(db_dbuser.id => db_dbauthinfo.aiidauser_id)]
Info: DbAuthInfo.aiidauser secondary synchronize pairs []
Info: DbAuthInfo.aiidauser local/remote pairs [(db_dbauthinfo.aiidauser_id / db_dbuser.id)]
Info: DbAuthInfo.aiidauser remote columns [db_dbuser.id]
Info: DbAuthInfo.aiidauser local columns [db_dbauthinfo.aiidauser_id]
Info: DbAuthInfo.aiidauser relationship direction symbol('MANYTOONE')
Info: (DbUser|db_dbuser) _configure_property(authinfos, RelationshipProperty)
Info: DbUser.authinfos setup primary join db_dbuser.id = db_dbauthinfo.aiidauser_id
Info: DbUser.authinfos setup secondary join None
Info: DbUser.authinfos synchronize pairs [(db_dbuser.id => db_dbauthinfo.aiidauser_id)]
Info: DbUser.authinfos secondary synchronize pairs []
Info: DbUser.authinfos local/remote pairs [(db_dbuser.id / db_dbauthinfo.aiidauser_id)]
Info: DbUser.authinfos remote columns [db_dbauthinfo.aiidauser_id]
Info: DbUser.authinfos local columns [db_dbuser.id]
Info: DbUser.authinfos relationship direction symbol('ONETOMANY')
Info: DbUser.authinfos lazy loading clause :param_1 = db_dbauthinfo.aiidauser_id
Info: DbAuthInfo.aiidauser lazy loading clause db_dbuser.id = :param_1
Info: DbAuthInfo.aiidauser will use Session.get() to optimize instance loads
Info: (DbAuthInfo|db_dbauthinfo) initialize prop dbcomputer
Info: DbAuthInfo.dbcomputer setup primary join db_dbcomputer.id = db_dbauthinfo.dbcomputer_id
Info: DbAuthInfo.dbcomputer setup secondary join None
Info: DbAuthInfo.dbcomputer synchronize pairs [(db_dbcomputer.id => db_dbauthinfo.dbcomputer_id)]
Info: DbAuthInfo.dbcomputer secondary synchronize pairs []
Info: DbAuthInfo.dbcomputer local/remote pairs [(db_dbauthinfo.dbcomputer_id / db_dbcomputer.id)]
Info: DbAuthInfo.dbcomputer remote columns [db_dbcomputer.id]
Info: DbAuthInfo.dbcomputer local columns [db_dbauthinfo.dbcomputer_id]
Info: DbAuthInfo.dbcomputer relationship direction symbol('MANYTOONE')
Info: (DbComputer|db_dbcomputer) _configure_property(authinfos, RelationshipProperty)
Info: DbComputer.authinfos setup primary join db_dbcomputer.id = db_dbauthinfo.dbcomputer_id
Info: DbComputer.authinfos setup secondary join None
Info: DbComputer.authinfos synchronize pairs [(db_dbcomputer.id => db_dbauthinfo.dbcomputer_id)]
Info: DbComputer.authinfos secondary synchronize pairs []
Info: DbComputer.authinfos local/remote pairs [(db_dbcomputer.id / db_dbauthinfo.dbcomputer_id)]
Info: DbComputer.authinfos remote columns [db_dbauthinfo.dbcomputer_id]
Info: DbComputer.authinfos local columns [db_dbcomputer.id]
Info: DbComputer.authinfos relationship direction symbol('ONETOMANY')
Info: DbComputer.authinfos lazy loading clause :param_1 = db_dbauthinfo.dbcomputer_id
Info: DbAuthInfo.dbcomputer lazy loading clause db_dbcomputer.id = :param_1
Info: DbAuthInfo.dbcomputer will use Session.get() to optimize instance loads
Info: (DbAuthInfo|db_dbauthinfo) initialize prop id
Info: (DbAuthInfo|db_dbauthinfo) initialize prop aiidauser_id
Info: (DbAuthInfo|db_dbauthinfo) initialize prop dbcomputer_id
Info: (DbAuthInfo|db_dbauthinfo) initialize prop auth_params
Info: (DbAuthInfo|db_dbauthinfo) initialize prop enabled
Info: (DbAuthInfo|db_dbauthinfo) _post_configure_properties() complete
Info: (DbComputer|db_dbcomputer) _post_configure_properties() started
Info: (DbComputer|db_dbcomputer) initialize prop _metadata
Info: (DbComputer|db_dbcomputer) initialize prop id
Info: (DbComputer|db_dbcomputer) initialize prop uuid
Info: (DbComputer|db_dbcomputer) initialize prop label
Info: (DbComputer|db_dbcomputer) initialize prop hostname
Info: (DbComputer|db_dbcomputer) initialize prop description
Info: (DbComputer|db_dbcomputer) initialize prop scheduler_type
Info: (DbComputer|db_dbcomputer) initialize prop transport_type
Info: (DbComputer|db_dbcomputer) initialize prop authinfos
Info: (DbComputer|db_dbcomputer) _post_configure_properties() complete
Info: (DbUser|db_dbuser) _post_configure_properties() started
Info: (DbUser|db_dbuser) initialize prop id
Info: (DbUser|db_dbuser) initialize prop email
Info: (DbUser|db_dbuser) initialize prop first_name
Info: (DbUser|db_dbuser) initialize prop last_name
Info: (DbUser|db_dbuser) initialize prop institution
Info: (DbUser|db_dbuser) initialize prop authinfos
Info: (DbUser|db_dbuser) _post_configure_properties() complete
Info: (DbComment|db_dbcomment) _post_configure_properties() started
Info: (DbComment|db_dbcomment) initialize prop dbnode
Info: DbComment.dbnode setup primary join db_dbnode.id = db_dbcomment.dbnode_id
Info: DbComment.dbnode setup secondary join None
Info: DbComment.dbnode synchronize pairs [(db_dbnode.id => db_dbcomment.dbnode_id)]
Info: DbComment.dbnode secondary synchronize pairs []
Info: DbComment.dbnode local/remote pairs [(db_dbcomment.dbnode_id / db_dbnode.id)]
Info: DbComment.dbnode remote columns [db_dbnode.id]
Info: DbComment.dbnode local columns [db_dbcomment.dbnode_id]
Info: DbComment.dbnode relationship direction symbol('MANYTOONE')
Info: (DbNode|db_dbnode) _configure_property(dbcomments, RelationshipProperty)
Info: DbNode.dbcomments setup primary join db_dbnode.id = db_dbcomment.dbnode_id
Info: DbNode.dbcomments setup secondary join None
Info: DbNode.dbcomments synchronize pairs [(db_dbnode.id => db_dbcomment.dbnode_id)]
Info: DbNode.dbcomments secondary synchronize pairs []
Info: DbNode.dbcomments local/remote pairs [(db_dbnode.id / db_dbcomment.dbnode_id)]
Info: DbNode.dbcomments remote columns [db_dbcomment.dbnode_id]
Info: DbNode.dbcomments local columns [db_dbnode.id]
Info: DbNode.dbcomments relationship direction symbol('ONETOMANY')
Info: DbNode.dbcomments lazy loading clause :param_1 = db_dbcomment.dbnode_id
Info: DbComment.dbnode lazy loading clause db_dbnode.id = :param_1
Info: DbComment.dbnode will use Session.get() to optimize instance loads
Info: (DbComment|db_dbcomment) initialize prop user
Info: DbComment.user setup primary join db_dbuser.id = db_dbcomment.user_id
Info: DbComment.user setup secondary join None
Info: DbComment.user synchronize pairs [(db_dbuser.id => db_dbcomment.user_id)]
Info: DbComment.user secondary synchronize pairs []
Info: DbComment.user local/remote pairs [(db_dbcomment.user_id / db_dbuser.id)]
Info: DbComment.user remote columns [db_dbuser.id]
Info: DbComment.user local columns [db_dbcomment.user_id]
Info: DbComment.user relationship direction symbol('MANYTOONE')
Info: DbComment.user lazy loading clause db_dbuser.id = :param_1
Info: DbComment.user will use Session.get() to optimize instance loads
Info: (DbComment|db_dbcomment) initialize prop id
Info: (DbComment|db_dbcomment) initialize prop uuid
Info: (DbComment|db_dbcomment) initialize prop dbnode_id
Info: (DbComment|db_dbcomment) initialize prop ctime
Info: (DbComment|db_dbcomment) initialize prop mtime
Info: (DbComment|db_dbcomment) initialize prop user_id
Info: (DbComment|db_dbcomment) initialize prop content
Info: (DbComment|db_dbcomment) _post_configure_properties() complete
Info: (DbGroupNode|db_dbgroup_dbnodes) _post_configure_properties() started
Info: (DbGroupNode|db_dbgroup_dbnodes) initialize prop id
Info: (DbGroupNode|db_dbgroup_dbnodes) initialize prop dbnode_id
Info: (DbGroupNode|db_dbgroup_dbnodes) initialize prop dbgroup_id
Info: (DbGroupNode|db_dbgroup_dbnodes) _post_configure_properties() complete
Info: (DbGroup|db_dbgroup) _post_configure_properties() started
Info: (DbGroup|db_dbgroup) initialize prop user
Info: DbGroup.user setup primary join db_dbuser.id = db_dbgroup.user_id
Info: DbGroup.user setup secondary join None
Info: DbGroup.user synchronize pairs [(db_dbuser.id => db_dbgroup.user_id)]
Info: DbGroup.user secondary synchronize pairs []
Info: DbGroup.user local/remote pairs [(db_dbgroup.user_id / db_dbuser.id)]
Info: DbGroup.user remote columns [db_dbuser.id]
Info: DbGroup.user local columns [db_dbgroup.user_id]
Info: DbGroup.user relationship direction symbol('MANYTOONE')
Info: (DbUser|db_dbuser) _configure_property(dbgroups, RelationshipProperty)
Info: DbUser.dbgroups setup primary join db_dbuser.id = db_dbgroup.user_id
Info: DbUser.dbgroups setup secondary join None
Info: DbUser.dbgroups synchronize pairs [(db_dbuser.id => db_dbgroup.user_id)]
Info: DbUser.dbgroups secondary synchronize pairs []
Info: DbUser.dbgroups local/remote pairs [(db_dbuser.id / db_dbgroup.user_id)]
Info: DbUser.dbgroups remote columns [db_dbgroup.user_id]
Info: DbUser.dbgroups local columns [db_dbuser.id]
Info: DbUser.dbgroups relationship direction symbol('ONETOMANY')
Info: DbUser.dbgroups lazy loading clause :param_1 = db_dbgroup.user_id
Info: DbGroup.user lazy loading clause db_dbuser.id = :param_1
Info: DbGroup.user will use Session.get() to optimize instance loads
Info: (DbGroup|db_dbgroup) initialize prop dbnodes
Info: DbGroup.dbnodes setup primary join db_dbgroup.id = db_dbgroup_dbnodes.dbgroup_id
Info: DbGroup.dbnodes setup secondary join db_dbnode.id = db_dbgroup_dbnodes.dbnode_id
Info: DbGroup.dbnodes synchronize pairs [(db_dbgroup.id => db_dbgroup_dbnodes.dbgroup_id)]
Info: DbGroup.dbnodes secondary synchronize pairs [(db_dbnode.id => db_dbgroup_dbnodes.dbnode_id)]
Info: DbGroup.dbnodes local/remote pairs [(db_dbgroup.id / db_dbgroup_dbnodes.dbgroup_id),(db_dbnode.id / db_dbgroup_dbnodes.dbnode_id)]
Info: DbGroup.dbnodes remote columns [db_dbgroup_dbnodes.dbgroup_id,db_dbgroup_dbnodes.dbnode_id]
Info: DbGroup.dbnodes local columns [db_dbgroup.id]
Info: DbGroup.dbnodes relationship direction symbol('MANYTOMANY')
Info: (DbNode|db_dbnode) _configure_property(dbgroups, RelationshipProperty)
Info: DbNode.dbgroups setup primary join db_dbnode.id = db_dbgroup_dbnodes.dbnode_id
Info: DbNode.dbgroups setup secondary join db_dbgroup.id = db_dbgroup_dbnodes.dbgroup_id
Info: DbNode.dbgroups synchronize pairs [(db_dbnode.id => db_dbgroup_dbnodes.dbnode_id)]
Info: DbNode.dbgroups secondary synchronize pairs [(db_dbgroup.id => db_dbgroup_dbnodes.dbgroup_id)]
Info: DbNode.dbgroups local/remote pairs [(db_dbnode.id / db_dbgroup_dbnodes.dbnode_id),(db_dbgroup.id / db_dbgroup_dbnodes.dbgroup_id)]
Info: DbNode.dbgroups remote columns [db_dbgroup_dbnodes.dbgroup_id,db_dbgroup_dbnodes.dbnode_id]
Info: DbNode.dbgroups local columns [db_dbnode.id]
Info: DbNode.dbgroups relationship direction symbol('MANYTOMANY')
Info: DbNode.dbgroups lazy loading clause :param_1 = db_dbgroup_dbnodes.dbnode_id AND db_dbgroup.id = db_dbgroup_dbnodes.dbgroup_id
Info: DbGroup.dbnodes lazy loading clause :param_1 = db_dbgroup_dbnodes.dbgroup_id AND db_dbnode.id = db_dbgroup_dbnodes.dbnode_id
Info: (DbGroup|db_dbgroup) initialize prop id
Info: (DbGroup|db_dbgroup) initialize prop uuid
Info: (DbGroup|db_dbgroup) initialize prop label
Info: (DbGroup|db_dbgroup) initialize prop type_string
Info: (DbGroup|db_dbgroup) initialize prop time
Info: (DbGroup|db_dbgroup) initialize prop description
Info: (DbGroup|db_dbgroup) initialize prop extras
Info: (DbGroup|db_dbgroup) initialize prop user_id
Info: (DbGroup|db_dbgroup) _post_configure_properties() complete
Info: (DbLog|db_dblog) _post_configure_properties() started
Info: (DbLog|db_dblog) initialize prop _metadata
Info: (DbLog|db_dblog) initialize prop dbnode
Info: DbLog.dbnode setup primary join db_dbnode.id = db_dblog.dbnode_id
Info: DbLog.dbnode setup secondary join None
Info: DbLog.dbnode synchronize pairs [(db_dbnode.id => db_dblog.dbnode_id)]
Info: DbLog.dbnode secondary synchronize pairs []
Info: DbLog.dbnode local/remote pairs [(db_dblog.dbnode_id / db_dbnode.id)]
Info: DbLog.dbnode remote columns [db_dbnode.id]
Info: DbLog.dbnode local columns [db_dblog.dbnode_id]
Info: DbLog.dbnode relationship direction symbol('MANYTOONE')
Info: (DbNode|db_dbnode) _configure_property(dblogs, RelationshipProperty)
Info: DbNode.dblogs setup primary join db_dbnode.id = db_dblog.dbnode_id
Info: DbNode.dblogs setup secondary join None
Info: DbNode.dblogs synchronize pairs [(db_dbnode.id => db_dblog.dbnode_id)]
Info: DbNode.dblogs secondary synchronize pairs []
Info: DbNode.dblogs local/remote pairs [(db_dbnode.id / db_dblog.dbnode_id)]
Info: DbNode.dblogs remote columns [db_dblog.dbnode_id]
Info: DbNode.dblogs local columns [db_dbnode.id]
Info: DbNode.dblogs relationship direction symbol('ONETOMANY')
Info: DbNode.dblogs lazy loading clause :param_1 = db_dblog.dbnode_id
Info: DbLog.dbnode lazy loading clause db_dbnode.id = :param_1
Info: DbLog.dbnode will use Session.get() to optimize instance loads
Info: (DbLog|db_dblog) initialize prop id
Info: (DbLog|db_dblog) initialize prop uuid
Info: (DbLog|db_dblog) initialize prop time
Info: (DbLog|db_dblog) initialize prop loggername
Info: (DbLog|db_dblog) initialize prop levelname
Info: (DbLog|db_dblog) initialize prop dbnode_id
Info: (DbLog|db_dblog) initialize prop message
Info: (DbLog|db_dblog) _post_configure_properties() complete
Info: (DbNode|db_dbnode) _post_configure_properties() started
Info: (DbNode|db_dbnode) initialize prop dbcomputer
Info: DbNode.dbcomputer setup primary join db_dbcomputer.id = db_dbnode.dbcomputer_id
Info: DbNode.dbcomputer setup secondary join None
Info: DbNode.dbcomputer synchronize pairs [(db_dbcomputer.id => db_dbnode.dbcomputer_id)]
Info: DbNode.dbcomputer secondary synchronize pairs []
Info: DbNode.dbcomputer local/remote pairs [(db_dbnode.dbcomputer_id / db_dbcomputer.id)]
Info: DbNode.dbcomputer remote columns [db_dbcomputer.id]
Info: DbNode.dbcomputer local columns [db_dbnode.dbcomputer_id]
Info: DbNode.dbcomputer relationship direction symbol('MANYTOONE')
Info: (DbComputer|db_dbcomputer) _configure_property(dbnodes, RelationshipProperty)
Info: DbComputer.dbnodes setup primary join db_dbcomputer.id = db_dbnode.dbcomputer_id
Info: DbComputer.dbnodes setup secondary join None
Info: DbComputer.dbnodes synchronize pairs [(db_dbcomputer.id => db_dbnode.dbcomputer_id)]
Info: DbComputer.dbnodes secondary synchronize pairs []
Info: DbComputer.dbnodes local/remote pairs [(db_dbcomputer.id / db_dbnode.dbcomputer_id)]
Info: DbComputer.dbnodes remote columns [db_dbnode.dbcomputer_id]
Info: DbComputer.dbnodes local columns [db_dbcomputer.id]
Info: DbComputer.dbnodes relationship direction symbol('ONETOMANY')
Info: DbComputer.dbnodes lazy loading clause :param_1 = db_dbnode.dbcomputer_id
Info: DbNode.dbcomputer lazy loading clause db_dbcomputer.id = :param_1
Info: DbNode.dbcomputer will use Session.get() to optimize instance loads
Info: (DbNode|db_dbnode) initialize prop user
Info: DbNode.user setup primary join db_dbuser.id = db_dbnode.user_id
Info: DbNode.user setup secondary join None
Info: DbNode.user synchronize pairs [(db_dbuser.id => db_dbnode.user_id)]
Info: DbNode.user secondary synchronize pairs []
Info: DbNode.user local/remote pairs [(db_dbnode.user_id / db_dbuser.id)]
Info: DbNode.user remote columns [db_dbuser.id]
Info: DbNode.user local columns [db_dbnode.user_id]
Info: DbNode.user relationship direction symbol('MANYTOONE')
Info: (DbUser|db_dbuser) _configure_property(dbnodes, RelationshipProperty)
Info: DbUser.dbnodes setup primary join db_dbuser.id = db_dbnode.user_id
Info: DbUser.dbnodes setup secondary join None
Info: DbUser.dbnodes synchronize pairs [(db_dbuser.id => db_dbnode.user_id)]
Info: DbUser.dbnodes secondary synchronize pairs []
Info: DbUser.dbnodes local/remote pairs [(db_dbuser.id / db_dbnode.user_id)]
Info: DbUser.dbnodes remote columns [db_dbnode.user_id]
Info: DbUser.dbnodes local columns [db_dbuser.id]
Info: DbUser.dbnodes relationship direction symbol('ONETOMANY')
Info: DbUser.dbnodes lazy loading clause :param_1 = db_dbnode.user_id
Info: DbNode.user lazy loading clause db_dbuser.id = :param_1
Info: DbNode.user will use Session.get() to optimize instance loads
Info: (DbNode|db_dbnode) initialize prop outputs_q
Info: DbNode.outputs_q setup primary join db_dbnode.id = db_dblink.input_id
Info: DbNode.outputs_q setup secondary join db_dbnode.id = db_dblink.output_id
Info: DbNode.outputs_q synchronize pairs [(db_dbnode.id => db_dblink.input_id)]
Info: DbNode.outputs_q secondary synchronize pairs [(db_dbnode.id => db_dblink.output_id)]
Info: DbNode.outputs_q local/remote pairs [(db_dbnode.id / db_dblink.input_id),(db_dbnode.id / db_dblink.output_id)]
Info: DbNode.outputs_q remote columns [db_dblink.input_id,db_dblink.output_id]
Info: DbNode.outputs_q local columns [db_dbnode.id]
Info: DbNode.outputs_q relationship direction symbol('MANYTOMANY')
Info: (DbNode|db_dbnode) _configure_property(inputs_q, RelationshipProperty)
Info: DbNode.inputs_q setup primary join db_dbnode.id = db_dblink.output_id
Info: DbNode.inputs_q setup secondary join db_dbnode.id = db_dblink.input_id
Info: DbNode.inputs_q synchronize pairs [(db_dbnode.id => db_dblink.output_id)]
Info: DbNode.inputs_q secondary synchronize pairs [(db_dbnode.id => db_dblink.input_id)]
Info: DbNode.inputs_q local/remote pairs [(db_dbnode.id / db_dblink.output_id),(db_dbnode.id / db_dblink.input_id)]
Info: DbNode.inputs_q remote columns [db_dblink.input_id,db_dblink.output_id]
Info: DbNode.inputs_q local columns [db_dbnode.id]
Info: DbNode.inputs_q relationship direction symbol('MANYTOMANY')
Info: DbNode.inputs_q lazy loading clause :param_1 = db_dblink.output_id AND db_dbnode.id = db_dblink.input_id
Info: DbNode.outputs_q lazy loading clause :param_1 = db_dblink.input_id AND db_dbnode.id = db_dblink.output_id
Info: (DbNode|db_dbnode) initialize prop id
Info: (DbNode|db_dbnode) initialize prop uuid
Info: (DbNode|db_dbnode) initialize prop node_type
Info: (DbNode|db_dbnode) initialize prop process_type
Info: (DbNode|db_dbnode) initialize prop label
Info: (DbNode|db_dbnode) initialize prop description
Info: (DbNode|db_dbnode) initialize prop ctime
Info: (DbNode|db_dbnode) initialize prop mtime
Info: (DbNode|db_dbnode) initialize prop attributes
Info: (DbNode|db_dbnode) initialize prop extras
Info: (DbNode|db_dbnode) initialize prop repository_metadata
Info: (DbNode|db_dbnode) initialize prop dbcomputer_id
Info: (DbNode|db_dbnode) initialize prop user_id
Info: (DbNode|db_dbnode) initialize prop dbcomments
Info: (DbNode|db_dbnode) initialize prop dbgroups
Info: (DbNode|db_dbnode) initialize prop dblogs
Info: (DbNode|db_dbnode) _post_configure_properties() complete
Info: (DbLink|db_dblink) _post_configure_properties() started
Info: (DbLink|db_dblink) initialize prop input
Info: DbLink.input setup primary join db_dblink.input_id = db_dbnode.id
Info: DbLink.input setup secondary join None
Info: DbLink.input synchronize pairs [(db_dbnode.id => db_dblink.input_id)]
Info: DbLink.input secondary synchronize pairs []
Info: DbLink.input local/remote pairs [(db_dblink.input_id / db_dbnode.id)]
Info: DbLink.input remote columns [db_dbnode.id]
Info: DbLink.input local columns [db_dblink.input_id]
Info: DbLink.input relationship direction symbol('MANYTOONE')
Info: DbLink.input lazy loading clause :param_1 = db_dbnode.id
Info: DbLink.input will use Session.get() to optimize instance loads
Info: (DbLink|db_dblink) initialize prop output
Info: DbLink.output setup primary join db_dblink.output_id = db_dbnode.id
Info: DbLink.output setup secondary join None
Info: DbLink.output synchronize pairs [(db_dbnode.id => db_dblink.output_id)]
Info: DbLink.output secondary synchronize pairs []
Info: DbLink.output local/remote pairs [(db_dblink.output_id / db_dbnode.id)]
Info: DbLink.output remote columns [db_dbnode.id]
Info: DbLink.output local columns [db_dblink.output_id]
Info: DbLink.output relationship direction symbol('MANYTOONE')
Info: DbLink.output lazy loading clause :param_1 = db_dbnode.id
Info: DbLink.output will use Session.get() to optimize instance loads
Info: (DbLink|db_dblink) initialize prop id
Info: (DbLink|db_dblink) initialize prop input_id
Info: (DbLink|db_dblink) initialize prop output_id
Info: (DbLink|db_dblink) initialize prop label
Info: (DbLink|db_dblink) initialize prop type
Info: (DbLink|db_dblink) _post_configure_properties() complete
Info: SELECT db_dbsetting.val 
FROM db_dbsetting 
WHERE db_dbsetting.key = %(key_1)s
Info: [generated in 0.00014s] {'key_1': 'repository|uuid'}
Info: ROLLBACK
Info: Pool disposed. Pool size: 5  Connections in pool: 0 Current Overflow: -5 Current Checked out connections: 0
Info: Pool recreating

Is playing and pausing the process going through the daemon worker? When I pause and subsequently play the process, I can see the Process status changing in verdi process list but I don’t see anything pop up in the daemon worker log.

Is playing and pausing the process going through the daemon worker? When I pause and subsequently play the process, I can see the Process status changing in verdi process list but I don’t see anything pop up in the daemon worker log.

Yes, if the process is not with a daemon worker, you would get a UnreachableProcessError. So the fact that works, and the fact that verdi devel rabbitmq tasks analyze says there are no problems, suggests to me that everything is fine on this level and the task is making it to the daemon worker.

What I don’t understand is that if there is some problem with continuing the process, we would see some kind of an error. Could you please update this method in your installation: aiida-core/src/aiida/manage/external/rmq/launcher.py at 06ea130df8854f621e25853af6ac723c37397ed0 · aiidateam/aiida-core · GitHub
Add some print statements just before and after the _continue call on line 86. Then repeat the steps of stopping the daemon and starting verdi daemon worker. This should cause the print statements to show in your terminal and we can see if it gets to and past the self._continue call or not.

1 Like

I don’t see any extra output when I do that so looks like we’re not reaching this function?

EDIT: I do see some output when I use a print statement and not a LOGGER, very strange. Let me investigate a bit more.

EDIT2: I see with printstament that I get to the super()._continue call, but I don’t see anything after that so it seems like it’s stuck somewhere in the parent method.