Getting "database is locked" error

Hi everyone!

I just tried to run a script that submits several workchains. I designed it long ago and it worked for previous AiiDA versions. The error I am getting reads:

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked
[SQL: UPDATE db_dbsetting SET val=?, time=? WHERE db_dbsetting.“key” = ?]
[parameters: (‘“2024-07-16T17:14:33.810058+02:00”’, ‘2024-07-16 17:14:33.810624’, ‘process|state_change|calculation’)]
(Background on this error at: Error Messages — SQLAlchemy 2.0 Documentation)

While writing this post, I get the suggestion of reading this one: sqlite_dos backend sqlite3.OperationalError: database is locked error

And in fact, from what I see, it seems that we are having exactly the same problem.

Giovani Pizzi commented:

but here multiple tasks are updating the same field, so for sure one write is locking the DB until it finishes writing.

I am afraid that this might be my case since my script is basically submitting 4 workchains and waiting for one of them to finish to keep submitting more workchains.

Giovanni also added:

It would be useful to have a bit more information of how you were running (were you running in parallel from multiple python scripts? Or submitting many workflows? or running a workflow that internally runs a lot of parallel steps? To understand where the source of concurrency is.

In case it helps, I am using a single python script that has at most 4 workchains running simmultaneously. It starts submitting 4 workchains and it checks their status in order to submit another workchain in case one of them is finished. I can give more details if necessary.

So, does anyone know how to solve this issue without introducing changes to the workchain or the python script ?

Thanks in advance

Jaime

Hi Jaime, thanks for the report. This does indeed seems to be a confirmed bug for the sqlite_dos backend, which is brand new and a bit experimental. Are you running more than 1 daemon worker by any chance? I think that if there is just a single worker, the problem might not occur.

Thanks!

More than a bug, I would say this highlights a missing feature of SQLite, i.e., concurrent writes to the DB (that cannot be obtained easily with a file-based DB, that however has the advantage of simplicity). Having said that, of course this needs to be somehow fixed, or at least the limitation documented, otherwise it’s indeed a bug from the point of view of the users.

Now, I have the feeling that this specific error might be the most frequent concurrent-write situation people might encounter, and if we prevent it, we already solve most of the problems, or at least this will allow us to discover the next concurrent-write issue.

So I have a practical suggestion.
The ‘process|state_change|calculation’ (in the DbSetting) stores the last time a process changed state.
This is always done by the engine, IIRC, even if not running with the daemon. The main goal is to have some useful information, that is efficient to access, to have a sense if a daemon is running. The main (only?) place where this is used is at the end of verdi process list, so a user might quickly realize that the daemon is not running, if this is the case.

Now, even if in the latest release one can combine SQLite with RMQ, I think in most cases people run SQLite without RMQ, so they do not run via a daemon (@jgarridoa - is this case? or are you suing RMQ + starting a daemon?).
So, if you are running in the same interpreter, the information is not really useful/needed, as you know if the processes are running.

Moreover, even if people are running with a daemon, I don’t see an obvious way to reduce concurrency of writes (I mean, there are ways, e.g. adding a new line for every state change; but this is going to have even worse performance issues after some days of use, with a huge table of all state changes - we used to have this in AiiDA 0.x).

So my suggestion is, for the SQLite backend, to drop this feature.
This can be implemented with a backend class attribute that defines which features we support, e.g. backend.store_last_process_change = False in SQLite.

Then, if this flag is False, the engine does not attempt to store the state change.
Essentially, this would need to be implemented only here.

One would then also probably change the verdi process list code to check the same flag, and either do not report anything at the end, or just issue a warning that says something like “With SQLite, the last state change is not reported.”
This needs to be done here, probably.

Finally, very importantly, we need to document the list of limitations in the docs (I don’t remember if we already have a page with the comparison of the backends, if not we should create it - I think @yakutovicha @Xing worked on it about a year ago, not sure where this is?), and add this as a limitation.

Any suggestions?

@jgarridoa if you want to try a quick hack, you could try to just go here in your code, and put a return as the first line of the set_process_state_change_timestamp function, so the function never does anything. Then, try to run again and it would be great to report if you encounter other issues, or if this solves all problems.
If this works, we’ll open an issue and discuss/plan for implementing this in AiiDA.

Finally, a note about this:

Are you aware of the aiida-submission-controller? Even if you now don’t want to migrate your scripts to it, we would be very interested if the submission controller would work for your use case or if you see features missing, as we are planning to officially recommend it when you want to submit max X work chains at any given time, and submit more only when some have finished, keeping the total number <=X.

Are you running more than 1 daemon worker by any chance?

Nope, I am using only one

(@jgarridoa - is this case? or are you suing RMQ + starting a daemon?)

Yes, this is my case. I wanted to have a similar experience to the previous AiiDA versions so I chose to use RMQ as well. I opted to choose sqlite instead of PostgreSQL just to check if it would feel simpler than PostgreSQL without any significant disadvantage. Now that I know this, I guess I will try to move to PostgreSQL.

@jgarridoa if you want to try a quick hack, you could try to just go here in your code, and put a return as the first line of the set_process_state_change_timestamp function, so the function never does anything. Then, try to run again and it would be great to report if you encounter other issues, or if this solves all problems.
If this works, we’ll open an issue and discuss/plan for implementing this in AiiDA.

Let me try. I will post again reporting the result.

Are you aware of the aiida-submission-controller? Even if you now don’t want to migrate your scripts to it, we would be very interested if the submission controller would work for your use case or if you see features missing, as we are planning to officially recommend it when you want to submit max X work chains at any given time, and submit more only when some have finished, keeping the total number <=X .

Yes, I am aware. In fact, I was encouraged to use it in this post.
I decided not to use it because for my use case, it was faster to write my own python script than learning how to use someone else’s work. I will probably migrate my scripts in the future, when I have more time.

I did and it seems to work fine. I am not getting any errors and by writing verdi process list, I get the 4 simultaneous workchains I wanted to submit. Sorry for my lack of knowledge but does this quick hack remove any functionality I should be aware of?

Thanks for the quick test! No, the only thing that you will miss is the report at the end of verdi process list (probably, in your case, you will find forever the time of your last submission before you changed the code).

I just opened an issue so we discuss there how to properly implement this: SQLite backend often raises `Database is locked` when dealing with multiple processes · Issue #6532 · aiidateam/aiida-core · GitHub