Thanks!
More than a bug, I would say this highlights a missing feature of SQLite, i.e., concurrent writes to the DB (that cannot be obtained easily with a file-based DB, that however has the advantage of simplicity). Having said that, of course this needs to be somehow fixed, or at least the limitation documented, otherwise it’s indeed a bug from the point of view of the users.
Now, I have the feeling that this specific error might be the most frequent concurrent-write situation people might encounter, and if we prevent it, we already solve most of the problems, or at least this will allow us to discover the next concurrent-write issue.
So I have a practical suggestion.
The ‘process|state_change|calculation’ (in the DbSetting) stores the last time a process changed state.
This is always done by the engine, IIRC, even if not running with the daemon. The main goal is to have some useful information, that is efficient to access, to have a sense if a daemon is running. The main (only?) place where this is used is at the end of verdi process list, so a user might quickly realize that the daemon is not running, if this is the case.
Now, even if in the latest release one can combine SQLite with RMQ, I think in most cases people run SQLite without RMQ, so they do not run via a daemon (@jgarridoa - is this case? or are you suing RMQ + starting a daemon?).
So, if you are running in the same interpreter, the information is not really useful/needed, as you know if the processes are running.
Moreover, even if people are running with a daemon, I don’t see an obvious way to reduce concurrency of writes (I mean, there are ways, e.g. adding a new line for every state change; but this is going to have even worse performance issues after some days of use, with a huge table of all state changes - we used to have this in AiiDA 0.x).
So my suggestion is, for the SQLite backend, to drop this feature.
This can be implemented with a backend class attribute that defines which features we support, e.g. backend.store_last_process_change = False in SQLite.
Then, if this flag is False, the engine does not attempt to store the state change.
Essentially, this would need to be implemented only here.
One would then also probably change the verdi process list code to check the same flag, and either do not report anything at the end, or just issue a warning that says something like “With SQLite, the last state change is not reported.”
This needs to be done here, probably.
Finally, very importantly, we need to document the list of limitations in the docs (I don’t remember if we already have a page with the comparison of the backends, if not we should create it - I think @yakutovicha @Xing worked on it about a year ago, not sure where this is?), and add this as a limitation.
Any suggestions?
@jgarridoa if you want to try a quick hack, you could try to just go here in your code, and put a return as the first line of the set_process_state_change_timestamp function, so the function never does anything. Then, try to run again and it would be great to report if you encounter other issues, or if this solves all problems.
If this works, we’ll open an issue and discuss/plan for implementing this in AiiDA.
Finally, a note about this:
Are you aware of the aiida-submission-controller? Even if you now don’t want to migrate your scripts to it, we would be very interested if the submission controller would work for your use case or if you see features missing, as we are planning to officially recommend it when you want to submit max X work chains at any given time, and submit more only when some have finished, keeping the total number <=X.