`sqlite_dos` backend `sqlite3.OperationalError: database is locked` error

While developing a Weather & Climate workflow using WorkGraph and ShellJobs, I just obtained the sqlite3.OperationalError: database is locked error. The tool is still in a very early development stage, i.e. I’m not running the actual executables, rather dummy scripts that just echo a string and sleep for a second, so this error might not occur in the final tool. However, as we’re planning to investigate the limitations of SQLite soon, especially concurrent writes to the db, I think this might be interesting for future reference. For now, I’ll use a psql_dos profile to avoid the error, but attached are the outputs of verdi process show:

Property     Value                                                                                                                                                                                                                                                                                                                                                        
-----------  ----------------------------------------------------------------------------------------------------------------------------------------------                                                                                                                                                                                                               
type         ShellJob<extpar@localhost>                                                                                                                                                                                                                                                                                                                                   
state        Excepted <Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                 
               File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1971, in _exec_single_context                                                                                                                                                                                                               
                 self.dialect.do_execute(                                                                                                                                                                                                                                                                                                                                 
               File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 919, in do_execute                                                                                                                                                                                                                       
                 cursor.execute(statement, parameters)                                                                                                                                                                                                                                                                                                                    
             sqlite3.OperationalError: database is locked                                                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                                                                                                          
             The above exception was the direct cause of the following exception:                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                                                                                                                          
             sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked                                                                                                                                                                                                                                                                               
             [SQL: UPDATE db_dbsetting SET val=?, time=? WHERE db_dbsetting."key" = ?]                                                                                                                                                                                                                                                                                    
             [parameters: ('"2024-06-10T10:15:37.606335+02:00"', '2024-06-10 10:15:37.607414', 'process|state_change|calculation')]                                                                                                                                                                                                                                       
             (Background on this error at: https://sqlalche.me/e/20/e3q8)>                                                                                                                                                                                                                                                                                                
pk           719                                                                                                                                                                                                                                                                                                                                                          
uuid         164d98a2-5c2f-4f77-84ab-9e36dadd5095                                                                                                                                                                                                                                                                                                                         
label        Extpar_0_2025_01_01_00_00_00                                                                                                                                                                                                                                                                                                                                 
description                                                                                                                                                                                                                                                                                                                                                               
ctime        2024-06-10 08:15:37.249568+00:00                                                                                                                                                                                                                                                                                                                             
mtime        2024-06-10 08:15:37.678615+00:00                                                                                                                                                                                                                                                                                                                             
computer     [2] localhost                                                                                                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                                                                                                                          
Inputs        PK    Type                                                                                                                                                                                                                                                                                                                                                  
------------  ----  -------------                                                                                                                                                                                                                                                                                                                                         
nodes                                                                                                                                                                                                                                                                                                                                                                     
    obs_data  711   FolderData                                                            
arguments     717   List                                                                  
code          309   InstalledCode                                                         
filenames     716   Dict                                                                  
outputs       718   List                                                                  

Caller                          PK  Type                                                  
----------------------------  ----  -----------------------                               
Extpar_0_2025_01_01_00_00_00   715  WorkGraph<WC-WorkGraph>                               

Log messages                                                                              
---------------------------------------------                                             
There are 1 log messages for this calculation                                             
Run 'verdi process report 719' to see them

and verdi process report:

*** 719 [Extpar_0_2025_01_01_00_00_00]: None
*** Scheduler output: N/A
*** Scheduler errors: N/A
*** 1 LOG MESSAGES:
+-> REPORT at 2024-06-10 08:15:37.634384+00:00
 | [719|ShellJob|on_except]: Traceback (most recent call last):
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1971, in _exec_single_context
 |     self.dialect.do_execute(
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 919, in do_execute
 |     cursor.execute(statement, parameters)
 | sqlite3.OperationalError: database is locked
 | 
 | The above exception was the direct cause of the following exception:
 | 
 | Traceback (most recent call last):
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/plumpy/base/state_machine.py", line 324, in transition_to
 |     self._enter_next_state(new_state)
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/plumpy/base/state_machine.py", line 388, in _enter_next_state
 |     self._fire_state_event(StateEventHook.ENTERED_STATE, last_state)
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/plumpy/base/state_machine.py", line 300, in _fire_state_event
 |     callback(self, hook, state)
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/plumpy/processes.py", line 337, in <lambda>
 |     lambda _s, _h, from_state: self.on_entered(cast(Optional[process_states.State], from_state)),
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/aiida/engine/processes/process.py", line 424, in on_entered
 |     set_process_state_change_timestamp(self)
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/aiida/engine/utils.py", line 287, in set_process_state_change_timestamp
 |     backend.set_global_variable(key, value, description)
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/aiida/storage/psql_dos/backend.py", line 411, in set_global_variable
 |     session.query(DbSetting).filter(DbSetting.key == key).update(dict(val=value))
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/sqlalchemy/orm/query.py", line 3251, in update
 |     result: CursorResult[Any] = self.session.execute(
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 2306, in execute
 |     return self._execute_internal(
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 2191, in _execute_internal
 |     result: Result[Any] = compile_state_cls.orm_execute_statement(
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/sqlalchemy/orm/bulk_persistence.py", line 1617, in orm_execute_statement
 |     return super().orm_execute_statement(
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/sqlalchemy/orm/context.py", line 293, in orm_execute_statement
 |     result = conn.execute(
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1422, in execute
 |     return meth(
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 514, in _execute_on_connection
 |     return connection._execute_clauseelement(
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1644, in _execute_clauseelement
 |     ret = self._execute_context(
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1850, in _execute_context
 |     return self._exec_single_context(
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1990, in _exec_single_context
 |     self._handle_dbapi_exception(
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2357, in _handle_dbapi_exception
 |     raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1971, in _exec_single_context
 |     self.dialect.do_execute(
 |   File "/home/geiger_j/.aiida_venvs/aiida-icon-clm/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 919, in do_execute
 |     cursor.execute(statement, parameters)
 | sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked
 | [SQL: UPDATE db_dbsetting SET val=?, time=? WHERE db_dbsetting."key" = ?]
 | [parameters: ('"2024-06-10T10:15:37.606335+02:00"', '2024-06-10 10:15:37.607414', 'process|state_change|calculation')]
 | (Background on this error at: https://sqlalche.me/e/20/e3q8)

Thanks, very useful.

The exception seems to be triggered when updating the ENTERED_STATE in the DbSetting, that happens at every state transition. This is indeed something that happens rather frequently, especially if you have dummy scripts that run very fast. Only one write can happen at a given time on a SQLite DB (maybe there is some level of concurrency possible with journaling activated, not 100% sure) but here multiple tasks are updating the same field, so for sure one write is locking the DB until it finishes writing.
I’m not sure of how the client (SQLAlchemy) manages the timeout (I imagine that it tries, it waits and retries, not sure if it retries often or only once or a few times at the end of the timeout), nor how long the timeout is.

It would be useful to have a bit more information of how you were running (were you running in parallel from multiple python scripts? Or submitting many workflows? or running a workflow that internally runs a lot of parallel steps? To understand where the source of concurrency is.

We’ll then probably need to better understand how the timeouts are dealt with, if these exceptions can be dealt with in our code (and we retry), or if e.g. we can at least alleviate the problem by reducing the number of places where we do concurrent writes.