`verdi archive import` fails when importing large dataset to a profile with `sqlite_dos` backend

On AiiDA v2.6.1.post0, I’m trying to import a large dataset ~12GB
from MaterialsCloud

First, I made a profile with sqlite_dos backend:

$ verdi profile setup core.sqlite_dos 

Then

$ verdi archive import MC3D-provenance.aiida 
Report: starting import: MC3D-provenance.aiida
Report: incompatible version detected for MC3D-provenance.aiida, trying migration
Report: Legacy migrations required from zip format
Report: Extracting data.json ...
Report: Legacy migration pathway: 0.10 -> 0.11 -> 0.12 -> 0.13
Report: legacy '0.13' -> 'main_0000' conversion required                                                                                                         
Report: Initialising new archive...
Report: Unique repository files written: 837443                                                                                                                  
Report: Converting DB to SQLite
Report: Performing SQLite migrations:                                                                                                                            
Report: - main_0000 -> main_0000a
Report: - main_0000a -> main_0000b
Report: - main_0000b -> main_0001
Report: Finalising the migration ...
Report: proceeding with import of migrated archive
Report: Parameters
-------------------------------  ----------------------
Archive                          migrated_archive.aiida
New Node Extras                  keep
Merge Node Extras (in database)  (k)eep
Merge Node Extras (in archive)   do (n)ot create
Merge Node Extras (in both)      (l)eave existing
Merge Comments                   leave
Computer Authinfos               exclude

Report: Adding 1 new user(s)
Critical: an exception occurred while trying to import the migrated archive MC3D-provenance.aiida: InvalidRequestError: Can't use explicit RETURNING for bulk INSERT operation with sqlite+pysqlite backend; executemany with RETURNING is not enabled for this dialect.

After ~ 40 minute of migration, it fails with critical error.

Most likely, there is something I’m missing on the process.
Please advise.

There is not anything you are missing. Simply importing it should be working. I also tried importing the same archive in a profile using core.sqlite_dos and it worked fine for me.

What you could maybe do is first migrate the archive manually with verdi archive migrate. It seems the migration works fine but then the import fails. By migrating it first, you can try to import from that straight away, so you don’t have to wait for the migration each time. If you can then reproduce the exception, I would modify the code to have it print the full stack trace so we get more information.

Thanks verdi archive migrate certainlly speeds up things.

It turned out the error is raised here.

Does it also make a user for you? Perhaps you have a profile with the same username as in the archive.

Report: Adding 1 new user(s)
*** sqlalchemy.exc.InvalidRequestError: Can't use explicit RETURNING for bulk INSERT operation with sqlite+pysqlite backend; executemany with RETURNING is not enabled for this dialect.```

So it excepts during the migration, not the importing? If that is the case, it shouldn’t matter what users are present in the profile. An archive migration shouldn’t really even need an existing profile.

From the top message, I think it’s during the import.
Would be great to also double check the versions of python, AiiDA, and sqlalchemy.

Note that here it expects a return value from bulk_insert, that is returned by the psql version but it’s not returned by the sqlite version and I think this is the issue. It might indeed be because the user exists, or because the warning was added only relatively recently. So it might be a bug.

I think @giovannipizzi is right, the error is raised precisely here. Apparently bulk INSERT operation, is not supported by the sqlite version. So this is more like a bug.

Here is more detail on the versions:
Python 3.11.9
AiiDA version 2.6.2.post0
sqlalchemy 2.0.30

@sphuber Indeed, the migration works as expected. What I was saying is that perhaps you couldn’t reproduce the issue because the user [parameters: ('sebastiaan.huber@epfl.ch', 'Sebastiaan', 'Huber', 'EPFL')] already exists in your profile; therefore verdi archive import doesn’t go through creating it.
Basically, this condition for you is False and for me is True.
Maybe I’m wrong, but that’s how I imagine it.

No you are right, that could be the problem on importing. But from the way you responded, I thought the problem was occuring during migration in which case the error wouldn’t make sense.

But I just checked and I don’t have that user in my profile yet, so I am still not sure why it is failing over at your end. It also wouldn’t really explain why this bug would only show up now. I think we have tested importing into sqlite backends plenty.

If you can reliably reproduce the exception, could you modify the code to print the entire stack trace and post it here? I will try to reproduce it as well, but haven’t been able to so far.

Hi Sebastiaan, in parallel, could you also report your python and code versions, including sqla? I think the difference might be there.
Also, would you have a quick way to check if the import was correct? (eg, if all uuids from the archive are correctly imported)

I had checked this. I am also main with Python 3.11. I had sqlalchemy==2.0.28 but I tried it with sqlalchemy==2.0.30 and that also worked.

Out of curiosity, could you both check the output of printing sqlite3.sqlite_version (after importing sqlite3)? I think one needs a recent sqlite underlying version for this returning feature, not sure if this is the problem

In [1]: import sqlite3

In [2]: sqlite3.sqlite_version
Out[2]: '3.43.0'

sqlite3.sqlite_version after import is 3.31.1
However, now that I tried a fresh installation on another laptop, everything seems to be working, and the err disappeared. The only difference to me is sqlite3.sqlite_version which in the working system is 3.45.1

The traceback.print_exc() (from the previous machine) says

verdi archive import MC3D-provenance__migrated_to_sqlite_dos.aiida 
Report: starting import: MC3D-provenance__migrated_to_sqlite_dos.aiida
Report: Parameters
-------------------------------  ---------------------------------------------
Archive                          MC3D-provenance__migrated_to_sqlite_dos.aiida
New Node Extras                  keep
Merge Node Extras (in database)  (k)eep
Merge Node Extras (in archive)   do (n)ot create
Merge Node Extras (in both)      (l)eave existing
Merge Comments                   leave
Computer Authinfos               exclude

Report: Adding 1 new user(s)
Adding new user(s)                         0.0%|                                                                                                                | 0/1Traceback (most recent call last):
  File "/home/khosra_a/development/repo/aiida-core/src/aiida/storage/psql_dos/backend.py", line 343, in bulk_insert
    result = session.execute(insert(mapper).returning(mapper, column('id')), rows).fetchall()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khosra_a/development/venv/aiida-apache/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2351, in execute
    return self._execute_internal(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khosra_a/development/venv/aiida-apache/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 2236, in _execute_internal
    result: Result[Any] = compile_state_cls.orm_execute_statement(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khosra_a/development/venv/aiida-apache/lib/python3.11/site-packages/sqlalchemy/orm/bulk_persistence.py", line 1269, in orm_execute_statement
    result = _bulk_insert(
             ^^^^^^^^^^^^^
  File "/home/khosra_a/development/venv/aiida-apache/lib/python3.11/site-packages/sqlalchemy/orm/bulk_persistence.py", line 200, in _bulk_insert
    result = persistence._emit_insert_statements(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khosra_a/development/venv/aiida-apache/lib/python3.11/site-packages/sqlalchemy/orm/persistence.py", line 1107, in _emit_insert_statements
    raise sa_exc.InvalidRequestError(
sqlalchemy.exc.InvalidRequestError: Can't use explicit RETURNING for bulk INSERT operation with sqlite+pysqlite backend; executemany with RETURNING is not enabled for this dialect.
Critical: an exception occurred while trying to import the archive MC3D-provenance__migrated_to_sqlite_dos.aiida: InvalidRequestError: Can't use explicit RETURNING for bulk INSERT operation with sqlite+pysqlite backend; executemany with RETURNING is not enabled for this dialect.

Thanks @ali-khosravi seems that that may have been the problem. We will have to figure out the minimum version that added support for this feature in sqlite3 and declare it as an explicit dependency in aiida-core because currently we don’t declare it at all

Indeed, the RETURNING clause was added after 3.35.0 see here
I agree, we need to enforce this somehow.

Thanks a lot, @sphuber and @giovannipizzi

Not quite sure how to do this though. The sqlite3 module comes from the standard library: sqlite3 — DB-API 2.0 interface for SQLite databases — Python 3.12.5 documentation

So really the version that matters is the version of SQLite installed on the system. But I get conflicting results when requesting the version directly on the CLI and through Python’s sqlite3:

$ sqlite3 --version
3.31.1 2020-01-27 19:55:54 3bfa9cc97da10598521b342961df8f5f68c7388fa117345eeb516eaa837balt1

and through Python

$ python -c 'import sqlite3;print(sqlite3.sqlite_version)'
3.43.0

Yes, I had the same issue when I tried to upgrade.
One has to compile from the source and then purge and remove the previous version:

wget https://www.sqlite.org/2024/sqlite-autoconf-3460000.tar.gz
tar -xvf sqlite-autoconf-3460000.tar.gz
cd sqlite-autoconf-3460000/
sudo apt-get install libreadline-dev
./configure
make -j 8
sudo apt-get purge sqlite3
sudo make install
export PATH="/usr/local/bin:$PATH"  # also add in your .bashrc
sudo apt-get remove -y --auto-remove sqlite3

After this, both versions matched fo me

taken from here and here

The point is that one can have multiple versions installed. Normally python will us the library shipped with it. Probably Sebastiaan you installed a newer one as well. It depends on the order of things in the path. For the binary, it’s the path, while for python, it’s the shared library (installed on Ubuntu with a package starting with lib, usually, probably something like libsqlite3 and/or libsqlite3-dev (the second needed if one needs to recompile cpython).
Now, if we just need to check the version, we just need to check the python module as you did earlier.
The issue is: if a user wants to upgrade, what we suggest.
Here there are some hacks to the library path: python - Upgrading SQLite3 version used in python3 on linux? - Stack Overflow but not sure how much I’d recommend them to typical users.

Maybe for now we just check and show a warning in Verdi status or somewhere similar, if a sqlite profile exists and the version is old.
It would be interesting to check if there is an alternative equivalent query that would work and does not use the returning clause