Hello all,
@npaulish has some issues with the performance of verdi commands and we try to figure out what this relates to. We first thought that this is due to the massive amount of loose files that are not packed, but we also have issues with commands like verdi process list
that should not access the repository databases and are similar slow. In comparison with verdi profile list
we did not have any issues in terms of speed, so I am a bit puzzled. @mbercx mentioned that it might be because of slow Python code that is run during these command.
Here the log the top 5 functions sorted according to percall and tottime using cProfile (see attachment for whole logs).
verdi process list percall
8456574 function calls (8421386 primitive calls) in 36.534 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
2 12.884 6.442 13.091 6.545 pathlib.py:532(_select_from)
3 15.762 5.254 15.767 5.256 {method 'fetchmany' of 'psycopg2.extensions.cursor' objects}
24 3.810 0.159 3.810 0.159 {method 'execute' of 'psycopg2.extensions.cursor' objects}
2 0.095 0.047 13.186 6.593 pathlib.py:1130(glob)
1 0.013 0.013 0.013 0.013 context.py:64(__init__)
verdi process list tottime
8456574 function calls (8421386 primitive calls) in 36.534 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
3 15.762 5.254 15.767 5.256 {method 'fetchmany' of 'psycopg2.extensions.cursor' objects}
2 12.884 6.442 13.091 6.545 pathlib.py:532(_select_from)
24 3.810 0.159 3.810 0.159 {method 'execute' of 'psycopg2.extensions.cursor' objects}
747601 0.435 0.000 1.242 0.000 _py39compat.py:25(ep_matches)
747601 0.388 0.000 0.748 0.000 __init__.py:241(matches)
verdi storage info percall
6587071 function calls (6555618 primitive calls) in 17.590 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
2 13.256 6.628 13.449 6.724 pathlib.py:532(_select_from)
21 0.896 0.043 0.896 0.043 {method 'execute' of 'psycopg2.extensions.cursor' objects}
2 0.085 0.043 13.535 6.767 pathlib.py:1130(glob)
2 0.019 0.010 0.020 0.010 {built-in method psycopg2._psycopg._connect}
46/45 0.032 0.001 0.033 0.001 {built-in method _imp.create_dynamic}
verdi storage info tottime
6587071 function calls (6555618 primitive calls) in 17.590 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
2 13.256 6.628 13.449 6.724 pathlib.py:532(_select_from)
21 0.896 0.043 0.896 0.043 {method 'execute' of 'psycopg2.extensions.cursor' objects}
541061 0.323 0.000 0.914 0.000 _py39compat.py:25(ep_matches)
541061 0.283 0.000 0.548 0.000 __init__.py:241(matches)
1250 0.239 0.000 0.239 0.000 {built-in method io.open_code}
Environment and other information
$ verdi status
:heavy_check_mark: version: AiiDA v2.3.1
:heavy_check_mark: config: /home/aiida/envs/aiida-fermisurf/.aiida
:heavy_check_mark: profile: main
:heavy_check_mark: storage: Storage for 'main' [open] @ postgresql://aiida:***@localhost:5432/aiida-fermisurf / DiskObjectStoreRepository: 580f68c6eb6347b7a203030615ef1d40 | /hith/aiida-fermisurf/repositories/main/container
:heavy_check_mark: rabbitmq: Connected to RabbitMQ v3.8.2 as amqp://guest:guest@127.0.0.1:5672?heartbeat=600
:heavy_check_mark: daemon: Daemon is running with PID 1193148
$ verdi storage info
entities:
Users:
count: 10
Computers:
count: 31
Nodes:
count: 4734923
Groups:
count: 331
Comments:
count: 0
Logs:
count: 1203422
Links:
count: 8936853
repository:
SHA-hash algorithm: sha256
Compression algorithm: zlib+1
Sizes in the repository folder hith/aiida-fermisurf/repositories/main/container
. Note the 2TB loose folder.
$ du -h --max-depth=1
4.0K ./duplicates
4.0K ./sandbox
2.0T ./loose
4.0K ./packs
2.0T .
cProfile-verdi_storage_info-20top_tottime.txt (1.7 KB)
cProfile-verdi_storage_info-20top_percall.txt (1.7 KB)
cProfile-verdi_process_list-20top_tottime.txt (1.7 KB)
cProfile-verdi_process_list-20top_percall.txt (1.7 KB)
raw-cProfile-verdi_storage_info.txt (652.3 KB)
raw-cProfile-verdi_process_list.txt (706.6 KB)