Why is AiiDAlab base widget import so slow?

I think it is fair to say that AiiDAlab Apps do not load very quickly. :face_with_open_eyes_and_hand_over_mouth:
It would be great to understand a bit more where the time is spent and if we can improve it.

One strange thing I noticed is that it takes anything between 2-3 seconds to just import the aiidalab_widgets_base package. :exploding_head:

time python -c "import aiidalab_widgets_base

During the coding week, I’d like to dig a little bit to understand what is going on. CC @mbercx

PS: I’ve created a dedicated AiiDAlab category, I hope that is fine @jusong.yu

Thanks a lot for creating this! Totally make sense to make it a dedicated category!

I think it is fair to say that AiiDAlab Apps do not load very quickly. :face_with_open_eyes_and_hand_over_mouth:

I agree, I have a feeling that is caused by having too many modules loaded in the __init__.py, not sure it is necessary. Great you can have a look at it.

1 Like

I’ve spent a fair amount of time today looking into this, without going anywhere in the end. :frowning: I haven’t identified any quick wins or big offenders here — it just seems we’re importing a lot of dependencies and that takes time.

There are definitely some offenders that take longer to import than others. For example, importing from aiida.orm takes more than a second, significantly more than other imports. Here’s an output when running from aiida.orm import StructureData with cProfile

$ python -m cProfile -s cumtime test_import.py
         3609825 function calls (3596342 primitive calls) in 1.726 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       24    0.001    0.000    6.252    0.260 __init__.py:10(<module>)
   1041/1    0.015    0.000    1.727    1.727 {built-in method builtins.exec}
        1    0.000    0.000    1.727    1.727 test_import.py:1(<module>)
    823/1    0.004    0.000    1.727    1.727 <frozen importlib._bootstrap>:1002(_find_and_load)
    822/1    0.003    0.000    1.727    1.727 <frozen importlib._bootstrap>:967(_find_and_load_unlocked)
   1039/2    0.001    0.000    1.727    0.863 <frozen importlib._bootstrap>:220(_call_with_frames_removed)
    790/2    0.004    0.000    1.727    0.863 <frozen importlib._bootstrap>:659(_load_unlocked)
    726/2    0.002    0.000    1.727    0.863 <frozen importlib._bootstrap_external>:844(exec_module)
2417/2314    0.037    0.000    1.277    0.001 {built-in method builtins.__build_class__}
       43    0.003    0.000    1.111    0.026 entry_point.py:319(get_entry_point_from_class)
     1100    0.003    0.000    1.099    0.001 __init__.py:502(select)
     1097    0.001    0.000    1.098    0.001 entry_point.py:261(get_entry_points)
     1101    0.002    0.000    1.083    0.001 __init__.py:376(select)
    11309    0.352    0.000    1.080    0.000 __init__.py:381(<genexpr>)
       39    0.000    0.000    0.973    0.025 node.py:160(__new__)
       39    0.000    0.000    0.970    0.025 node.py:78(get_type_string_from_class)
   498948    0.384    0.000    0.681    0.000 __init__.py:237(matches)
       12    0.000    0.000    0.644    0.054 __init__.py:2(<module>)
   556/19    0.001    0.000    0.569    0.030 {built-in method builtins.__import__}
       42    0.001    0.000    0.546    0.013 __init__.py:1(<module>)
 1198/373    0.002    0.000    0.438    0.001 <frozen importlib._bootstrap>:1033(_handle_fromlist)
        7    0.000    0.000    0.292    0.042 __init__.py:8(<module>)
        1    0.000    0.000    0.267    0.267 array.py:10(<module>)
   498966    0.107    0.000    0.256    0.000 {built-in method builtins.all}
        2    0.000    0.000    0.245    0.123 data.py:10(<module>)
        1    0.000    0.000    0.220    0.220 node.py:11(<module>)
        1    0.000    0.000    0.219    0.219 postgres.py:10(<module>)

The first non-Python-internal entry seems to be aiida/plugins/entry_point.py, which is in turn called from

which is called by the AbstractNodeMeta metaclass constructor.

I was a bit surprised to see the entry point system being used even if I import a Node explicitly (e.g. from aiida.orm import StructureData) instead of using the DataFactory for example. Looking at the code, it doesn’t seem like there’s an easy way out of this, perhaps @sphuber could have some insight here?

The reason for this is that when a Node class is imported, the _plugin_type_string class attribute needs to be defined, which is based on the associated entry point if it has one. It is a reverse lookup if you will. For example, when you take the StructureData, when imported, this code will look in the entry points and find that it has the entry point string aiida.data:core.structure and so the _plugin_type_string is set to data.core.structure.StructureData.. This is in turn stored in the node_type column of the node instance when it is stored. This is what allows AiiDA to load a StructureData instance, instead of just a Node instnace, when the node is loaded from the database (it reconstructs the actual class from the data.core.structure.StructureData. node type string).

I am not sure whether this functionality could be implemented “lazily”, such that it is just executed when it is actually needed and not just on import. Since this is done in the metaclasses I have the feeling that it will be tricky to not have this happen on import of the class.

1 Like

Just a thought, could the laziness be implemented via a getter method?

@danielhollas thanks a lot for looking into this. I am a bit confused now it is a Aiidalab-widget-base problem or import StuctuteData is slow in general?

It is slow in general, you can time it just by importing from aiida core.

Perhaps, but it is a class attribute, so it would have to be a class property, which is not really native to Python. We have been using our own implementation (aiida.common.lang.classproperty) but not sure if this will work in all cases and get around the up-front load time on import. Worth a try perhaps.

Yes, loading the aiida.orm module is slow, which is why we avoid doing that in the aiida.cmdline module outside of actual commands. Otherwise tab-completion would be horrifically slow.

1 Like

@danielhollas made a quick proof of concept here: https://github.com/sphuber/aiida-core/tree/fix/lazy-node-metaclass
Would be great if you could give that a spin and see if it improves things. I have to say that I didn’t get huge loading times when calling these methods, so couldn’t quite reproduce it. Maybe there is still something else that is really slowing things down?

1 Like

Thank you very much @sphuber :clap:

I am using hyperfine to benchmark, running this command

 hyperfine --warmup 2 "python -c 'import aiida.orm'"

The following results were done with Python 3.10.6.

Current master

   Time (mean ± σ):     846.5 ms ±  91.5 ms    [User: 1641.7 ms, System: 3146.2 ms]
   Range (min … max):   693.7 ms … 987.5 ms    10 runs

Lazy branch

  Time (mean ± σ):     585.2 ms ±  16.5 ms    [User: 1422.4 ms, System: 3082.6 ms]
  Range (min … max):   561.0 ms … 606.7 ms    10 runs

So we have a very nice 260ms improvement! :tada: Importantly, the results are now much more consistent (the original code varied wildly between executions, the first one usually being the slowest).

I’ll do a bit more benchmarking soon. Here’s importing aiida.plugins

  Time (mean ± σ):     295.5 ms ±   3.4 ms    [User: 265.1 ms, System: 29.9 ms]
  Range (min … max):   290.5 ms … 302.0 ms    10 runs

So there are certainly other reasons for the slowness.

Another 50ms is just from importing numpy. It is imported in a fair number of places, but in some of them it is already imported lazily so I’ve tried to do the rest in this commit.

The only place where I don’t know how to get rid of the import is here:

In aiida/orm/nodes/data/bool.py

@to_aiida_type.register(numpy.bool_)
def _(value):
    return Bool(value)

A much bigger win comes from lazily importing disk_objectstore (its slowness in turn comes from importing sqlalchemy). This is implemented in this commit:

Together with the numpy import change, here are the new timings :tada:

  Time (mean ± σ):     354.2 ms ±  14.2 ms    [User: 324.9 ms, System: 28.7 ms]
  Range (min … max):   333.5 ms … 377.5 ms    10 runs

This is just 50ms slower than import aiida

  Time (mean ± σ):     293.7 ms ±   4.0 ms    [User: 259.9 ms, System: 33.4 ms]
  Range (min … max):   289.4 ms … 300.5 ms    10 runs

Continuing the import rabbit hole… :smile:

Another sizeable win: not importing requests and jsonschema packages. This helps all imports, not just aiida.orm so it’s worth doing and not much work. Here’s the commit:

Here’s a new timing for import aiida :tada:

  Time (mean ± σ):     199.5 ms ±   1.9 ms    [User: 178.0 ms, System: 21.1 ms]
  Range (min … max):   196.6 ms … 203.7 ms    14 runs

Thanks a lot for the extensive benchmarking @danielhollas . What is the timing on your machine for import aiida with the current main branch? Just to get a sense of the impact of all these changes. I think the original numbers you showed were for importing the widget and not just import aiida

So here’s the thing, get_entry_point_from_class was called 43 times on import. It contains two nested for loops so that becomes 1500 calls to importlib_metadata.entry_points().select() method. Unfortunately, this method has its own for loops…so we end up with 800000 (800k) calls to a specific part of importlib_metadata. :scream: :exploding_head:

Here’s the output from cProfile when import aiida.orm on main branch, filtered on importlib_metadata

$ python -m cProfile -o timings.out test_import.py
$ python -m pstats timings.out 
Welcome to the profile statistics browser.
timings_faster.out% sort cumtime
timings_faster.out% stats importlib_ 20
Wed Jul 12 04:03:26 2023    timings_faster.out

         3830317 function calls (3816176 primitive calls) in 1.813 seconds

   Ordered by: cumulative time
   List reduced from 5134 to 108 due to restriction <'importlib_'>
   List reduced from 108 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1514    0.002    0.000    1.208    0.001 /home/bl22441/micromamba/envs/aiida/lib/python3.10/site-packages/importlib_metadata/__init__.py:507(select)
     1515    0.002    0.000    1.197    0.001 /home/bl22441/micromamba/envs/aiida/lib/python3.10/site-packages/importlib_metadata/__init__.py:380(select)
     1519    0.066    0.000    1.196    0.001 /home/bl22441/micromamba/envs/aiida/lib/python3.10/site-packages/importlib_metadata/__init__.py:386(<genexpr>)
   403975    0.224    0.000    1.129    0.000 /home/bl22441/micromamba/envs/aiida/lib/python3.10/site-packages/importlib_metadata/__init__.py:385(<genexpr>)
   402460    0.325    0.000    0.905    0.000 /home/bl22441/micromamba/envs/aiida/lib/python3.10/site-packages/importlib_metadata/_py39compat.py:25(ep_matches)
   402460    0.286    0.000    0.541    0.000 /home/bl22441/micromamba/envs/aiida/lib/python3.10/site-packages/importlib_metadata/__init__.py:241(matches)
   816392    0.131    0.000    0.171    0.000 /home/bl22441/micromamba/envs/aiida/lib/python3.10/site-packages/importlib_metadata/__init

Putting an extra cache around entry_points().select() helps a lot, and should be combined with your fix.

Moreover, there was another abstract class with the same issue. After a similar fix, the entry point stuff is not called at all when importing aiida.orm (but it still gets executed when importing aiida.cmdline.commands used by verdi).

I will post updated timings in a post below (spoiler alert: things are :sparkles: blazingly :sparkles: fast now :sunglasses: )

So I’ve realized I reached the end of the rabbit hole when I started chasing imports in the 5-10ms range. :smiley: Turns out there was a lot of low- or medium- hanging fruit throughout the whole aiida-core codebase, not just aiida.orm. The timings below are a comparison of the main branch and my faster-import branch, feel free to take it for a spin here. Commits · danielhollas/aiida-core · GitHub

tl;dr importing aiida.orm is now as fast as importing aiida.cmdline. :rocket: In fact, I needed to avoid importing aiida.cmdline from within aiida.orm since it was meaningfully slowing it down. :sweat_smile: All the other submodules are a lot faster as well (I may have went a bit over board, but I think the result is well worth it). Importing just aiida is essentially instant. The toughest nut to crack was aiidalab.cmdline.commands which is used by verdi, and thus super important for CLI responsiveness, but at the same time very hard to optimize since there’s a bunch of stuff getting executed at import time (I think it’s for dynamic click completion). Nevertheless, even there I managed to shave of 150ms so I think this should be very noticeable. Also, I’ve verified the much faster aiida.orm import significantly speeds up verdi commands that interact with the DB (e.g. try running verdi code list)

Ok, timings in a post below…

I am now directly comparing the main branch with my fast-import branch (located at $FASTAIIDA), by running hyperfine like this

hyperfine "PYTHONPATH=$FASTAIIDA python -c 'import aiida'" "python -c 'import aiida'"

import aiida

3.58 ± 0.23 times faster

Benchmark 1: PYTHONPATH=/tmp/faster-import python -c 'import aiida'
  Time (mean ± σ):      77.3 ms ±   4.0 ms    [User: 67.4 ms, System: 9.7 ms]
  Range (min … max):    54.1 ms …  80.0 ms    39 runs
 
Benchmark 2: python -c 'import aiida'
  Time (mean ± σ):     276.9 ms ±  10.4 ms    [User: 247.1 ms, System: 29.1 ms]
  Range (min … max):   257.5 ms … 291.4 ms    10 runs
 
Summary
  'PYTHONPATH=/tmp/faster-import python -c 'import aiida'' ran
    3.58 ± 0.23 times faster than 'python -c 'import aiida''

import aiida.cmdline.commands (verdi)

1.92 ± 0.13 times faster

Benchmark 1: PYTHONPATH=/tmp/faster-import python -c 'import aiida.cmdline.commands'
  Time (mean ± σ):     162.8 ms ±   8.9 ms    [User: 142.4 ms, System: 18.9 ms]
  Range (min … max):   140.0 ms … 186.4 ms    18 runs
 
Benchmark 2: python -c 'import aiida.cmdline.commands'
  Time (mean ± σ):     312.9 ms ±  12.6 ms    [User: 281.9 ms, System: 30.6 ms]
  Range (min … max):   297.9 ms … 333.4 ms    10 runs
 
Summary
  'PYTHONPATH=/tmp/faster-import python -c 'import aiida.cmdline.commands'' ran
    1.92 ± 0.13 times faster than 'python -c 'import aiida.cmdline.commands''

import aiida.transports

2.75 ± 0.22 times faster

  Time (mean ± σ):     129.2 ms ±  10.1 ms    [User: 109.3 ms, System: 19.3 ms]
  Range (min … max):   108.7 ms … 136.2 ms    21 runs
 
Benchmark 2: python -c 'import aiida.transports'
  Time (mean ± σ):     355.4 ms ±   2.6 ms    [User: 316.4 ms, System: 38.3 ms]
  Range (min … max):   352.0 ms … 359.5 ms    10 runs
 
Summary
  'PYTHONPATH=/tmp/faster-import python -c 'import aiida.transports'' ran
    2.75 ± 0.22 times faster than 'python -c 'import aiida.transports''

import aiida.orm

4.13 ± 0.62 times faster

Benchmark 1: PYTHONPATH=/tmp/faster-import python -c 'import aiida.orm'
  Time (mean ± σ):     207.0 ms ±  13.0 ms    [User: 181.9 ms, System: 22.0 ms]
  Range (min … max):   199.1 ms … 251.0 ms    14 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: python -c 'import aiida.orm'
  Time (mean ± σ):     854.9 ms ± 117.5 ms    [User: 1687.1 ms, System: 3095.8 ms]
  Range (min … max):   706.9 ms … 1063.5 ms    10 runs
 
Summary
  'PYTHONPATH=/tmp/faster-import python -c 'import aiida.orm'' ran
    4.13 ± 0.62 times faster than 'python -c 'import aiida.orm''

import aiida.engine

4.03 ± 0.50 times faster

Benchmark 1: PYTHONPATH=/tmp/faster-import python -c 'import aiida.engine'
  Time (mean ± σ):     234.6 ms ±   7.4 ms    [User: 211.2 ms, System: 23.2 ms]
  Range (min … max):   213.8 ms … 243.2 ms    12 runs
 
Benchmark 2: python -c 'import aiida.engine'
  Time (mean ± σ):     944.8 ms ± 113.9 ms    [User: 1756.5 ms, System: 3128.3 ms]
  Range (min … max):   775.2 ms … 1177.7 ms    10 runs
 
Summary
  'PYTHONPATH=/tmp/faster-import python -c 'import aiida.engine'' ran
    4.03 ± 0.50 times faster than 'python -c 'import aiida.engine''

importing everything together

Benchmark 1: PYTHONPATH=/tmp/faster-import python -c 'import aiida.engine;import aiida.cmdline.commands;import aiida.transports'
  Time (mean ± σ):     306.5 ms ±  12.8 ms    [User: 276.0 ms, System: 29.9 ms]
  Range (min … max):   293.9 ms … 329.9 ms    10 runs
 
Benchmark 2: python -c 'import aiida.engine;import aiida.cmdline.commands;import aiida.transports'
  Time (mean ± σ):     988.0 ms ± 109.8 ms    [User: 1804.0 ms, System: 3157.6 ms]
  Range (min … max):   846.4 ms … 1145.6 ms    10 runs
 
Summary
  'PYTHONPATH=/tmp/faster-import python -c 'import aiida.engine;import aiida.cmdline.commands;import aiida.transports'' ran
    3.22 ± 0.38 times faster than 'python -c 'import aiida.engine;import aiida.cmdline.commands;import aiida.transports''

@sphuber if you’re happy with the progress here I can start submitting individual PRs from my branch. :blush:

I don’t really see further avenue for improvement in aiida-core. I’ve also submitted some PRs to external dependencies

(fastjsonschema is a faster alternative to jsonschema that we could potentially migrate to)

Thanks for all the work @danielhollas . Why don’t you add a commit with all your changes on to my branch, then we can create a PR from there