What does "contains" actually mean in the QueryBuilder?

edan-bainglass · February 20, 2024, 11:37am

I’m trying to get a sense of how the “contains” query works for list/dictionary types. Let’s focus on lists first. Suppose the following:

orm.QueryBuilder().append(
    orm.StructureData,
    filters={
        "attributes.cell": {
            "contains": [[1.0, 0.0, 1.0]],
        }
    }
)

I would think this means that the cell contains the vector [1.0, 0.0, 1.0]. However, the query fetches a result with a cell [[1.0, 0.0, 0.0], [0.0, 2.0, 0.0], [0.0, 0.0, 3.0]]. This suggests it’s doing some sort of partial check through each element? Unclear.

Also, the documentation itself is a bit ambiguous.

Contains all? Any? Note that neither seems to explain the above behavior.

A bit of digging suggests that “contains” points to SQLAlchemy’s sqlalchemy.dialects.postgresql.JSONB.Comparator.contains (I think). If so, the docs say

“Boolean expression. Test if keys (or array) are a superset of/contained the keys of the argument jsonb expression.”

Not sure what to make of it.

giovannipizzi · February 20, 2024, 4:37pm

We need to check the actual query in postgres.
You can store the query builder object e.g. in qb and then run print(qb.as_sql(inline=True)). And then we need to check the PostgreSQL docs.

From some quick testing, I think that:

{'contains': ['a', 'b']} means that it should match if there are both elements in the list
if you put one more list, I tried with this example:
```
from aiida import orm
n = orm.Dict({'test': [['a', 'b', 'c'], ['a', 'd', 'f']]})
n.store()
```
Then, I tried various queries similar to print(orm.QueryBuilder().append(orm.Dict, filters={"attributes.test": {"contains": [["f"], ["b"]]}}).all(flat=True)).
I think that:
- "contains": [["a", "d"]] means that there is at least an element in the list with both elements. This matches (there is ['a', 'd', 'f'])
- So, "contains": [["f"]] means at least one element in the list is a list with the element f.
- "contains": [["f"], ["b"]] then means that both conditions apply: there is at least one element in the list with f, and at least one (the same or another) element in the same list with the element “b”. This matches, again.

You can try with adding other letters to see what matches and what not. And then we should double check Postres to confirm our intuition. But this should explain your results: "contains": [[1.0, 0.0, 1.0]] means:

In the list of lists, there is at least one internal list that matches “[1.0, 0.0, 1.0]”, and the latter statement means: at least one internal list has both elements 0.0 and 1.0.

(which is not the query you had in mind, but can be at least explained).

As a final note, let’s be very careful of querying float numbers, because of the finite precision (I’m not sure of how “=” compares two float numbers in postgres).

(And note that in the SQLite backend, contains is not supported).

edan-bainglass · February 21, 2024, 11:44am

Thanks @giovannipizzi. That’s very peculiar behavior. Is this the behavior we wish to reflect in the QueryBuilder? As for floats, that’s interesting. I think I’ll take a deeper look at this. Do we presently not handle floats comparisons?

giovannipizzi · February 21, 2024, 1:45pm

I agree that is not what one would expect for the cell. But for general lists of lists, I see how the current behaviour covers at least some usecases. Anyway, I think we’ll be limited by the actual query capabilities of PostgreSQL, so we need first to check what is possible in raw Postgres SQL syntax.

If one really cares, it’s still possible to make a relatively complex query to achieve the goal you want, by making enough and/or filters. But again, do we really need it?
I don’t think you ever want to query for the first vector being exactly [1, 0, 0].
You might care if the system is cubic, for instance. And then you could do e.g. as sequence of and statements, checking that the 6 off-diagonal components are zero. But in this case I think it’s best to add extras e.g. with the spacegroup, and then query for those.

Float comparisons are OK and should work as intended. It’s equality that is tricky (not only for us, but for anything on a computer with float numbers).

Just try this in you python…

(4/3 - 1) == 1/3

(it should be True, but it will return False, or at least it does for me, as they differ by ~1e.-17).

You typically want to check if the difference in abs value is < some threshold (e.g. 1.e-6 or smaller, depending on the application).

rabbull · November 26, 2024, 8:30am

In addition to the current discussion, I added some comment in a github issue. Please feel free to check it as well for further discussion:

github.com/aiidateam/aiida-core

Counterintuitive Behaviors of `contains` Filter Operator over Arrays in PostgreSQL Backend

opened 02:36PM - 19 Nov 24 UTC

rabbull

- [x] [AiiDA Troubleshooting Documentation](https://aiida.readthedocs.io/project…s/aiida-core/en/stable/installation/troubleshooting.html) - [x] [AiiDA Discourse Forum](https://aiida.discourse.group/) ### Describe the bug The `contains` operator on the PostgreSQL backend lacks comprehensive documentation. To better understand its exact behavior, I conducted additional tests in #6617. However, some of the observ /ed behaviors are counterintuitive, and these are detailed below: #### 1. Non-Existent `attr_key` ```python import pytest from aiida.orm import Dict, QueryBuilder @pytest.mark.usefixtures('aiida_profile_clean') @pytest.mark.requires_psql def test(): Dict({ 'arr': [114, 514] }).store() qb = QueryBuilder().append(Dict, filters={ 'attributes.oops': {'contains': []}, }) print(len(qb.all())) # prints 0 qb = QueryBuilder().append(Dict, filters={ 'attributes.oops': {'!contains': []}, }) print(len(qb.all())) # also prints 0 ``` In a test where the `attributes` column does not contain a key named `oops`, the query executes successfully. However, the results are confusing: neither the affirmation nor the negation of the `contains` operation matches the entry. This behavior is unexpected and counterintuitive. What best fits my expectation is to fail loudly, e.g. to raise an Exception. #### 2. Nested Array This is a known issue discussed in a [previous thread](https://aiida.discourse.group/t/what-does-contains-actually-mean-in-the-querybuilder/282) on discourse. ```python @pytest.mark.usefixtures('aiida_profile_clean') @pytest.mark.requires_psql def test(): Dict({ 'arr': [[1, 2], [3]] }).store() qb = QueryBuilder().append(Dict, filters={ 'attributes.arr': {'contains': [[4]]}, }) assert len(qb.all()) == 0 # OK qb = QueryBuilder().append(Dict, filters={ 'attributes.arr': {'contains': [[2]]}, }) assert len(qb.all()) == 0 # AssertionError: assert 1 == 0 ``` When testing with nested arrays, the `contains` operation unexpectedly matches entries even when the contained elements do not strictly align with the expected structure. For example, a query attempting to match `[2]` against an array `[[1, 2], [3]]` may return a match, even though `[2]` is not directly an element of the array. Note that this actually complies with PostgreSQL's native JSONB containment semantics: ``` postgres=# select '[[1, 2], [3]]'::jsonb @> '[[2]]'::jsonb; ?column? ---------- t (1 row) postgres=# select '[[1, 2], [3]]'::jsonb @> '[[4]]'::jsonb; ?column? ---------- f (1 row) ``` However, this behavior contracts with the intution quite a lot, and consequently makes the abstraction hard to be understood well. In addition, it should also be nice to mention in the documentation that `contains` doesn't care the order of arrays. See example below: ```python @pytest.mark.usefixtures('aiida_profile_clean') @pytest.mark.requires_psql def test(): Dict({ 'arr': [[1, 2], [3]] }).store() qb = QueryBuilder().append(Dict, filters={ 'attributes.arr': {'contains': [[2, 1]]}, }) assert len(qb.all()) == 1 # OK ``` ### Steps to reproduce See above. ### Expected behavior See above. ### Your environment - Operating system [e.g. Linux]: Linux 6.11.6-arch1-1 - Python version [e.g. 3.7.1]: 3.11.10 - aiida-core version [e.g. 1.2.1]: main (779cc29d8a47eddabdf9b274d7fa711220ee1aa9) Other relevant software versions, e.g. Postres & RabbitMQ - PostgreSQL version: PostgreSQL 16.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 14.2.1 20240805, 64-bit ### Additional context This might be issues of SQLAlchemy and need further investigation.

Topic		Replies	Views
Querying from attributes General Usage question	4	44	August 22, 2024
Unsual behavior in the QueryBuilder General Usage question	8	86	February 19, 2024
Find calculations with alloys by elements New to AiiDA	1	93	March 8, 2024
Strange warning about SQL General Usage	2	177	May 28, 2024
How to query for nodes belonging to intersection/union of two or more groups? General Usage	2	71	December 6, 2023

What does "contains" actually mean in the QueryBuilder?

Related topics