How to distinguish between surface and bulk calculations?

We have tens of thousands of immigrated VASP calculations.
Now we need to categorize them in surface and bulk calculations.
How would you approach this task?

The StructureData nodes that represent the crystal structures have a pbc property which is a tuple of booleans to indicate whether there are periodic boundary conditions along each axis of the crystal cell. You could check this if any of the axes is set to False, which might indicate the surface normal. However, it is quite possible that the calculation is still going to be periodic even long the surface normal, so the pbc might still be set to (True, True, True) in which case it won’t be helpful.

I am not a VASP user, but does it require an input in the INCAR to indicate the structure is a surface? If so, I believe the INCAR is generated based on the parameters input and you could query for the particular input flag.

One more possibility if there are no distinghuishing features in the calculation inputs, you could analyse the structure from the StructureData itself. Using tools like ASE or pymatgen, you might be able to determine whether it is a surface or not.

Once you have a function that can determine whether a StructureData is surface or not, you can “tag” it using the extras of the node. Any node can have extras set, which are like tags. You can then use the QueryBuilder to query for them. For example:

structure = load_node(..)
structure.base.extras('is_surface', True)
QueryBuilder().append(StructureData, filters={'extras.is_surface': True}).all()

Another option is to use groups. You can create two groups and add the different StructureData nodes (or their associated VaspCalculation nodes).

from aiida.orm import Group
bulk_structures = [load_node(), load_node()]
surface_group = Group('structures/surface').store()
bulk_group = Group('structures/bulk').store()
bulk_group.add_nodes(bulk_structures)

You can then use the Group API to retrieve the nodes of interest, or use verdi group list.

Thank you for your answer.

This is the query for the option based on StructureData:

QueryBuilder().append(StructureData, filters = {"attributes.pbc1": {"==": True}, "attributes.pbc2": {"==": True}, "attributes.pbc3": {"==": False}})

Not very successful in my case.
For me it is often not obvious how attributes are named. Luckily, I found it in the source code.

Is there an easier way to find out what are the available attributes for the filters?

Another approach utilizing kpoints:
Often kpoint mesh in z direction is set to 1.

QueryBuilder().append(KpointsData, filters = {"attributes.mesh.2": {"==": 1}}, tag="kpoints").append(CalcJobNode, with_incoming="kpoints", tag="calc").append(StructureData, with_outgoing="calc")

I’m not aware of any flag in INCAR indicate that a surface is calculated.

Is it possible to propagate this is_surface property (group or extra) of the input structure automatically to the relaxed output structure of a calculation?

As a matter of fact, we just merged a PR that provides this functionality. With that feature, you would have been able to do

QueryBuilder().append(
    StructureData, filters=(StructureData.fields.pbc1 == True)
)

The StructureData.fields is tab-completable, so in an interactive shell or IDE you can discover the fields that can be queried and projected.

Is it possible to propagate this is_surface property (group or extra) of the input structure automatically to the relaxed output structure of a calculation?

Since the output nodes already exist, there is no automation. But you can manually propagate them. Or if you want to have them on the output structure, why not directly set the extra there or add the output structures to the group?

Great!

I see. Thanks.