Issues with .aiida/container folder

Hello,

I had an issue where I filled up my 2 TB folder on my server with the .aiida/container folder. I figured a quick remedy might be to clear out some of the folders and this got my project going again. I’m not sure I have anywhere else to put this folder on the server but I’ll look into it with the server administrators.

I have a couple of questions about this. I am wondering why it needs to download the files and if there’s a way in my workflow to clear out these files afterwards.

The other issue I’m running into is that I’m now having issues with my POTCAR files for VASP. I understood I probably deleted those files so I deleted the group with all the related nodes and made a new vasp-potcar entry. When I go to load any of the POTCARs though I’m getting the following issue:

FileNotFoundError: object with key 679f44230d585e003a0295961cbe62660749dcd423da4eea0826dd0938dde087 does not exist.

I’m guessing something is messed up at this point although I’m unsure why when it’s a new entry with a new name and the files say they’ve been uploaded. I know sometimes there are caching issues perhaps?

Thanks for the help and discussion.

Nathan

Hi Nathan,

I figured a quick remedy might be to clear out some of the folders

What exactly did you “clear out”? Did you delete anything in the .aiida/container folder? Did you perhaps delete anything in the loose folder or any of the pack files? This would definitely explain the behavior that you are seeing. The container is the file repository and it automatically deduplicates content. It keeps track of which files it contains (based on th SHA256 checksum) in an sqlite database. So if you deleted the pack files (which contains the actual file content) but you didn’t update the sqlite database, that still thinks it has the file. So when you then upload the same files, AiiDA thinks it already has it and doesn’t store it again. If this is the case, restoring the lost data is essentially impossible.

I have a couple of questions about this. I am wondering why it needs to download the files and if there’s a way in my workflow to clear out these files afterwards.

What ever output files of completed calcjobs that are stored permanently by AiiDA is determined by the CalcJob plugin. Each plugin specifies explicitly which files need to be retrieved and permanently stored as part of the retrieved output node. Certain plugins may allow you to tune this in the input parameters, but this is highly plugin specific. Which plugins are you running? We should look which plugins and which output files are causing the bulk of your output.

Hi Sebastiaan,

I did delete files from the loose folder as it had most of the data. What would be the best workaround? I know you say impossible but is there a way to reset the sqlite database for this?

I’m using a personalized version of the NWChem plugin. I’ve been meaning to push some updates for the general functionality of the plugin but I have some customized workflows that I’m not sure are covered by our open development policy so I need to filter out those files. I’ll take a look at where you mentioned and see what it’s currently set to.

Nathan

There is no API for doing this, at least not a straightforward one that I am aware of. This is really a part of the data that is not supposed to be touched manually, let alone deleted. You would have to load the database manually and manipulate it to remove the keys of the files that you removed. But without having those files, it is difficult to determine that. Your best bet is to restore from backup.

I’ll take a look at where you mentioned and see what it’s currently set to.

You should look at what the CalcJob sets for the retrieve_list attribute of the CalcInfo object that is created and returned by the prepare_for_submission method. Alternatively, you can take a completed CalcJob from your database and do load_node().base.attributes.get('retrieve_list') That should return the list used for that calculation.

Okay sounds like I messed things up quite a bit. It sounds like the best thing at this point is to export my data and then reupload that into a fresh database as I haven’t been backing up my data as well. It seems like things are working well enough for NWChem but other programs are going to have issues.

The retrieve list is currently set to this.

calcinfo.retrieve_list = [self._DEFAULT_OUTPUT_FILE,
self._DEFAULT_ERROR_FILE]

I think I had copied the formatting from QE at the time. It seems that it would be taking the output file and the error file which is what I’m seeing in those folders. Should this perhaps be put into the retrieve_temporary_list? I guess I could see what the current format is in some of the main plugins as well.

Are the files that these correspond to typically very large? If it is just a text output file, I would be suprised that you would be creating TBs of data. Can you look at a typical example calculation and check the size of those files? If they are indeed very large, then a sensible choice would indeed be to move them to the retrieve_temporary_list and update the parser to parse any information from there and store that separately, instead of the entire file.

One comment from my side. As Sebastiaan wrote, if you deleted some files from the repository, they are now gone forever. I’m not sure that a backup will fix this (it also depends on how you do it, but the missing files cannot be restored).

One thing that is not clear to me is: you would like to restore missing files (very hard or almost impossible), or tell AiiDA that those files do not exist anymore (or never existed)? The latter is technically possible, but requires quite some scripting. One has to go in the disk-objectstore (= the repository), look for objects where the corresponding files are now missing, remove the corresponding entries from the Object table of the SQLite DB of the disk-objectstore; then go in the AiiDA PostgreSQL DB, find the references to these objects in any AiiDA node, and decide what to do. E.g. remove the reference to them as if those files never existed (but it’s a bit “weird”, nodes are supposed to be immutable and you are in this way changing their content), or just discover “problematic” nodes and delete them altogether from the AiiDA DB (if you don’t need those nodes anymore, and don’t want to get errors when working with AiiDA).

Hope this helps…

They are on the range of 30-100 MB in size but I’m running 10’s of thousands of simulations. I’ll move these over to the retrieve_temporary_list and that should hopefully fix this issue. Thanks.

Note that if you move them to the retrieve_temporary_list they will be retrieved but as soon as the Parser.parse returns, the files are deleted. So if they contain data that you need to keep, you need to make sure to update the Parser to parse the data from those files and store it as a separate output node.

I’m not too interested in restoring as opposed to just not have it look for them? I thought I had fixed this when I uploaded a new vasp-potcar data with a new family group but AiiDA is smarter than me obviously and can still recognize the other uploads that I thought had been deleted from the database as well. I wouldn’t mind deleting any reference to those POTCAR files as I hadn’t gotten too far with these VASP simulations. I’ve mostly been using NWChem in AiiDA at this point. I was already able to find a way to get all the PotcarData entries in the database and was trying to decide the best way to go about deleting those. I know I’ve seen a reference to the sqlite location before but where is that stored? I’m drawing a blank and can’t seem to find it at this second.

Yes I do remember this. I actually have it doing this currently for when I was downloading cube files. I just never imagined I was going to have issues with the output files.