New Era of AiiDA

Hi @AiiDA Team

I hope you’re doing well.

I’m currently preparing my GSoC proposal for the project “Natural Language Interface for AiiDA using Multi-Agent AI”, and I would really appreciate your feedback on the idea and its direction.

The main problem I’m trying to address is that AiiDA currently requires researchers to write fairly complex Python scripts and directly interact with its API to manage computational workflows. This can be time-consuming and can act as a barrier, especially for users who are less comfortable with programming.

In my previous project QueryGenius, I worked on a similar challenge. I built a local LLM + RAG-based system that allows users to perform database operations (CRUD and analytics) using plain English, removing the need for manual querying. This significantly reduced effort and allowed users to focus more on system design and analysis rather than low-level operations.

Building on that experience, my proposal is to design an AI pipeline for AiiDA that takes natural language input and converts it into secure Python API executions using the Model Context Protocol (MCP). The goal is to enable users to define and run computational workflows using simple human language.

One of the key challenges in such systems is handling LLM hallucinations, which can lead to invalid or non-executable code. To address this, I plan to apply fault-tolerant techniques from my project AgenticIQ, where I used sandboxing, silent retries using inngest, and intelligent fallback mechanisms to prevent execution of faulty AI-generated code and significantly reduce runtime failures.

Instead of relying on costly Cloud model, I’m proposing a locally deployed, multi-agent architecture (using tools like Ollama), both for cost efficiency and better modularity. Inspired by my work on AI Gossip Hub, where multiple AI agents collaborate via shared context, I plan to implement an agent-to-agent communication protocol in AiiDA.

The system would consist of three specialized agents:

  • Execution Agent: Translates user prompts into MCP tool calls and configures workflows (e.g., DFT simulations using Quantum ESPRESSO).

  • Validation Agent: Runs in the background using sandboxing to verify and sanitize generated code/arguments before execution.

  • Analysis & Diagnostic Agent: Interacts with AiiDA’s provenance graph to analyze results and help debug failed computations.

Overall, the goal is to make AiiDA more accessible, reduce manual scripting effort, and provide a more intelligent and fault-tolerant interface for computational workflows.

I would really value your feedback on:

  • Whether this idea aligns with AiiDA’s vision and technical direction

  • Any concerns regarding feasibility, scope, or integration with the existing ecosystem

  • Suggestions to improve or better scope this into a strong GSoC proposal

Thank you for your time and guidance!

Best regards,
Mukul Sharma

I’m finding it a bit difficult to imagine how this would work in practice as a user. Do you have some user stories in mind, of:

  • what the user might like to achieve
  • how they would do it currently
  • how they might achieve it through the natural language interface

AiiDA users already work at a few different levels of depth/complexity, who is the target for this tool?

e.g. would the scope include

  • executing largely-preconfigured workflows from public packages;
  • combining existing workflows from public packages into a workgraph;
  • executing custom workflows from the user’s environment;
  • writing custom workflows (workgraphs?) in the user’s environment?

In the AiiDA spirit of provenance-everywhere, could some record of the original prompt and its consequences be recorded in the graph?

Thank you @ajjackson for the excellent feedback. These questions really help bridge the gap between the architectural backend and the actual user experience. Here are my thoughts on how this would work in practice:

1. User Stories (Current vs. Natural Language Interface): My target audience includes both novice users (lowering the barrier to entry so they can focus on physics rather than AiiDA syntax) and power users (automating repetitive diagnostic tasks).

  • Scenario A: Setting up a standard calculation

    • The Goal: Run a geometry optimization on a specific crystal structure.

    • Currently: The user has to write a Python script to load the StructureData node, load the Quantum ESPRESSO code, construct a nested dictionary of input parameters, configure the builder, and submit the engine.

    • With the NL Interface: The user types: “Run a geometry optimization on structure node 1234 using Quantum ESPRESSO with default precision.” The Execution Agent translates this intent into the correct MCP tool calls, populates the schema, passes it through the Validation Agent, and submits the workflow.

  • Scenario B: Diagnosing a failed workflow

    • The Goal: Figure out why a simulation crashed.

    • Currently: The user manually queries the database via the verdi CLI or Python shell, fetches the failed node, and parses the log files to identify an SCF convergence issue.

    • With the NL Interface: The user types: “Why did calculation 5678 fail?” The Analysis & Diagnostic Agent uses MCP tools to query the AiiDA provenance graph, reads the exit status, and responds: “The calculation failed due to an SCF convergence error. Would you like me to set up a retry with a higher mixing beta?”

2. Scope of the Tool To manage hallucination risks and ensure reliability, I propose scoping the tool to focus heavily on orchestration rather than code generation:

  • Yes - Executing largely-preconfigured workflows: This is the primary target. The agents will map prompts to standard AiiDA tools exposed via MCP.

  • Yes - Combining existing workflows / Executing custom workflows: As long as the workflows or workgraphs are registered in the user’s local AiiDA environment and exposed via MCP tools, the agent can trigger them.

  • No/Later - Writing custom workflows from scratch: Asking the LLM to write arbitrary new Python workflow logic introduces high hallucination risks. For this exploratory project, the focus will be on executing and diagnosing existing logic safely.

3. Provenance of the Prompt I completely agree with capturing the prompt in the AiiDA spirit of “provenance-everywhere”. Because the system uses MCP to interface with the AiiDA Python API, we can easily enforce a rule where the Execution Agent saves the original NLP prompt, the LLM model version used, and the generated MCP arguments directly into the calculation node’s extras or as a distinct Data node. This ensures that the AI’s decision-making process is fully auditable and reproducible within the provenance graph.

I would love to hear your thoughts on scoping it this way!

Best regards, Mukul

For this “standard” case I don’t think they do need to write a script; they could instead use the command-line interface included with the aiida-quantumespresso plugin?

So the equivalent would be something like

aiida-quantumespresso calculation launch pw --code pw@local --structure ID --pseudo-family sssp --calculation-mode relax

If the main value of the language interface is to provide default values for a few more parameters, there are simpler and lower-CPU ways to achieve this!

I do see the appeal of a “conversation” interface for debugging and diagnostics, but of course it has to be tested on a good range of broken scenarios. How would you collect those?

Great point about the existing aiida-quantumespresso CLI! You are completely right that an LLM is overkill for a simple, standard run.

Here is where I see the NL interface adding real value over the standard CLI:

  1. Discoverability: Beginners don’t need to memorize different commands and flags across various plugins (e.g., switching between Quantum ESPRESSO and VASP).
  2. Smart Orchestration: It can handle dynamic, conditional steps that a basic CLI can’t. (e.g., “Run a geometry optimization, and if the volume changes >10%, retry with a denser k-point mesh”).

Regarding the broken scenarios for the Diagnostic Agent, here is my plan to collect them:

  1. Manual fault injection: I will intentionally write scripts to trigger common crashes (like SCF non-convergence or bad pseudopotentials) in my local setup.
  2. Community mining: I will pull common tracebacks and errors that users post about on the AiiDA Discourse forum and GitHub issues.
  3. Historical data: Does the AiiDA team have a repository or database dump of real failed nodes that I could use for benchmarking?

Does this sound like a solid plan for the exploratory phase?

Best regards, Mukul

Hi,

Thanks to @ajjackson for the thoughtful questions, very much appreciated, and to @mukul-sharma-tech for the detailed responses.

As a general note, we’d ask that GSoC-related discussions stay in the official GSoC thread rather than separate topics. I also just posted an update there that addresses some of the points raised here.

For GSoC applicants: please submit your proposals via the GSoC portal before March 31st at 18:00 UTC. We will not be engaging in detailed pre-submission technical discussions with individual candidates, as that could potentially not be fair to other applicants. That’s what the interviews will be for.

@ajjackson would you be OK if I close this topic? Happy to continue the discussion in the main GSoC thread if you have further thoughts.

The AiiDA team

1 Like

Of course, please do!

1 Like

Thanks @geiger_j for this. @ajjackson if you have any further questions I am happy to answer them in GSoC thread.