Proposal: AI Copilot for AiiDA (Multi-Agent + RAG)

Hi everyone,

After reading this year’s GSoC proposal on implementing a natural language interface for AiiDA using multi-agent AI, I’ve been thinking about a possible direction and would love to get early feedback.

Rather than approaching this as a simple chat-based wrapper around CLI commands, I’m considering framing it as an “AI Copilot” for AiiDA, an intelligent assistant that integrates into the workflow lifecycle and assists users in a structured, architecture-aware way.

AiiDA has a rich architecture (engine, workflows, provenance graph, plugins), and a natural language interface should ideally:

  • Respect provenance integrity
  • Avoid unsafe or careless code generation
  • Integrate cleanly with existing abstractions
  • Be maintainable and modular

Given the team’s concerns about low-effort AI-generated contributions, I think this project should focus heavily on architectural design and validation mechanisms rather than just LLM integration.

High-Level Concept

The AI Copilot would assist users in:

  1. Workflow Design

    • Generate WorkChain skeletons
    • Suggest input structures
    • Guide plugin usage
  2. Debugging

    • Inspect failed processes
    • Parse logs
    • Explain likely causes
    • Suggest corrective actions
  3. Provenance Exploration

    • Translate natural language into structured QueryBuilder queries
    • Summarize provenance graphs
    • Explain data lineage
  4. Optimization & Suggestions

    • Detect common misconfigurations
    • Suggest improvements based on past runs

Proposed Architecture

Instead of a single LLM, I’m considering a modular agent structure:

  1. Intent Agent

Classifies user requests (create workflow, debug, inspect provenance, etc.).

  1. Context / Retrieval Agent (RAG)

Retrieves relevant information from:

  • AiiDA documentation
  • Local workflow code
  • Execution logs
  • Node metadata / provenance graph

This would ground responses and reduce hallucinations.

  1. Execution / Generation Agent

Generates:

  • WorkChain templates
  • QueryBuilder queries
  • Explanations
  • Debugging suggestions

All outputs would be structured and constrained to AiiDA’s abstractions.

  1. Validation / Safety Layer

Checks:

  • Compatibility with AiiDA’s architecture
  • Proper engine usage (run, submit, etc.)
  • Schema correctness

This layer would help address concerns around uncontrolled LLM-generated code.

Preliminary Technical Goals

If this direction aligns with the team’s expectations, I would aim to:

  • Draft a more detailed architectural design (components, data flow, extension points)
  • Define strict output schemas for generated code
  • Prototype a minimal but well-structured implementation

Open Questions

I’d really appreciate feedback on this. Looking forward to your thoughts — and excited about the possibility of building with AIIDA.

Thanks!
Unnati Kadam

1 Like

Hi Unnati,

Thanks for your interest and for sharing your thinking!

As I just mentioned in the official GSoC thread, we are not providing individual feedback on proposal directions before the portal closes. Please submit your application via the GSoC portal before March 31st at 18:00 UTC, covering all points in our application requirements.

We also ask that GSoC-related discussions be kept in the main GSoC thread rather than separate topics.

The AiiDA team