Google Summer of Code (GSoC) 2025

AiiDA is plan to participate in Google Summer of Code (GSOC) 2025 as the sub-organization under the NumFocus umbrella in 2025.

This thread is for any discussions leading up to GSOC 2025. If you’re interested in working on AiiDA as a student, use this thread to say hi and ask any questions you may have.
Have a look at our project ideas for GSOC 2025, and see the NumFOCUS guide for GSOC students.

According to the timeline below, we will find out whether we receive any student slots for GSoC after the accepted mentoring organizations are announced. At this moment, we cannot guarantee that we will secure a slot this year. However, contributions are always welcome!

As a reminder, here is an excerpt from the GSOC 2025 timeline:

January 27 - 18:00 UTC

  • Mentoring organizations can begin submitting applications to Google

February 11 - 18:00 UTC

  • Mentoring organization application deadline

February 11 - 26

  • Google program administrators review organization applications

February 27 - 18:00 UTC

  • List of accepted mentoring organizations published

February 27 - March 24

  • Potential GSoC contributors discuss application ideas with mentoring organizations

March 24 - 18:00 UTC

  • GSoC contributor application period begins

April 8 - 18:00 UTC

  • GSoC contributor application deadline

April 29 - 18:00 UTC

  • GSoC contributor proposal rankings due from Org Admins

May 8 - 18:00 UTC

  • Accepted GSoC contributor projects announced

Please show your enthusiasm here, instead of opening new topics!

You may also list all of your merged PRs in AiiDA, here.

Cheers :wink:
Ali

2 Likes

Hi! My name is Jigyasu and I am an engineering undergraduate. I have been researching LLM-generated text watermarking techniques with a team at University of South Carolina. I am also an active contributor and member of sktime, a machine learning framework for time series forecasting.

I have experience in training LLMs on specific datasets using transfer learning, I did that once on gemma using the medquad dataset. I hope to contribute to aiida this summer in the GSoC '25. I have set up the development server and will be opening PRs soon : )

Hi @ali-khosravi and @jusong.yu, I have an approved PR in AiiDA now https://github.com/aiidateam/aiida-core/pull/6770 and I want to work on Training an LLM to generate a queries from natural language prompts over the summer.

Would you like to assign me any tasks or kindly let me know how to proceed?

Hi @ali-khosravi , @jusong.yu and everyone .
This is Muhammad Rebaal , current lead at GDG on Campus University of Karachi , β-MLSA @Microsoft , a 3rd year student , an associate software engineer who loves to do MLE and OpenSource Developer.

I’ve solved these issues : #6758 , #6705

and would love to contribute through GSoC Project Idea
Project 1 - Standardize Type Annotations as well

Happy to connect : LinkedIn

Hi @jusong.yu, @geiger_j, and @ali-khosravi,

I’m Ayush, a second-year undergraduate student at IIIT Sonepat, India. I recently came across the “Standardize Type Annotations” project in AiiDA’s GSoC 2025 project ideas, and I am highly interested in contributing to it.

I believe this project aligns well with my skills and learning goals. I have already contributed to AiiDA, with a merged PR (#6735), and I am currently working on #6736 and #6751, both of which are in their final stages. Through these contributions, I have gained a foundational understanding of the codebase, and I would love the opportunity to further deepen my engagement by working on this project as part of GSoC 2025.

I would appreciate any guidance on how to get started and prepare effectively. Looking forward to collaborating with the community!

Hello Respected Mentors @ali-khosravi @jusong.yu @geiger_j ,

"Hi, I’m Fardeen, a 5th-semester Computer Science student at AKTU. My technical skills include Python, NumPy, Pandas, TensorFlow, and Scikit-learn (basic). I have a solid understanding of neural networks and core machine learning concepts. I’m passionate about data science and AI, continuously expanding my knowledge and applying these tools to solve real-world problems. I’m looking forward to contributing to Aiida , on working Projects

I have also made a PR #6788 ,

Iam looking Forward to Contribute to the projects ,

Looking Forward to Contribute to Project(Gsoc 2025) through your Guidance
Thank u

Hello Everyone,

I am Gulshan Kumar, a second-year undergraduate student at IIT Bombay with a strong interest in Python, C++, and Web Development. I am new to Open Source contributions but excited to start my journey by contributing to AiiDA. My goal is to make at least 1-2 meaningful contributions to the project.

I have worked on several projects in machine learning (including reinforcement learning), web development, and web scraping. To begin, I will focus on building and compiling AiiDA, followed by selecting a good-first-issue from the repository to contribute.

Looking forward to learning and collaborating with the community!

Thank you!

Hello everyone,

My name is Karanveer Gautam, and I’m a 2nd-year IIT Patna student pursuing Mathematics and Computing. I’m excited about the project on training an LLM to generate AiiDA queries from natural language prompts.
@ali-khosravi @jusong.yu @geiger_j
As I begin working on building the dataset, I wanted to get some advice on the best format to use for our training data. Specifically, should we follow the Alpaca format, or would the Phi-3 format be more appropriate for our task? I’m interested in understanding any advantages or drawbacks you might have observed with either approach, especially in the context of mapping natural language prompts to AiiDA QueryBuilder code.

Thank you in advance for your insights and guidance!

Hello Everyone,

My name is Shibu Meher, and I am a Ph.D. student at the Materials Research Centre, Indian Institute of Science, Bangalore. My research focuses on machine learning-driven high-throughput search of point defects for quantum technology applications. I am currently developing a high-throughput framework to automate the complex workflow of first principles calculations of point defect.

I am particularly interested in contributing to AiiDA as it closely aligns with my research interests. I am in the process of understanding the aiida-core and will soon be submitting a pull request.

I am thrilled to join the AiiDA community and contribute to its exciting projects. I would like to extend my gratitude to the mentors ( @jusong.yu @geiger_j @ali-khosravi ) for their inspiring project ideas.

You can connect with me on:

GitHub: Shibu778 (Shibu Meher)
LinkedIn: SHIBU MEHER | LinkedIn
Portfolio: Shibu Meher

Regards,
Shibu Meher

Hi @jusong.yu , @ali-khosravi !

Hope you all are doing well! I’ve got these concerns about the project in GSoC :
Project 3 - Training an LLM to generate a queries from natural language prompts
and the answer of these questions would help me better understand the project and help me write a better proposal for the project and I’d be eager to listen to you ?

(1) Is there any specific model that you prefer to fine-tune for AiiDA’s Query Builder ?

(2) Is there any existing dataset or we’ve to collect?

(3) When considering a user-friendly interface, do you picture the tool as a stand-alone application or something that is incorporated straight into AiiDA or Materials Cloud? Would you want an interface that is simple with just a command-line, or more developed with a web-based UI?

(4) Are there particular sources of training data or domain-specific literature that I should consult to fine-tune the LLM for AiiDA-specific queries?

(5) What would be the critical milestone for the project?

Looking forward to your guidance

Muhammad Rebaal