This thread is for any discussions leading up to GSOC 2025. If you’re interested in working on AiiDA as a student, use this thread to say hi and ask any questions you may have.
Have a look at our project ideas for GSOC 2025, and see the NumFOCUS guide for GSOC students.
According to the timeline below, we will find out whether we receive any student slots for GSoC after the accepted mentoring organizations are announced. At this moment, we cannot guarantee that we will secure a slot this year. However, contributions are always welcome!
Hi! My name is Jigyasu and I am an engineering undergraduate. I have been researching LLM-generated text watermarking techniques with a team at University of South Carolina. I am also an active contributor and member of sktime, a machine learning framework for time series forecasting.
I have experience in training LLMs on specific datasets using transfer learning, I did that once on gemma using the medquad dataset. I hope to contribute to aiida this summer in the GSoC '25. I have set up the development server and will be opening PRs soon : )
I’m Ayush, a second-year undergraduate student at IIIT Sonepat, India. I recently came across the “Standardize Type Annotations” project in AiiDA’s GSoC 2025 project ideas, and I am highly interested in contributing to it.
I believe this project aligns well with my skills and learning goals. I have already contributed to AiiDA, with a merged PR (#6735), and I am currently working on #6736 and #6751, both of which are in their final stages. Through these contributions, I have gained a foundational understanding of the codebase, and I would love the opportunity to further deepen my engagement by working on this project as part of GSoC 2025.
I would appreciate any guidance on how to get started and prepare effectively. Looking forward to collaborating with the community!
"Hi, I’m Fardeen, a 5th-semester Computer Science student at AKTU. My technical skills include Python, NumPy, Pandas, TensorFlow, and Scikit-learn (basic). I have a solid understanding of neural networks and core machine learning concepts. I’m passionate about data science and AI, continuously expanding my knowledge and applying these tools to solve real-world problems. I’m looking forward to contributing to Aiida , on working Projects
I have also made a PR #6788 ,
Iam looking Forward to Contribute to the projects ,
Looking Forward to Contribute to Project(Gsoc 2025) through your Guidance
Thank u
I am Gulshan Kumar, a second-year undergraduate student at IIT Bombay with a strong interest in Python, C++, and Web Development. I am new to Open Source contributions but excited to start my journey by contributing to AiiDA. My goal is to make at least 1-2 meaningful contributions to the project.
I have worked on several projects in machine learning (including reinforcement learning), web development, and web scraping. To begin, I will focus on building and compiling AiiDA, followed by selecting a good-first-issue from the repository to contribute.
Looking forward to learning and collaborating with the community!
My name is Karanveer Gautam, and I’m a 2nd-year IIT Patna student pursuing Mathematics and Computing. I’m excited about the project on training an LLM to generate AiiDA queries from natural language prompts. @ali-khosravi@jusong.yu@geiger_j
As I begin working on building the dataset, I wanted to get some advice on the best format to use for our training data. Specifically, should we follow the Alpaca format, or would the Phi-3 format be more appropriate for our task? I’m interested in understanding any advantages or drawbacks you might have observed with either approach, especially in the context of mapping natural language prompts to AiiDA QueryBuilder code.
Thank you in advance for your insights and guidance!
My name is Shibu Meher, and I am a Ph.D. student at the Materials Research Centre, Indian Institute of Science, Bangalore. My research focuses on machine learning-driven high-throughput search of point defects for quantum technology applications. I am currently developing a high-throughput framework to automate the complex workflow of first principles calculations of point defect.
I am particularly interested in contributing to AiiDA as it closely aligns with my research interests. I am in the process of understanding the aiida-core and will soon be submitting a pull request.
I am thrilled to join the AiiDA community and contribute to its exciting projects. I would like to extend my gratitude to the mentors ( @jusong.yu@geiger_j@ali-khosravi ) for their inspiring project ideas.
Hope you all are doing well! I’ve got these concerns about the project in GSoC : Project 3 - Training an LLM to generate a queries from natural language prompts
and the answer of these questions would help me better understand the project and help me write a better proposal for the project and I’d be eager to listen to you ?
(1) Is there any specific model that you prefer to fine-tune for AiiDA’s Query Builder ?
(2) Is there any existing dataset or we’ve to collect?
(3) When considering a user-friendly interface, do you picture the tool as a stand-alone application or something that is incorporated straight into AiiDA or Materials Cloud? Would you want an interface that is simple with just a command-line, or more developed with a web-based UI?
(4) Are there particular sources of training data or domain-specific literature that I should consult to fine-tune the LLM for AiiDA-specific queries?
(5) What would be the critical milestone for the project?