Anuj Solanki
GSoC Community Bonding
01 May 2024 - 26 May 2024
Introduction
I am a third-year student currently enrolled in the BTech program for Computer Science and Engineering at IIT Mandi. My expertise lies in Machine Learning and App development. I have a keen interest in Open Source initiatives, driven by the passion to assist people globally and continually seek opportunities to contribute. I love learning and always want to improve myself. I'm really good at Deep Learning and Natural Language Processing (NLP), and I've worked a lot with techniques like LSTM and transformers.
About GSoC
Google Summer of Code (GSoC) is a global program focused on bringing more students and developers into open source software development. Selected Contributors work with an open source organization on a programming project during their break from school. The program is designed to encourage collaboration and help students gain exposure to real-world software development.
GSoC offers three types of projects to accommodate different levels of commitment and experience:
- 90-Hour Projects: These are smaller projects designed for those who can dedicate approximately 8-10 hours per week. Ideal for newcomers or those with limited time but still eager to contribute.
- 175-Hour Projects: These medium-sized projects require around 15-20 hours per week. Suitable for students who have some experience and can commit more time to developing their skills.
- 350-Hour Projects: These are large projects designed for those who can commit around 30-35 hours per week. These projects are comprehensive and ideal for students looking to make a significant impact and gain substantial experience in open source development.
By offering various project sizes, GSoC ensures that students with different levels of expertise and availability can participate and benefit from the program.
About My Project
Title: Enhance Speech Recognition for AGL using Whisper AI
Key Deliverables:
Implementation of Whisper AI for Speech-to-Text: For both online and offline modes, integrating tiny and base models for offline functionality and a larger model for online use.
Expansion of Natural Language Understanding (NLU): Adding support to execute more commands by integrating available APIs like soundmanager, weather, etc., to the voice agent service.
Integration with Other Applications (Stretched Goals): Developing functionality to interact (open, close, start some activity) with other applications such as navigation, phone, etc., and services (WiFi, Bluetooth, GPS).
Community Bonding Period
During this period, I:
Talked to mentors Jan-Simon Möller, Scott Murray, and Walt Miner.
Researched Whisper AI Containerization.
Set up and tested the previous build of the AGL Voice Assistant.
-