Anuj Solanki
GSoC Week 12 Summary
Date: 12 Aug 2024 - 18 Aug 2024
Introduction
This week focused on enhancing the AGL Voice Agent by integrating Whisper-cpp for improved speech-to-text performance, updating Snips-NLU for media control intents, and testing the overall functionality on Raspberry Pi 5.
Tasks Completed This Week
# Updated agl-service-voiceagent to Use Whisper-cpp
Revised the STTModel
class to utilize Whisper-cpp for speech-to-text functionality, replacing OpenAI’s Whisper AI. This change is expected to provide faster and more efficient transcription, particularly on resource-constrained devices like Raspberry Pi 5.
# Tested Whisper-cpp in agl-service-voiceagent on Raspberry Pi 5
Performed extensive testing of the Whisper-cpp integration within agl-service-voiceagent
on Raspberry Pi 5. The results confirmed that Whisper-cpp significantly enhances the voice agent's responsiveness compared to the previous implementation.
# Updated Snips-NLU for MediaControl Intents
Enhanced the Snips-NLU model to accurately extract intents related to media control, such as PLAY, PAUSE, NEXT, and PREVIOUS.
# Added Media Control Support in agl-service-voiceagent
Integrated support for media control within agl-service-voiceagent
, allowing users to control media playback through voice commands.
# Identified Issue with Audio Recording in Manual Mode on Raspberry Pi 5
Encountered an issue where audio recording in manual mode is not functioning properly on Raspberry Pi 5.
Tasks for Next Week
- Fix the audio recording issue on Raspberry Pi 5 to ensure reliable voice input capture.
- Continue working on the UI of the Flutter app to improve user interaction and experience.
Conclusion
This week was productive, with significant improvements in the AGL Voice Agent's speech-to-text performance and media control capabilities. The upcoming week will focus on resolving the audio recording issue on Raspberry Pi 5 and further refining the Flutter app's UI.