Description of image

Anuj Solanki

GSoC Week 12 Summary

Date: 12 Aug 2024 - 18 Aug 2024

Introduction

This week focused on enhancing the AGL Voice Agent by integrating Whisper-cpp for improved speech-to-text performance, updating Snips-NLU for media control intents, and testing the overall functionality on Raspberry Pi 5.

Tasks Completed This Week

# Updated agl-service-voiceagent to Use Whisper-cpp

Revised the STTModel class to utilize Whisper-cpp for speech-to-text functionality, replacing OpenAI’s Whisper AI. This change is expected to provide faster and more efficient transcription, particularly on resource-constrained devices like Raspberry Pi 5.

# Tested Whisper-cpp in agl-service-voiceagent on Raspberry Pi 5

Performed extensive testing of the Whisper-cpp integration within agl-service-voiceagent on Raspberry Pi 5. The results confirmed that Whisper-cpp significantly enhances the voice agent's responsiveness compared to the previous implementation.

# Updated Snips-NLU for MediaControl Intents

Enhanced the Snips-NLU model to accurately extract intents related to media control, such as PLAY, PAUSE, NEXT, and PREVIOUS.

# Added Media Control Support in agl-service-voiceagent

Integrated support for media control within agl-service-voiceagent, allowing users to control media playback through voice commands.

# Identified Issue with Audio Recording in Manual Mode on Raspberry Pi 5

Encountered an issue where audio recording in manual mode is not functioning properly on Raspberry Pi 5.

Tasks for Next Week

  • Fix the audio recording issue on Raspberry Pi 5 to ensure reliable voice input capture.
  • Continue working on the UI of the Flutter app to improve user interaction and experience.

Conclusion

This week was productive, with significant improvements in the AGL Voice Agent's speech-to-text performance and media control capabilities. The upcoming week will focus on resolving the audio recording issue on Raspberry Pi 5 and further refining the Flutter app's UI.