Anuj Solanki

GSoC Week 11 Summary

Date: 05 Aug 2024 - 11 Aug 2024

Introduction

This week was dedicated to enhancing the dataset for key intents, testing Whisper AI's performance on Raspberry Pi 5, and exploring alternative implementations to improve transcription speed. Additionally, new recipes were created and tested to optimize the voice assistant's functionality on pi5.

Tasks Completed This Week

# Improved the Dataset for Key Intents

Expanded and refined the dataset for the following intents to improve the accuracy and reliability of the voice assistant's responses:

VolumeControl
HVACFanSpeed
HVACTemperature

# Tested Whisper AI on Raspberry Pi 5

Conducted tests on the Whisper AI integration within the agl-ivi-demo-flutter image on Raspberry Pi 5:

Using the base model, Whisper AI took 20-22 seconds to transcribe a 4-second audio clip.
With the tiny model, the transcription time was reduced to 12-13 seconds.

# Researched Alternative Implementations of Whisper AI

Due to the long transcription times, I explored other Whisper AI implementations that are optimized for performance on Raspberry Pi. Notable alternatives include:

Faster-whisper
Whisper-cpp

# Created Recipes for Whisper-cpp

Developed the following Yocto recipes to facilitate the use of Whisper-cpp on embedded systems:

whisper-cpp
whisper-cpp-tiny
whisper-cpp-base

# Tested Whisper-cpp on Raspberry Pi 5

Evaluated the performance of Whisper-cpp in the agl-ivi-demo-flutter image on Raspberry Pi 5:

Whisper-cpp significantly outperformed OpenAI's Whisper.
It transcribed the same 4-second audio clip in 6-7 seconds with the base model and around 3 seconds with the tiny model.
Whisper-cpp on pi5:

Tasks for Next Week

Continue working on snips-nlu to include more intents.
Discuss about whisper-cpp with the mentors.
Start working on the UI improvements for the Flutter app.

Conclusion

This week was filled with exciting developments, from dataset enhancements to exploring alternative Whisper AI implementations. The performance tests on Raspberry Pi 5 provided valuable insights into optimizing the voice assistant's transcription speed.