Voice controlled services for remote operation of DLNA/Upnp devices

The goal of this project was to gain experience with Google Voice Recognition services. Therefor a use case was set up to control Multimedia throughout the house by Voice commands. This usecase turned out to be succesful, however there are a few drawbacks.

First off all, the system makes use of a standard media server like Windows Media server. This server serves audio and video data. To control what data is displayed where,  it is needed to have a Media Controller (this is part of the upnp specification). In this example this was build using Object Pascal. It controls all found Media servers and Media controllers (like DLNA enabled TVs). The code for this server can be found on Github. The self build Media Controller now has extra options to make Voice control possible. It can be controlled through a REST service, interpreting orders like ‘play’ ‘pause’, and also, based on longer commands, select an item to be played on a Media renderer (TV or audio).

The Android application has build in speech recognition for which it communicates with Google servers. It uses the following packages:

  • import android.speech.RecognitionListener;
  • import android.speech.RecognizerIntent;
  • import android.speech.SpeechRecognizer;

with speech recognition the following application can be controlled by voice. It is just an example which shows the endless possibilities:


In practice, the speech engine turned out not be as useful as expected. One major drawback is the way the google engine works. It is not possible to upload an audio fragment with suggestions. Instead of this, it just returns a lot of terms it thinks were phrased by the user. This is quite error-prone. Another problem is that it’s quite easily to forget you are using speech for controlling your home. Several times, devices just started playing or select other movies. All in all, with some fine tuning, this can be integrated in IoT applications