Zoelen Media Center v3 VR part 1


ZMC v3
The VR system

(Warning, this article has a lot of explanation of things but they are essential for a better understanding about what is going on)

For most doing voice recognition (VR) in domotica is the holy grail of automation. However it isn't that easy. Sure you can get a good result when carrying a quality headset around but that doesn't feel right. What we want is open air speech and if possible something like it's used in Star Trek or other SF movies.

Now let me clear the fog ahead and skip that idea. With the present technology this will not be feasible. You can get quiet far but forget giving commands in a crowded and lively environment. And skip the whole idea that you can a dialog with the computer to get an answer. Having said that, VR is still something that is possible. I would say yes. But only if you accept it flaws, use a good setup, use the right equipment and being prepared to do testing, tinkering and experimenting for the next few months.

Tools
Beside a microphone and a mixer unit such as the AP/XAP800 you absolutely need this tool.
Tip #1
You need a female jack plug to connect your headphone to and enough electrical wire to get from your AP/XAp800 unit to any location in your house that you want to control by voice. Solder the jack plug to the wire and attach the wire to output the unit that goes to the mic input on the PC. This way you can clearly hear, in real-time, what the result is of your settings and position of the microphone. Walking around with a laptop and setup a remote connection won't give you good results. It's essential that you listen directly to what's going into the mic input on the PC.

The microphone
The most obvious part is the microphone. I've tried many, many different microphones. Ranging from condenser mics to intercoms, phones, headsets and baby phones. They are all crap and a waste of funds that you could have better spend on a quality microphone in the first place. The most obvious location of the microphone will be somewhere in the room out of sight and want to be able to speak to it from any other place in that same room. For this there are special microphones on the market called a boundary microphone. I use four of these Samson CM11B omnidirectional boundary microphone. It isn't a cheap one but they are really very good. The first time I used a boundary microphone is was surprised. I could hear a clock ticking in a cupboard in a different room! I never managed that with any other microphone.

Tip #2
Don't fix the microphone on it place. First try the reception, move the mic around the room and try different orientations before attaching it in place. Keep also a fair distance from any speaker or any other source of noise.

The AP/XAP800 unit
The AP/XAP800 units can power all sorts of microphones via phantom power. This is a very handy feature. Just plug in the microphone, switch the power on via software and you are ready to go. If you are using only one microphone then things get rather easy and the results can be very good. But when you add another microphone in another room to it things are going downhill. For example, you have two rooms next to each other and each room have an active microphone installed. You give a command and both microphones will pick this up. Now where do you want to hear the respond of the system? Obviously in the same room as you are standing in because it is a bit lame to say 'kitchen lights on' while you are in the kitchen. You just want to say 'lights on' and they will be turned on in the kitchen. Now how is that possible?

The Gate
The AP/XAP800 unit has a feature called gating. You can put a threshold on the sound level it hear and then trigger the gate of the microphone. In more detail, the microphone will listen all the time to the noise level and what is said. This will give the system a baseline that moves up and down over time. By putting a threshold of a few decibel (dB) above it then it will move up and down with the baseline. If the sound level rises above this threshold due to a spoken command the gate of the microphone will trigger and give a signal. Your configuration dictates the behavior of this all. It's possible to keep all mics silent until the threshold is passed and anything in between. Each mic input has a gate and when a gate fires then you know which mic is active. From that you know in which room the mic is and where you must direct the response to or that you mean the kitchen when you just say 'lights on'. I will describe the problems you get with this when you are in a livelier environment and some possible counter measurements.

"That's neat, so now I am done!"
Not really. There is an odd behavior with gating. It fires and it falls back to the previous state although you are still speaking. You need a way to catch the gating info and store it for further usage. Ideally you want to have just one gate active, not two or more. To get the state of the gate you can ask this via serial communication to the unit. The attention sentence is heard and your system triggers on this. It will make a call to the com port to get the gate id of the gate that's active. The Unit picks this up, processes this, get the gate info and send it back. But sometimes your system has some delay and then the command to get the gate info will come in to late and you won't get any gate info back. Or worse, you will get the info back that another microphone is gated.

There are a few ways around this.
Tip #3
On the back of the unit there are two DB25 connectors that are used for I/O. Look in the manual for more info about these connectors. Eight of these pins give the state back of the gate however they behave like momentary switches. They are on for a short time and then off again. You can connect an Arduino (or any other board) to these pins and handle the I/O from there. The Arduino can then push back the info to your system where you handle it further.

Tip #4
I figured out this solution which is more elegant. It uses the same DB25 connector on the back mentioned at tip #3. You can wire the gate output pin directly to a free input pin on the same connector. You can stick a piece of wire into them or get a DB25 male connecter and solder the wires there and plug in the connector afterward (best solution). For gate 1 you connect it to pin 17 directly to pin 1 on connector A. Pin 1 is a control pin and its behavior can be programmed. Next you open your software for the unit (G-Ware for the XAP800) and go to the GPIO builder.
ZMC v3 Configure pin 1 that it will send a string to the RS232 port. This will actively push the gate info into your system. (Btw, the ZMC script handles this for you). You can also configure the text that will be sending back with the command string editor but there I no real need for that.


ZMC v3
The reason why I am telling you this is that it will offer you some solutions for lively environment. If you are a single person and living at home alone then you don't have to do all of this. But when there are multiple people in multiple rooms or there is a lot of surrounding noise then it will get hard to determine where you are.

Tip #5
The gating problem gets even worse when you get more rooms equipped with mics. I have abandoned the whole automatic gating at all. I've put all mics in listening mode (open) and I allow max two mics to be active. This to prevent that when my wife uses a hair dryer I lost all control over the system. You can overrule that with the 'chairmen override' setting but that is bound to a location and won't work if you are in another place. Or even worse the loud noise is in the same room that got this override set on. What I did is I made four attention sentences that include the location e.g. 'Attention in the kitchen'. Now the system knows where to put the focus on and I don't have to repeat the location anymore. For me this is the best setup but I may be different for yours.

Active noise control
The unit has another cool feature and that is that it is capable of canceling out noise sources. It will do so by subtracting one audio source from another. This concept may be strange but if you have a look here it may become clearer. wiki about active noise control your microphone will not only pickup your voice but also the music in the background. If you followed the ZMC v3 setup this far you have your voice and your music in two separate channels in the unit. What you want is to subtract all the other sound channels that might be playing in the same room as the where the microphone is from the signal coming from the microphone. If you do so you will keep a clear signal with only your voice in it and nothing else. Even the loud music at the background will be gone. If you are using an AP800 you do this by sending all other audio to one output (an output that doesn't go anywhere) and use that output as reference. When you have a XAP800 thing are simpler. You can setup a virtual reference without losing an output. I only use one channel to block out the music. This shows the strength of this setup by using a processor in between. This virtual reference is connected to the AEC setting found under the microphone input settings.

Tip #6
Ideally you should feed in all audio sources that you want to cancel. Is your TV interfering your speech then you must feed the audio from your TV into the unit and cancel it out there. This will surely the case when English is you native tongue. Don't be surprised when something said on television will trigger your attention phrase and controls you lights.

Tip #7
Don't use a short attention phrase as this will be more likely trigger your VR.

Naming convention
VR require from you to think ahead when setting things up. For example what will you use for device and event names? I want to speak in Dutch to my system but then it is weird to have events like 'Tell me the weather' or a device called 'living room couch lamp'.

Tip #8
You have to think about the naming schema of your system to maximize the VR experience. I use on the status screen floor (location2 field) and room (location field) so I don't have to say the floor. The room I get from the system out of the context of the attention phrase or gating info. And I made sure that all devices in the same room have a distinct but logical name. Something like 'main light' for the main lamp at the ceiling.

Voice recognition software
You need something to react on an attention phrase and to the following command(s). This could be the Speaker Client from HomeSeer or any other system. HS Speaker client does actually a good job but the voice recognition itself is done by the operating system where the domotica system runs on. I use Windows 7 which has a very capable VR system. But, there is always a but for me it seems, I am a Dutch speaking person and my wife prefers to give voice commands in Dutch. This is a problem for me since Windows 7 doesn't support Dutch VR. There is however a VR SDK with Dutch grammar. So I programmed my own VR software to do exactly what I want but keeping in mind that in the future I might switch to something else. What my software does is it will listen to the attention phrase and when heard it will turn a HS device called 'VR active' ON and in another device I will store the gate number of the microphone that picked up the attention phrase. A trigger in HomeSeer will fire when the device 'VR Active' is ON. The event will then start AZ_ZMC3.vb("VRStart") which sets the whole system in VR mode. Another event triggers when the device 'VR Active' goes OFF again and kicks the script with AZ_ZMC3.vb("VREnd") to revert the system back to its previous state. Know this you can use any kind of VR software that can set HS devices or doing so by running a script to do it for you. There are multiple ways to accomplice this with third party VR software.

System testing
This is how you test your setup. First mute all microphones in use. Turn off the music. Wire your headset to the output of the unit. The out mend here is the one that goes to the mic. input on your PC. Now stand in the room you want to test and listen. It should be silent. Now unmute the microphone and listen to the sounds you hear. Tune your system so that it will give a clear sound of your voice. That is what you want. A clear sound that doesn't clip. You voice doesn't have to be very loud but must be clear. (have I stretched this enough to be noticed)? Move the microphone around and speak to it from different locations. The microphone should be close to the location that you will be using most in that room. Next turn on the music. You should be hearing the music and your voice at the same time. Listen carefully if you not hearing feedback from the speaker to your microphone. Adjust your settings to compensate, move the microphone around and speak from different angle to get the best results. Accept the fact that you probably won't be able to get it right for every situation but try to get the best out of it. Now to get rid of the background music press the AEC button found under the mic input you are using and select the virtual ref. 1 reference you have setup (described in next article) in the PA Adapt and AEC Reference pull down box. Now the background music must disappear or nearly completely disappear. Again check if everything still sounds clear. Now move to the next microphone and repeat adjusting the settings. When this second microphone sounds good then turn both microphones on and listen if they don't interfere. Repeat all of this over and over again till you get it right.

Tip #9
Remember that the AP/XAP800 unit is a dynamic unit. Everything is floating around levels and nearly nothing is fixed. This also means that your idea of a right threshold is different in the morning then in the evening. Although the threshold is still x dB above noise level the system can be more responsive after a silent night and the system baseline is at a low noise level. after a few minutes this baseline is changed again. It is possible to set this at a fixed level so you won't get this behavior.

The last article describes how i run this system, ZMC VR v3 Part 2