I am trying to implement automatic voice recording functionality, similar to the Talking Tom app. I use the following code to read input from the audio recorder and analyse the
The way to process the input is to use a specialised library which removes noise.
For example, http://audacity.sourceforge.net, does noise removal.
So long as you have characterised the main types of noise, you should have only speech remaining.
It would be worthwhile collecting sampling data before the capture from the user, and after the user ended the capture, as this would provide at-the-time samples of noise in the environment. This is useful if each user faces unique background noise challenges.