I am developing an application that should show a caption text conforming what is heard in the PC microphone.
It works partially, because the text is returned only wh