使用Google语音API在Python中进行语音识别-yiteyi-C++库

语音识别是家庭自动化、人工智能等应用中的一项重要功能。本文旨在介绍如何使用Python的SpeechRecognition库。这很有用，因为它可以在外置麦克风的帮助下用于微控制器，如Raspberri Pis。

null

所需安装

必须安装以下设备：

Python语音识别模块：
```
 sudo pip install SpeechRecognition 
```
PyAudio： 对linux用户使用以下命令
```
sudo apt-get install python-pyaudio python3-pyaudio
```
如果存储库中的版本太旧，请使用以下命令安装pyaudio
```
sudo apt-get install portaudio19-dev python-all-dev python3-all-dev && 
sudo pip install pyaudio
```
python3使用pip3而不是pip。 Windows用户可以通过在终端中执行以下命令来安装pyaudio
```
pip install pyaudio
```

使用麦克风进行语音输入，并将语音转换为文本

配置麦克风（用于外部麦克风）： 建议在节目期间指定麦克风，以避免出现任何故障。类型 lsusb 在候机楼。将显示已连接设备的列表。麦克风名称如下所示
```
USB Device 0x46d:0x825: Audio (hw:1, 0)
```
记下这一点，因为它将在程序中使用。
设置块大小： 这基本上涉及到指定一次要读取多少字节的数据。通常，该值以2的幂指定，例如1024或2048
设置采样率： 采样率定义了记录值以进行处理的频率
将设备ID设置为所选麦克风 ：在这一步中，我们指定要使用的麦克风的设备ID，以避免在有多个麦克风的情况下出现歧义。这也有助于调试，因为在运行程序时，我们将知道指定的麦克风是否被识别。在程序中，我们指定一个参数device_id。如果麦克风无法识别，程序会说无法找到device_id。
允许调整环境噪音： 由于周围的噪声不同，我们必须允许程序在一秒钟或更长的时间内调整录制的能量阈值，以便根据外部噪声水平进行调整。

演讲到文本的翻译： 这是在谷歌语音识别的帮助下完成的。这需要一个活跃的互联网连接才能工作。然而，也有一些离线识别系统，比如PocketSphinx，但是有一个非常严格的安装过程，需要几个依赖项。谷歌语音识别是最容易使用的方法之一。

上述步骤已实施如下：

                         #Python 2.x program for Speech Recognition                       
                                   
                         import                                     speech_recognition as sr                       
                                   
                         #enter the name of usb microphone that you found                       
                         #using lsusb                       
                         #the following name is only used as an example                       
                         mic_name                                     =                                     "USB Device 0x46d:0x825: Audio (hw:1, 0)"                       
                         #Sample rate is how often values are recorded                       
                         sample_rate                                     =                                     48000                       
                         #Chunk is like a buffer. It stores 2048 samples (bytes of data)                       
                         #here.                       
                         #it is advisable to use powers of 2 such as 1024 or 2048                       
                         chunk_size                                     =                                     2048                       
                         #Initialize the recognizer                       
                         r                                     =                                     sr.Recognizer()                       
                                   
                         #generate a list of all audio cards/microphones                       
                         mic_list                                     =                                     sr.Microphone.list_microphone_names()                       
                                   
                         #the following loop aims to set the device ID of the mic that                       
                         #we specifically want to use to avoid ambiguity.                       
                         for                                     i, microphone_name                                     in                                     enumerate                                     (mic_list):                       
                                                 if                                     microphone_name                                     =                                     =                                     mic_name:                       
                                                 device_id                                     =                                     i                       
                                   
                         #use the microphone as source for input. Here, we also specify                       
                         #which device ID to specifically look for incase the microphone                       
                         #is not working, an error will pop up saying "device_id undefined"                       
                         with sr.Microphone(device_index                                     =                                     device_id, sample_rate                                     =                                     sample_rate,                       
                                                 chunk_size                                     =                                     chunk_size) as source:                       
                                                 #wait for a second to let the recognizer adjust the                       
                                                 #energy threshold based on the surrounding noise level                       
                                                 r.adjust_for_ambient_noise(source)                       
                                                 print                                     "Say Something"                       
                                                 #listens for the user's input                       
                                                 audio                                     =                                     r.listen(source)                       
                                   
                                                 try                                     :                       
                                                 text                                     =                                     r.recognize_google(audio)                       
                                                 print                                     "you said: "                                     +                                     text                       
                                   
                                                 #error occurs when google could not understand what was said                       
                                   
                                                 except                                     sr.UnknownValueError:                       
                                                 print                                     (                                     "Google Speech Recognition could not understand audio"                                     )                       
                                   
                                                 except                                     sr.RequestError as e:                       
                                                 print                                     ("Could                                     not                                     request results                                     from                                     Google                       
                                                 Speech Recognition service; {                                     0                                     }".                                     format                                     (e))                       

将音频文件转录成文本

如果我们有一个想要翻译成文本的音频文件，我们只需要用音频文件而不是麦克风来替换源文件。为方便起见，请将音频文件和节目放在同一文件夹中。这适用于WAV、AIFF和FLAC文件。下面展示了一个实现

                         #Python 2.x program to transcribe an Audio file                       
                         import                                     speech_recognition as sr                       
                                   
                         AUDIO_FILE                                     =                                     (                                     "example.wav"                                     )                       
                                   
                         # use the audio file as the audio source                       
                                   
                         r                                     =                                     sr.Recognizer()                       
                                   
                         with sr.AudioFile(AUDIO_FILE) as source:                       
                                                 #reads the audio file. Here we use record instead of                       
                                                 #listen                       
                                                 audio                                     =                                     r.record(source)                       
                                   
                         try                                     :                       
                                                 print                                     (                                     "The audio file contains: "                                     +                                     r.recognize_google(audio))                       
                                   
                         except                                     sr.UnknownValueError:                       
                                                 print                                     (                                     "Google Speech Recognition could not understand audio"                                     )                       
                                   
                         except                                     sr.RequestError as e:                       
                                                 print                                     ("Could                                     not                                     request results                                     from                                     Google Speech                       
                                                 Recognition service; {                                     0                                     }".                                     format                                     (e))                       

故障排除

通常会遇到以下问题

静音麦克风： 这会导致无法接收输入。要检查这一点，可以使用alsamixer。可以使用
```
sudo apt-get install libasound2 alsa-utils alsa-oss
```
类型 阿米克斯 .输出将有点像这样
```
Simple mixer control 'Master', 0
  Capabilities: pvolume pswitch pswitch-joined
  Playback channels: Front Left - Front Right
  Limits: Playback 0 - 65536
  Mono:
  Front Left: Playback 41855 [64%] [on]
  Front Right: Playback 65536 [100%] [on]
Simple mixer control 'Capture', 0
  Capabilities: cvolume cswitch cswitch-joined
  Capture channels: Front Left - Front Right
  Limits: Capture 0 - 65536
  Front Left: Capture 0 [0%] [off] #switched off
  Front Right: Capture 0 [0%] [off]
```
如您所见，捕获设备当前已关闭。要打开它，请键入 阿尔萨米克斯 正如你在第一张图片中看到的，它显示了我们的回放设备。按F4键切换以捕获设备。

在第二张图片中，突出显示的部分显示捕获设备已静音。要取消静音，请按空格键

正如您在上一张图片中所看到的，突出显示的部分确认捕获设备没有静音。
当前麦克风未被选为捕获设备： 在这种情况下，可以通过键入来设置麦克风 阿尔萨米克斯 以及选择声卡。在这里，您可以选择默认的麦克风设备。如图所示，突出显示的部分是您必须选择声卡的地方。
第二张图显示了声卡的屏幕选择
没有互联网连接： 语音到文本转换需要活动的internet连接。

本文由 迪帕克·斯里瓦察夫 .如果你喜欢GeekSforgek，并想贡献自己的力量，你也可以使用贡献极客。组织或者把你的文章寄到contribute@geeksforgeeks.org.看到你的文章出现在Geeksforgeks主页上，并帮助其他极客。

如果您发现任何不正确的地方，或者您想分享有关上述主题的更多信息，请写下评论。

文章版权归作者所有，未经允许请勿转载。

THE END

Python