From audio to tensor, back to audio in tensorflow

核能气质少年 提交于 2021-01-21 12:04:26

问题


Is there any way to directly load an audio file (wav) to a tensor in tensorflow? And then, converting the tensor into an audio file again? I saw some people transforming audio into spectograms, but I couldn't find anyone that could convert from the spectogram to audio.


回答1:


TensorFlow 1.x:

The tf.contrib.ffmpeg.decode_audio() op can load audio data (including in WAV format) into a tensor, and the tf.contrib.ffmpeg.encode_audio() can covert it back into audio data.

input_filename = tf.placeholder(tf.string, shape=[])
output_filename = tf.placeholder(tf.string, shape=[])

input_signal = tf.contrib.ffmpeg.decode_audio(
    tf.read_file(input_filename), file_format="wav",
    samples_per_second=44100, channel_count=2)

# ...

output_signal = ...  # A 2-D tensor, [samples x channels]
encoded_audio_data = tf.contrib.ffmpeg.encode_audio(
    output_signal, file_format="wav", samples_per_second=44100)

write_file_op = tf.write_file(output_filename, encoded_audio_data)

with tf.Session() as sess:
  sess.run(write_file_op, {input_filename: "input.wav",
                           output_filename: "output.wav"})

TensorFlow 2.x

The tf.contrib module has been deprecated, but you are still able to load and save audio files in 16-bit PCM WAV format using eager execution and tf.audio:

# Returns a tuple of Tensor objects (audio, sample_rate).
input_signal = tf.audio.decode_wav("input.wav")

# Returns a Tensor of type string.
output_signal = tf.audio.encode_wav(input_signal[0], input_signal[1])


来源:https://stackoverflow.com/questions/48675097/from-audio-to-tensor-back-to-audio-in-tensorflow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!