boto3 S3 Object Parsing

旧街凉风 提交于 2019-12-08 10:50:33

问题


I'm trying to write a Python script for processing audio data stored on S3.

I have an S3 object which I'm calling using

def grabAudio(filename, directory):

     obj = s3client.get_object(Bucket=bucketname, Key=directory+'/'+filename)

return obj['Body'].read()

Accessing the data using

print(obj['Body'].read())

yields the correct audio information. So its accessing the data from the bucket just fine.

When I try to then use this data in my audio processing library (pydub), it fails:

audio = AudioSegment.from_wav(grabAudio(filename, bucketname))

Traceback (most recent call last): File "split_audio.py", line 38, in <module> audio = AudioSegment.from_wav(grabAudio(filename, bucketname)) File "C:\Users\jmk_m\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pydub\audio_segment.py", line 544, in from_wav return cls.from_file(file, 'wav', parameters) File "C:\Users\jmk_m\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pydub\audio_segment.py", line 456, in from_file file.seek(0) AttributeError: 'bytes' object has no attribute 'seek'

What is the format of the object coming in from s3? Byte array I presume? If so, is there a way of parsing it into a .wav format without having to save to disk? I'm trying to refrain from saving to disk.

Also open to alternative audio processing libraries.


回答1:


Thanks to Linas for linking a similar issue, and Jiaaro for the answer.

 import io
    s = io.BytesIO(y['data'])
    AudioSegment.from_file(s).export(x, format='mp3')

Allows me to pull directly from the bucket into memory with

obj = s3client.get_object(Bucket=bucketname, Key=customername+'/'+filename)

data = io.BytesIO(obj['Body'].read())
audio = AudioSegment.from_file(data)


来源:https://stackoverflow.com/questions/48997852/boto3-s3-object-parsing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!