可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a list of floating-point values in Python:
floats = [3.14, 2.7, 0.0, -1.0, 1.1]
I would like to write these values out to a binary file using IEEE 32-bit encoding. What is the best way to do this in Python? My list actually contains about 200 MB of data, so something "not too slow" would be best.
Since there are 5 values, I just want a 20-byte file as output.
回答1:
Alex is absolutely right, it's more efficient to do it this way:
from array import array output_file = open('file', 'wb') float_array = array('d', [3.14, 2.7, 0.0, -1.0, 1.1]) float_array.tofile(output_file) output_file.close()
And then read the array like that:
input_file = open('file', 'r') float_array = array('d') float_array.fromstring(input_file.read())
array.array
objects also have a .fromfile
method which can be used for reading the file, if you know the count of items in advance (e.g. from the file size, or some other mechanism)
回答2:
See: Python's struct module
import struct s = struct.pack('f'*len(floats), *floats) f = open('file','wb') f.write(s) f.close()
回答3:
The array module in the standard library may be more suitable for this task than the struct module which everybody is suggesting. Performance with 200 MB of data should be substantially better with array.
If you'd like to take at a variety of options, try profiling on your system with something like this
回答4:
I'm not sure how NumPy will compare performance-wise for your application, but it may be worth investigating.
Using NumPy:
from numpy import array a = array(floats,'float32') output_file = open('file', 'wb') a.tofile(output_file) output_file.close()
results in a 20 byte file as well.
回答5:
回答6:
回答7:
I ran into a similar issue while inadvertently writing a 100+ GB csv file. The answers here were extremely helpful but, to get to the bottom of it, I profiled all of the solutions mentioned and then some. All profiling runs were done on a 2014 Macbook Pro with a SSD using python 2.7. From what I'm seeing, the struct
approach is definitely the fastest from a performance point of view:
6.465 seconds print_approach print list of floats 4.621 seconds csv_approach write csv file 4.819 seconds csvgz_approach compress csv output using gzip 0.374 seconds array_approach array.array.tofile 0.238 seconds numpy_approach numpy.array.tofile 0.178 seconds struct_approach struct.pack method