问题
I want to read a csv file into a list in an apache beam application, where each element in the list is a tuple or list (don't really matter), so that I would have the csv
1,2,3
4,5,6
become
[(1,2,3) , (4,5,6)]
or
[ [1,2,3], [4,5,6] ]
I tried following the instructions in How to convert csv into a dictionary in apache beam dataflow but when I try to use
from beam_utils.sources import CsvFileSource
I get
from beam_utils.sources import CsvFileSource
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/beam_utils/sources.py", line 9, in <module>
from apache_beam.io import fileio
ImportError: cannot import name fileio
If I try to directly import
from apache_beam.io import fileio
I get the same issue, however I can use both of
import apache_beam.io
import beam_utils
without any issues. Anyone got a good idea of what the issue might be or got a good idea of how I could do this in a different way?
I currently have
with beam.Pipeline(options = pipeline_options) as p:
csvfile = p | ReadFromText(known_args.input)
so if I can turn csvfile to the desired format in another way that works well too
回答1:
Just ran into this same problem a few minutes ago. The issue is that fileio is apparently no longer in apache_beam (at least it wasn't for me). It appears to have been replaced by filesystem.
Not a great solution, but in sources.py from beam_utils I replaced all instances of "fileio" with "filesystem"
So
from apache_beam.io import fileio
becomes
from apache_beam.io import filesystem
来源:https://stackoverflow.com/questions/46787428/python-from-apache-beam-io-import-fileio-gives-error-cannot-import-name-filei