AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis', using pandas eval

问题

I have a series of the form:

s

0    [133, 115, 3, 1]
1    [114, 115, 2, 3]
2      [51, 59, 1, 1]
dtype: object

Note that its elements are strings:

s[0]
\'[133, 115, 3, 1]\'

I\'m trying to use pd.eval to parse this string into a column of lists. This works for this sample data.

pd.eval(s)

array([[133, 115, 3, 1],
       [114, 115, 2, 3],
       [51, 59, 1, 1]], dtype=object)

However, on much larger data (order of 10K), this fails miserably!

len(s)
300000

pd.eval(s)
AttributeError: \'PandasExprVisitor\' object has no attribute \'visit_Ellipsis\'

What am I missing here? Is there something wrong with the function or my data?

回答1:

TL;DR
As of v0.21, this is a bug, and an open issue on GitHub. See GH16289.

Why am I getting this error?
This (in all probability) is pd.eval's fault, which cannot parse series with more than 100 rows. Here's an example.

len(s)
300000

pd.eval(s.head(100))  # returns a parsed result

Whereas,

pd.eval(s.head(101))
AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis'

This issue persists, regardless of the parser or the engine.

What does this error mean?
When a series with more than 100 rows is passed, pd.eval operates on the __repr__ of the Series, rather than the objects contained within it (which is the cause of this bug). The __repr__ truncated rows, replacing them with a ... (ellipsis). This ellipsis is misinterpreted by the engine as an Ellipsis object -

...
Ellipsis

pd.eval('...')
AttributeError: 'PandasExprVisitor' object has no attribute 'visit_Ellipsis'

Which is exactly the cause for this error.

What can I do to make this to work?
Right now, there isn't a solution (the issue is still open as of 12/28/2017), however, there are a couple of workarounds.

Option 1
ast.literal_eval
This option should work out of the box if you can guarantee that you do not have any malformed strings.

from ast import literal_eval

s.apply(literal_eval)

0    [133, 115, 3, 1]
1    [114, 115, 2, 3]
2      [51, 59, 1, 1]
dtype: object

If there is a possibility of malformed data, you'll need to write a little error handling code. You can do that with a function -

def safe_parse(x):
    try:
        return literal_eval(x)
    except (SyntaxError, ValueError):
        return np.nan # replace with any suitable placeholder value

Pass this function to apply -

s.apply(safe_parse)

0    [133, 115, 3, 1]
1    [114, 115, 2, 3]
2      [51, 59, 1, 1]
dtype: object

ast works for any number of rows, and is slow, but reliable. You can also use pd.json.loads for JSON data, applying the same ideas as with literal_eval.

Option 2
yaml.load
Another great option for parsing simple data, I picked this up from @ayhan a while ago.

import yaml
s.apply(yaml.load)

0    [133, 115, 3, 1]
1    [114, 115, 2, 3]
2      [51, 59, 1, 1]
dtype: object

I haven't tested this on more complex structures, but this should work for almost any basic string representation of data.

You can find the documentation for PyYAML here. Scroll down a bit and you'll find more details on the load function.

Note

If you're working with JSON data, it might be suitable to read your file using pd.read_json or pd.io.json.json_normalize to begin with.
You can also perform parsing as you read in your data, using read_csv -
```
s = pd.read_csv(converters=literal_eval, squeeze=True)
```
Where the converters argument will apply that function passed on the column as it is read, so you don't have to deal with parsing later.
Continuing the point above, if you're working with a dataframe, pass a dict -
```
df =  pd.read_csv(converters={'col' : literal_eval})
```
Where col is the column that needs to be parsed You can also pass pd.json.loads (for json data), or pd.eval (if you have 100 rows or less).

Credits to MaxU and Moondra for uncovering this issue.

回答2:

Your data is fine, and pandas.eval is buggy, but not in the way you think. There is a hint in the relevant github issue page that urged me to take a closer look at the documentation.

pandas.eval(expr, parser='pandas', engine=None, truediv=True, local_dict=None,
            global_dict=None, resolvers=(), level=0, target=None, inplace=False)

    Evaluate a Python expression as a string using various backends.

    Parameters:
        expr: str or unicode
            The expression to evaluate. This string cannot contain any Python
            statements, only Python expressions.
        [...]

As you can see, the documented behaviour is to pass strings to pd.eval, in line with the general (and expected) behaviour of the eval/exec class of functions. You pass a string, and end up with an arbitrary object.

As I see it, pandas.eval is buggy because it doesn't reject the Series input expr up front, leading it to guess in the face of ambiguity. The fact that the default shortening of the Series' __repr__ designed for pretty printing can drastically affect your result is the best proof of this situation.

The solution is then to step back from the XY problem, and use the right tool to convert your data, and preferably stop using pandas.eval for this purpose entirely. Even in the working cases where the Series is small, you can't really be sure that future pandas versions don't break this "feature" completely.

来源：https://stackoverflow.com/questions/48008191/attributeerror-pandasexprvisitor-object-has-no-attribute-visit-ellipsis-us

标签

python