PySpark DataFrames: filter where some value is in array column
问题 I have a DataFrame in PySpark that has a nested array value for one of its fields. I would like to filter the DataFrame where the array contains a certain string. I'm not seeing how I can do that. The schema looks like this: root |-- name: string (nullable = true) |-- lastName: array (nullable = true) | |-- element: string (containsNull = false) I want to return all the rows where the upper(name) == 'JOHN' and where the lastName column (the array) contains 'SMITH' and the equality there