Removing rows in a nested struct in a spark dataframe using PySpark (details in text)
问题 I am using pyspark and I have a dataframe object df and this is what the output of df.printSchema() looks like root |-- M_MRN: string (nullable = true) |-- measurements: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- Observation_ID: string (nullable = true) | | |-- Observation_Name: string (nullable = true) | | |-- Observation_Result: string (nullable = true) I would like to filter out all the arrays in 'measurements' where the Observation_ID is not '5' or '10'.