outlier detection in pyspark
问题 I have a pyspark data frame as shown below. +---+-------+--------+ |age|balance|duration| +---+-------+--------+ | 2| 2143| 261| | 44| 29| 151| | 33| 2| 76| | 50| 1506| 92| | 33| 1| 198| | 35| 231| 139| | 28| 447| 217| | 2| 2| 380| | 58| 121| 50| | 43| 693| 55| | 41| 270| 222| | 50| 390| 137| | 53| 6| 517| | 58| 71| 71| | 57| 162| 174| | 40| 229| 353| | 45| 13| 98| | 57| 52| 38| | 3| 0| 219| | 4| 0| 54| +---+-------+--------+ and my expected output should be look like, +---+-------+--------+-