How to reference a dataframe when in an UDF on another dataframe?
问题 How do you reference a pyspark dataframe when in the execution of an UDF on another dataframe? Here's a dummy example. I am creating two dataframes scores and lastnames , and within each lies a column that is the same across the two dataframes. In the UDF applied on scores , I want to filter on lastnames and return a string found in lastname . from pyspark import SparkContext from pyspark import SparkConf from pyspark.sql import SQLContext from pyspark.sql.types import * sc = SparkContext(