'RDD' object has no attribute '_jdf' pyspark RDD
问题 I'm new in pyspark. I would like to perform some machine Learning on a text file. from pyspark import Row from pyspark.context import SparkContext from pyspark.sql.session import SparkSession from pyspark import SparkConf sc = SparkContext spark = SparkSession.builder.appName("ML").getOrCreate() train_data = spark.read.text("20ng-train-all-terms.txt") td= train_data.rdd #transformer df to rdd tr_data= td.map(lambda line: line.split()).map(lambda words: Row(label=words[0],words=words[1:]))