Spark MLLib TFIDF implementation for LogisticRegression

前端 未结 1 2037
借酒劲吻你
借酒劲吻你 2020-12-08 23:35

I try to use the new TFIDF algorithem that spark 1.1.0 offers. I\'m writing my job for MLLib in Java but I can\'t figure out how to get the TFIDF implementation working. For

相关标签:
1条回答
  • 2020-12-09 00:26

    IDFModel.transform() accepts a JavaRDD or RDD of Vector, as you see. It does not make sense to compute a model over a single Vector, so that's not what you're looking for right?

    I assume you're working in Java, so you mean you want to apply this to a JavaRDD<LabeledPoint>. LabeledPoint contains a Vector and a label. IDF is not a classifier or regressor, so it needs no label. You can map a bunch of LabeledPoint to just extract their Vector.

    But you already have a JavaRDD<Vector> above. TF-IDF is merely a way of mapping words to real-valued features based on word frequencies in the corpus. It also does not output a label. Maybe you mean you want to develop a classifier from TF-IDF-derived feature vectors, and some other labels you already have?

    Maybe that clears things up but otherwise you'd have to greatly clarify what you are trying to achieve with TF-IDF.

    0 讨论(0)
提交回复
热议问题