How to access a web URL using a spark context

前端 未结 1 774
小蘑菇
小蘑菇 2020-12-19 03:32

I am trying to use a WEB URL from spark-shell using textFile method, but getting error. Probably this is not the right way. So can someone please tell me how to access a web

相关标签:
1条回答
  • 2020-12-19 04:10

    You cannot get url content using textFile directly. textFile is to :

    Read a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI

    You see, HTTP/HTTPS url is not included.

    You can get the content first, and then make it as RDDs.

    val html = scala.io.Source.fromURL("https://spark.apache.org/").mkString
    val list = html.split("\n").filter(_ != "")
    val rdds = sc.parallelize(list)
    val count = rdds.filter(_.contains("Spark")).count()
    
    0 讨论(0)
提交回复
热议问题