I am trying to use a WEB URL from spark-shell using textFile method, but getting error. Probably this is not the right way. So can someone please tell me how to access a web
You cannot get url content using textFile
directly. textFile
is to :
Read a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI
You see, HTTP/HTTPS
url is not included.
You can get the content first, and then make it as RDDs
.
val html = scala.io.Source.fromURL("https://spark.apache.org/").mkString
val list = html.split("\n").filter(_ != "")
val rdds = sc.parallelize(list)
val count = rdds.filter(_.contains("Spark")).count()