Trying to get spark streaming to read data stream from website, what is the socket?

谁说胖子不能爱 提交于 2019-12-01 12:52:45

问题


I am trying to get this data http://stream.meetup.com/2/rsvps into spark stream

They are JSON objects, I know the lines will be strings, I just want it to work before I try JSON.

I am not sure what to put as the port, I assume that is the problem.

SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("Spark Streaming");

JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(1));

JavaReceiverInputDStream<String> lines = jssc.socketTextStream("http://stream.meetup.com/2/rsvps", 80);


lines.print();

jssc.start();
jssc.awaitTermination();

Here is my error

java.net.UnknownHostException: http://stream.meetup.com/2/rsvps
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at java.net.Socket.connect(Socket.java:528)
    at java.net.Socket.<init>(Socket.java:425)
    at java.net.Socket.<init>(Socket.java:208)

回答1:


The socketTextStream isn't designed to work as an http client. As you noticed, you will need to create a custom receiver, one potential place to start is based on the receiver created as part of the meetup streaming data source (see https://github.com/actions/meetup-stream/blob/master/src/main/scala/receiver/MeetupReceiver.scala ).




回答2:


Here is a custom UrlReceiver which follows spark documentation on custom receivers:

class UrlReceiver(urlStr: String) extends Receiver[String](StorageLevel.MEMORY_AND_DISK_2) with Logging {

  override def onStart() = {
    new Thread("Url Receiver") {
      override def run() = {
        val urlConnection: URLConnection = new URL(urlStr).openConnection
        val bufferedReader: BufferedReader = new BufferedReader(
          new InputStreamReader(urlConnection.getInputStream)
        )
        var msg = bufferedReader.readLine
        while (msg != null) {
          if (!msg.isEmpty) {
            store(msg)
          }
          msg = bufferedReader.readLine
        }
        bufferedReader.close()
      }
    }.start()
  }

  override def onStop() = {
    // nothing to do
  }
}

Then use it like this:

val lines = sc.receiverStream(new UrlReceiver("http://stream.meetup.com/2/rsvps"))


来源:https://stackoverflow.com/questions/30672898/trying-to-get-spark-streaming-to-read-data-stream-from-website-what-is-the-sock

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!