I\'m using spark with java, and i hava an RDD of 5 millions rows. Is there a sollution that allows me to calculate the number of rows of my RDD. I\'ve tried RDD.count(
Daniel's explanation of count is right on the money. If you are willing to accept an approximation, though, you could try the countApprox(timeout: Long, confidence: Double = 0.95): PartialResult[BoundedDouble] RDD method. (Note, though, that this is tagged as "Experimental").