I am having a Zipped file containing multiple text files. I want to read each of the file and build a List of RDD containining the content of each files.
val
This filters only the first line. can anyone share your insights. I am trying to read a CSV file which is zipped and create JavaRDD for further processing.
JavaPairRDD zipData =
sc.binaryFiles("hdfs://temp.zip");
JavaRDD newRDDRecord = zipData.flatMap(
new FlatMapFunction, Record>(){
public Iterator call(Tuple2 content) throws Exception {
List records = new ArrayList();
ZipInputStream zin = new ZipInputStream(content._2.open());
ZipEntry zipEntry;
while ((zipEntry = zin.getNextEntry()) != null) {
count++;
if (!zipEntry.isDirectory()) {
Record sd;
String line;
InputStreamReader streamReader = new InputStreamReader(zin);
BufferedReader bufferedReader = new BufferedReader(streamReader);
line = bufferedReader.readLine();
String[] records= new CSVParser().parseLineMulti(line);
sd = new Record(TimeBuilder.convertStringToTimestamp(records[0]),
getDefaultValue(records[1]),
getDefaultValue(records[22]));
records.add(sd);
}
}
return records.iterator();
}
});