How I can get the name of the input file within a mapper? I have multiple input files stored in the input directory, each mapper may read a different file, and I need to kno
The answers which advocate casting to FileSplit
will no longer work, as FileSplit
instances are no longer returned for multiple inputs (so you will get a ClassCastException
). Instead, org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit
instances are returned. Unfortunately, the TaggedInputSplit
class is not accessible without using reflection. So here's a utility class I wrote for this. Just do:
Path path = MapperUtils.getPath(context.getInputSplit());
in your Mapper.setup(Context context)
method.
Here is the source code for my MapperUtils
class:
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import java.lang.invoke.MethodHandle;
import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodType;
import java.lang.reflect.Method;
import java.util.Optional;
public class MapperUtils {
public static Path getPath(InputSplit split) {
return getFileSplit(split).map(FileSplit::getPath).orElseThrow(() ->
new AssertionError("cannot find path from split " + split.getClass()));
}
public static Optional getFileSplit(InputSplit split) {
if (split instanceof FileSplit) {
return Optional.of((FileSplit)split);
} else if (TaggedInputSplit.clazz.isInstance(split)) {
return getFileSplit(TaggedInputSplit.getInputSplit(split));
} else {
return Optional.empty();
}
}
private static final class TaggedInputSplit {
private static final Class> clazz;
private static final MethodHandle method;
static {
try {
clazz = Class.forName("org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit");
Method m = clazz.getDeclaredMethod("getInputSplit");
m.setAccessible(true);
method = MethodHandles.lookup().unreflect(m).asType(
MethodType.methodType(InputSplit.class, InputSplit.class));
} catch (ReflectiveOperationException e) {
throw new AssertionError(e);
}
}
static InputSplit getInputSplit(InputSplit o) {
try {
return (InputSplit) method.invokeExact(o);
} catch (Throwable e) {
throw new AssertionError(e);
}
}
}
private MapperUtils() { }
}