The context of this question is that I am trying to use the maxmind java api in a pig script that I have written... I do not think that knowing about either is necessary to
One recommended way is to use the Distributed Cache rather than trying to bundle it into a jar.
If you zip GeoIP.dat and copy it on hdfs://host:port/path/GeoIP.dat.zip. Then add these options to the Pig command:
pig ...
-Dmapred.cache.archives=hdfs://host:port/path/GeoIP.dat.zip#GeoIP.dat
-Dmapred.create.symlink=yes
...
And LookupService lookupService = new LookupService("./GeoIP.dat"); should work in your UDF as the file will be present locally to the tasks on each node.