Push
Within run() (the place where you set up your mapper and reducer classes):
DistributedCache.addCacheFile(new URI("file:///home/raymond/file.txt"), conf);
conf is the Configuration I set up with:
Configuration conf = new Configuration();
Use file:// for local filesystem and hdfs:// for targeting the hdfs filesystem.
Pull
Now you need to know where the framework has cached your file.. Within mapper (or reducer) class you can now say:
Path[] localFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration());
localFiles now contains paths of all cached files. If you have used only one file, you can access it by saying:
String Path = localFiles[0].toString();
Now you know where its located. You can pass it to a filereader to read out the contents.
As usual I'm trying to keep things as simple as possible.. Let me know if I definitely should mention something else about this process :)
Keine Kommentare:
Kommentar veröffentlichen