Mittwoch, 14. November 2012

Push and Pull files within Hadoop distributed cache

Push

Within run() (the place where you set up your mapper and reducer classes):

DistributedCache.addCacheFile(new URI("file:///home/raymond/file.txt"), conf);

conf is the Configuration I set up with:

Configuration conf = new Configuration();

Use file:// for local filesystem and hdfs:// for targeting the hdfs filesystem.

Pull

Now you need to know where the framework has cached your file.. Within mapper (or reducer) class you can now say:

Path[] localFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration());

localFiles now contains paths of all cached files. If you have used only one file, you can access it by saying:

String Path = localFiles[0].toString();

Now you know where its located. You can pass it to a filereader to read out the contents.


As usual I'm trying to keep things as simple as possible.. Let me know if I definitely should mention something else about this process :)

Keine Kommentare:

Kommentar veröffentlichen