Mittwoch, 19. Dezember 2012

[Solved] Hadoop is processing "ghost data" from input

This fuck of a problem cost me more than 4 hours to solve. I have to note this down somewhere to take the edge off..

If you are using Linux (ubuntu in my case)
If your mapreduce is picking up input data that is not within your input folder
If you delete the content of your input folder and mapreduce still reads "ghost data"*
If you have renamed your input folder and the problem still continues
If you have deleted hadoop temp directories and the problem still continues
If you have rebooted your machine and emptied your trash and the problem still continues

FFS make sure that the input directory is really empty! Linux creates temp files that end with "~". These will not show up when you check the directory size, the directory content (because it is not visible by default) and will not show up with the ls command (because it is not visible by default).

:|

*Ghost Data: Data that is not visible and seems not to be there.

Keine Kommentare:

Kommentar veröffentlichen