I recently looked into learning Apache Hadoop. Big data and cloud services are where the internet is headed and having not had experience with big data yet, I thought it’d be a good idea to at least get my feet wet. I found a great article on CTO Vision that walks you through getting hadoop installed and started. Addendum: If you’re having problems with their instructions, like I was, there’s a complete breakdown at pyfunc’s page.
Fortunately, the config is XML-based, so its easy enough to understand once you get the basic syntax down. Unfortunately, the config CTO Vision linked to for Pseudo Distributed mode turned out to be a dead link (404) because Apache removed the Hadoop r0.20.2 documentation from their site sometime in the first half of 2013.
Luckily, I’m resourceful and know of a site other than google that caches web sites, and they just so happened to have a copy of the document I needed.