Hadoop r0.20.2 documentation — dead link — found a cached copy

I recently looked into learning Apache Hadoop. Big data and cloud services are where the internet is headed and having not had experience with big data yet, I thought it’d be a good idea to at least get my feet wet. I found a great article on CTO Vision┬áthat walks you through getting hadoop installed and started. Addendum: If you’re having problems with their instructions, like I was, there’s a complete breakdown at pyfunc’s page.

Fortunately, the config is XML-based, so its easy enough to understand once you get the basic syntax down. Unfortunately, the config CTO Vision linked to for Pseudo Distributed mode turned out to be a dead link (404) because Apache removed the Hadoop r0.20.2 documentation from their site sometime in the first half of 2013.

Luckily, I’m resourceful and know of a site other than google that caches web sites, and they just so happened to have a copy of the document I needed.

So in the hopes that Google will pick up the text of the following two URLs; the following pages are dead:
http://hadoop.apache.org/docs/r0.20.2/quickstart.html#PseudoDistributed
or
http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html#PseudoDistributed

And without any further ado, you can find an archived/cached copy of the page linked to by the two links above at:
http://web.archive.org/web/20130123211137/http://hadoop.apache.org/docs/r0.20.2/quickstart.html#PseudoDistributed

That’ll help you complete the initial setup for a test install of hadoop in Pseudo Distributed mode. Beyond that, the rest of the r0.20.2 documentation is there also, but this seems to be a frequently searched for page on since it came up under Google’s Autofill shortly after I started typing the first few characters of the URL.