One of the things that I would very much like to do is to work on big data. My current job cannot present an opportunity on that, but that’s ok. Therefore, I decided to find my own big data problem and construct my own cluster. Along the way, I wanted to share my experience with you.
The decision on tool selection was easy for me. I have Raspberry Pi 2 Model B. Putting a few of them together should perform as a base case for much more bigger and capable clusters.
On top of that, installing Hadoop as the big data processing framework will be easy. There are many good books on that, such as Hadoop: The Definitive Guide and Pro Apache Hadoop. Also, Hadoop and its ecosystem provides a wide enough context that is applicable to any scalable big data problem.
Running Hadoop on Raspberry Pi is not something new. There are really good guides for that. I definitely used a few of them. Jonas Widriksson’s guide is the first one that I have encountered. It is a little bit out-dated but worth every single second I devoted to read. itToby also deserves mention. The guide prepared by Monning and Schiller covers the most current version of Hadoop.
As I said before, I did, and still do, read all of them. What I wanted to do here is to collect the bits and pieces from many sources and provide a more or less comprehensive Hadoop setup on Raspberry Pi nodes, enriched by my own pitfalls and experience.
The next post will be about preparing a Raspberry Pi for Hadoop installation using a headless setup with a Mac.
Leave a Reply