With the advent of Private Networking on DigitalOcean I would like to replace my local physical Cloudera Hadoop cluster with a droplet based cluster. One of the great things about using DigitalOcean droplets is that you can take snapshots of any images and destroy the VMs while they are not in use and you aren’t charged for them. On the downside, the way that private networking is implemented on DigitalOcean Droplets doesn’t ensure any security as far as other hosts on the same private network, you should certainly take this into consideration when using the private network; While the bandwidth is free, it isn’t really private.
This post outlines a 4 host cluster costing $0.15/Hour (1x$0.06 + 3x$0.03 / hour) making it an extremely accessible platform.
If your not familiar with DigitalOcean – they provide very simple very cheap Virtual Servers (droplets in DigitalOcean parlance).
I will be using the Cloudera Manager Automated Installer Guide, I’ve found this to be a great tool for managing a cluster.
This post doesn’t cover the happy path – there are some pit-falls that I’ve documented here, along with errors (for those that run into the same issues). I’ll do a Happy Path post soon (and I’m making notes throughout this post) but this guide will get you there if you read through it carefully. Below are a few gotchas to keep in mind. The most important thing that this post provides is an indicator the you CAN do this on DigitalOcean Droplets!
- All nodes must use root with the same password (or a superuser with the same password and a no-password sudo capability).
- Each and every node must have a hosts file with a mapping of all the private IP’s to hostname for ALL the hosts (modify the /etc/hosts file).
- When you get the web interface up and running there will be a search screen, you need to enter ALL the nodes (including 1, the one you’re logged into with the Cloudera Manager installed).
- When I finally got the cluster to install properly I had cranked up the Cloudera Management server to a 4GB server based on the amount of RAM used in a failed incomplete install.
- Because of the lack of security in this situation, the first thing I do when the web interface for Cloudera Manager comes up is to change the admin password, logging out and back in restarts the install.
- Look for highlighted notes about issues that were resolved on later attempts
Creating the Droplet
Signing up on DigitalOcean is a snap, however it does require a credit card to get started. Once you have an account you can sign in and create Droplets (Virtual Servers) very simply and quickly.
It’s helpful to use a Hostname scheme that reflects what you are trying to accomplish.
The Requirements for Cloudera Manager state that 2GB RAM may be sufficient for non-Oracle deployments involving fewer than 100 hosts, I’m going to create a much smaller VS knowing that I can re-size the droplet on the fly. I ended up using a 4GB management server by the time I successfully setup a cluster, the other hosts were created at 2GB.
Click “Create” button.
|Select Size||4GB (earlier tried 512MB & 2GB, not certain this was an issue)|
|Select Region||New York 2 (this is the only region that supports private networking at the time of this post)|
|Select Image||CentOS 6.4 x64|
|Settings||Select “Enable VirtIO” and “Private Networking”|
Click “Create Droplet” button.
You will momentarily receive an email from Digital Ocean Support with your new Droplet IP and credentials.
The first thing I do with each new droplet is to change the root password:
root@hadoop1:~# passwd Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully root@hadoop1:~#
I believe that for the Cloudera Automated installer is simpler to use the same root password on all of your nodes.
Installing Cloudera Manager and CDH
Following the instructions Installation Path A – Automated Installation by Cloudera Manager to complete this section – this process is very straight forward, sorting through all the documentation is more difficult than doing the installation.
Download the Cloudera Manager Installer (cloudera-manager-installer.bin) & upload the file to the host where you are installing.
[root@hadoop1 ~]# chmod u+x cloudera-manager-installer.bin [root@hadoop1 ~]# sudo ./cloudera-manager-installer.bin
Click through a few screens, accept the licence, wait a few minutes… Voilà!
On initial login via the web, be sure to select the appropriate license (Cloudera Standard in my case).
Somewhere about here I should have added entries for all of the nodes into the /etc/hosts file on each of the nodes. I didn’t and it caused me some grief.
And add all of your node host names and [private] IP’s to each node (this must be done on all of your nodes!) (And later I added the short names):
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.128.6.202 hadoop1.nerdnuts.com hadoop1 10.128.6.236 hadoop2.nerdnuts.com hadoop2 10.128.6.246 hadoop3.nerdnuts.com hadoop3 10.128.6.247 hadoop4.nerdnuts.com hadoop4
Specify hosts for your CDH cluster installation.
Well at this point in the setup the Cloudera Installer wants some other hosts to add to the cluster, but it’s dinner time and the kids are hungry. I stepped away from the computer and came back to it hours later, when I attempted to add the new hosts it appeared the the admin node was down – instead of troubleshooting I just deleted the droplets and started over.
Now that I’ve re-run the entire previous section of this post (creating 2GB Droplets instead of 512MB), I’m back to this screen, so lets setup a few other nodes to add. I’m going to create 3 more identical droplets with the hostnames hadoop2.nerdnuts.com, hadoop3.nerdnuts.com & hadoop4.nerdnuts.com.
On the search screen you need to enter ALL of the nodes for the cluster, including the one that the GUI is running on (if it is intended to be a member). The first and second times through I only added the new nodes (the three new nodes) and not the node that I started with which was already running. The third time I started over I included all 4 nodes and this process worked much better!
After setting up 3 hosts I received an error on all 3 additional hosts (this was caused because of a lack of entries in the /etc/hosts file on each host):
The “Details” revealed the following [very long] error (scroll from the horizontal rule to the horizontal rule to skip):
/tmp/scm_prepare_node.G4WA613r using SSH_CLIENT to get the SCM hostname: 10.128.6.236 43470 22 opening logging file descriptor Starting installation script... Acquiring installation lock... BEGIN flock 4 END (0) Detecting root privileges... effective UID is 0 Detecting distribution... BEGIN grep Tikanga /etc/redhat-release . . . >>agent.py: error: argument --hostname is required >>[15/Sep/2013 06:24:47 +0000] 1928 Dummy-1 agent INFO Stopping agent... >>/usr/lib64/cmf/agent/src/cmf/parcel.py:15: DeprecationWarning: the sets module is deprecated >> from sets import Set >>/usr/lib64/cmf/agent/src/cmf/agent.py:31: DeprecationWarning: the sha module is deprecated; use the hashlib module instead >> import sha >>[15/Sep/2013 06:24:47 +0000] 1928 MainThread agent INFO SCM Agent Version: 4.7.1 >>[15/Sep/2013 06:24:47 +0000] 1928 MainThread agent ERROR Could not determine hostname or ip address; proceeding. >>Traceback (most recent call last): >> File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 1600, in parse_arguments >> ip_address = socket.gethostbyname(fqdn) >>gaierror: [Errno -2] Name or service not known >>usage: agent.py [-h] [--agent_dir AGENT_DIR] >> [--agent_httpd_port AGENT_HTTPD_PORT] --package_dir >> PACKAGE_DIR [--parcel_dir PARCEL_DIR] >> [--supervisord_path SUPERVISORD_PATH] >> [--supervisord_httpd_port SUPERVISORD_HTTPD_PORT] >> [--standalone STANDALONE] [--master MASTER] >> [--environment ENVIRONMENT] [--host_id HOST_ID] >> [--disable_supervisord_events] --hostname HOSTNAME >> --ip_address IP_ADDRESS [--use_tls] >> [--client_key_file CLIENT_KEY_FILE] >> [--client_cert_file CLIENT_CERT_FILE] >> [--verify_cert_file VERIFY_CERT_FILE] >> [--client_keypw_file CLIENT_KEYPW_FILE] [--logfile LOGFILE] >> [--logdir LOGDIR] [--optional_token] [--clear_agent_dir] >>agent.py: error: argument --hostname is required >>[15/Sep/2013 06:24:47 +0000] 1928 Dummy-1 agent INFO Stopping agent... END (0) BEGIN tail -n 50 /var/log/cloudera-scm-agent//cloudera-scm-agent.log | sed 's/^/>>/' tail: tail: cannot open `/var/log/cloudera-scm-agent//cloudera-scm-agent.log' for reading: No such file or directory cannot open `/var/log/cloudera-scm-agent//cloudera-scm-agent.log' for reading: No such file or directory END (0) end of agent logs. scm agent could not be started, giving up waiting for rollback request
So, something is wrong here… talking to the man sitting next to me, he tells me that the system requires host entries to work properly.
[root@hadoop1 ~]# vi /etc/hosts
And add all of your node host names and [private] IP’s to each node (this must be done on all of your nodes!):
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.128.6.236 hadoop1.nerdnuts.com 10.128.6.202 hadoop2.nerdnuts.com 10.128.6.246 hadoop3.nerdnuts.com 10.128.6.247 hadoop4.nerdnuts.com
Then I clicked on “Retry Failed Hosts” on the last screen from above.
Installation completed successfully (much better).
Installing Selected Parcels (this takes a while):
The install is stalled here. I finally hit refresh to see what would happen and it’s stalled again at a download screen… this isn’t going as well as I had hoped. We followed these same basic steps on a local VM install at the same time and it’s up and running.
- the nodes appear to be communicating over the private network just fine.
- the install isn’t complete
- there appears to be 2 clusters defined within the main node now (I’m assuming that this has happened as a result of refreshing the page)
Should I start over?
Nothing ever came up, so yes I should have started from the begining.
So I switch from my Mac to my PC and punch in the management IP, login to the managment console and it asks for the host nodes again. When I enter the nodes and click next, a strange message comes up telling me to enter the the management IP or host, so I enter the management node. It finds the management node as a node and continues to install packages on it:
This completes successfully and I click “Continue”:
I end up back on the cluster installation window that was hanging, but now there is progress!
It completes, then “Continue”
A “Inspect hosts for correctness” screen appears, after a moment there is a detailed breakdown of status’s (there are a lot of warnings on this screen).
“Choose the CDH4 services that you want to install on your cluster.”
I select a basic install of “Core Hadoop” and click “Continue”.
“Database Setup” > “Use Embedded Database” > “Test Connection” >
I get the feeling that I have fubared the happy path to installation. Let’s see… there isn’t a way to insert connection data for the imbedded database… I’ve seen this work without setting up my own mysql or postgres install… I think I’ll start over and see if I can get on the happy path. This time around I will add all 4 hosts to the search initially and hopefully that will evade the issues I’ve had so far.
2nd Time I’ve completely restarted by destroying all the VM’s.
Destroy all droplets.
Change passwords. I like to log into one box “passwd” then ssh into the next box, then as I’m exiting boxes I can add the hosts entry[ies].
Create host entries.
[root@hadoop1 ~]# vi /etc/hosts
And add all of your node host names and [private] IP’s to each node (this must be done on all of your nodes!):
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.128.6.202 hadoop1.nerdnuts.com 10.128.6.236 hadoop2.nerdnuts.com 10.128.6.246 hadoop3.nerdnuts.com 10.128.6.247 hadoop4.nerdnuts.com
Upload bin to 1, change permissions, execute…
Go through a few simple GUI screens.
Search for all the hosts (all 4 private ip’s), it finds them.
Now installing on all hosts (not just the additional nodes).
Installation completed successfully.
Installing Selected Parcels
I’ve stalled here before… Completed! <Continue>
Inspect hosts for correctness
Much better results! No warnings, lots of green! <Continue>
Choose the CDH4 services that you want to install on your cluster.
Core Hadoop <Continue>
Use Embedded Database <Test Connection>
Review configuration changes
Starting your cluster services.
This takes a while, but has an indicator of the service(s) it’s working on.
It’s been stalled at “Creating Oozie database” for quite some time (11 of 17 services).
It’s never completed…. it ran for 8 hours, Oozie is still hanging and the Cloudera Manager is no longer responsive. Re-running the bin installer gives me a message that Cloudera Manager is already installed, rung /usr/share/cmf/uninstall-cloudera-manager.sh to uninstall.I’m unable to hit the management web interface (http://184.108.40.206:7180) so I check the status of the the Cloudera Management service:
[root@hadoop1 ~]# service cloudera-scm-server status
cloudera-scm-server dead but pid file exists
[root@hadoop1 ~]# service cloudera-scm-server stop
Stopping cloudera-scm-server: [FAILED]
[root@hadoop1 ~]# service cloudera-scm-server start
Starting cloudera-scm-server: [ OK ]
Now the admin interface is coming up:
Right on the home page is a error stating “Unable to issue query: the Host Monitor is not running”, without this I’m not seeing hardware monitoring details for each node. It looks like the cluster may be up and running otherwise. Lets test out the functionality in a simple way (Word Count Tutorial).
I’m unable to even create directories on the filesystem:
[hdfs@hadoop1 wordcount]$ hadoop fs -mkdir /user/cloudera /user/cloudera/wordcount /user/cloudera/wordcount/input mkdir: `/user/cloudera': Input/output error mkdir: `/user/cloudera/wordcount': Input/output error mkdir: `/user/cloudera/wordcount/input': Input/output error [hdfs@hadoop1 wordcount]$
Is it time for a 3rd complete restart? I haven’t been able to definitively identify a single issue that I would do differently on the next try, even after combing through all the log files that I can locate on the primary server.
The only things that come to mind are:
- The version of CentOS – maybe there is something going awry in the virtualization of 6.4, it’s fairly new and I’ve already had other issues in the past few weeks.
- Perhaps the multi-homed network adapters are causing an issue – I could try VM’s without private networking to test this.
Before destroying the cluster I took a look at the resources being used by each host. It appears that I could get away with using 1GB host nodes for everything except the management node. Next time through I will be upping the size of the first (management) node to ensure that it has the resources required to function properly.
3rd Time I’ve completely restarted by destroying all the VM’s.
Based on the snapshot of the resources above, I’ve decided to create a 4GB VM for the hadoop1 and create 2GB VS’s for hadoop2-4. The smallest server I’ve ever seen the Cloudera Manager running on was in the range of 3GB in a 4 host cluster.
The other change for this round of install is that I’m adding the short name to the hosts file:
10.128.6.246 hadoop1.nerdnuts.com hadoop1 10.128.6.247 hadoop2.nerdnuts.com hadoop2 10.128.6.236 hadoop3.nerdnuts.com hadoop3 10.128.6.202 hadoop4.nerdnuts.com hadoop4
I also changed the password within the web interface for the Cloudera Management interface as soon as I have access to it.
So after several tries I now get the following message: