ContainerShip Blog

Crate.io on ContainerShip

Background

We continue our “on ContainerShip” series with Crate.io, which is as they describe it, a “Scalable SQL Database with all the NoSQL Goodies”. We feel a connection to Crate because of our TechCrunch Disrupt Startup Battlefield experience. We weren’t quite as lucky, but as you are likely aware, Crate was the winner of Disrupt Europe 2014! Crate allows users to query and compute data with SQL in real time by providing a distributed aggregation engine, native search, and super simple scalability.

Crate is awesome, but if you take a look at the instructions for setting up a cluster using the Docker container, it can seem a little intimidating. And recently, there was a post about running Crate using Docker Machine and Docker Swarm, but the length and amount of steps in that post leave something to be desired in the ease of use department.

We’re going to show you how easy it can be to:

  • Setup a Crate cluster that spans multiple follower hosts
  • Visit the web management console and add some data using their nifty built in twitter feed demo
  • Scale up our ContainerShip cluster from 2 follower hosts to 4, automatically scaling Crate to 4 nodes in the process
  • Try out the Crate command line tool crash
  • Backup the entire cluster state so we can restore it easily the next time we want to play with Crate

Setup Guide

The steps below will guide you through setting up a scalable Crate cluster on ContainerShip. Each follower node in the cluster will run an instance of Crate, and join the existing cluster. The guide below assumes you already have a ContainerShip cluster running; you can find more information about setting up a ContainerShip cluster from our official docs. This guide also assumes you are utilizing Navigator, a web-ui plugin for ContainerShip, to launch your applications. More information on using Navigator can be found here. While we do not demonstrate CLI or API usage in this article, setting up Crate using either will also work.

  • To get started, open Navigator and create a new application with a sensible name (such as crate).
  • Set the image to containership/crate:0.49.1 (most recent version at the time of writing this article. Check the Dockerhub page for additional tags in the future).
  • The default command is acceptable, so nothing needs to be entered for this field.
  • Crate needs ports 4200 & 4300 for inter-node communication. Since we are configuring our cluster to run one instance of Crate on each of our follower nodes, setting the networking mode to host is desirable.
  • Give the application some resources. Remember, Crate is ElasticSearch under the hood (Java project) so be sure to allocate enough memory for each container.
  • Since we are utilizing host networking mode, the container port can be left blank (autoassigned).
  • Set a sensible host volume location which the Crate container will bind mount, and write its data to. This location can be any directory on the host system large enough to hold all of Crate’s data. Once CodexD is merged into our next release, leaving the host volume blank will automatically use a copy on write subvolume in a default location. More on that in a later post.
  • Container volume should be set to /data, where Crate is configured to write to.
  • Environment variables are not necessary, but setting CRATE_CLUSTER_NAME will override the default value of “ContainerShip Crate”.
  • Click create!

Once your application is created, you can ensure your cluster scales horizontally by running it once on every follower node. You can easily configure your Crate application to do this by setting theconstraints.per_host=1 tag, as seen below. More information about tags is available in our docs.

Your Crate application should automatically scale to n/n (where n is the number of active follower hosts in your cluster). As more follower hosts come online, the cluster will scale linearly. Is this case we have 2 followers in our ContainerShip cluster, so Crate automatically scales up to 2.

Checking Out the Crate Management Console

Congratulations! Your Crate cluster is now ready to use!

Checking Out the Crate Management Console

Now that we have our Crate cluster up and running, let’s visit the management console and start having some fun.

All you need to do is visit the public IP of one of your follower hosts on the port we exposed earlier (4200).

For example:

http://follower.ip.address:4200/_plugin/crate-admin

You should be presented with the management console, and see 2 instances in the cluster.

 management console
You won’t see my awesome test database called dogs unfortunately. And you won’t see the tweets database yet either.. those are there because my co-founder set this up last night on AWS, then backed it up and tore it down. Today I restored the backup to DigitalOcean to make this post… and now I’m just rolling with it. ContainerShip Cloud is awesome.

Anyways… moving on!

Crate has a really cool feature in the Getting Started section of the console that lets you connect your twitter account, and pull in some tweets so you have test data to play with.

Getting Started section

After importing some tweets you can jump over to the tables view and see how things are looking.

 tables view

So now that you have some data in your tweets database, you’ll probably want to run some queries and see how it works or how performance is. Crate has a built in console where you can input SQL queries and see results right in your browser, let’s see how that works.

SQL queries

Cool, that works well! Now what about crash the Crate command line tool?

First, you need to install crash on our local workstation:

sudo pip install crash

Then run it and connect to your two follower hosts where the Crate containers are running. Finally, run some queries to test.

Topics: Docker Containership Big Data