|

Analyze Data With MongoDB and Go

A guest post by William Kennedy, a managing partner at Ardan Studios in Miami, FL, a mobile and web app development company, and the author of GoingGo.Net.

My company is building a mobile application called Outcast. The idea behind Outcast is to give people who love the outdoors the ability to get ahead of the weather. By analyzing real time buoy, tide, lunar and solar data with user preferences and experiences, the application can deliver relevant information and forecasts. The user helps with the forecasting by providing an experience review after their outdoor activities have ended. Over time, the application learns what is the best condition for each individual user, their activity and favorite locations. Read about the open source debate that was had about this application.

The forecasting engine is built using MongoDB and Go. I created a small program called MongoRules that shows how MongoDB and Go can be used to quickly mine and analyze data.

MongoDB has done a great job providing documentation for all their tools. If you want to setup and maintain your own environment, go for it. I was doing this myself for a while. Eventually, like all devops and IT related work, it became a full-time job. I want to build software, not maintain computing environments.

Fortunately for me, I found a company called MongoLab. They provide managed MongoDB databases in the cloud. They have automated backups, 24/7 monitoring, and excellent support. They also partnered with Google to provide their services inside the Google cloud data centers. This service is currently being beta tested and I can’t wait for it to be available to the general public. This is huge for me because I plan to run the Outcast Go applications in the Google Compute Engine. This is going to help with both performance and bandwidth costs.

Thanks to MongoLab, I created a free database called goingo. If you have the MongoDB client already installed, use the following parameters to connect to the database. If you use a MongoDB client application, like MongoVue or MongoHub, take the parameters and configure a new database.

./mongo --username guest --password welcome ds035428.mongolab.com:35428/goinggo

In the goinggo database you will find a MongoDB collection called buoy_stations. I imported 114 documents from my Outcast database that represent actual buoy stations in the Gulf of Mexico. The MongoRules program has implemented one rule that identifies and analyzes certain buoy stations to determine if you should go fishing in Tampa, FL. The analysis is very basic, but it will give you an idea of how I use MongoDB to analyze data and build rules.

When I am analyzing data using MongoDB I always use the aggregation framework. What I love about the aggregation framework is how I can create a series of operations that allow MongoDB do the bulk of the work. The more I can leverage MongoDB to do the heavy lifting, the better. Here is a link to the documentation for the aggregation framework: http://docs.mongodb.org/manual/aggregation/.

Here is a trimmed down version of a MongoDB document that you will find in the buoy_stations collection:

{
  "station_id": "42036",
  "name": "Station 42036 - West Tampa",
  "location_desc": "112 NM WNW of Tampa, FL",
  "condition": {
    "date": ISODate("2013-07-19T21:50:00Z"),
    "wind_speed_milehour": 17.895520000000001204,
    "wind_direction_degnorth": 190,
    "gust_wind_speed_milehour": 22.36939999999999884
  },
  "location": {
    "type": "Point",
    "coordinates": [
      -84.516999999999995907,
      28.5
    ]
  }
}

A MongoDB document is written using JSON (JavaScript Object Notation). The buoy station document has two sub-documents. The first sub-document contains the buoy’s current weather condition and the second sub-document contains the geographic location. The geographic location is written as longitude/latitude. MongoDB has great support for performing geospatial queries, as you will see.

The rules for fishing in Tampa require that the average wind speed of all buoys in the Tampa area be less than or equal to 15 miles per hour. You can see the wind speed for buoy station 42036 is currently blowing faster – over 17 miles per hour. Or at least it was on July 19th at 9:50 PM UTC. In Outcast, these conditions are updated every five minutes, and we keep historical data.

Using the aggregation framework we can calculate the average wind speed for Tampa buoys by running these three operations through the pipeline:

db.buoy_stations.aggregate(
{"$geoNear": {
    "near": [-82.798676,27.945886],
    "query": {"condition.wind_speed_milehour" : {"$ne" : null}},
    "distanceField": "distance",
    "maxDistance": 0.00756965597428,
    "spherical": true,
    "distanceMultiplier": 3963.192
  }
},
{"$project" : {
  "station_id" : "$station_id",
  "wind_speed" : "$condition.wind_speed_milehour",
  "_id" : 0
  }
},
{"$group" : {
    "_id" : 1,
    "total_stations" : {"$sum" : 1},
    "average_wind_speed" : {"$avg" : "$wind_speed"}
  }
}
)

As each operation is executed through the pipeline, the results are passed to the next operation. This makes running operations through the pipeline very efficient. The idea is to keep filtering and aggregating the results until you get the data you need. The $geoNear operation is used to perform a geo-spatial query to filter the data. If we look at the geo-coordinates the program is using on Google maps, you can see where in Tampa the program is telling MongoDB our point of origin is:

https://maps.google.com/maps?q=27.945886,-82.798676&z=10

The program only wants to look at buoys within a 30-mile radius of that geo-location. The max distance for this query needs to be provided in radians. To calculate the radian value for 30 miles, divide 30 by the radius of the earth in miles or 3963.192.

The $project operation reduces the size of each document that was selected from the previous operation. Passing only the fields we need through the pipeline helps with efficiency and performance. You can also create calculated fields, rename fields and project fields out of a sub-document using the $project operation. In the $project operation we are keeping the station_id, projecting and renaming the condition sub-document field wind_speed_milehour to just wind_speed, and removing the internal MongoDB object id.   If we run the first two operations through the pipeline we get the following result:

{
	"result" : [
		{
			"station_id" : "cwbf1",
			"wind_speed" : 9.171453786668778
		},
		{
			"station_id" : "fhpf1",
			"wind_speed" : 14.98749757333756
		},
		{
			"station_id" : "camf1",
			"wind_speed" : 3.35541
		},
		{
			"station_id" : "optf1",
			"wind_speed" : 8.052983786668777
		},
		{
			"station_id" : "sapf1",
			"wind_speed" : 8.052983786668777
		},
		{
			"station_id" : "sblf1",
			"wind_speed" : 4.697573786668777
		},
		{
			"station_id" : "tpaf1",
			"wind_speed" : 2.23694
		},
		{
			"station_id" : "tshf1",
			"wind_speed" : 3.35541
		},
		{
			"station_id" : "egkf1",
			"wind_speed" : 12.75055757333756
		}
	],
	"ok" : 1
}

The $geoNear and $project operations have produced an array of nine documents. We now have one document for each buoy within a 30-mile radius of the target geo-location. The current wind speed has also been projected out of the condition sub-document.

The $group operation will be passed this array of nine documents and calculate the average wind speed. When we run the entire operation through the pipeline we get the following result:

{
	"result" : [
		{
			"_id" : 1,
			"total_stations" : 9,
			"average_wind_speed" : 7.406756699261136
		}
	],
	"ok" : 1
}

The average wind speed of those nine buoys is over seven miles per hour. There are other operations that the MongoRules program runs to get the final answer about fishing in Tampa. You can see those operations in the code. For now, let’s look at how we can run these same three operations through the aggregation framework inside of a Go program. To access a MongoDB database in Go, I use the mgo driver written by Gustavo Niemeyer. I have included the latest version of the mgo driver in my repository to make it easy for you to compile and run the program. If you want to install the mgo driver code directly from the Labix repository, you can find the instructions here: http://labix.org/mgo.

This is how we connect to a Mongo database and prepare the program for use using the mgo driver:

// Create MongoDB connectivity parameters
dialInfo := &mgo.DialInfo{
    Addrs:    []string{DBDB_HOST},
    Timeout:  10 * time.Second,
    Database: MONGODB_DATABASE,
    Username: MONGODB_USERNAME,
    Password: MONGODB_PASSWORD,
}

// Connect to MongoDB and establish a connection
// Only do this once in your application.
// There is a lot of overhead with this call.
session, err := mgo.DialWithInfo(dialInfo)
if err != nil {

    fmt.Printf("ERROR : %s", err)
    return
}

// Capture a reference to the collection
collection := session.DB(MONGODB_DATABASE).C("buoy_stations")

I have created constants to hold the connection parameters to the MongoDB database. Then we create a DialInfo object and use that to call the DialWithInfo function. This function returns a Session object, which can be used to gain access to our collection.

With access to the collection, we can run our three operations through the aggregation framework. The mgo driver defines a special type called M to help write MongoDB queries and operations. This type is implemented as follows:

type M map[string]interface{}

Our first operation looks like this in JavaScript code:

{"$geoNear": {
    "near": [-82.798676,27.945886],
    "query": {"condition.wind_speed_milehour" : {"$ne" : null}},
    "distanceField": "distance",
    "maxDistance": 0.00756965597428,
    "spherical": true,
    "distanceMultiplier": 3963.192
  }

This is how we translate that JavaScript code to an M type:

bson.M{
    "$geoNear": bson.M{
        "near": []float64{this.Longitude, this.Latitude},
        "query": bson.M{
            "condition.wind_speed_milehour": bson.M{"$ne": nil},
            },  
        "distanceField": "distance",
        "maxDistance": this.MaxDistance,   
        "spherical": true,
        "distanceMultiplier": DISTANCE_MULTIPLIER,
    },
}

The main M object contains the “$geoNear” operation as the map key. Then a new M object is created as the value for the “$geoNear” map key. This will hold all of the parameters for the operation. Each parameter is a new key and value pair. Look at how the coordinates are provided as an array of float64 type values.

Here are all three operations coded using M objects:

Latitude := 27.945886
Longitude := -82.798676
MaxDistance := (30.0 / DISTANCE_MULTIPLIER)

o1 := bson.M{
    "$geoNear": bson.M{
        "near": []float64{this.Longitude, this.Latitude},
        "query": bson.M{
            "condition.wind_speed_milehour": bson.M{"$ne": nil},
            },  
        "distanceField": "distance",
        "maxDistance": this.MaxDistance,   
        "spherical": true,
        "distanceMultiplier": DISTANCE_MULTIPLIER,
    },
}

o2 := bson.M{
    "$project": bson.M{
        "station_id": "$station_id",
        "wind_speed": "$condition.wind_speed_milehour", "_id": 0,
    },
}

o3 := bson.M{
    "$group": bson.M{
        "_id": 1,
        "average_wind_speed": bson.M{
            "$avg": "$wind_speed",
            },
    },
}

operations := []bson.M{o1, o2, o3}

Now that our operations have been translated into M objects, we can run them through the MongoDB aggregation framework:

// Prepare the query to run in the MongoDB aggregation pipeline
pipe := collection.Pipe(operations)

// Run the queries and capture the results
results := []bson.M{}
err := pipe.All(&results)

if err != nil {

    fmt.Printf("ERROR : %sn", err)
    return
}

// Capture the average wind speed
avgWindSpeed := results[0]["average_wind_speed"].(float64)

fmt.Printf("Average Wind Speed : %.2fn", avgWindSpeed)

With the collection object, we create a Pipe object, passing the array of operations. With a pipe object, we run all of the operations through MongoDB and retrieve the results. One nice feature of mgo is that there is support to iterate through each operation and retrieve the individual results.

Once the operations are complete, we get an M object back with the results. At the end of the code sample, we extract the average wind speed from the M object and display the value to the screen.

The MongoRules program provides a working version of this code and more. Here is the output of the MongoRules program for the Tampa rule.

./MongoRules tampa

Tampa Buoy With Lowest Wind Gust
Station Id			: tpaf1
Name			 		: Station TPAF1 - 8726694
Location				: TPA Cruise Terminal 2, Tampa, FL
Latitude				: 27.933000
Logitude				: -82.433000
Distance				: 22.363652 Miles
Wind Speed			: 2.24 Miles/Hour
Wind Direction			: 340 From True North
Wind Gust				: 4.70 Miles/Hour
Avg Wind Gust			: 7.41 Miles Per Hour

Tampa Buoy Closest To Your Location
Station Id			: cwbf1
Name					: Station CWBF1 - 8726724
Location				: Clearwater Beach, FL
Latitude				: 27.977000
Logitude				: -82.832000
Distance				: 2.962600 Miles
Wind Speed			: 9.17 Miles/Hour
Wind Direction			: 260 From True North
Wind Gust				: 12.75 Miles/Hour
Avg Wind Gust			: 7.41 Miles Per Hour

MongoDB is an excellent database and Go is the perfect programming language to leverage all of its functionality. The MongoRules program should get you started exploring ways to use MongoDB for all of your data storage and analytical needs. We use MongoDB in most of our development projects that require a database. We also use it when programming in Ruby and .Net.

The mgo driver is why I am a Go programmer today. When I was looking for a new programming language, the first requirement I had was that a driver for MongoDB had to exist. The mgo driver has been a great find and as you can see, it gives you access to all of the MongoDB functionality. With the power of the Go programming language and MongoDB combined, you can build high-powered data analytic engines to drive applications like Outcast.

Safari Books Online has the content you need

Below are some MongoDB and Go books from Safari Books Online that will help you with all sorts of tips and information.

MongoDB: The Definitive Guide, 2nd Edition shows you the many advantages of using document-oriented databases, and demonstrates how this reliable, high-performance system allows for almost infinite horizontal scalability.
Programming in Go: Creating Applications for the 21st Century brings together all of the knowledge you need to evaluate Go, think in Go, and write high-performance software with Go. The author explains everything from the absolute basics through Go’s lock-free channel-based concurrency and its flexible and unusual duck-typing type-safe approach to object-orientation.
MongoDB in Action is a comprehensive guide to MongoDB for application developers. The book begins by explaining what makes MongoDB unique and describing its ideal use cases. A series of tutorials designed for MongoDB mastery then leads into detailed examples for leveraging MongoDB in e-commerce, social networking, analytics, and other common applications.
Learn how to create large MongoDB clusters! Scaling MongoDB shows you how to use MongoDB efficiently for very large databases. It Covers sharding, cluster setup, and administration.
Learn how to create large MongoDB clusters! The Go Programming Language Phrasebook gives you the code phrases you need to quickly and effectively complete a wide variety of projects with Go, today’s most exciting new programming language. Tested, easy-to-adapt code examples illuminate every step of Go development, helping you write highly scalable, concurrent software.

About the author

bill.headshot William Kennedy is a managing partner at Ardan Studios in Miami, FL and is the author of GoingGo.Net. Ardan Studios is a Mobile and Web App Development company. Bill spent his first 10 years as a professional developer writing low level C/C++ for the healthcare and call center industries on the Microsoft stack. Then in 2003 switched to C#, developing those same back end systems for the call center and gaming industry. In May 2013, Bill looked for a new language that would allow him to develop back end systems in Linux. Bill found Go and has never looked back. He has been married for 18 years. He and his wife enjoy their five kids, four cats, one dog and all the wild animals who have found a home in the Kennedy backyard.

About Safari Books Online

Safari Books Online is an online learning library that provides access to thousands of technical, engineering, business, and digital media books and training videos. Get the latest information on topics like Windows 8, Android Development, iOS Development, Cloud Computing, HTML5, and so much more – sometimes even before the book is published or on bookshelves. Learn something new today with a free subscription to Safari Books Online.
|

Comments are closed.