|

Clojure: Thinking in Hashmaps

A guest post by Timothy Pratley, who currently works for Tideworks Technology as a Development Manager building Traffic Control software for logistics clients including SSA Marine, CSX, and BNSF.

A Clojure hashmap is like a dictionary, but it is an efficient immutable implementation. Instead of modifying a hashmap in place, associating a key/value efficiently returns a new hashmap. Edmund Jackson’s post also covers some good Clojure hashmap tips.

Hashmaps have a special form {}, and can be declared as follows:

(def city {"Seattle" "cloudy"
           "Phoenix" "sunny"
           "New York" "busy"})

Most languages provide a syntax for accessing values in a dictionary by key:

city["Seattle"]

Clojure takes a different approach. Hashmaps are instead functions that you call to get the value associated with a key:

(city "Seattle")
-> "cloudy"

So, hashmaps can be passed as a function to higher order functions such as map:

(map city ["Seattle" "Phoenix"])
-> ("cloudy" "sunny")

A sequence of values returned from the city hashmap is the result of calling city across a sequence of keys.

One way to count all of the words in a document is to build a dictionary of words to the count of occurrences. The imperative style algorithm is as follows:

foreach word in document
  if word is not a key in dictionary
    then dictionary[word] = 1
    else dictionary[word]++

In Clojure, we draw on more abstractions:

over a sequence of words
aggregate a hashmap of word to count
by a function that takes a map and a word and returns a new map
where the new map value associated with word is incremented
starting with an empty map

What a mouthful! So how do we implement it? First, we define a function that does the incrementing:

(defn dict-inc [m word]
  (update-in m [word] (fnil inc 0)))

update-in takes a hashmap, calls a function on the value found by a key sequence, and associates that result to a hashmap, which is the final result:

(update-in city ["Seattle"] clojure.string/upper-case)
-> {"Seattle" "CLOUDY",
    "Phoenix" "sunny",
    "New York" "busy"}

Calling update-in does not modify city, since the result is a new hashmap value. This is an important property for composability.

fnil is a function, which creates a function that will use a default value when passed nil:

((fnil inc 0) nil)
-> 1
((fnil inc 0) 10)
-> 11

This is shorthand to TryGet the value and provide if then else branches. Performing an ‘update function’ is a useful concept that requires less code than get/modify/create if you have a convenient way of dealing with the absence of a value.

When we call dict-inc on an empty hashmap, the increment of 0 is associated with the input word:

(dict-inc {} "book")
-> {"book" 1}

If the word already has a count associated, the new hashmap associates the increment:

(dict-inc {"book" 1} "book")
-> {"book" 2}

Here is a sequence of words we can use as input:

(def words
 (re-seq #"\w+"
   "the quick fox jumped the quick fox quick"))

We aggregate a hashmap over the sequence of words:

(reduce dict-inc {} words)
-> {"jumped" 1,
    "fox" 2,
    "quick" 3,
    "the" 2}

The code definition of dict-inc and the call to reduce is very terse and packed with abstraction. Such dense code can be intimidating, but the payoff is you get leverage. This is the same leverage we experienced with sequences, and it applies to hashmaps.

Conclusion

Hashmaps as functions is a subtle and powerful abstraction. Hashmap transformation functions provide leverage from composability. So, using hashmaps to solve problems will change your way of thinking about problems. You will start thinking “in hashmaps” rather than “about hashmaps,” because they are tightly integrated into the language.

Be sure to look at the Clojure resources that you can find in Safari Books Online.

Safari Books Online has the content you need

Clojure Inside Out is a video where you’ll not only learn how to tackle practical problems with this functional language, but you’ll learn how to think in Clojure—and why you should want to. Neal Ford (software architect and meme wrangler at ThoughWorks) and Stuart Halloway (CEO of Relevance, Inc.) show you what makes programming with Clojure swift, surgical, and accurate.
Clojure Programming, helps you learn the fundamentals of Clojure with examples relating it to the languages you know already—whether you’re focused on data modeling, concurrency and parallelism, web programming, statistics and data analysis, and more.
The Joy of Clojure goes beyond the syntax, and shows how to write fluent, idiomatic Clojure code. You will learn to approach programming challenges from a Functional perspective and master the Lisp techniques that make Clojure so elegant and efficient. This book will help you think about problems the “Clojure way,” and recognize when they simply need to change the way they program.
Practical Clojure is the first definitive reference for the Clojure language, providing both an introduction to functional programming in general and a more specific introduction to Clojure’s features. This book demonstrates the use of the language through examples, including features such as STM and immutability, which may be new to programmers coming from other languages.

About the author

timhead Timothy Pratley currently works for Tideworks Technology as a Development Manager building Traffic Control software for logistics clients including SSA Marine, CSX, and BNSF. He can be reached at timothypratley@gmail.com.

About Safari Books Online

Safari Books Online is an online learning library that provides access to thousands of technical, engineering, business, and digital media books and training videos. Get the latest information on topics like Windows 8, Android Development, iOS Development, Cloud Computing, HTML5, and so much more – sometimes even before the book is published or on bookshelves. Learn something new today with a free subscription to Safari Books Online.
|

One Response to Clojure: Thinking in Hashmaps

  1. markc says:

    As with a huge number of useful patterns like this, Clojure provides a handy built-in function. For counting occurrences of words in a sequence, use “frequencies”.

    (frequencies words) ==> {“the” 2, “quick” 3, “fox” 2, “jumped” 1}

    Nice to see it derived from primitives though.. Thanks.