|

Parallel Programming Paradigms in Clojure, Part I

Due to the significant shift in physical processors toward multi-core capabilities, it is becoming important for languages to support parallel programming paradigms and for developers to effectively use these capabilities in order to gain maximum performance. Clojure, a functional programming language, provides the capability to work in parallel environments through Agents and Concurrency functions that are easier to use, and they provide a quick way to parallelize your code. It is also possible to use Java’s native threading capabilities with Clojure, which is more suited to some configurations. In this article we will introduce some of the ways in which Clojure can help you to achieve concurrency. The second part of this article explores more advanced topics such as atomics, STMs, Java executors, and more.

Agents for multi-threading

In Clojure, Agents are used for managing states, and they support the asynchronous sharing of changing states, thus making threads independent for the order of execution. The agents are executed in thread pools that are actually managed by the Clojure runtime. There is one difference in dispatching actions to the thread pool that selects one of two thread pools. The actions can either be dispatched by send or send-off functions, with each having different properties. Read Chapter 6.4. Asynchronous Agents to learn more about agents in Practical Clojure.

send function

The send function uses a thread pool that is fine-tuned and accurately sized to run the code on an available number of physical processor cores. In ideal conditions where the CPU is free, all threads will start and terminate the execution at the same time within a reasonable amount of precision. This functionality is particularly useful for time constraint applications. However, if the CPU cores are busy running other processes, the execution of these threads will automatically serialize based upon available system resources, without sacrificing application consistency.

send-off function

This function uses the second thread pool that can execute an arbitrarily large number of threads and execute them in parallel to any available resources without considering the physical cores present in the current system. Applications created using this function are generally more adaptable to changing hardware configurations and more scalable to future systems that may contain several physical cores.

Choosing between the send or send-off functions is up to the programmer who can decide based upon all of the requirements.

For example, let’s say we have an agent that keeps a sum of a list of numbers. The current value of the agent is a map of the current sum and the array of numbers:

usman=> (def currentsum (agent {:nums [] :sum 0}))
#'usman/currentsum

Next, we must define an action function for the agent that takes the current value of the agent and the number to add toward the sum to return the value of the new agent value:

(defn update-currentsum [current s]
    (let [new-nums (conj (:nums current) s)]
        {:nums new-nums
         :sum (reduce + new-nums)}))

Now, we’ll use the send thread pool rather than send-off, since we require a simple processing that doesn’t require a large number of processes. Let’s test the functions with some values:

usman=> (send currentsum update-currentsum 5)
#<Agent @4cdac8 {:nums [], :sum 0}>

usman=> (send currentsum update-currentsum 10)
#<Agent @4cdac8 {:nums [5], :sum 5}>

usman=> (send currentsum update-currentsum 15)
#<Agent @4cdac8 {:nums [5 10], :sum 15}>

usman=> (send currentsum update-currentsum 200)
#<Agent @4cdac8 {:nums [5 10 15], :sum 30}>

usman=> @currentsum
{:nums [5 10 15 200], :sum 230}

This example shows a basic use case that can be extended for large data sets, where the true performance improvement on multi-core systems can be seen. The next step could be to write the similar example for a larger data set to benchmark against the single threaded tasks. Again, the choice between the send or send-off directive is based upon numerous factors, including the type of application and the underlying system or environment. In real-world examples, a code developed using agents can scale well from a single CPU to hundreds of CPUs without compromising results.

Concurrency Macros

In addition, the Clojure standard library comes with a collection of concurrency macros and functions. The pre-set functions are already optimized for multi-threaded performance and it is much easier to use them when coding your own functions. Some of these functions are described next.

pmap

The pmap function provides the same functionality along with an argument list as the normal map function; however, it computes the operations in parallel. This results in a significant performance increase for map functionality in multi-core and multi-CPU systems.

For example, take the case of a simple increment function using map:

usman=> (map inc [5 6 7 8])
(6 7 8 9)

This can be replaced using pmap as shown in the following:

usman=> (pmap inc [5 6 7 8])
(6 7 8 9)

The whole execution will change from one thread performing four serial operations to four separate threads performing a single operation. Samples with larger data sets using the same methodology can see a performance increment of up to four times.

pvalues

The pvalues function evaluates an arbitrary number of expressions in parallel and provides a lazy sequence in results.

Consider this example:

usman=> (pvalues (* 5 3) (+6 9) (/ 4 2) (- 7 4))
(15 15 2 3)

Note that each of these four operations are executed in parallel with four separate threads; however, the implementation is hidden behind the provided pvalues function.

pcalls

pcalls shares the same logic as with pmaps and pvalues and parallelizes no-argument functions.

Futures

The future is a single threaded computation in Clojure. A new thread is created whenever a future is made, which performs all of the computation given to the future. The future macro is used for creating a future that takes the arguments to be computed for that future. If more than one future is created, they will execute in parallel to effectively use the underlying parallel hardware. Look at this simple example that creates a future with a simple computation:

usman=> (def future-example (future (+ 5 15)))
#'usman/future-example
usman=> @future-example
20

Java Multi-Threading

Clojure also provides a Java interoperability feature that enables using some Java features inside Clojure, including multi-threading. It is easier to use the Java functionality from Clojure macros, making the whole code easier to write while using the extensive concurrency support in Java in the background. The following example creates a thread, performs an operation and shows the result:

usman=> (def sample (atom 0))
#'usman/sample
usman=> (def pthread (Thread. #(swap! sample inc)))
#'usman/pthread
usman=> (.start pthread)
nil
usman=> @sample
1

You can learn more about concurrency in Java directly from Sun’s documentation.

You should now have a good introduction of how to take advantage of concurrency using Clojure.

Safari Books Online has the content you need

Check out these Clojure books available from Safari Books Online:

Clojure Programming, helps you learn the fundamentals of Clojure with examples relating it to the languages you know already—whether you’re focused on data modeling, concurrency and parallelism, web programming, statistics and data analysis, and more.
Practical Clojure is the first definitive reference for the Clojure language, providing both an introduction to functional programming in general and a more specific introduction to Clojure’s features. This book demonstrates the use of the language through examples, including features such as STM and immutability, which may be new to programmers coming from other languages.

About the author

Usman Aziz is a technical lead at TunaCode, Inc., a startup that delivers GPU-accelerated computing solutions to time-critical application domains. He holds a degree in Computer Systems Engineering. His current focus is on protecting bulk data. He can be reached at usman@tunacode.com.
|

2 Responses to Parallel Programming Paradigms in Clojure, Part I

  1. Dan Winkler says:

    Looks like your code is tracking the average, not the sum: (/ (reduce + new-nums) (count new-nums))

    Also, you called the agent currentsum when you created it but then refer to it later as my-currentsum.

  2. Usman says:

    Thank you for pointing it out Dan, the code is now updated.

Facebook Twitter RSS feed