When it comes to Clojure, there are many tutorials, websites, and even blog posts about how to get started (language syntax, set up a project, configure your IDE, etc.). There are also many tutorials, websites, and even blog posts about how language features work (protocols, transducers, core.async, etc.). There are precious few tutorials, websites, and even blog posts about when and how to use Clojure's features.
This blog post is not a getting started tutorial. Neither is it a deep dive on a particular Clojure feature. It's more like a mini-class in comparative architecture. If you're reading this post, I'll assume you are familiar with Clojure and even a bit proficient at it.
In this post, we'll talk about polymorphism and look at some example problems, solve them with different tools, and then pick them apart for what is good and what is bad. There will not be one right answer, but you will learn principles about when to appropriately apply Clojure's polymorphic tools.
This is not a cookbook. This is me conveying my experience writing large Clojure systems. If I am successful, then I am not teaching you recipes, instead I am helping you develop a taste for good Clojure design.
When I say "Clojure" I am usually referring to both Clojure and ClojureScript. Sometimes I am referring only to JVM Clojure, but context should make it clear.
So let's examine the theme of Polymorphism. Polymorphism is a property of a function. When a function is polymorphic its behavior will depend on the arguments you give to it. The most common type of polymorphism is type-based. In a language like Java you call methods on objects, the object on which you call a method is an implicit first argument to the method, and the code that runs will depend on the type of that object.
The decision about which code to run is called dispatch. Aside from type-based dispatch on the first argument, dispatch could also consider the number of arguments, the types of each of the arguments, etc.
Clojure gives you a few options for polymorphic dispatch:
- **defmulti**: a multimethod is the most general type of dispatch. You can decide how the dispatch will be performed by providing a dispatch function, and this dispatch function can run arbitrary Clojure code.
- **defprotocol**: a protocol function is a very specific (but common) dispatch. A protocol function will dispatch on the type of its first argument.
Those are mechanisms for defining abstractions in Clojure. You also have a couple of options for implementing abstractions:
- **reify**: using
reify you can implement protocols and interfaces as an anonymous class. You cannot define fields, but the reified class is a closure and you can capture an atom or something appropriate from the context.
deftype generates a class that implements several protocols and interfaces. A
deftype is not a closure, but it can have fields (including mutable fields).
defrecord generates a new immutable, "map-like" data structure. A
defrecord is also not a closure, but you can define immutable fields, and implement several protocols and interfaces. A
defrecord can be used as a map, destructured like a map, have additional keys `assoc`ed on to it, etc.
There are a couple other language features that are specific to either Clojure or ClojureScript, but
defmulti/defprotocol/reify/deftype/defrecord is where I will spend most of my time. This collection of features common to both Clojure and ClojureScript is a very flexible way to define and implement abstractions.
Polymorphism is primarily used to create an abstraction that can be implemented several ways. An abstraction that I have created many times is a service abstraction. For instance, you may want to define a storage service abstraction modeled as a key-value store for binary objects. You could then implement it for S3, CloudFiles, and Azure. For use in tests, you could even implement a local filesystem backend.
My abstraction will consist of five functions:
close. You call
connect to construct a service object. The service object will be used with
get to fetch an object from the store,
put to store an object in the store,
delete to delete an object from the store, and
close to clean up the service object.
There are a couple of different ways of defining and implementing this abstraction. Each will have its own advantages and disadvantages. First let us look at them, then I will pick them apart for some design principles.
An abstraction can be defined and implemented using only plain functions. In this case the abstraction is not explicit in the code, but is based on using conventional names for definitions:
Use the abstraction as
If you want to change the "implementation" you are using you would pull in and
connect a different namespace:
A second option is to define the abstraction with multimethods:
Implemented them for each backend:
Use the multimethods directly as
Protocol + Client Namespace
Thirdly, the abstraction can be defined using a protocol:
Implement it for each backend:
Implement a client namespace:
Use the client namespace as
storage-conn as an instance of
Protocol + Client Namespace + Multimethod
Finally, the protocol can be combined with a
connect a multimethod:
Implement the protocol and multimethod for each backend:
Implement a client namespace:
Use the client namespace as
That was a lot of code, but hopefully it gave you a broad view of different approaches to the same problem. These approaches can be analyzed along a few different dimensions: code elasticity, separation of concerns, and dynamism. I'll pull apart the good and the bad and draw out some principles.
An important question is how easy it is to change the code. You may want to add a new backend. How much code would have to change?
Adding a new backend to the "Plain Functions" approach would be simple, but making use of the new backend would require changing every single callsite. If the likelihood of adding a new backend is pretty low, this may be appropriate. You might wonder why the approach should be used at all? The "Plain Functions" approach is a kind of abstraction. It is a step up from---and far, far better than---scattering across your application direct calls to an S3 library. As a kind of primordial polymorphism, it can be used for temporary experiments where you would like to keep around the original implementation while you test a new implementation.
Adding a new backend with "Protocol + Client Namespace" would be about the same as without the client namespace. However, making use of a new backend would not require any changes at client callsites. The site at which the
storage-conn object is created would have to change, but the impact of that would likely be much less.
The impact of changes for the other approaches are similar to "Protocol Functions + Client Namespace". The client namespace in "Protocol Functions + Client Namespace" is a dispatch mechanism similar to how a very simple object oriented language might work. Multimethods and protocol functions are just more complex dispatch mechanisms. Once you make the jump to a dispatch mechanism you remove the need for client code to know implementation details, and that's the big win for code elasticity.
Separation of Concerns
To help illustrate separation of concerns, I have included validation that the
bucket parameter is not
nil. If you pay attention to how the validation code moves between the approaches you will get a sense for separation of concerns.
Even when using multimethods or protocol functions as a dispatch mechanism, you will notice that there is still some value in having a client namespace. It mediates between client concerns (code that uses the abstraction) and backend concerns (code that implements the abstraction).
Without the client namespace each backend implementation must enforce validation. I've only shown one implementation, but you can imagine how that would go. It would be much better to enforce validation in one place for all backend implementations.
The client namespace can also mediate client concerns. Imagine if---to make using the library easier---I decided to make
bucket an optional parameter that defaults to
"widgets". That would require updating every backend implementation with a new arity for most of the API functions, and they must all agree on the correct default value. When I implement a backend I don't want concerns about defaults and such. Having a separate client namespace reduces the surface area of the backend implementations.
Another separation of concerns is using a lifecycle (i.e.
close functions). Not having a lifecycle unnecessarily ties the life of your service object to the life of your VM. It makes it impossible to choose a backend dynamically, which at the very least makes writing tests more difficult. It may seem like overkill for this S3 backend, but presumably some of the other backends will actually have use for
close. Even for the S3 backend you could imagine starting up an HTTP connection pool, and shutting it down.
There are two times at which dispatch can occur: compile-time and run-time. The "Plain Functions" approach is obviously doing dispatch at compile-time. To change the dispatch I must change the code, recompile it, redeploy it, and re-run it. If the dispatch will not change often, this may be fine.
Though some may not take advantage of it, the other approaches all dispatch at run-time. Run-time dispatch will not only make the code more elastic (as we saw above), but it is possible to dispatch on values that are not available at compile-time. For example, you could allow each user to configure which storage backend they would like to use.
Using run-time dispatch you can also create other moments for dispatch, like deploy-time. With the "Protocol + Client Namespace + Multimethod" approach you could define the multimethod dispatch like this:
config would look up the deploy-time config for your application. To change a backend you would only need to change the deploy-time config and restart your application. This would also allow a different storage backend for each deployment of your application.
5 Principles to Take Away
1. If you have a reasonable expectation that your code may change, then you should use some kind of run-time dispatch mechanism.
2. Using run-time dispatch, even if you don't necessarily need it, will increase the elasticity of your code.
3. Most likely you should have a client namespace to separate client (code that uses the abstraction) concerns from backend (code that implements the abstraction) concerns and limit the surface area for implementation.
4. Use a lifecycle for components in your application, unless it would always be overkill for every situation that you can possibly imagine.
5. If you need to construct a service object based on run-time information (or deploy-time information), consider using the "Protocol + Client + Multimethod" approach.