Don't Repeat Yourself and (Bad) Coupling

Featured on Hashnode

I've heard and received (even recently, which is what made me write this article) many recommendations to always apply the Don't Repeat Yourself Principle, more commonly known as the DRY principle, even when there are just 2 lines of code that are repeated. In this article, I want to explain why I don't always want to do that and why I think everyone should think twice before applying this principle and what you need to look out for when you do that.

The problem

I've had many situations where DRY has led to painful complications in the code because it was applied thoroughly and vigorously, with every occasion it had.

The problem is that sometime later, developers start adding features on top of the DRY'ed-up code and they don't always see the pitfalls.

When application behaviour starts deviating between two code units that are using the same function for example, my experience has almost always been that, instead of creating copies of the common function and changing one of them to comply with the new requirements, developers tend to add more branching logic to that common function, to keep the old behaviour as well as implement the new one. Worse off even, they add new parameters to that function (which was previously simple and handled a single concern) that are artificially made optional so they can introduce the new logic as a new branch in the code, even though when looking at the correct business logic, the things those new parameters refer to would never be optional if they had stood in their appropriate context!

Am I the only one seeing issues with DRY?

Too often I see a huge effort to make everything DRY. In my opinion, this can lead to messy, unmaintainable code and complex branching logic that is hard to understand. Such codebases can end up being impossible to maintain once they grow past a certain size. It's a tale of two extremes: just as a complete lack of application of coding principles and design patterns to your software can lead to its downfall, an overprescribed application of these can also do the same (over-abstraction).

The DRY principle has been prescribed in a lot of books on programming and software engineering and software craftsmanship. Most of them say that you MUST APPLY DRY if you want to write high-quality software, otherwise, your code becomes hard to evolve, because if you change a behaviour you have to remember to change it in many places now, which is not necessarily true (it can easily be mitigated by tests).

But then very few people talk about the pitfalls of DRY. Only in some conference talks did some notable speakers talk about the issues DRY has created. Here's what Greg Young (the creator of CQRS) said at QCon London 2012 to InfoQ, when asked if he sees any issues with DRY:

The basic argument against following DRY is that there is another side to things. When following "DRY", it is quite common that people start building coupling and complexity into their software. One side of the trade-off is very easy to measure (number of face plants per hour when needing to fix bugs in multiple places) while the other is rather difficult (coupling and complexity built into software in the name of DRY).

One can argue that if DRY is followed "properly" there will never be coupling and complexity built into the software. This is even anecdotally visible, I can write you a code base that perfectly follows DRY while not introducing coupling and complexity. This however assumes that I have perfect knowledge.

This perfectly sums up the issues I've seen with DRY and there are probably very few more authoritative opinions than that of Greg on this question.

My suggestion

If you need to apply the DRY principle and "scratch that itch", my suggestion is to try and make it obvious to future developers that there is coupling in the code. Help them immediately see where the coupling is and make it easier for them to decouple it. Don't do that with comments. Do it with code.

But, a simple function declaration, reference, or call, doesn't immediately scream "Possible bad coupling here!"– by the way, that is another thing that I don't have time to go into here: good coupling vs. bad coupling.

Example

Here's a somewhat contrived example.

Say you have two React components that are displaying the same data, but from different perspectives, and you want to write an event handler called handleOnClick for both and assign it to each component's onClick prop. Now these two components directly depend on handleOnClick and can't evolve independently.

function handleOnClick(event) {
  // do X, do Y... handle the event...
}

return (
  <Foo onClick={handleOnClick} />
  <Bar onClick={handleOnClick} />
)

I know this looks simple but imagine these components having multiple event handlers, so many that it would become hard to distinguish between them (on click, before click, after click, on completion, etc., some of them even being related or linked somehow).

If the two components weren't in the same file and the click handler was in another file, say in a useClickHandlers hook, it can become harder to spot the coupling, especially if when you are working on adding new features to the handlers, you are working only in the context of the useClickHandlers file. This is another thing about coupling, it becomes harder to spot when the dependent code is further away.

Instead of the above, I would create different event handlers per component, say for example fooHandleOnClick and barHandleOnClick and I would remodel the shared code that was in handleOnClick more descriptively.

For example, if the function was simple, I would rename it to give more information on what it does, like doSomeThing, then I would call that function in my separate event handlers. This is effectively the extract method refactoring pattern from Martin Fowler. We have extracted the shared logic from two event handlers into a separate function.

function doSomeThing() {
  // do X, do Y...
}

function handleFooOnClick(event) {
  doSomeThing();
}

function handleBarOnClick(event) {
  doSomeThing();
}

return (
  <Foo onClick={handleFooOnClick} />
  <Bar onClick={handleBarOnClick} />
)

Or, if it was a more complex function and it was doing multiple things (and doing multiple different, unrelated things or things of different abstraction levels in the same function is a code smell), I would extract those pieces of functionality from the old handleOnClick handler into separate functions, like selectTask, or findPairTask, etc. Then I would call these functions in each independent handler:

function selectTask(taskId) {
  // update React context, etc.
}

function findPairTask(taskId) {
  // find pair of taskId and return it
}

function handleFooOnClick(event) {
  selectTask(event.taskId);
  const pairTask = findPairTask(event.taskId);
  // do something with pairTask...
}

function handleBarOnClick(event) {
  selectTask(event.taskId);
  const pairTask = findPairTask(event.taskId);
  // do something with pairTask...
}

return (
  <Foo onClick={handleFooOnClick} />
  <Bar onClick={handleBarOnClick} />
)

You'll notice that I am still left with some code duplication. Both handlers are still doing the same things, they are just extracted into separate "utility" functions now.

The trap

This is where I think the trap is. Someone would be looking at the code above thinking it is very "WET" and would immediately want to "DRY it out". They would replace each handler with a single function called handleOnClick, and then we're back to where we started!

Try and resist that urge. You've already extracted the most complicated part of the code into separate functions! The code is much more readable now, descriptive, self-documenting, loosely coupled, composable, separated into concerns, and all that good stuff.

Reassessment (or seeing a different picture)

What we are left with now is nothing but simple and declarative workflows for each component's click event: fooHandleOnClick and barHandleOnClick just describe some steps in a workflow, and in a more declarative manner (you describe the steps needed to be taken, and not how they are implemented), which is much easier to read and understand than imperative, low-level code.

Should you care too much that both handlers describe the same workflow twice?

I argue that you shouldn't. A simple glance over each handler now tells you immediately what each does. Leaving them like this allows me to quickly evolve them separately when needed, and if the workflow ever changes, I will have to think about what handleFooOnClick and handleBarOnClick need to do, not about what handleOnClick needs to do and who might be consuming it and how this change might interfere with them. I have now a reduced dependency on the outside whenever I need to change one of the handler's behaviour and a much smaller surface area affected by a change.

DRY should be applied to the point where it doesn't introduce coupling and where code is readable enough, cohesive and has a separation of concerns that makes evolving that code in the future much easier to do!

Sometimes you can't resist the urge to fiddle with code

When I can't resist the urge to DRY up my code, sometimes I might do this, which is halfway, I think, between what we had at the start (one common handler) and the solution I proposed before (two separate handlers):

function fooHandleOnClick() {
  // do X...
  // do Y...
}

function barHandleOnClick() {
  fooHandleOnClick();
}

return (
  <Foo onClick={handleFooOnClick} />
  <Bar onClick={handleBarOnClick} />
)

There is now no code duplication here and the coupling is singular, clear and unidirectional: barHandleOnClick depends on fooHandleOnClick, so there is only one point to decouple when needed, as opposed to fooHandleOnClick and barHandleOnClick both depending on handleOnClick.

Conclusion

Knowing when to apply DRY is not so cut and... dry (pun intended). Be wary. Every time you notice that DRY has led to coupling that you later regret, learn from the mistake. It's the only way to get a feeling for when it's right or wrong.

We've all learned that code duplication is bad. That is why we write functions or procedures, and even classes. We have rarely been told though that duplication can also be good! It's like one of those counterintuitive things that are in reality (sometimes) good.

Leave me your comments and links. Let me know what you or other developers you hold in high regards think about DRY.