More and more people seem to start making use of static code analysis tools. On the one hand, project sponsors conspire to express the process of software development in measurable numbers by tools like Sonar. Of course, not without reasons and often argued through Tom DeMarco’s famous, “You can’t control what you can’t measure”. On the other hand, software developers are interested in the automation of metrics in order to continuously check a certain level of code standards. In many cases, metrics even tend to become the Key Performance Indicators (KPIs) of the source code, which is absolutely not advisable in my opinion. Thus, there seems to be a common misunderstanding about the power and limits of metrics, which I would like to tackle in this post.

Metrics in a nutshell

To go into detail, we first need to take a step back and figure out what metrics are all about. The task of a metric is to quantify a complex matter by means of a standardized or at least well documented procedure. To formulate this in layman’s terms, metrics provide a well interpretable and comparable value for complex stuff; And usually, there is a lot of complex stuff in software, which result in a wide range of available metrics, probably not as much as JavaScript frameworks these days, but still enough to get one confused.

Thankfully, most tools use similar metrics for measuring attributes of the source code like cyclomatic complexity, lack of cohesion, coupling or duplication-detection. All these attributes have their justification: Low cohesion, high coupling or a high number of duplications are almost always negative indicators for maintainability. Therefore, the assumption is that these identified technical issues in the source code provide some good insight about the overall maintainability of a codebase.

Use them with caution

Let’s think of these metrics from the example of test coverage. Having ten tests is for sure better than zero tests. But are ten flaky and hard-to-maintain tests still better than five robust and meaningful ones? Or how about method-signature-documentations? People tend to expect method-signature-documentations from high-quality codebases; But, are meaningless, listlessly written, or even autogenerated method-signature-documentations better than missing ones?

So, it seems to me, that these technical metrics are prone to false positives, as they often only work in one direction. People usually sum this up as “you should never optimize a negative indicator.” I tend to describe this by the example of blood pressure: It is almost always a bad sign if somebody has a very high blood pressure, but this does not mean trying to constantly minimize it is advantageous. If you think about it, a blood pressure of zero is undoubted very unfavorable. The same counts for metrics like test coverage. Zero or very low-test coverage can be identified as a problem, but the existence of high test coverage does not automatically imply good tests in a codebase.

The tip of the iceberg

That’s one of the reasons why metrics can only inspect the tip of the iceberg in my experience. Surely metrics provide a tried and tested way to get a first impression but they lack in providing a deeper insight into important assets like consistency, tight mapping of domain concepts to technical implementations, or providing the right level of abstractions, just to name a few.

Of course, we can start trying to find metrics that evaluate if abstractions are well-chosen or concepts in a codebase are consistent, but I’m quite sure going down this path is, unfortunately, a road to nowhere, as these source code properties normally cannot be measured useful anymore by the simple usage of negative indicators. They depend on a wide variety of complex factors, which can be a part of the codebase, but often aren’t. The latter case makes the application of metrics almost impossible. For example, you won’t be able to verify the consistent usage of the ubiquitous language or a valid mapping between the technical implementation and the real word model by only observing a given codebase.

What maintainability really needs

So, if metrics only observe a partition of the maintainability, what’s the alternative? Well, as software is a social construct, the most important ingredient for a maintainable codebase is commitment. People will always find a way around to sidestep given rules if they don’t believe in their benefits. But, if they have a common understanding and commitment about maintainability in their codebase, they will no longer question it.

This definition of maintainability within a team is not rigid but dynamic, depends on the context, experiences or the given mindset in a team, and is therefore in most cases not definable by measures. If you force this commitment, the team will follow this stable vision of maintainability by themselves. For example, by using concepts like Pull-Requests, or maybe even some well-selected metrics on which they agree as team.

I think there can be cases where this approach does not lead to the expected result. But that’s usually an indicator of non-existent or inadequate framework conditions. The application of metrics would only combat the symptoms of the problem instead of the cause.

Conclusion

Metrics have the great advantage of automation, but can only cover a small part of what actually impacts the maintainability of the source code. Therefore, a common understanding of and commitment to the definition of maintainability within a team should be forced, as it usually leads to better results in the long run. Like so many other suggestions on achieving a maintainable codebase, this seems awfully incomplete. Yet, I have found it useful – and in the end, that is what counts.