The Point of Coverage

100% Code Coverage is Good, Actually

I’ve long said that 100% test coverage should be table stakes for any production system. Yes, I know all the objections - “coverage for coverage’s sake,” “you get what you measure,” “it measures the wrong thing.” I agree with every single criticism. Yet I still insist on 100%.

Test coverage has long been a controversial topic. Management often equates coverage percentage to code quality, pushing for teams to meet some arbitrary threshold that will magically ensure that we’re writing high-quality code. While I agree with the criticisms, I still insist that 100% coverage must be our baseline for any production system.

🍀 100% Coverage; What is it Good For?

While the clichéd retorts against coverage metrics are not without merit, they are only directionally correct. Insisting on 100% coverage for its own sake is a fool’s errand. Total coverage is still a worthy goal as a means of developing quality software. While coverage doesn’t indicate quality the deliberate pursuit of coverage enables quality in unique and powerful ways.

By striving to test each and every function, branch, and statement we write we force ourselves to engage with our code’s rough edges in the most intimate ways as early as possible. The tests we write are often the first time our code is engaged with - but it will not be the last! Testing puts the evolutionary pressure of “testability” on our design. By listening closely to what that pressure tells us we can immediately spot issues with the long-term maintainability of our code before anyone else even sees it.

If we approach testing our code with this mindset, 100% code coverage is simply an artifact of a job well done.

🎓 The Hidden Curriculum of Test Coverage

Most professional engineers that I’ve had the pleasure to work with generally do attempt to thoroughly test the code that they write. They identify the “happy path” but also ensure that a few obvious edge cases are covered as well. If they can achieve 100% coverage in this way, all the better - but so long as they’re not dipping below any automated baselines (eg. failing continuous integration builds) then this is an optional cherry on top of their work.

Those programmers who do set their sights higher often buckle at the first sign of adversity. Can’t come up with a clear API invocation that covers that one branch in that private method you wrote? “Eh, its probably not critical to test that branch. We do need to have it though just in case X, Y, or Z reason”. This self-delusion is a trap - any production code that warrants being written definitionally deserves to be tested. If we anticipate this pattern, however, we can identify it for what it really is: a signal about the quality of our source code.

🙉 When Testing Gets Hard, Listen

There are often scenarios where contriving a test scenario that mimics exactly what some edge case is programmed for is simply too difficult or even impossible. It is these exact scenarios that inform us that our design needs refactoring.

Private methods are by definition implementation details and as such should not be tested explicitly. That implementation detail, however, reflects a real use case of our system. While it is usually trivial to establish a test whose setup puts the system in a state that mirrors this “real-world scenario”. However any engineer who has written tests for any substantial amount of time in their career can tell you that this is not always the case.

Imagine that you’re building a feature that sends notifications to users, but only if they haven’t been notified recently (to avoid spam). This might look something like this:

class NotificationService
  def notify_user(user_id, message)
    user = User.find(user_id)
    
    if should_send_notification?(user)
      send_email(user.email, message)
      send_sms(user.phone, message) if user.premium?
      log_notification(user_id, message)
    end
  end
  
  private
  
  def should_send_notification?(user)
    return false if user.notifications_paused?
    
    last_notification = Notification.where(user_id: user.id)
                                   .order(created_at: :desc)
                                   .first
    
    return true if last_notification.nil?
    
    time_since_last = Time.now - last_notification.created_at
    time_since_last > throttle_period(user)
  end
  
  def throttle_period(user)
    user.premium? ? 1.hour : 24.hours
  end
end

This isn’t a particularly complicated class. It is generally cohesive, as all of this logic lives together. However when we attempt to fully test it, we run into problems quick. For example, how do you test the branch where last_notification.nil? is false and time-based logic kicks in?

The private method clearly does too much: database access, time calculations, business rules. Attempting to test this, while not impossible, introduces quite a lot of friction: you need to create database records, manipulate timestamps, and stub Time.now - to name just a few things.

🐣 Code that is Easy to Test is Easy to Maintain

Below is just one example of how we might respond to that difficulty. We can turn the difficult-to-validate should_send_notification? method into a dedicate class that more clearly models our notification domain:

class NotificationService
  def initialize(notification_policy:)
    @notification_policy = notification_policy
  end
  
  def notify_user(user_id, message)
    user = User.find(user_id)
    
    if @notification_policy.allowed?(user)
      deliver_notification(user, message)
    end
  end
  
  private
  
  def deliver_notification(user, message)
    send_email(user.email, message)
    send_sms(user.phone, message) if user.premium?
    log_notification(user.id, message)
  end
end

class NotificationPolicy
  def allowed?(user)
    return false if user.notifications_paused?
    
    last_sent_at = user.last_notification_sent_at
    return true if last_sent_at.nil?
    
    time_since_last = Time.now - last_sent_at
    time_since_last > throttle_period(user)
  end
  
  private
  
  def throttle_period(user)
    user.premium? ? 1.hour : 24.hours
  end
end

This approach has several immediate benefits, such as:

  • NotificationPolicy can be tested in isolation with simple user objects/doubles
  • Time-based logic is separated from delivery mechanism
  • Adding new notification channels doesn’t touch the policy logic

But perhaps more important than these immediate improvements is the reduced costs of maintenance over time. Imagine that six months later, you need to add “quiet hours” (no notifications 10pm-8am). In the original implementation, you’d be digging into that tangled private method. In our refactored version you add one method to NotificationPolicy and one additional condition in allowed?. The separation makes the change obvious and safe.

By leaving difficult-to-test code untested, we abandon signal that is crucial to ensuring our code is maximally resistant to entropy. Untested paths are where these design smells hide and you cannot improve that which you cannot smell.

Scaling these marginal improvements in maintainable design across every feature we implement pays dividends that counteract any perceived development costs. Each new feature becomes easier and easier to implement as our systems become more and more flexible.

⚡️ TDD as a Real-Time Feedback Loop

Anyone who has worked with me for any stretch of time knows that I am a passionate advocate for Test-Driven Development. By flipping the traditional script on its head, we’re able to force the signals we’re concerned with to appear far earlier in the software development lifecycle than ever before!

Think about it: even in the world where you’re writing tests after production code you’re thinking of edge cases and considering their costs and mitigations in the moment. By codifying these considerations in your specifications before even writing the code you put yourself in a position to write the correct APIs and design the right system before a single line of code has even been authored! This is the real power of TDD.

Done correctly, TDD also implies one hundred percent test coverage.

🎯 Coverage is a Process, not a Target

While I strongly believe that TDD is the best way to approach writing our code, it’s not a hard requirement for using code coverage to force better design decisions. What’s important is that we ensure total coverage and use each point of friction we find as a means of identifying improvements to be made.

It’s important to note that simply surfacing opportunities for improvement is not itself a silver bullet. Improved testability does not equate better design; it is simply one dimension of good code. Testable code isn’t necessarily well designed code, but well designed code is inherently testable. While aiming for testability puts pressure on the software’s design, that pressure is not dictatorial. You are still the one making the design decisions and those decisions are where things become problematic.

🦍 Brute Forcing Coverage is an Anti-pattern

Never before has it been more tempting to achieve a higher coverage number through brute force. LLMs and the agents that they power offer the ability to easily define a metric of success (coverage) and the context needed to achieve it. Viewing coverage as a target removes any value from derived from the process.

Without very detailed prompting, an LLM will always simply attempt to write the easiest possible test that moves the coverage metric in the right direction - whether it be abusing mocks and stubs or forcing private method calls. Agent-driven coverage efforts will optimize for the wrong metric to the exclusion of all else.

“No pain, no gain” is an ancient adage for a reason. Offloading the pain of achieving coverage to an LLM that cannot feel the pain and recognize it for what it is abandons any signal and removes the human spark that is needed to take that signal and turn it into a more maintainable design.

🌶️ 100% Coverage is Good, Actually

It can be easy to fall into the trap of seeing gaps in coverage as a problem to be solved. Viewing coverage gaps in this way can easily lead to missing what those gaps really are - missed signals. By aiming for complete coverage as part of day-to-day engineering work, we shift that crucial signal leftward and amortize the cost of design across the entire SDLC at the time of highest leverage - initial development. If we approach coverage gaps retroactively with this same mentality, we may lose some flexibility, but we can still find useful signal to point ourselves in the direction of a successful refactor.

So the next time someone argues against 100% coverage, ask them this: are you arguing against a metric, or are you arguing against listening to your code?