Mocks and Code Maintenance

ยท

9 min read

A Twitter mutual recently tweeted about issues with mocks in his codebase. I was surprised by his tweet; I used to think that Scala users have it all nice and tidy ๐Ÿ˜…. Coincidentally, I had just had my day's worth of mocking problems in Go. It turns out that we have the same problems, regardless of language. Here, I will share some of the problems I have had with mocks in Go, and why I prefer to test the real thing.

What I Mean by Mocks

By mocks, I mean any test double that records interactions through its interface and asserts expectations against those interactions. Mocks are the most active kind of test doubles. We set them up with preset outputs and expectations that they assert automatically.

I will also loosely refer to spies as mocks in this article. Spies are less active than mocks. We set up spies with preset outputs only, then assert on their recorded interactions manually. Basically: if the object asserts our expectations automatically, it is a mock; if we have to ask what arguments it saw and assert them ourselves, it is a spy.

Mocks in Go

Go codebases use interfaces extensively. It is the primary means of expressing common behaviour in the language. A program that wants to vary a part of its behaviour will accept that part as an interface and call methods on the interface. For example, programs that want to read input as byte streams from any source will often accept an io.Reader interface.

Interfaces, therefore, are natural seams for overriding behaviour when writing tests in Go. To test a function that accepts an interface, we have to pass an implementation of that interface. We can pass the same implementation that we use in production, or a fake implementation (a test double, like that cat and its double up there). The function does not care about the details, it only wants something that looks the part.

However, we often do not pass the real implementations. The behaviours we abstract are usually routines that read from databases, filesystems, and networks. We try to limit their proliferation in our tests, so we pass their doubles. We can generate those doubles either manually or automatically.

Manually writing mocks

Given a function that we want to test with an injected interface, we can write an implementation for the interface right there in the test file and be on our way. This is easy to do in three steps:

  1. define a new type that will implement that interface;

  2. define the necessary methods for the type;

  3. instantiate the type and pass it to the system under test (SUT).

This is my go-to approach for writing test doubles in Go.

Advantage

Writing mocks manually keeps us aware of the cost of our API. This is important for both application code and library code. Every interface we require has some cost to the users of our API. By writing mocks manually, we can sense when this cost can be reduced by refactoring.

An example of an expensive interface is an interface that requires too many methods. This is common in poorly factored code written in the layered architectural style. If a function requires UserService, where UserService itself defines 10 methods to be implemented, we can tell that testing the clients of UserService will be difficult. We have to implement all 10 methods just to use one.

Drawback

How would we add tests to a codebase with such huge interfaces everywhere? Writing manual mocks would be tedious. It is easier to generate the mocks so that we can write some characterisation tests for safety. Manual mocks do not play well with legacy codebases.

Generating mocks

Mocking libraries exist to reduce the tedium of writing mocks. In Go, we cannot mock interfaces at runtime, so we generate the scaffolding code as part of the compilation process. The tool I am most familiar with is called counterfeiter. It checks interfaces for code generation directives that request mock scaffolding. It then generates the scaffolding into a separate package.

Advantage

The tedium of implementing large interfaces in legacy code is a compelling reason to generate mocks. Generated mocks satisfy the interface automatically, letting us focus on specifying only the behaviour and expectations we want.

Furthermore, generating mocks speeds up the test-driven development (TDD) process by letting us refactor interfaces easily without having to redo fake test implementations as we change interfaces.

Drawbacks

Generating mocks takes from us, even as it gives. We get blanket mocking of a large interface, we get a problem in itself. We get rapid iteration for the TDD process, we lose the direct feeling of the cost of our API requirements.

I will describe the problems of broad interface mocking a bit more and then suggest some guidelines for using mocks effectively.

Pitfalls of broad interface mocking

Without care against cruft, interfaces tend to grow as features are added to the software. Consider this hypothetical UserService interface with humble beginnings.

type UserService interface {
  // Snip
  Login(ctx context.Context, email string, password string) (User, error)
  ChangePassword(ctx context.Context, userID, oldPassword string, newPassword string) (User, error)
}

You may look at it and think, "This looks wrong. Why is authentication here?" Yes, authentication should not be here. But your lowly programmer has to deliver that new feature in record time to keep the velocity high. That burndown chart gotta burn. More features until morale improves.

type UserService interface {
  // Snip
  Login(ctx context.Context, email string, password string) (User, error)
  ChangePassword(ctx context.Context, userID, oldPassword string, newPassword string) (User, error)
  // New methods for password reset
  ChangePasswordAfterReset(ctx context.Context, token string, newPassword string) (User, error)
  ResetePassword(ctx context.Context, email string) error
  // Snip
}

Interfaces exhibit the Lindy effect too: bigger interfaces tend to grow bigger because they often are the path of least resistance to adding features. They already have some methods that we need, we just need one more method. One more method and we'll make it, bro. This is how you grow the god interface.

If we tried to use manual mocking in such a codebase, we'd be in pain. So we reach for candy a mocking library and make it happen. Look ma, we have unit tests! The proper thing to do is to refactor that code and put things where they should be โ€” in objects with focused responsibilities. But what can stop us when we can simply mock the broad interface?

This is okay until it is not. I recently had to extend such a system to support different login methods, where the system would authenticate the user using different systems, depending on the feature that the user wanted to access. Since the new auth system was specific to the new feature we were building, we had to split the interface using the interface segregation principle. It was difficult because... Never mind.

Lacking that feedback from manual mocking, we lose the design benefit of TDD and write poorly organised code. Broad interfaces beget broader interfaces and more blanket mocking. Of course, an experienced programmer would watch out for this, so we can affectionately file this as a skill issue.

Managing the problems of mocks

Reducing logic duplication

In addition to the problems of poor use of mocks that I have presented so far, there is an inherent problem with mocks โ€” they duplicate implementation around tests. Each test case that uses a mock has to be taught how to behave like the real implementation. Whenever the expected outputs for each scenario change, we must update the mocks.

We can reduce the duplication by factoring out the mock setup using creation methods, but that gets us only so far. There will remain at least one place where we teach a mock how to do what the real implementation would do. This is a tradeoff we accept when we use mocks.

Also, try to avoid mocks that exercise custom logic to create indirect outputs or verify expectations. Mocks should simply verify interactions by making simple assertions on indirect outputs.

Reducing test coupling

Another problem with mocks is that they can create coupling too close in the test. This is by design โ€” mocks test indirect outputs and interactions among objects. But if we mock an entire interface to see how it plays in an interaction, we may be checking at a level that is too high. Consequently, slight changes to the interaction will fail the tests. We can avoid this problem by increasing the granularity of the mocks (deep mocking).

Also, mocking an interface that has any logic (such as I/O retry, back pressure, etc.) hides the effects of such logic from the test. This often indicates that the mocks are coarse. We can solve this problem by moving such concerns to a separately tested component.

Deep mocking as an alternative

Consider the offending UserService above. First, we can get rid of the interface (it's an article, so we're not breaking any call site ๐Ÿ˜). It is an application service that can have only one implementation. We can tell that by its inputs: it has to look up entities of interest using the email, ID, or token that it receives.

type UserService struct {
  dbclient http.Client
}

func (self UserService) Login(ctx context.Context, email string, password string) (User, error) {
  // snip
}

Assume that our database is a remote HTTP service. We can test UserService.Login by faking the underlying http.Client that it uses for dbclient. This lets us test the entire logic while pushing the mocks to the leaves of the dependency graph.

With deep mocking, we can mock at the leaves of our dependency graph and assert actual outcomes rather than internal plumbing. We can change the internal plumbing without changing the tests. This is useful for limiting side effects in integration tests. I wrote a library (transportest) that simplifies generating mocks for HTTP clients.

Extract separate concerns

If we want to support different database clients, we can abstract dbclient to a narrow interface (only the methods we need for this use case). The database then becomes a separate concern that we can mock when testing this use case, while the real implementation can be tested separately. Retries, rate limiting, backpressure handling, etc. can also be injected as separately tested concerns.

When we extract separate concerns, the logic in the SUT is delegated to other components. We may use mocks to verify the interactions and accept the coupling. We may also forgo verifying such details through mocks and use the integration patterns I described in Testing Event-Based Workflows Using Node.js and RabbitMQ instead.

Key takeaways

  1. Mocks are useful, but they may impose some undesirable properties on your code. Use them carefully.

  2. Avoid making spurious interfaces. Application services have no use being behind interfaces. Before creating an interface, stop and ask why you should.

  3. Keep interfaces small and focused. Writing mocks manually can help us feel the cost of our interfaces. Mocking libraries can hide this cost, so we should be vigilant when using them. Like ORMs and DI containers, mocking libraries can hide problems that we would easily notice if we had to write test doubles manually.

  4. Large interfaces tend to grow larger, and broad interface mocking makes this easy. Keep interfaces small and focused.

  5. Mocks often duplicate application logic, hide important logic from test paths, and couple tests to internal program interactions. Refactor tests to reduce duplication when it occurs. Extract service logic as separate concerns, and consider other integration patterns for component interactions in tests.

  6. If you must verify internal component interactions by asserting method calls, do it sparingly.