Writing Better Errors In Go

It is no secret that nobody loves Go's error handling. We pretend to love it, we tolerate it, but deep down we know that feeling. It is not the if err != nil that I'm interested in today; that is one editor snippet away. It is how much we lose in the standard library's error interface.

Go is billed as the language for writing robust web services, in part because of its error handling doctrine. "Errors are values and must be handled when they are encountered", we say. But what even is an error? What do we mean when we return an error?

The lowdown on errors

There are two facets to encountering an error value: the information content and the control flow decision. When we encounter an error value, we want to know what went wrong and what we must do next. This is summed up in the common pattern:

if err != nil {
    slog.Error("Message for the operator", "error", err)
    return possiblyWrappedErrorValue
}

These two facets also have different audiences in interactive applications like web services. There is the internal audience (the program and its operator) and the external audience (the caller of the service). Both react to the error in differently.

The program wants to log rich information for the operator and then alter its behaviour accordingly. The caller wants only the minimum necessary information to know what to do next (retry, correct its input and retry, or just give up and blame the service).

Both facets of errors should be considered separately. More importantly, the audiences must not be coupled: it should be easy to separate the content for the client from that for the operator. How can we achieve this?

An example from Rust

Error handling in Rust is still evolving. It has been left to the community to evolve a robust mechanism that serves these use cases sufficiently. This is what the standard library has to say about errors.

core::error

pub trait Error
where
    Self: Debug + Display,

---

Error is a trait representing the basic expectations for error values,
i.e., values of type E in Result<T, E>.

Errors must describe themselves through the Display and Debug
traits. Error messages are typically concise lowercase sentences without
trailing punctuation:

let err = "NaN".parse::<u32>().unwrap_err();
assert_eq!(err.to_string(), "invalid digit found in string");

Errors may provide cause information. Error::source is generally
used when errors cross "abstraction boundaries". If one module must report
an error that is caused by an error from a lower-level module, it can allow
accessing that error via Error::source. This makes it possible for the
high-level module to provide its own errors while also revealing some of the
implementation for debugging.

This is brilliant. First, the Error trait clearly represents both audiences we introduced: the operator is represented by Debug and the client is represented by Display. This means that we have two channels for reporting errors, so the audiences are not coupled. Furthermore, Error::source allows us to expose lower-level errors for reporting without confusing its audience or representation.

Also, notice that the error is still a value. It is any value that implements Error, and it can be carried in the second parameter E of Result<T, E>, or in any other data type. The reporting is not coupled to the control flow. You can imagine a different data type that can carry an Error and influence control flow differently from Result. Rust passes our criteria for the basic affordances of an error value. How about Go?

An example from Go

To demonstrate how this works in Go, let us take a minimal example around tamper-proofing an event log. The mechanism is simple: we chain the events using a simple hash function and store the HEAD hash in a separate audit log. On verification, we reconstruct the HEAD hash and compare it to the value in the tree. The tree is separately validated.

How can we communicate a detected tampering as an error? A first attempt may be as follows.

var ErrEventLogTampered = errors.New("event log tampered")

func VerifyEventLog(HEAD string, auditLogLength int, events []Event) error {
    if len(events) != auditLogLength {
        return fmt.Errorf("audit log and event log lengths differ: %w",
            ErrEventLogTampered)
    }

    headHash, err := chainEvents(events)
    if err != nil {
        return fmt.Errorf("could not chain events: %w", err)
    }

    if headHash != HEAD {
        return fmt.Errorf("event head hash %q differs from audit log HEAD %q: %w",
            headHash, HEAD, ErrEventLogTampered)
    }

    return nil
}

We have a package-level error variable ErrEventLogTampered that we can use to differentiate tampering errors from other program faults (like serialisation errors from chainEvents). How does this simple function fare on our criteria for error values?

Reporting and control flow

We have a package-level error variable that we can test for control flow and the content of the returned error is used for logging. This is fine.

Lower-level source reporting

Error wrapping using fmt.Errorf allows us to expose nested errors like a linked list. However, this facility has a terrible flaw. The entire error chain is always revealed at the first level. The following code snippet demonstrates this (run the code in the playground).

errA := errors.New("error A")
errB := fmt.Errorf("error B: %w", errA)
for err := errB; err != nil; err = errors.Unwrap(err) {
    fmt.Printf("error: %s\n", err)
}
// Output
// error: error B: error A
// error: error A

This is not great. We would hope that the caller will see only "could not chain events" if they inspected the returned error from a chainEvents failure, and can get the underlying cause if they care by calling errors.Unwrap. But the inner error's string representation pollutes the outer one. 👎

We can test underlying errors by using errors.Is. But that is clunky in its own way because we cannot know what functions in a package return a package-level error variable without looking at the source code.

Audience coupling

In good old software requirements clarity, here comes an additional requirement and the conversation plays out as follows.

PM: When we detect tampering, we want to communicate the cause to the caller. For differing log lengths, only the text "different log lengths" should be reported to the caller. For different hashes, the expected hash and the actual hash should be reported to the caller.

Devs: Have you no pity?

Notice that fmt.Errorf cannot help us achieve this unless we parse the string format of the error to extract those bits of information. We can improve on that by implementing our own custom error type, but the information we want to communicate to the caller and that we want to communicate to the operator are in the same stringly typed error interface.

type error interface {
    Error() string
}

This interface is a problem. It collapses all the structure of an error into an opaque string. This is the core of the problem: an anemic interface for errors was baked into the language too early, and we are stuck with it. Let us make a custom error type to see how that solution fares.


var ErrDifferentLogLengths = errors.New("audit log and event log lengths differ")

type WrongHashesError struct {
    Expected string `json:"expected_hash"`
    Computed string `json:"computed_hash"`
}

// Error implements error.
func (w WrongHashesError) Error() string {
    return fmt.Sprintf("computed hash %q does not match expected hash %q", w.Computed, w.Expected)
}

func VerifyEventLog(HEAD string, auditLogLength int, events []Event) error {
    if len(events) != auditLogLength {
        return ErrDifferentLogLengths
    }

    headHash, err := chainEvents(events)
    if err != nil {
        return fmt.Errorf("could not chain events: %w", err)
    }

    if headHash != HEAD {
        return WrongHashesError{Expected: HEAD, Computed: headHash}
    }

    return nil
}

We can test for the custom error type using and get the contained information using errors.As (fun fact: errors.Is will not work on our custom error struct 🫠). We can then log the intended messages for the operator and report the required information to the caller. Doable, but inconvenient. Just like with Rust, we require a bit more work to separate the audiences. But it is more inconvenient in Go, whereas in Rust we can use crates like thiserror and anyhow to remove the boilerplate.

By default, the error interfaces couples both error audiences by having only one method for getting the error message. It takes extra care to separate them, so you will be swimming against the tide as most Go packages do not separate them. Some standard library packages expose custom types, but many do not. You may even get some objections from your team if you try to separate them because it is not the default habit of Go programmers. Please promote the better practice, especially if you are a package author.

Go's room for improvement

The upside to this situation is that Go can improve. Here are two possible improvements that will benefit error handling.

With a decent macro or comptime system, we can derive most of this boilerplate at compile time.
Allowing constraints in return position will give us true sum types. By returning a type list, we can explicitly signal the errors that a function can return. This will make it easier to understand a function's failure modes without reference to the entire package scope.

I won't hold my breath. I have no reason to believe that any of these will ever be supported in Go. But writing this cleared my head of the frustration that I felt, and that is enough. Until next time.