Formatting is Unreasonably Expensive for Embedded Rust

2019-12-08

As a (late) part of the #rust2020 request, I wanted to talk about one of the paper-cuts that has become decently problematic for new and experienced users of Embedded Rust:

The size costs of formatting in Rust today are unreasonably expensive for users of Rust on bare metal embedded systems.

In particular there are two root issues I can see:

Out of the box formatting machinery, including format!(), write!(), and panic!() in Rust is not program space efficient
More problematically, there is no way to change or opt-out of this cost, and it can show up in unexpected places that are outside of developer control

To be clear: I don't think that these issues are a failing of the language, but rather a sign of growth. The current formatting solution has managed to work without issue for most people (and is still okay-ish in edge cases like embedded) for the better part of 4 years. However as we push Rust to more use cases, the language will need to adapt if it would like to fill into these spaces.

This post aims to illuminate the challenges faced today, and attempt to draw an attainable set of paths moving forward.

The Situation

As of December 2019, the machinery in the Rust language for formatting is generally optimized for three things:

User flexibility and convenience (to format whatever and however, with as little effort as possible)
Compile time complexity (don't blow up build times)
Run time speed (printing shouldn't be slow)

This design trade off works well for most users: It allows for maximum flexibility and dynamic-ish behavior for users at the cost of slightly larger (in the order of 10s-100s of KiB) and more complex (trait objects, recursion) generated code.

However for embedded users, size is a primary concern. Often we only have 32-128KiB for all code, as well as 16-64KiB of RAM (or less).

There is currently no way to disable, configure, or replace the provided formatting machinery, or to customize it in any way. Formatting (through panics) can be invoked in unexpected places, and can be difficult to remove consistently.

A quick disclaimer

I have not been able to find a good written overview of how the formatting machinery currently works, or what conversations were had about its design.

My understanding of how it works and the design trade offs that were made is largely informed by previous informal conversations with rustc contributors. If you know of a good written reference, please let me know!

It’s dark and ancient magic. I don’t think anyone knows it very well, never mind documentation
— nrc (@nick_r_cameron) December 8, 2019

Examples

All examples provided below are sourced from this repository. On each commit, there is is a file called size.txt, which contains the output of arm-none-eabi-size demonstrating the total size of the code, as well as cargo bloat, which generally shows the Rust components that make up the total size.

This project is based on the Nordic nRF52, an Arm Cortex-M based micro-controller. All examples have been compiled with opt-level = 's', though the core/std library was not recompiled from source, so those components will be at opt-level = 3. Additionally, the following settings were made to the release profile:

[profile.release]
lto             = true
panic           = "abort"
debug           = true
incremental     = false
codegen-units   = 1
opt-level       = 's'

Baseline

For our baseline, we use a fairly regular minimal embedded Rust example. This firmware image will periodically blink an LED and output text to a serial port, and it uses the panic-persist crate, which stores panic messages to RAM and reboots the device in the case of a panic.

Here is the reference firmware main.rs:

#![no_std]
#![no_main]

// Panic provider crate
use panic_persist as _;

// Used to set the program entry point
use cortex_m_rt::entry;

// Provides definitions for our development board
use dwm1001::{
    nrf52832_hal::prelude::*,
    DWM1001,
};

#[entry]
fn main() -> ! {
    let mut board = DWM1001::take().unwrap();
    let mut timer = board.TIMER0.constrain();
    let mut toggle = false;

    loop {
        // board.leds.D10 - Bottom LED BLUE
        if toggle {
            board.leds.D10.enable();
        } else {
            board.leds.D10.disable();
        }

        toggle = !toggle;

        // nRF52 requires data to be in RAM, not Flash
        const MSG: &[u8] = "Hello, world!\r\n".as_bytes();
        let mut buf = [0u8; MSG.len()];
        buf.copy_from_slice(MSG);

        board.uart.write(&buf).unwrap();

        // Delay 1 second (in microseconds)
        timer.delay(1_000_000);
    }
}

Unfortunately, this generates a 14.2KiB binary, which isn't prohibitively large, but somewhat unreasonable for the limited functionality demonstrated.

   text    data     bss     dec     hex filename
  14512       0       4   14516    38b4 tiny-nrf52

===============

File  .text   Size            Crate Name
0.2%   9.7%   812B              std <char as core::fmt::Debug>::fmt
0.2%   9.4%   790B              std core::str::slice_error_fail
0.2%   8.7%   728B              std core::fmt::Formatter::pad
0.2%   7.6%   634B        [Unknown] main
0.1%   6.6%   550B              std core::fmt::Formatter::pad_integral
0.1%   6.3%   530B              std core::fmt::write
0.1%   5.9%   490B              std core::slice::memchr::memchr
0.1%   4.3%   360B nrf52_hal_common <nrf52_hal_common::uarte::Error as core::fmt::Debug...
0.1%   4.3%   360B              std core::fmt::num::<impl core::fmt::Debug for usize>::fmt

Here we see that our actual code (main) is only the 4th largest item in the binary at 634 bytes, and nearly every other item at the top of the list is a component of formatting, or a Debug trait implementation.

`panic-reset`

As an initial comparison, we can use a panic provider crate that doesn't use the panic output message at all, and hope that the optimizer is smart enough to discard any undesired panic formatting entirely. In this case, we swap panic-persist for panic-reset:

 // Panic provider crate
-use panic_persist as _;
+use panic_reset as _;

You can see the full changes necessary in the silent panic diff, but without changing the primary functionality of our code at all (we still blink a light, print text to the UART, and reboot on a panic condition), we see a huge difference:

   text    data     bss     dec     hex filename
   1176       0       4    1180     49c tiny-nrf52

===============

File  .text Size       Crate Name
0.3%  72.5% 618B   [Unknown] main
0.1%  13.1% 112B cortex_m_rt Reset
0.0%   4.5%  38B    cortex_m cortex_m::peripheral::scb::<impl cortex_m::peripheral::SCB...
0.0%   0.7%   6B panic_reset rust_begin_unwind
0.0%   0.7%   6B cortex_m_rt ResetTrampoline
0.0%   0.7%   6B         std core::result::unwrap_failed
0.0%   0.7%   6B         std core::panicking::panic
0.0%   0.7%   6B         std core::panicking::panic_fmt
0.0%   0.7%   6B         std core::panicking::panic_bounds_check
0.0%   0.2%   2B cortex_m_rt HardFault_
0.0%   0.2%   2B cortex_m_rt DefaultPreInit
0.0%   0.2%   2B cortex_m_rt DefaultHandler_

The output of cargo-bloat is now the entirety of our produced binary, weighing in at 1.2KiB, or a 92% reduction in size.

Unfortunately this is a very "all or nothing" proposition, as we no longer have our panic messages available for debugging, and it still requires the optimizer to intelligently recognize that we never use any of the formatting machinery.

But what happened? We never explicitly print, we never explicitly format, so where does the size come from?

Swallowing errors

The spoiler is that we call .unwrap() in two places in our example:

// Number 1: Unwrapping an Option<T>
let mut board = DWM1001::take().unwrap();

// Number 2: Unwrapping a Result<T, E>
board.uart.write(&buf).unwrap();

When you hit the error case of unwrap (either a None or an Err(T)), formatting code is used to display the Debug representation of the error case. Depending on the data returned, this can be pretty big.

Here, rather than entirely discarding the panics by switching to panic-reset, we instead handle the errors by wrapping the functionality in main:

#[entry]
fn main() -> ! {
    let _ = inner_main();

    // We should never get here unless there is
    // an error condition
    panic!();
}

fn inner_main() -> Result<(), ()> {
    // The old contents of main
}

Additionally we use ok_or and map_err to replace the error cases previously handled by .unwrap() to instead return an empty tuple:

-    let mut board = DWM1001::take().unwrap();
+    let mut board = DWM1001::take().ok_or(())?;

-        board.uart.write(&buf).unwrap();
+        board.uart.write(&buf).map_err(drop)?;

You can see the entirety of the changes in the swallow error diff which still uses the original panic-persist crate and has some formatting in use, but the savings are still significant:

   text    data     bss     dec     hex filename
   4708       0       4    4712    1268 tiny-nrf52

===============

File  .text   Size         Crate Name
0.2%  19.4%   728B           std core::fmt::Formatter::pad
0.2%  16.4%   616B    tiny_nrf52 tiny_nrf52::inner_main
0.2%  15.7%   588B           std core::fmt::num::imp::fmt_u32
0.1%  14.1%   530B           std core::fmt::write
0.1%   6.0%   224B panic_persist <&T as core::fmt::Display>::fmt
0.1%   5.5%   208B panic_persist <&mut W as core::fmt::Write>::write_char
0.0%   3.0%   112B   cortex_m_rt Reset
0.0%   2.8%   104B     [Unknown] __aeabi_memcpy

We were able to knock nearly 10KiB off of the binary size, just by replacing two .unwrap() statements!

A quick interlude

In actuality, the majority of the 10KiB of code comes from unwrapping the error in sending data to the UART. To demonstrate how unpredictable the formatting code size really is, the following example shows what happens to the total code size when changing just this one line:

// A - nrf52_hal_common::uarte::Error
board.uart.write(&buf).unwrap();

// B - ()
board.uart.write(&buf).map_err(drop).unwrap();

// C - i32
board.uart.write(&buf).map_err(|_| 5).unwrap();

// D - &str
board.uart.write(&buf).map_err(|_| "argh").unwrap();

// E - None
board.uart.write(&buf).map_err(|e| <Option<()>>::None ).unwrap();

Size	Error Type
14KiB	A - `nrf52_hal_common::uarte::Error`
4KiB	B - `()`
5KiB	C - `i32`
10KiB	D - `&str`
14KiB	E - `None`

Bring your own Context

In our last change, we won't get any additional information for errors that could occur, we're just returning an Err(()), and we always hit the same panic!() message in main(), which means we don't know what actually happened, just that there was an error.

We can add back our own context by using human readable error messages to get "just enough context" for our use case. First, we change main() to gracefully reboot if inner_main() returns Ok(()), and to panic with an error if inner_main() returns an Err:

 #[entry]
 fn main() -> ! {
-    let _ = inner_main();
-    panic!();
+    match inner_main() {
+        Ok(()) => cortex_m::peripheral::SCB::sys_reset(),
+        Err(e) => panic!(e),
+    }
 }

Then we change inner_main() to return Result<(), &'static str> instead of Result<(), ():

-fn inner_main() -> Result<(), ()> {
+fn inner_main() -> Result<(), &'static str> {

And we change our error cases to return a little context:

-    let mut board = DWM1001::take().ok_or(())?;
+    let mut board = DWM1001::take().ok_or("Err getting board!")?;

-        board.uart.write(&buf).map_err(drop)?;
+        board.uart.write(&buf).map_err(|_| "Err writing hello!")?;

We even take advantage of panic-persist's ability to retrieve panic messages from the previous boot:

if let Some(msg) = panic_persist::get_panic_message_bytes() {
    // write the error message in reasonable chunks
    for i in msg.chunks(255) {
        board
            .uart
            .write(i)
            .map_err(|_| "Err writing error!")?;
    }
}

With all of these changes, we've increased the binary size a bit, but still far smaller than our original code. We've even have the same ability to see where the errors occurred. See the context diff for all changes, but here's the new size numbers:

   text    data     bss     dec     hex filename
   4864       0       4    4868    1304 tiny-nrf52

===============

File  .text   Size         Crate Name
0.2%  19.9%   776B     [Unknown] main
0.2%  18.7%   728B           std core::fmt::Formatter::pad
0.2%  15.1%   588B           std core::fmt::num::imp::fmt_u32
0.1%  13.6%   530B           std core::fmt::write
0.1%   5.7%   224B panic_persist <&T as core::fmt::Display>::fmt
0.1%   5.3%   208B panic_persist <&mut W as core::fmt::Write>::write_char
0.0%   2.9%   112B   cortex_m_rt Reset
0.0%   2.7%   104B     [Unknown] __aeabi_memcpy

Wrapping up our examples

We've now been able to bring our code to be roughly equivalent, with a reduction to around 1/3 of the original code size.

For a total summary of code sizes:

   text    data     bss     dec     hex example
  =====    ====     ===   =====    ==== =======
   1176       0       4    1180     49c panic-reset
   4708       0       4    4712    1268 without context
   4864       0       4    4868    1304 with context
  14512       0       4   14516    38b4 original with unwrap

Workarounds

There are a couple workarounds to these issues that are possible today, with various trade offs:

Using `panic-halt` or `panic-reset`

A user could chose to use an existing panic crate that discards panic messages completely. This helps to handle cases where code size increases were caused by an errant .unwrap() or panic!().

This reduces the code size greatly, with extremely minor code changes.

However, this doesn't help with normal formatting cases, and forces the user to do without panic messages of any kind. It also relies on the optimizer to eliminate these code paths, which is not necessarily guaranteed, especially without codegen-units = 1, incremental = false, and lto = true set.

Manually avoiding formatting, unwraps, etc.

A user could carefully avoid the use of formatting and ensure that they don't use unwrap (as we have done here), however this puts a lot of manual review work on the end developer. Tools like panic-never can help automate the discovery of these problems, but not assist in fixing them.

Furthermore, unwraps or panics could exist in code provided by another crate, making it difficult to patch directly without waiting for a PR to land, or maintaining a fork of the project.

Replacing the built-in format machinery

The user could chose to replace their usage of format machinery with libraries like japaric's μfmt, which implements a similar API to the built-in Display, Debug, format!() and write!() interfaces, but in a more efficient, if not limited fashion.

Unfortunately μfmt is not a drop-in replacement, and won't help with cases where an unexpected panic or unwrap is the source of code increase, as those still rely on the official Debug trait.

This option would combine well with the previous workarounds by avoiding an expensive panic while also addressing the needs to format code within a project.

Long term Solutions

Let's look back at my two problem statements:

Out of the box formatting, including format!(), write!(), and panic!() in Rust is not program space efficient
More problematically, there is no way to change or opt-out of this cost, and it can show up in unexpected places that are outside of developer control

Solution 1: Changing Panic

The largest issue in my opinion is that formatting through panic is impossible to completely opt-out of.

It could be imagined that there could be another interface for Panicking that could be opted-in to, one that doesn't invoke any usage of formatting while still giving some context.

The panic_info_message feature currently in nightly, tracking issue here, has a couple of aspects that could make it easier to implement a no_std panic handler that doesn't require formatting using the default machinery, though it is still under discussion.

Note that this solution makes no attempt to address the default formatting machinery as it currently stands.

Solution 2: Tweaking Format

A second option would be to provide two or more implementations of the formatting machinery in libcore/libstd, and allow the user to select which one is used at compile time. This could show up in the Cargo.toml of your application:

# Options are: 'smol', 'fast', and 'balanced'.
# Defaults to 'balanced'
format-machinery = "smol"

This would allow for multiple end-user options, while not being forced to stabilize any component of how formatting works under the hood. However this would require changes to the upstream standard library whenever a new optimization is needed.

This solution does not address the problem of unexpected panics introducing an unwanted formatting cost, but this may be acceptable if the formatting machinery is sufficiently compact.

Solution 3: Replacing Format

Think "panic handler crates", but instead, "format handler crates". This one is a bit trickier, but in general, it would require:

Stabilizing certain interfaces for formatting
Stabilizing a way for crates to implement these interfaces
Stabilizing a way for applications to select which of these crates to use

I have a feeling this is the best long term solution, however I would expect this solution to be relatively slow to implement for a few reasons:

Format handling is fairly "baked in" to the compiler itself
Stabilizing low level parts of the compilation process can take time to get right, as seen by the slow push for stable panics and global or local allocators.

Summing it all up

In general - I think we need all three solutions, and some of the workaround(s). For now, making it easier to make lighter-weight panic handlers (like Solution #1) and using crates like μfmt can help reduce the cost in the short term, and we should document those patterns.

In the medium term (in 2020), I'd like to see Solution #2 happen, as a way to test out implementing a new formatting handler, but without being forced to stabilize a particular interface yet.

In the long term (2020 and beyond), I'd like to see Solution #3 happen, once we have learned from implementing Solution #2.

James Munns