Postcard 1.0.0 Release

2022-06-21

Quoting from the README:

Postcard is a #![no_std] focused serializer and deserializer for Serde. Postcard aims to be convenient for developers in constrained environments, while allowing for flexibility to customize behavior as needed.

I first published postcard back in 2019, as a way to get "all the good stuff from Serde" in a format that would work for embedded systems. Since then, people all over Rust are using Postcard as a general purpose, compact, and flexible Serde format, not just embedded folks! Now, 3 years later or so, it has reached 1.0.0!

As of yesterday, June 20th, postcard v1.0.0 has been released.

This blog post is an extended overview of the changes since the last stable release, v0.7.3.

This work was made possible thanks to sponsorship from the Mozilla Corporation, and I'd like to thank them again for their support!

`varint` all the things!

The largest user-visible change is that more integers are now variably encoded on the wire.

Background

Previously, only enum discriminants and the length of slices were variably encoded. This was typically an "easy win" for enums as it is rare to have enums with more than 127 variants, which meant saving three bytes per enum on the wire. Additionally it was generally a positive for slices, as it is also rare to send slices with counts close to the max usize amount. All other integers were sent as an array of little-endian bytes, basically what you'd get if you called the .to_le_bytes() function on that integer.

However, there was a subtle compatibility issue here: serde doesn't actually have a wire type for usize or isize. This means when you serialize a usize on a 64-bit platform, it is eight bytes on the wire. When you serialize a usize on a 32-bit platform, it is four bytes on the wire.

Since postcard is designed to facilitate communication between dissimilar systems, especially embedded systems and desktops/servers, this was a real problem!

The actual change

In order to resolve this, postcard now encodes ALL integers larger than one byte as a variable length integer. Enum discriminants and slice lengths are still encoded as variable length integers. This includes u16, u32, u64, u128, i16, i32, i64, and i128. Now, we still encode varint(usize) as a varint(u32) on a 32-bit platform, but with variable length encoding, we can also now detect when a "larger system" is sending too big of a number to a "smaller" system. This detection now correctly leads to a reported error while decoding.

As a specific example, here's how some u32s would be encoded in postcard 0.7 and 1.0:

Number	Hex	Postcard 0.7	Postcard 1.0
64	`0x0000_0040`	`[0x40, 0x00, 0x00, 0x00]`	`[0x40]`
69420	`0x0001_0f2c`	`[0x2C, 0x0F, 0x01, 0x00]`	`[0xAC, 0x9E, 0x04]`
2000000000	`0x7735_9400`	`[0x00, 0x94, 0x35, 0x77]`	`[0x80, 0xA8, 0xD6, 0xB9, 0x07]`

In most cases (whenever your number is not at the very top of the integer range), this will translate to either a reduction or neutral change in wire size. Performance impact has been measured and is minimal (+/- single digit percentages in benchmarks) due to the simple nature of the encoding and decoding scheme.

More details on the encoding scheme and additional examples are provided in the Unsigned Integer Encoding section of the format specification.

Zigzag Encoding

One issue with this encoding scheme is that two's compliment signed numbers use the most significant bit to store the sign. This means that a small negative number like -1_i32 would be encoded as 0xFFFF_FFFF, the WORST case for this encoding scheme!

To address this, signed integers are first Zigzag encoded, then encoded with variable length. Zigzag encoding stores the sign bit in the LEAST significant bit of the integer, rather than the MOST significant bit. This means that signed integers of low absolute magnitude (e.g. 1, -1) can be encoded using a much smaller space.

For example, the following 16-bit signed numbers would be encoded as follows:

Dec	Hex*	Zigzag (hex)
0	`0x00_00`	`0x00_00`
-1	`0xFF_FF`	`0x00_01`
1	`0x00_01`	`0x00_02`
63	`0x00_3F`	`0x00_7E`
-64	`0xFF_C0`	`0x00_7F`
64	`0x00_40`	`0x00_80`
-65	`0xFF_BF`	`0x00_81`
32767	`0x7F_FF`	`0xFF_FE`
-32768	`0x80_00`	`0xFF_FF`

*: This column is represented as a sixteen bit, two's compliment form

Comparing Postcard 0.7 to 1.0:

Dec	Postcard 0.7	Postcard 1.0
0	`[0x00, 0x00]`	`[0x00]`
-1	`[0xFF, 0xFF]`	`[0x01]`
1	`[0x01, 0x00]`	`[0x02]`
63	`[0x3F, 0x00]`	`[0x7E]`
-64	`[0xC0, 0xFF]`	`[0x7F]`
64	`[0x40, 0x00]`	`[0x80, 0x01]`
-65	`[0xBF, 0xFF]`	`[0x81, 0x01]`
32767	`[0xFF, 0x7F]`	`[0xFF, 0xFF, 0x02]`
-32768	`[0x80, 0x00]`	`[0xFF, 0xFF, 0x03]`

More details on the encoding scheme and additional examples are provided in the Signed Integer Encoding section of the format specification.

An escape hatch

In some cases, it is not desirable to always use variable length encoded data. In particular, I've had reports from users that have serialized 16-bit floating point numbers (fp16) as u16s, which often hit the "worst case" encoding size, increasing their message sizes by 50%.

Additionally, although postcard has always encoded data in little-endian format, some users have asked for the ability to encode big-endian data, often for zero-copy purposes or compatibility with other message formats.

For this reason, postcard has added a pair of convenience wrapper types, FixintLE<T> and FixintBE<T>, which do not use the encoding schemes described above, and are provided for all 16-128 bit integer types.

As a specific example, here's how some u32s would be encoded using these types:

Number	Hex	`FixintLE<u32>`	`FixintBE<u32>`
64	`0x0000_0040`	`[0x40, 0x00, 0x00, 0x00]`	`[0x00, 0x00, 0x00, 0x40]`
69420	`0x0001_0f2c`	`[0x2C, 0x0F, 0x01, 0x00]`	`[0x00, 0x01, 0x0F, 0x2C]`
2000000000	`0x7735_9400`	`[0x00, 0x94, 0x35, 0x77]`	`[0x77, 0x35, 0x94, 0x00]`

As a note, these were (accidentally) not included in the postcard 1.0.0 release, but will be released shortly as part of postcard v1.0.1.

Additionally, as postcard is "just another" serde backend, it is always possible to override the default serialization and deserialization methods for your types if you find that the default options do not suit you well. Under the hood, these types just implement the Serialize and Deserialize traits manually as such:

impl Serialize for FixintLE<u32> {
    #[inline]
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        self.0.to_le_bytes().serialize(serializer)
    }
}

impl<'de> Deserialize<'de> for FixintLE<u32> {
    #[inline]
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: serde::Deserializer<'de>,
    {
        <_ as Deserialize>::deserialize(deserializer)
            .map(u32::from_le_bytes)
            .map(Self)
    }
}

A written and stable wire specification

Although postcard's wire format has rarely changed over the years, it has never officially had a "stable" wire format. This has led to some questions, particularly for folks interested in using it more formally, or looking to write protocol implementations in other languages.

Postcard now has a documented and tested wire specification, located in the in the repository, and in a hosted format.

This specification is available under a CC-BY-SA 4.0 license, and is written in markdown using mdBook.

The specification includes two main parts:

An elaborated definition of the upstream Serde Data Model, which describes the fundamental types that can be serialized or deserialized.
The Postcard Wire Format, which describes how each type is encoded to the wire, or decoded from the wire.

Starting with postcard 1.0.0, breaking changes to the wire format are also considered breaking changes to the library, and will require a version bump to postcard 2.0.0.

Revamped "Flavors"

Postcard has had a concepts of "flavors", which acted as "middlewares" for the serialization process. This enabled certain functionality, such as encoding the serialized data using the COBS encoding scheme, as the data was serialized.

This was convenient, as multiple steps could be done during serialization, without requiring additional temporary buffers.

However, I had not previously been able to find a good way to bring this functionality to the deserialization side.

Luckily, with spending time on the Postcard interface during the 1.0 release process, I was able to find one! As of postcard 1.0, postcard now supports both Serialization Flavors as well as Deserialization Flavors.

There are some trade-offs for using flavors when compared to doing each step of the process (generally a speed vs memory usage trade-off), so make sure you take a look at the docs if you have questions!

Updated `cobs`

Historically, postcard has used a fork of the cobs crate, due to some necessary features that had not been merged upstream. A few months ago, I took over maintenance of the cobs crate, and have released a few v0.2.x versions that have integrated the changes that postcard needed, which means the forked postcard-cobs crate is no longer necessary!

There are generally no functional changes from a postcard perspective, however a few functions are now more robust (they do not panic when handed unexpected or malformed data), and the take_from_bytes_cobs function, which was generally incorrect has been fixed.

Thanks to Allen Welkie for sharing the cobs crate with me!

Get back `#[inline]`!

While validating performance, I noticed that I was missing #[inline] attributes on most serialization and deserialization functions, which means that without LTO, it would not be possible to inline these often very small functions. This is particularly important, as they are called by serde functions, greatly increasing the overhead.

In benchmarks, this significantly increased the serialization performance by up to 5x, and deserialization performance by up to 2x. There was a slight improvement in builds that already used LTO (common in embedded builds), but far less dramatic.

Note, that these are NOT #[inline(always)] markers, which means that the compiler will still make inlining decisions based on its own heuristics. This means in typical release builds, which are opt-level = 3, some increase in code size is expected. However, when changing the optimization level to opt-level = 's', the code returns to typical sizes seen in postcard v0.7.

Experimental Features

Finally, thanks to the efforts of Lachlan S., this release also included two experimental features:

The first is a MaxSize derive macro and trait, which allows for automatically calculating the maximum serialized size of a postcard message. This was an often asked for feature, especially in embedded use cases where it is important to have suitably sized serialization and deserialization buffers.

The second is a Schema derive macro and trait, which renders a description of the schema of the message, including all data contained by any given type. This is intended to be used for two main purposes: producing a human readable format, suitable for inclusion in documentation, and producing a machine readable format (such as JSON), which can be used to generate serialization, deserialization, or parsing code, in order to better support other languages and verification efforts.

These features are behind the experimental-derive feature, and do NOT yet have stability guarantees behind them. That being said, I expect them to land in the next releases of postcard with only minor improvements and tweaks. Give them a try, and let me know what you think!

Wrapping up

Thank you all for using postcard over the years! It started as a way for me to learn and scratch my own itches, and it has been great to hear how it has helped others now too.

Enjoy the release!

🍾

James Munns