Internal mechanics

Internal workings of libcbor are mostly derived from the specification. The purpose of this document is to describe technical choices made during design & implementation and to explicate the reasoning behind those choices.

Terminology

MTB

Major Type Byte

https://www.rfc-editor.org/rfc/rfc8949.html#section-3.1

DST

Dynamically Sized Type

Type whose storage requirements cannot be determined

during compilation (originated in the Rust community)

Conventions

API symbols start with cbor_ or CBOR_ prefix, internal symbols have _cbor_ or _CBOR_ prefix.

General notes on the API design

The API design has two main driving principles:

  1. Let the client manage the memory as much as possible

  2. Behave exactly as specified by the standard

Combining these two principles in practice turns out to be quite difficult. Indefinite-length strings, arrays, and maps require client to handle every fixed-size chunk explicitly in order to

  • ensure the client never runs out of memory due to libcbor

  • use realloc() sparsely and predictably [1]

    • provide strong guarantees about its usage (to prevent latency spikes)

    • provide APIs to avoid realloc() altogether

  • allow proper handling of (streamed) data bigger than available memory

Coding style

This code loosely follows the Linux kernel coding style. Tabs are tabs, and they are 4 characters wide.

Memory layout

CBOR is very dynamic in the sense that it contains many data elements of variable length, sometimes even indefinite length. This section describes internal representation of all CBOR data types.

Generally speaking, data items consist of three parts:

type cbor_item_t

Represents the item. Used as an opaque type

cbor_type type

Type discriminator

size_t refcount

Reference counter. Used by cbor_decref(), cbor_incref()

union cbor_item_metadata metadata

Union discriminated by type. Contains type-specific metadata

unsigned char *data

Contains pointer to the actual data. Small, fixed size items (Types 0 & 1 – Positive and negative integers, Type 6 – Semantic tags, Type 7 – Floats & control tokens) are allocated as a single memory block.

Consider the following snippet

cbor_item_t * item = cbor_new_int8();

then the memory is laid out as follows

+-----------+---------------+---------------+-----------------------------------++-----------+
|           |               |               |                                   ||           |
|   type    |   refcount    |   metadata    |              data                 ||  uint8_t  |
|           |               |               |   (= item + sizeof(cbor_item_t))  ||           |
+-----------+---------------+---------------+-----------------------------------++-----------+
^                                                                                ^
|                                                                                |
+--- item                                                                        +--- item->data

Dynamically sized types (Type 2 – Byte strings, Type 3 – UTF-8 strings, Type 4 – Arrays, Type 5 – Maps) may store handle and data in separate locations. This enables creating large items (e.g byte strings) without realloc() or copying large blocks of memory. One simply attaches the correct pointer to the handle.

type cbor_item_metadata

Union type of the following members, based on the item type:

struct _cbor_int_metadata int_metadata

Used both by both Types 0 & 1 – Positive and negative integers

struct _cbor_bytestring_metadata bytestring_metadata
struct _cbor_string_metadata string_metadata
struct _cbor_array_metadata array_metadata
struct _cbor_map_metadata map_metadata
struct _cbor_tag_metadata tag_metadata
struct _cbor_float_ctrl_metadata float_ctrl_metadata

Decoding

As outlined in API, there decoding is based on the streaming decoder Essentially, the decoder is a custom set of callbacks for the streaming decoder.