Internal mechanics¶
Internal workings of libcbor are mostly derived from the specification. The purpose of this document is to describe technical choices made during design & implementation and to explicate the reasoning behind those choices.
Terminology¶
MTB |
Major Type Byte |
|
DST |
Dynamically Sized Type |
Type whose storage requirements cannot be determined during compilation (originated in the Rust community) |
Conventions¶
API symbols start with cbor_
or CBOR_
prefix, internal symbols have _cbor_
or _CBOR_
prefix.
General notes on the API design¶
The API design has two main driving priciples:
Let the client manage the memory as much as possible
Behave exactly as specified by the standard
Combining these two principles in practice turns out to be quite difficult. Indefinite-length strings, arrays, and maps require client to handle every fixed-size chunk explicitly in order to
ensure the client never runs out of memory due to libcbor
use
realloc()
sparsely and predictably 1
provide strong guarantees about its usage (to prevent latency spikes)
provide APIs to avoid
realloc()
altogetherallow proper handling of (streamed) data bigger than available memory
- 1
Reasonable handling of DSTs requires reallocation if the API is to remain sane.
Coding style¶
This code loosely follows the Linux kernel coding style. Tabs are tabs, and they are 4 characters wide.
Memory layout¶
CBOR is very dynamic in the sense that it contains many data elements of variable length, sometimes even indefinite length. This section describes internal representation of all CBOR data types.
Generally speaking, data items consist of three parts:
-
type
cbor_item_t
¶ Represents the item. Used as an opaque type
-
size_t
refcount
¶ Reference counter. Used by
cbor_decref()
,cbor_incref()
-
union cbor_item_metadata
metadata
¶ Union discriminated by
type
. Contains type-specific metadata
-
unsigned char *
data
¶ Contains pointer to the actual data. Small, fixed size items (Types 0 & 1 – Positive and negative integers, Type 6 – Semantic tags, Type 7 – Floats & control tokens) are allocated as a single memory block.
Consider the following snippet
cbor_item_t * item = cbor_new_int8();
then the memory is laid out as follows
+-----------+---------------+---------------+-----------------------------------++-----------+ | | | | || | | type | refcount | metadata | data || uint8_t | | | | | (= item + sizeof(cbor_item_t)) || | +-----------+---------------+---------------+-----------------------------------++-----------+ ^ ^ | | +--- item +--- item->data
Dynamically sized types (Type 2 – Byte strings, Type 3 – UTF-8 strings, Type 4 – Arrays, Type 5 – Maps) may store handle and data in separate locations. This enables creating large items (e.g byte strings) without
realloc()
or copying large blocks of memory. One simply attaches the correct pointer to the handle.
-
size_t
-
type
cbor_item_metadata
¶ Union type of the following members, based on the item type:
-
struct _cbor_int_metadata
int_metadata
¶ Used both by both Types 0 & 1 – Positive and negative integers
-
struct _cbor_bytestring_metadata
bytestring_metadata
¶
-
struct _cbor_string_metadata
string_metadata
¶
-
struct _cbor_array_metadata
array_metadata
¶
-
struct _cbor_map_metadata
map_metadata
¶
-
struct _cbor_tag_metadata
tag_metadata
¶
-
struct _cbor_float_ctrl_metadata
float_ctrl_metadata
¶
-
struct _cbor_int_metadata