Recap Type Spec

  • Version 0.1.1

Introduction

This document defines Recap’s types. It is intended to be the authoritative specification. Recap type implementations must adhere to this document.

This spec uses YAML to provide examples, but Recap’s types are agnostic to the serialized format. Recap types may be defined in YAML, TOML, JSON, XML, or any other compatible language.

What is Recap’s Type Spec?

Recap’s type spec describes a data model that can model relation database schemas and RPC IDLs with minimal type coercion. It’s similar to Apache Arrow’s Schema.fbs or Apache Kafka’s Schema.java.

Why Does Recap Type Spec Exist?

Data passes through web services, databases, message brokers, and object stores. Each system describes its data differently. Developers have historically written conversion logic and tooling for each system. Repeatedly building custom logic and tooling is inefficient and error-prone. Recap’s type system describes schemas in a standard data model so a single set of converters and tools can be built to deal with data as it moves between systems.

Types

Recap supports the following types:

  • null
  • bool
  • int
  • float
  • string
  • bytes
  • list
  • map
  • struct
  • enum
  • union

This section defines each type.

NOTE: Attribute types in the spec define what type a Recap implementation should use when storing the attribute. Attribute types are not the same as Recap types unless noted.

null

A null value.

bool

A boolean value.

int

An integer value with a fixed bit length.

Attributes

Name Description Type Required Default
bits Number of bits. 32-bit signed integer YES  
signed False means the integer is unsigned. Boolean NO true

Examples

# A 32-bit signed integer
type: int
bits: 32
signed: true

float

An IEEE 754 encoded floating point value with a fixed bit length.

Attributes

Name Description Type Required Default
bits Number of bits. 32-bit signed integer YES  

Examples

# A 32-bit IEEE 754 encoded float
type: float
bits: 32

string

A UTF-8 encoded Unicode string with a maximum byte length.

Attributes

Name Description Type Required Default
bytes The maximum number of bytes to store the string. 64-bit signed integer NO 65536
variable If true, the string is variable-length (<= bytes). Boolean NO true

Examples

# A VARCHAR(255)
type: string
bytes: 255
variable: true

bytes

A byte array value.

Attributes

Name Description Type Required Default
bytes The maximum number of bytes that can be stored in the byte array. 64-bit signed integer NO 65536
variable If true, the byte array is variable-length (<= bytes). Boolean NO true

Examples

# A VARBINARY(255)
type: bytes
bytes: 255
variable: true

list

A list of values all sharing the same type.

Attributes

Name Description Type Required Default
values Type for all items in the list. Recap type object YES  
length The maximum length of the list. If unset, the list size is infinite. 64-bit signed integer | null NO null
variable If true, the list is variable-length (<= length if length is set). If false, length must be set. Boolean NO true

Examples

# A list of unsigned 64-bit integers
type: list
values:
  type: int
  bits: 64
  signed: false

map

A map of key/value pairs where each key is the same type and each value is the same type.

Attributes

Name Description Type Required Default
keys Type for all items in the key set. Recap type object YES  
values Type for all value items. Recap type object YES  

Examples

# A map from 32-bit strings to boolean values
type: map
keys:
  type: string
  bytes: 2_147_483_647
values:
  type: bool

struct

An ordered collection of Recap types. Table schemas are typically represented as Recap structs, though the two are not the same. Recap structs support additional features like the nested structs (like a Protobuf Message) and unnamed fields (like a CSV file with no header).

struct Attributes

Name Description Type Required Default
name The struct’s name. String NO  
fields An ordered list of Recap types. List of Recap type objects NO []

field Attributes

Recap types in the fields attribute can have two extra attributes set: name and default.

Name Description Type Required Default
name The field’s name. String NO  
default The default value for a reader if the field is not set in the struct. Literal of any type NO  

An unset default is differentiated from a default with a null value. An unset default is treated as “no default”, while a default that’s been set to null is treated as a null default.

NOTE: Database defaults often appear as strings like nextval('\"public\".some_id_seq'::regclass) or 'USD'::character varying. Such defaults are left to the developer to interpret based on the database they’re using.

Examples

# A struct with a required signed 32-bit integer field called "id"
type: struct
fields:
  - name: id
    type: int
    bits: 32
  - name: email
    type: string
    bytes: 255

Optional fields are expressed as a union with a null type and a null default (similar to Avro’s fields).

# A struct with an optional string field called "secondary_phone"
type: struct
fields:
  - name: secondary_phone
    type: union
    types: ["null", "string32"]
    default: null

enum

An enumeration of string symbols.

Attributes

Name Description Type Required Default
symbols An ordered list of string symbols. List of strings YES  

Examples

# An enum with RGB symbols
type: enum
symbols: ["RED", "GREEN", "BLUE"]

union

A value that can be one of several types. It is acceptable for a value to be more than one of the types in the union.

Attributes

Name Description Type Required Default
types A list of types the value can be. List of Recap type objects YES  

Examples

# A union type of null or a 32-bit signed int
type: union
types:
  - type: null
  - type: int
    bits: 32

Unions can also be defined as a list of types:

# A union type of null or a boolean
type: ["null", "bool"]

Documentation

All types support a doc attribute, which allows developers to document types.

Name Description Type Required Default
doc A documentation string for the type. A string | null NO null
type: union
doc: A union type of null or a 32-bit signed int
types:
    - type: null
    - type: int
      bits: 32

Logical Types

Logical types annotate one of the 11 Recap types listed in the Types section at the top of the spec. Logical types add additional context to Recap types that help converters. For example, a decimal logical type can be defined as:

type: bytes
logical: decimal
precision: 6
scale: 3
bytes: 16
variable: false

Developers may define their own logical types.

Logical types may require additional attributes such as the precision and scale attributes shown above. Recap converters may use logical annotations to convert schemas to more accurate types such as SQL’s DECIMAL type (rather than BYTES).

Logical type names are globally unique, so they must include a unique dotted namespace prefix.

Built-in Logical Types

Recap comes with a collection of built-in logical types:

  • build.recap.Date: An integer representing a length of time since the UNIX epoch without timezones and leap seconds.
  • build.recap.Decimal: A byte array representing an arbitrary-precision decimal number.
  • build.recap.Duration: An integer representing a length of time and time unit without timezones and leap seconds.
  • build.recap.Interval: A byte array representing an interval of time on a calendar measured in months, days, and a duration of intra-day time with a time unit.
  • build.recap.Time: An integer representing a length of time since midnight without timezones and leap seconds.
  • build.recap.Timestamp: An integer representing a length of time (in a time unit) since a specific epoch.
  • build.recap.UUID: A string representing a UUID in 8-4-4-4-12 format as defined in RFC 4122.

build.recap.Date

Elapsed time since the UNIX epoch without timezones and leap seconds.

Annotates

build.recap.Date logical types must annotate int types.

Attributes

Name Description Type Required Default
unit A string time unit. String literal of year, month, day, hour, minute, second, millisecond, microsecond, nanosecond, or picosecond. YES  

Examples

type: int
logical: build.recap.Date
unit: day

build.recap.Decimal

An arbitrary-precision decimal number. This type is the same as Avro’s Decimal.

Annotates

build.recap.Decimal logical types must annotate bytes types.

Attributes

Name Description Type Required Default
precision Total number of digits. 123.456 has a precision of 6. 32-bit signed integer YES  
scale Digits to the right of the decimal point. 123.456 has a scale of 3. 32-bit signed integer YES  

Examples

type: bytes
logical: build.recap.Decimal
precision: 6
scale: 3
bytes: 16
variable: false

build.recap.Duration

A length of time without timezones and leap seconds. This type is the same as Arrow’s Duration but with a superset of time units.

Annotates

build.recap.Duration logical types must annotate int types.

Attributes

Name Description Type Required Default
unit A string time unit. String literal of year, month, day, hour, minute, second, millisecond, microsecond, nanosecond, or picosecond. YES  

Examples

type: int
logical: build.recap.Duration
bits: 64

build.recap.Interval

An interval of time on a calendar. This measurement allows you to measure time without worrying about leap seconds, leap years, and time changes. Years, quarters, hours, and minutes can be expressed using this type.

Intervals are measured in months, days, and an intra-day time measurement. Months and days are each 32-bit signed integers. The remainder is a 64-bit signed integer measured in a certain time unit. Leap seconds are ignored.

Annotates

build.recap.Interval logical types must annotate bytes types with the variable attribute set to false and the bytes attribute set to 16.

build.recap.Interval is the same as Avro’s Duration but with a superset of time units and 16 bytes instead of 12 bytes.

Attributes

Name Description Type Required Default
unit A string time unit. String literal of year, month, day, hour, minute, second, millisecond, microsecond, nanosecond, or picosecond. YES  

Examples

type: bytes
logical: build.recap.Interval
bytes: 16
variable: false
unit: millisecond

build.recap.Time

Elapsed time since midnight without timezones and leap seconds.

Annotates

build.recap.Time logical types must annotate int types.

Attributes

Name Description Type Required Default
unit A string time unit. String literal of year, month, day, hour, minute, second, millisecond, microsecond, nanosecond, or picosecond. YES  

Examples

type: int
logical: build.recap.Time
bits: 32
unit: millisecond

build.recap.Timestamp

Time elapsed since a specific epoch.

A timestamp with no timezone is a DATETIME in database parlance–a date and time as you would see it on wrist-watch.

A timestamp with a timezone represents the amount of time elapsed since the 1970-01-01 00:00:00 epoch in UTC time zone (regardless of the timezone that’s specified). Readers must translate the UTC timestamp to a timestamp value for the specified timezone. See Apache Arrow’s Schema.fbs documentation for more details.

This type is the same as Arrow’s timestamp but with a superset of time units and an arbitrary bit length.

Annotates

build.recap.Timestamp logical types must annotate int types.

Attributes

Name Description Type Required Default
unit A string time unit. String literal of year, month, day, hour, minute, second, millisecond, microsecond, nanosecond, or picosecond. YES  
timezone An optional Olson timezone database string. A string | null NO null

Examples

type: int
logical: build.recap.Timestamp
bits: 64

build.recap.UUID

A string representing a UUID in 8-4-4-4-12 format as defined in RFC 4122.

Annotates

build.recap.UUID logical types must annotate string types with a bytes attribute greater than or equal to 36.

Attributes

build.recap.UUID logical types have no additional attributes.

Examples

type: string
logical: build.recap.UUID
bytes: 36
variable: false

Aliases

All types support an alias attribute. An alias must reference one of the 11 Recap types defined above. Aliases are useful in larger data structures where complex types are repeatedly used.

Aliases are globally unique, so they must include a unique dotted namespace prefix. Naked aliases (aliases with no dotted namespace) are reserved for Recap’s built-in aliases (see the next section).

Attributes

Name Description Type Required Default
alias An alias for the type. A string | null NO null

Examples

Aliases are referenced using the type field:

type: struct
doc: A book with pages
fields:
  - name: previous
    alias: com.mycorp.models.Page
    type: int
    bits: 32
    signed: false
  - name: next
    type: com.mycorp.models.Page

Recap will treat this struct the same as:

type: struct
doc: A book with pages
fields:
  - name: previous
    alias: com.mycorp.models.Page
    type: int
    bits: 32
    signed: false
  # The `next` field has the same types and attributes
  # as the `previous` one; they're both pages.
  - name: next
    type: int
    bits: 32
    signed: false

Recap also allows cyclic references:

alias: com.mycorp.models.LinkedListUint32
type: struct
doc: A linked list of unsigned 32-bit integers
fields:
  - name: value
    type: int
    bits: 32
    signed: false
  - name: next
    type: com.mycorp.models.LinkedListUint32

And attribute overrides:

type: struct
fields:
  - name: id
    alias: com.mycorp.models.Uint24
    type: int
    bits: 24
    signed: false
  - name: signed_id
    type: com.mycorp.models.Uint24
    # Let's make this a signed int24
    signed: true

But alias inheritance is not allowed:

type: struct
doc: All fields have the same type
fields:
  - name: field1
    alias: "com.mycorp.models.Field"
    type: int
    bits: 32
    signed: false
  - name: field2
    type: com.mycorp.models.Field
    # An alias of an alias isn't allowed.
    alias: com.mycorp.models.FieldAlias
  - name: field3
    type: com.mycorp.models.FieldAlias

Built-in Aliases

Recap comes with a collection of built-in aliases:

  • int8: An 8-bit signed integer.
  • uint8: An 8-bit unsigned integer.
  • int16: A 16-bit signed integer.
  • uint16: A 16-bit unsigned integer.
  • int32: A 32-bit signed integer.
  • uint32: A 32-bit unsigned integer.
  • int64: A 64-bit signed integer.
  • uint64: A 64-bit unsigned integer.
  • float16: A 16-bit IEEE 754 encoded floating point number.
  • float32: A 32-bit IEEE 754 encoded floating point number.
  • float64: A 64-bit IEEE 754 encoded floating point number.
  • string32: A variable-length UTF-8 encoded unicode string with a maximum length of 2_147_483_648.
  • string64: A variable-length UTF-8 encoded unicode string with a maximum length of 9_223_372_036_854_775_807.
  • bytes32: A variable-length byte array with a maximum length of 2_147_483_648.
  • bytes64: A variable-length byte array with a maximum length of 9_223_372_036_854_775_807.
  • uuid: A fixed-length 36 byte string in UUID 8-4-4-4-12 string format as defined in RFC 4122.
  • decimal128: An arbitrary-precision decimal number stored in a fixed-length 128-bit byte array.
  • decimal256: An arbitrary-precision decimal number stored in a fixed-length 256-bit byte array.
  • duration64: A length of time and time unit without timezones and leap seconds.
  • interval128: An interval of time on a calendar measured in months, days, and a duration of intra-day time with a time unit.
  • time32: Time since midnight without timezones and leap seconds in a 32-bit signed integer.
  • time64: Time since midnight without timezones and leap seconds in a 64-bit signed integer.
  • timestamp64: Time elapsed (in a time unit) since a specific epoch.
  • date32: Date since the UNIX epoch without timezones and leap seconds in a 32-bit integer.
  • date64: Date since the UNIX epoch without timezones and leap seconds in a 64-bit integer.