Recap Type Spec
- Version 0.3.0
Introduction
This document defines Recap’s types. It is intended to be the authoritative specification. Recap type implementations must adhere to this document.
This spec uses YAML to provide examples, but Recap’s types are agnostic to the serialized format. Recap types may be defined in YAML, TOML, JSON, XML, or any other compatible language.
Recap’s type spec is also available as a JSON schema, which you can use to validate your type definitions.
What is Recap’s Type Spec?
Recap’s type spec describes a data model that can model relation database schemas and RPC IDLs with minimal type coercion. It’s similar to Apache Arrow’s Schema.fbs or Apache Kafka’s Schema.java.
Why Does Recap Type Spec Exist?
Data passes through web services, databases, message brokers, and object stores. Each system describes its data differently. Developers have historically written conversion logic and tooling for each system. Repeatedly building custom logic and tooling is inefficient and error-prone. Recap’s type system describes schemas in a standard data model so a single set of converters and tools can be built to deal with data as it moves between systems.
Types
Recap supports the following types:
null
bool
int
float
string
bytes
list
map
struct
enum
union
This section defines each type.
NOTE: Attribute types in the spec define what type a Recap implementation should use when storing the attribute. Attribute types are not the same as Recap types unless noted.
null
A null value.
bool
A boolean value.
int
An integer value with a fixed bit length.
Attributes
Name | Description | Type | Required | Default |
---|---|---|---|---|
bits | Number of bits. | 32-bit signed integer | YES | |
signed | False means the integer is unsigned. | Boolean | NO | true |
Examples
# A 32-bit signed integer
type: int
bits: 32
signed: true
float
An IEEE 754 encoded floating point value with a fixed bit length.
Attributes
Name | Description | Type | Required | Default |
---|---|---|---|---|
bits | Number of bits. | 32-bit signed integer | YES |
Examples
# A 32-bit IEEE 754 encoded float
type: float
bits: 32
string
A UTF-8 encoded Unicode string with a maximum byte length.
Attributes
Name | Description | Type | Required | Default |
---|---|---|---|---|
bytes | The maximum number of bytes to store the string. Must be >= 1 if set. If unset, the string length is infinite. | 64-bit signed integer | null | NO | null |
variable | If true, the string is variable-length (<= bytes ). If false, bytes must be set. | Boolean | NO | true |
Examples
# A VARCHAR(255)
type: string
bytes: 255
variable: true
bytes
A byte array value.
Attributes
Name | Description | Type | Required | Default |
---|---|---|---|---|
bytes | The maximum number of bytes that can be stored in the byte array. Must be >= 1 if set. If unset, the byte length is infinite. | 64-bit signed integer | null | NO | null |
variable | If true, the byte array is variable-length (<= bytes ). If false, bytes must be set. | Boolean | NO | true |
Examples
# A VARBINARY(255)
type: bytes
bytes: 255
variable: true
list
A list of values all sharing the same type.
Attributes
Name | Description | Type | Required | Default |
---|---|---|---|---|
values | Type for all items in the list. | Recap type object | YES | |
length | The maximum length of the list. Must be >= 1 if set. If unset, the list size is infinite. | 64-bit signed integer | null | NO | null |
variable | If true, the list is variable-length (<= length if length is set). If false, length must be set. | Boolean | NO | true |
Examples
# A list of unsigned 64-bit integers
type: list
values:
type: int
bits: 64
signed: false
map
A map of key/value pairs where each key is the same type and each value is the same type.
Attributes
Name | Description | Type | Required | Default |
---|---|---|---|---|
keys | Type for all items in the key set. | Recap type object | YES | |
values | Type for all value items. | Recap type object | YES |
Examples
# A map from 32-bit strings to boolean values
type: map
keys:
type: string
bytes: 2_147_483_647
values:
type: bool
struct
An ordered collection of Recap types. Table schemas are typically represented as Recap structs, though the two are not the same. Recap structs support additional features like the nested structs (like a Protobuf Message) and unnamed fields (like a CSV file with no header).
struct
Attributes
Name | Description | Type | Required | Default |
---|---|---|---|---|
name | The struct’s name. | String | NO | |
fields | An ordered list of Recap types. | List of Recap type objects | NO | [] |
field
Attributes
Recap types in the fields
attribute can have an extra name
attribute to set a field’s name.
Name | Description | Type | Required | Default |
---|---|---|---|---|
name | The field’s name. | String | NO |
Examples
# A struct with a required signed 32-bit integer field called "id"
type: struct
fields:
- name: id
type: int
bits: 32
- name: email
type: string
bytes: 255
enum
An enumeration of string symbols.
Attributes
Name | Description | Type | Required | Default |
---|---|---|---|---|
symbols | An ordered list of string symbols. | List of strings | YES |
Examples
# An enum with RGB symbols
type: enum
symbols: ["RED", "GREEN", "BLUE"]
union
A value that can be one of several types. It is acceptable for a value to be more than one of the types in the union.
Attributes
Name | Description | Type | Required | Default |
---|---|---|---|---|
types | A list of types the value can be. | List of Recap type objects | YES |
Examples
# A union type of null or a 32-bit signed int
type: union
types:
- type: "null"
- type: int
bits: 32
Unions can also be defined as a list of types:
# A union type of null or a boolean
type: ["null", "bool"]
Documentation
All types support a doc
attribute, which allows developers to document types.
Name | Description | Type | Required | Default |
---|---|---|---|---|
doc | A documentation string for the type. | A string | null | NO | null |
type: union
doc: A union type of null or a 32-bit signed int
types:
- type: "null"
- type: int
bits: 32
Defaults
Recap types can have a default
attribute with a literal value. The value is used when the field is not set in a struct.
Name | Description | Type | Required | Default |
---|---|---|---|---|
default | The default value for a reader if the field is not set in the struct. | Literal of any type | NO |
# A struct with a field called "id" that defaults to 0
type: struct
fields:
- name: id
type: int
bits: 32
default: 0
A missing default is differentiated from a default with a null value. An unset default is treated as “no default”, while a default that’s been set to null is treated as a null default.
NOTE: Database defaults often appear as strings like nextval('\"public\".some_id_seq'::regclass)
or 'USD'::character varying
. Such defaults are left to the developer to interpret based on the database they’re using.
Optional Types
Optional types are expressed as a union with a null
type and a null default (similar to Avro’s fields).
# A struct with an optional string field called "secondary_phone"
type: struct
fields:
- name: secondary_phone
type: union
types: ["null", "string32"]
default: "null"
Unions can be cumbersome, so Recap supports a shorthand syntax for types using optional
:
# A struct with an optional string field called "secondary_phone"
type: struct
fields:
- name: secondary_phone
type: string32?
optional: true
The above shorthand is equivalent to the union example.
Optional types also work for unions:
# A union type of null or a 32-bit signed int
type: union
types:
- type: int32
- type: float32
optional: true
Optionals also work with aliases:
# A struct with an optional string field called "secondary_phone"
type: struct
fields:
- name: phone
alias: phone_type
type: string32
- name: secondary_phone
type: phone_type
optional: true
Optionality is not inherited in aliases:
type: struct
fields:
- name: phone
alias: phone_type
type: string32
optional: true
- name: secondary_phone
type: phone_type
Logical Types
Logical types annotate one of the 11 Recap types listed in the Types section at the top of the spec. Logical types add additional context to Recap types that help converters. For example, a decimal
logical type can be defined as:
type: bytes
logical: decimal
precision: 6
scale: 3
bytes: 16
variable: false
Developers may define their own logical types.
Logical types may require additional attributes such as the precision
and scale
attributes shown above. Recap converters may use logical
annotations to convert schemas to more accurate types such as SQL’s DECIMAL
type (rather than BYTES
).
Logical type names are globally unique, so they must include a unique dotted namespace prefix.
Built-in Logical Types
Recap comes with a collection of built-in logical types:
build.recap.Date
: An integer representing a length of time since the UNIX epoch without timezones and leap seconds.build.recap.Decimal
: A byte array representing an arbitrary-precision decimal number.build.recap.Duration
: An integer representing a length of time and time unit without timezones and leap seconds.build.recap.Interval
: A byte array representing an interval of time on a calendar measured in months, days, and a duration of intra-day time with a time unit.build.recap.Time
: An integer representing a length of time since midnight without timezones and leap seconds.build.recap.Timestamp
: An integer representing a length of time (in a time unit) since a specific epoch.build.recap.UUID
: A string representing a UUID in 8-4-4-4-12 format as defined in RFC 4122.
build.recap.Date
Elapsed time since the UNIX epoch without timezones and leap seconds.
Annotates
build.recap.Date
logical types must annotate int
types.
Attributes
Name | Description | Type | Required | Default |
---|---|---|---|---|
unit | A string time unit. | String literal of year, month, day, hour, minute, second, millisecond, microsecond, nanosecond, or picosecond. | YES |
Examples
type: int
logical: build.recap.Date
unit: day
build.recap.Decimal
An arbitrary-precision decimal number. This type is the same as Avro’s Decimal.
Annotates
build.recap.Decimal
logical types must annotate bytes
types.
Attributes
Name | Description | Type | Required | Default |
---|---|---|---|---|
precision | Total number of digits. 123.456 has a precision of 6. | 32-bit signed integer | YES | |
scale | Digits to the right of the decimal point. 123.456 has a scale of 3. | 32-bit signed integer | YES |
Examples
type: bytes
logical: build.recap.Decimal
precision: 6
scale: 3
bytes: 16
variable: false
build.recap.Duration
A length of time without timezones and leap seconds. This type is the same as Arrow’s Duration but with a superset of time units.
Annotates
build.recap.Duration
logical types must annotate int
types.
Attributes
Name | Description | Type | Required | Default |
---|---|---|---|---|
unit | A string time unit. | String literal of year, month, day, hour, minute, second, millisecond, microsecond, nanosecond, or picosecond. | YES |
Examples
type: int
logical: build.recap.Duration
bits: 64
unit: millisecond
build.recap.Interval
An interval of time on a calendar. This measurement allows you to measure time without worrying about leap seconds, leap years, and time changes. Years, quarters, hours, and minutes can be expressed using this type.
Intervals are measured in months, days, and an intra-day time measurement. Months and days are each 32-bit signed integers. The remainder is a 64-bit signed integer measured in a certain time unit. Leap seconds are ignored.
Annotates
build.recap.Interval
logical types must annotate bytes
types with the variable
attribute set to false
and the bytes
attribute set to 16
.
build.recap.Interval
is the same as Avro’s Duration but with a superset of time units and 16 bytes instead of 12 bytes.
Attributes
Name | Description | Type | Required | Default |
---|---|---|---|---|
unit | A string time unit. | String literal of year, month, day, hour, minute, second, millisecond, microsecond, nanosecond, or picosecond. | YES |
Examples
type: bytes
logical: build.recap.Interval
bytes: 16
variable: false
unit: millisecond
build.recap.Time
Elapsed time since midnight without timezones and leap seconds.
Annotates
build.recap.Time
logical types must annotate int
types.
Attributes
Name | Description | Type | Required | Default |
---|---|---|---|---|
unit | A string time unit. | String literal of year, month, day, hour, minute, second, millisecond, microsecond, nanosecond, or picosecond. | YES |
Examples
type: int
logical: build.recap.Time
bits: 32
unit: millisecond
build.recap.Timestamp
Time elapsed since a specific epoch.
A timestamp with no timezone is a DATETIME
in database parlance–a date and time as you would see it on wrist-watch.
A timestamp with a timezone represents the amount of time elapsed since the 1970-01-01 00:00:00 epoch in UTC time zone (regardless of the timezone that’s specified). Readers must translate the UTC timestamp to a timestamp value for the specified timezone. See Apache Arrow’s Schema.fbs documentation for more details.
This type is the same as Arrow’s timestamp but with a superset of time units and an arbitrary bit length.
Annotates
build.recap.Timestamp
logical types must annotate int
types.
Attributes
Name | Description | Type | Required | Default |
---|---|---|---|---|
unit | A string time unit. | String literal of year, month, day, hour, minute, second, millisecond, microsecond, nanosecond, or picosecond. | YES | |
timezone | An optional Olson timezone database string. | A string | null | NO | null |
Examples
type: int
logical: build.recap.Timestamp
bits: 64
unit: millisecond
build.recap.UUID
A string representing a UUID in 8-4-4-4-12 format as defined in RFC 4122.
Annotates
build.recap.UUID
logical types must annotate string
types with a bytes
attribute greater than or equal to 36.
Attributes
build.recap.UUID
logical types have no additional attributes.
Examples
type: string
logical: build.recap.UUID
bytes: 36
variable: false
Aliases
All types support an alias
attribute. An alias
must reference one of the 11 Recap types defined above. Aliases are useful in larger data structures where complex types are repeatedly used.
Aliases are globally unique, so they must include a unique dotted namespace prefix. Naked aliases (aliases with no dotted namespace) are reserved for Recap’s built-in aliases (see the next section).
Attributes
Name | Description | Type | Required | Default |
---|---|---|---|---|
alias | An alias for the type. | A string | null | NO | null |
Examples
Aliases are referenced using the type
field:
type: struct
doc: A book with pages
fields:
- name: previous
alias: com.mycorp.models.Page
type: int
bits: 32
signed: false
- name: next
type: com.mycorp.models.Page
Recap will treat this struct the same as:
type: struct
doc: A book with pages
fields:
- name: previous
alias: com.mycorp.models.Page
type: int
bits: 32
signed: false
# The `next` field has the same types and attributes
# as the `previous` one; they're both pages.
- name: next
type: int
bits: 32
signed: false
Recap also allows cyclic references:
alias: com.mycorp.models.LinkedListUint32
type: struct
doc: A linked list of unsigned 32-bit integers
fields:
- name: value
type: int
bits: 32
signed: false
- name: next
type: com.mycorp.models.LinkedListUint32
And attribute overrides:
type: struct
fields:
- name: id
alias: com.mycorp.models.Uint24
type: int
bits: 24
signed: false
- name: signed_id
type: com.mycorp.models.Uint24
# Let's make this a signed int24
signed: true
But alias inheritance is not allowed:
type: struct
doc: All fields have the same type
fields:
- name: field1
alias: "com.mycorp.models.Field"
type: int
bits: 32
signed: false
- name: field2
type: com.mycorp.models.Field
# An alias of an alias isn't allowed.
alias: com.mycorp.models.FieldAlias
- name: field3
type: com.mycorp.models.FieldAlias
Built-in Aliases
Recap comes with a collection of built-in aliases:
int8
: An 8-bit signed integer.uint8
: An 8-bit unsigned integer.int16
: A 16-bit signed integer.uint16
: A 16-bit unsigned integer.int32
: A 32-bit signed integer.uint32
: A 32-bit unsigned integer.int64
: A 64-bit signed integer.uint64
: A 64-bit unsigned integer.float16
: A 16-bit IEEE 754 encoded floating point number.float32
: A 32-bit IEEE 754 encoded floating point number.float64
: A 64-bit IEEE 754 encoded floating point number.string32
: A variable-length UTF-8 encoded unicode string with a maximum length of 2_147_483_648.string64
: A variable-length UTF-8 encoded unicode string with a maximum length of 9_223_372_036_854_775_807.bytes32
: A variable-length byte array with a maximum length of 2_147_483_648.bytes64
: A variable-length byte array with a maximum length of 9_223_372_036_854_775_807.uuid
: A fixed-length 36 byte string in UUID 8-4-4-4-12 string format as defined in RFC 4122.decimal128
: An arbitrary-precision decimal number stored in a fixed-length 128-bit byte array.decimal256
: An arbitrary-precision decimal number stored in a fixed-length 256-bit byte array.duration64
: A length of time and time unit without timezones and leap seconds.interval128
: An interval of time on a calendar measured in months, days, and a duration of intra-day time with a time unit.time32
: Time since midnight without timezones and leap seconds in a 32-bit signed integer.time64
: Time since midnight without timezones and leap seconds in a 64-bit signed integer.timestamp64
: Time elapsed (in a time unit) since a specific epoch.date32
: Date since the UNIX epoch without timezones and leap seconds in a 32-bit integer.date64
: Date since the UNIX epoch without timezones and leap seconds in a 64-bit integer.