Skip to content
2 changes: 1 addition & 1 deletion extensions/extension_types.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
urn: extension:io.substrait:extension_types
urn: urn:substrait:extension:io.substrait:extension_types
types:
- name: point
structure:
Expand Down
2 changes: 1 addition & 1 deletion extensions/functions_aggregate_approx.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:io.substrait:functions_aggregate_approx
urn: urn:substrait:extension:io.substrait:functions_aggregate_approx
aggregate_functions:
- name: "approx_count_distinct"
description: >-
Expand Down
2 changes: 1 addition & 1 deletion extensions/functions_aggregate_decimal_output.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:io.substrait:functions_aggregate_decimal_output
urn: urn:substrait:extension:io.substrait:functions_aggregate_decimal_output
aggregate_functions:
- name: "count"
description: Count a set of values. Result is returned as a decimal instead of i64.
Expand Down
2 changes: 1 addition & 1 deletion extensions/functions_aggregate_generic.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:io.substrait:functions_aggregate_generic
urn: urn:substrait:extension:io.substrait:functions_aggregate_generic
aggregate_functions:
- name: "count"
description: Count a set of values
Expand Down
2 changes: 1 addition & 1 deletion extensions/functions_arithmetic.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:io.substrait:functions_arithmetic
urn: urn:substrait:extension:io.substrait:functions_arithmetic
scalar_functions:
-
name: "add"
Expand Down
2 changes: 1 addition & 1 deletion extensions/functions_arithmetic_decimal.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:io.substrait:functions_arithmetic_decimal
urn: urn:substrait:extension:io.substrait:functions_arithmetic_decimal
scalar_functions:
-
name: "add"
Expand Down
2 changes: 1 addition & 1 deletion extensions/functions_boolean.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:io.substrait:functions_boolean
urn: urn:substrait:extension:io.substrait:functions_boolean
scalar_functions:
-
name: or
Expand Down
2 changes: 1 addition & 1 deletion extensions/functions_comparison.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:io.substrait:functions_comparison
urn: urn:substrait:extension:io.substrait:functions_comparison
scalar_functions:
-
name: "not_equal"
Expand Down
2 changes: 1 addition & 1 deletion extensions/functions_datetime.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:io.substrait:functions_datetime
urn: urn:substrait:extension:io.substrait:functions_datetime
scalar_functions:
-
name: extract
Expand Down
2 changes: 1 addition & 1 deletion extensions/functions_geometry.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:io.substrait:functions_geometry
urn: urn:substrait:extension:io.substrait:functions_geometry
types:
- name: geometry
structure: "BINARY"
Expand Down
2 changes: 1 addition & 1 deletion extensions/functions_list.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:io.substrait:functions_list
urn: urn:substrait:extension:io.substrait:functions_list
scalar_functions:
- name: "transform"
description: >-
Expand Down
2 changes: 1 addition & 1 deletion extensions/functions_logarithmic.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:io.substrait:functions_logarithmic
urn: urn:substrait:extension:io.substrait:functions_logarithmic
scalar_functions:
-
name: "ln"
Expand Down
2 changes: 1 addition & 1 deletion extensions/functions_rounding.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:io.substrait:functions_rounding
urn: urn:substrait:extension:io.substrait:functions_rounding
scalar_functions:
-
name: "ceil"
Expand Down
2 changes: 1 addition & 1 deletion extensions/functions_rounding_decimal.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:io.substrait:functions_rounding_decimal
urn: urn:substrait:extension:io.substrait:functions_rounding_decimal
scalar_functions:
-
name: "ceil"
Expand Down
2 changes: 1 addition & 1 deletion extensions/functions_set.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:io.substrait:functions_set
urn: urn:substrait:extension:io.substrait:functions_set
scalar_functions:
-
name: "index_in"
Expand Down
2 changes: 1 addition & 1 deletion extensions/functions_string.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:io.substrait:functions_string
urn: urn:substrait:extension:io.substrait:functions_string
scalar_functions:
-
name: concat
Expand Down
2 changes: 1 addition & 1 deletion extensions/type_variations.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:io.substrait:type_variations
urn: urn:substrait:extension:io.substrait:type_variations
type_variations:
- parent: string
name: dict4
Expand Down
2 changes: 1 addition & 1 deletion extensions/unknown.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:io.substrait:unknown
urn: urn:substrait:extension:io.substrait:unknown
types:
- name: unknown
scalar_functions:
Expand Down
7 changes: 5 additions & 2 deletions proto/substrait/extensions/extensions.proto
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,11 @@ message SimpleExtensionURN {
// 0 is a valid anchor/reference, but prefer non-zero values for ergonomics.
uint32 extension_urn_anchor = 1;

// The extension URN that uniquely identifies this extension. This must follow the
// format extension:<OWNER>:<ID> and serves as the "namespace" of this extension.
// The extension URN that uniquely identifies this extension. The canonical format
// is urn:substrait:extension:<NAMESPACE>:<ID>. For backwards compatibility, URNs
// beginning with "extension:" have "urn:substrait:" prepended before validation.
// The canonical form must be a valid RFC 8141 URN and conform to the regex:
// ^urn:substrait:extension:[a-z0-9_.-]+:[a-z0-9_.-]+$
string urn = 2;
}

Expand Down
10 changes: 7 additions & 3 deletions site/docs/extensions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,17 @@ Some kinds of primitives are so frequently extended that Substrait defines a sta
* Window Functions
* Table Functions

To extend these items, developers can create one or more YAML files that describe the properties of each of these extensions. Each YAML file must include a required `urn` field that uniquely identifies the extension. While these identifiers are URN-like but not technically URNs (they lack the `urn:` prefix), they will be referred to as `extension URNs` for clarity.
To extend these items, users can create one or more YAML files that describe the properties of each of these extensions. Each YAML file must include a required `urn` field that uniquely identifies the extension.

This extension URN uses the format `extension:<OWNER>:<ID>`, where:
The canonical form of an extension URN is `urn:substrait:extension:<NAMESPACE>:<ID>`, where:

- `OWNER` represents the organization or entity providing the extension and should follow [reverse domain name convention](https://en.wikipedia.org/wiki/Reverse_domain_name_notation) (e.g., `io.substrait`, `com.example`, `org.apache.arrow`) to prevent name collisions
- `NAMESPACE` represents the organization or entity providing the extension and should follow [reverse domain name convention](https://en.wikipedia.org/wiki/Reverse_domain_name_notation) (e.g., `io.substrait`, `com.example`, `org.apache.arrow`) to prevent name collisions
- `ID` is the specific identifier for the extension (e.g., `functions_arithmetic`, `custom_types`)

For backwards compatibility, if a URN begins with `extension:` instead of `urn:substrait:extension:`, the prefix `urn:substrait:` is prepended before validation. This means `extension:io.substrait:functions_list` is equivalent to `urn:substrait:extension:io.substrait:functions_list`.

Extension URNs in canonical form must be valid [RFC 8141](https://www.rfc-editor.org/rfc/rfc8141.html) URNs (with NID `substrait`) and must match the regex `^urn:substrait:extension:[a-z0-9_.-]+:[a-z0-9_.-]+$`.

The YAML file is constructed according to the [YAML Schema](https://github.com/substrait-io/substrait/blob/main/text/simple_extensions_schema.yaml). Each definition in the file corresponds to the YAML-based serialization of the relevant data structure. If a user only wants to extend one of these types of objects (e.g. types), a developer does not have to provide definitions for the other extension points.

A Substrait plan can reference one or more YAML files via their extension URN. In the places where these entities are referenced, they will be referenced using an extension URN + name reference. Each extension entity (type, type variation, or function) is assigned an anchor value, which is a non-negative integer starting from 0. The anchor value 0 is valid and can be used to reference extension entities, but prefer non-zero values for ergonomics. The name scheme per type works as follows:
Expand Down
2 changes: 1 addition & 1 deletion site/docs/serialization/binary_serialization.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ For simple extensions, a plan references the extension URNs associated with the

Simple extensions within a plan are split into three components: an extension URN, an extension declaration and a number of references.

* **Extension URN**: A unique identifier for the extension following the format `extension:<OWNER>:<ID>` that identifies a YAML document specifying one or more specific extensions. Declares an anchor that can be used in extension declarations.
* **Extension URN**: A unique identifier for the extension in the canonical format `urn:substrait:extension:<NAMESPACE>:<ID>` that identifies a YAML document specifying one or more specific extensions. Declares an anchor that can be used in extension declarations. For backwards compatibility, URNs beginning with `extension:` have `urn:substrait:` prepended before validation. See [Simple Extensions](../extensions/index.md#simple-extensions) for details.
* **Extension Declaration**: A specific extension within a single YAML document. The declaration combines a reference to the associated extension URN along with a unique key identifying the specific item within that YAML document (see [Function Signature](../extensions/index.md#function-signature)). It also defines a declaration anchor. The anchor is a plan-specific unique value that the producer creates as a key to be referenced elsewhere.
* **Extension Reference**: A specific instance or use of an extension declaration within the plan body.

Expand Down
2 changes: 1 addition & 1 deletion site/examples/extensions/any1_type_function.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Example showing the 'any1' type - arguments must be of the same type
urn: extension:example:any1_type
urn: urn:substrait:extension:example:any1_type
scalar_functions:
- name: bar
impls:
Expand Down
2 changes: 1 addition & 1 deletion site/examples/extensions/any_type_function.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Example showing the 'any' type - arguments can be of any type
urn: extension:example:any_type
urn: urn:substrait:extension:example:any_type
scalar_functions:
- name: foo
impls:
Expand Down
2 changes: 1 addition & 1 deletion site/examples/extensions/distance_functions.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
urn: extension:example:distance_functions
urn: urn:substrait:extension:example:distance_functions
dependencies:
ext: extension:io.substrait:extension_types
scalar_functions:
Expand Down
2 changes: 1 addition & 1 deletion site/examples/extensions/double_function.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:example:double_function
urn: urn:substrait:extension:example:double_function
scalar_functions:
-
name: "double"
Expand Down
2 changes: 1 addition & 1 deletion site/examples/extensions/lambda_function_example.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
%YAML 1.2
---
urn: extension:io.substrait:functions_list
urn: urn:substrait:extension:io.substrait:functions_list
scalar_functions:
- name: "transform"
description: >-
Expand Down
2 changes: 1 addition & 1 deletion site/examples/extensions/metadata_example.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Example showing the metadata field at multiple levels
urn: extension:io.substrait:metadata_examples
urn: urn:substrait:extension:io.substrait:metadata_examples
metadata:
version: 2.0
maintainer: example-team
Expand Down
2 changes: 1 addition & 1 deletion site/examples/types/point_with_datatype_param.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Compound user-defined type with a data type parameter
urn: extension:example:point_parameterized
urn: urn:substrait:extension:example:point_parameterized
types:
- name: point
parameters:
Expand Down
2 changes: 1 addition & 1 deletion site/examples/types/point_with_enum_param.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Compound user-defined type with an enumeration parameter
urn: extension:example:point_enum_param
urn: urn:substrait:extension:example:point_enum_param
types:
- name: point
parameters:
Expand Down
2 changes: 1 addition & 1 deletion site/examples/types/point_with_nstruct.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Alternative way to define point structure using NSTRUCT syntax
urn: extension:example:point_nstruct
urn: urn:substrait:extension:example:point_nstruct
types:
- name: point
structure: "NSTRUCT<longitude: i32, latitude: i32>"
2 changes: 1 addition & 1 deletion site/examples/types/point_with_structure.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# User-defined point type with structure information
urn: extension:example:point_with_structure
urn: urn:substrait:extension:example:point_with_structure
types:
- name: point
structure:
Expand Down
2 changes: 1 addition & 1 deletion site/examples/types/point_with_two_params.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Compound user-defined type with two data type parameters
urn: extension:example:point_two_params
urn: urn:substrait:extension:example:point_two_params
types:
- name: point
parameters:
Expand Down
2 changes: 1 addition & 1 deletion site/examples/types/tuple_optional_variadic.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Tuple type with optional variadic parameter (zero or more types)
urn: extension:example:tuple_variadic
urn: urn:substrait:extension:example:tuple_variadic
types:
- name: tuple
parameters:
Expand Down
2 changes: 1 addition & 1 deletion site/examples/types/union_variadic.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Union type with variadic parameter (one or more types)
urn: extension:example:union_variadic
urn: urn:substrait:extension:example:union_variadic
types:
- name: union
parameters:
Expand Down
2 changes: 1 addition & 1 deletion site/examples/types/user_defined_point.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# User-defined type example: a point type with two scalar functions
urn: extension:example:point_type
urn: urn:substrait:extension:example:point_type
types:
- name: "point"

Expand Down
2 changes: 1 addition & 1 deletion site/examples/types/vector_with_constraints.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Vector type with integer parameter constrained to 2 or 3 dimensions
urn: extension:example:vector_constrained
urn: urn:substrait:extension:example:vector_constrained
types:
- name: vector
parameters:
Expand Down
1 change: 1 addition & 0 deletions text/simple_extensions_schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ required: [urn]
properties:
urn:
type: string
pattern: "^urn:substrait:extension:[a-z0-9_.-]+:[a-z0-9_.-]+$"

@benbellick benbellick Mar 18, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically a breaking change, as it would force extensions to update their URNs to the new format. However, we can make it so that all of the implementing libraries (go, python, rs, java) do a proper migration, where they can accept either old and new, and emit old.

Later we emit new, and then finally we can consider dropping the old if we would like.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am probably being paranoid and overdoing here but if we do this, I suggest we register substrait with IANA ASAP. We don't have to make this a break change right now (simply make the regex treat urn:substrait: optional) but I don't have a strong opinion on this as I do not know how many things will actually break because of this change...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what the process looks like or how long it would take to register with IANA, but sounds good to me :)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for making it optional, it will break peoples YAML files, but that would be an easy fix. Then I would imagine that in all of the libraries, we make it so that they can accept both:

  • urn:substrait:extension:io.substrait:functions_list, and
  • extension:io.substrait:functions_list

And then they always emit

  • extension:io.substrait:functions_list

Then one day we can switch them all to emitting

  • urn:substrait:extension:io.substrait:functions_list

And then we can consider dropping support for the old URN.

dependencies:
# For reusing type classes and type variations from other extension files.
# The keys are namespace identifiers that you can then use as dot-separated
Expand Down
Loading