Skip to content

tc39/proposal-extractors

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extractors for ECMAScript

A proposal to introduce Extractors (a.k.a. "Extractor Objects") to ECMAScript.

Extractors would augment the syntax for BindingPattern and AssignmentPattern to allow for new destructuring forms, as in the following example:

// binding patterns
const Foo(y) = x;           // instance-array destructuring
const Foo([y]) = x;         // nested array destructuring
const Foo({y}) = x;         // nested object destructuring
const [Foo(y)] = x;         // nesting
const { z: Foo(y) } = x;    // ..
const Foo(Bar(y)) = x;      // ..
const X.Foo(y) = x;         // qualified names (i.e., a.b.c)

// assignment patterns
Foo(y) = x;                 // instance-array destructuring
Foo([y]) = x;               // nestedarray destructuring
Foo({y}) = x;               // nested object destructuring
[Foo(y)] = x;               // nesting
({ z: Foo(y) } = x);        // ..
Foo(Bar(y)) = x;            // ..
X.Foo(y) = x;               // qualified names (i.e., a.b.c)

In addition, this would leverage the new Symbol.customMatcher built-in symbol added by the Pattern Matching proposal. When destructuring using the new form, the Symbol.customMatcher method would be called and its result would be destructured instead.

Status

Stage: 1
Champion: Ron Buckton (@rbuckton)

For more information see the TC39 proposal process.

Authors

  • Ron Buckton (@rbuckton)

Motivations

ECMAScript currently has no mechanism for executing user-defined logic during destructuring, which means that operations related to data validation and transformation may require multiple statements:

function toInstant(value) {
  if (value instanceof Temporal.Instant) {
    return value;
  } else if (value instanceof Date) {
    return Temporal.Instant.fromEpochMilliseconds(+value);
  } else if (typeof value === "string") {
    return Temporal.Instant.from(value);
  } else {
    throw new TypeError();
  }
}

class Book {
  constructor({
    isbn,
    title,
    createdAt = Temporal.Now.instant(),
    modifiedAt = createdAt
  }) {
    this.isbn = isbn;
    this.title = title;
    this.createdAt = toInstant(createdAt);
    // some effort duplicated if `modifiedAt` was `undefined`
    this.modifiedAt = toInstant(modifiedAt);
  }
}

new Book({ isbn: "...", title: "...", createdAt: Temporal.Instant.from("...") });
new Book({ isbn: "...", title: "...", createdAt: new Date() });
new Book({ isbn: "...", title: "...", createdAt: "..." });

With Extractors, such validation and transformation logic can be encapsulated and reused inside of the binding pattern:

const InstantExtractor = {
  [Symbol.customMatcher](value) {
    if (value instanceof Temporal.Instant) {
      return [value];
    } else if (value instanceof Date) {
      return [Temporal.Instant.fromEpochMilliseconds(+value)];
    } else if (typeof value === "string") {
      return [Temporal.Instant.from(value)];
    }
  }
};

class Book {
  constructor({
    isbn,
    title,
    // Extract `createdAt` as an Instant
    createdAt: InstantExtractor(createdAt) = Temporal.Now.instant(),
    modifiedAt: InstantExtractor(modifiedAt) = createdAt
  }) {
    this.isbn = isbn;
    this.title = title;
    this.createdAt = createdAt;
    this.modifiedAt = modifiedAt;
  }
}

new Book({ isbn: "...", title: "...", createdAt: Temporal.Instant.from("...") });
new Book({ isbn: "...", title: "...", createdAt: new Date() });
new Book({ isbn: "...", title: "...", createdAt: "..." });

This would also be extremely useful when paired with a forthcoming enum proposal with support for Algebraic Data Types (ADT):

// Rust-like enum of algebraic data types:
enum Option of ADT {
  Some(value),
  None
}

// construction
const x = Option.Some(1);

// destructuring
const Option.Some(y) = x;
y; // 1

// pattern matching
match (x) {
  when Option.Some(y): console.log(y); // 1
  when Option.None: console.log("none");
}
// Another ADT enum example:
enum Message of ADT {
  Quit,
  Move({x, y}),
  Write(message),
  ChangeColor(r, g, b),
}

// construction
const msg1 = Message.Move({ x: 10, y: 10 });
const msg2 = Message.Write("Hello");
const msg3 = Message.ChangeColor(0x00, 0xff, 0xff);

// destructuring
const Message.Move({ x, y }) = msg1;    // x: 10, y: 10
const Message.Write(message) = msg2;    // message: "Hello"
const Message.ChangeColor(r, g, b) = msg3;     // r: 0, g: 255, b: 255

// pattern matching
match (msg) {
  when Message.Move({ x, y }): ...;
  when Message.Write(message): ...;
  when Message.ChangeColor(r, g, b): ...;
  when Message.Quit: ...;
}

Proposed Solution

Extractors are loosely based on Scala's Extractor Objects and Rust's Pattern Matching. Extractors extend the syntax for BindingPattern and AssignmentPattern to allow for the evaluation of user-defined logic for validation and transformation.

Extractors perform array destructuring on a successful match result, starting with a reference to a value in scope using an ExtractorMemberExpression, which is essentially an IdentifierReference (i.e., Point, InstantExtractor, etc.) or a dotted-name (i.e., Option.Some, Message.Move, etc.).

When an Extractor is evaluated during destructuring, its ExtractorMemberExpression is evaluated, and that evaluated result's [Symbol.customMatcher]() method is invoked with the current value to be destructured, returning an iterable object that indicates the match succeeded, and the extracted elements to use for further destructuring. For the purpose of destructuring, any other value will produce a TypeError. In the case of pattern matching, true and false are also valid return values.

An Extractor consists of an ExtractorMemberExpression followed by a parenthesized list of additional destructuring patterns:

// binding pattern
let Foo(a, { b }, [c]) = ...;

// assignment pattern
Foo(a, { b }, [c]) = ...;

Parentheses (()) are used instead of square brackets ([]) for several reasons:

  • Avoids collisions with a ElementAccessExpression when the extractor is part of an assignment pattern:
    Option.Some[value] = opt; // already an element access expression
  • Ensures a consistent syntax between binding and assignment patterns:
    let Option.Some(value) = opt;
    Option.Some(value) = opt;
  • Allows destructuring (and pattern matching) to mirror construction/application:
    let opt = Option.Some(x);
    let Option.Some(y) = opt;
    
    opt = Option.Some(x);
    Option.Some(y) = opt;

Object Destructuring using Extractors

Extractors do not introduce a novel syntax for object extraction. Instead, an Extractor can return a single element match result containing the object to be further destructured:

// binding pattern
const Message.Move({ x, y }) = ...;

// assignment pattern
(Message.Move({ x, y }) = ...);

Iterable Wrapper Overhead

It's important to note that this has the overhead of allocating an iterable wrapper object to perform further destructuring. If necessary, this overhead could be removed if we consider an alternative representation for a match result that indicates result is the sole extracted value, i.e.:

const Message = {
  Move: class Move {
    #x;
    #y;
    ...
    static [Symbol.match](value) {
      if (value instanceof Message.Move) {
        // 'match: "unary"' indicates that 'value' is the sole extracted value
        return { match: "unary", value: { x: value.#x, y: value.#y } };
      }
      return false;
    }
  }
};

const Message.Move({ x, y }) = ...;

Future Object Extractor Syntax

It is possible that a future property-literal construction syntax that might be used by Algebraic Data Types or other constructors, i.e.:

const enum Message of ADT {
  Move{ x, y },
  Write(text),
  Quit
}

// ADT construction
const msg = Message.Move{ x: 10, y: 20 };

// property-literal construction potentially used by fixed-shape objects (i.e., "struct"),
// or a user-defined construction mechanism similar to tagged templates or via a built-in-symbol named method:
struct Point { x, y };
const pt = Point{ x: 10, y: 20 };

However, such a syntax is currently out of scope for this proposal. In addition, introducing an Identifier { syntax for extractors has the high likelihood of carving off too much syntax space that could be used by other proposals. As a result, an "Object Extractor" like syntax like const Point{ x, y } = p is not being considered part of this proposal at this time.

Prior Art

Related Proposals

Examples

The examples in this section use a desugaring to explain the underlying semantics, given the following helper:

function %InvokeCustomMatcherOrThrow%(extractor, subject, receiver) {
  if (typeof extractor !== "object" || extractor === null) {
    throw new TypeError();
  }
  const f = extractor[Symbol.customMatcher];
  if (typeof f !== "function") {
    throw new TypeError();
  }
  const result = f.apply(extractor, [subject, "list", receiver]);
  if (typeof result !== "object" || result === null) {
    throw new TypeError();
  }
  return result;
}

const C(x) = subject

Given,

class C {
  #data;
  constructor(data) {
    this.#data = data;
  }
  [Symbol.customMatcher](subject) {
    return #data in subject && [this.#data];
  }
}

const subject = new C("data");

const C(x) = subject;
x; // "data"

The statement

const C(x) = subject;

is approximately the same as the transposed representation

const [x] = %InvokeCustomMatcherOrThrow%(C, subject, undefined);

such that x results in the value "data".

const C(x, y) = subject

Given,

class C {
  #first;
  #second;
  constructor(first, second) {
    this.#first = first;
    this.#second = second;
  }
  [Symbol.customMatcher](subject) {
    return #first in subject && [this.#first, this.#second];
  }
}

const subject = new C(1, 2);

const C(x, y) = subject;
x; // 1
y; // 2

The statement

const C(x, y) = subject;

is approximately the same as the transposed representation

const [x, y] = %InvokeCustomMatcherOrThrow%(C, subject, undefined);

such that x and y result in the values 1 and 2, respectively.

const C(x, ...y) = subject

Given,

class C {
  #first;
  #second;
  #third;
  constructor(first, second, third) {
    this.#first = first;
    this.#second = second;
    this.#third = third;
  }
  [Symbol.customMatcher](subject) {
    return #first in subject && [this.#first, this.#second, this.#third];
  }
}

const subject = new C(1, 2, 3);

const C(x, ...y) = subject;
x; // 1
y; // [2, 3]

The statement

const C(x, ...y) = subject;

is approximately the same as the transposed representation

const [x, ...y] = %InvokeCustomMatcherOrThrow%(C, subject, undefined);

such that x and y result in the values 1 and [2, 3], respectively.

const C(x = -1, y) = subject

Given,

class C {
  #first;
  #second;
  constructor(first, second) {
    this.#first = first;
    this.#second = second;
  }
  [Symbol.customMatcher](subject) {
    return #first in subject && [this.#first, this.#second];
  }
}

const subject = new C(undefined, 2);

const C(x = -1, y) = subject;
x; // -1
y; // 2

The statement

const C(x = -1, y) = subject;

is approximately the same as the transposed representation

const [x = -1, y] = %InvokeCustomMatcherOrThrow%(C, subject, undefined);

such that x and y result in the values -1 and 2, respectively.

const C({ x }) = subject

Given,

class C {
  #data;
  constructor(data) {
    this.#data = data;
  }
  [Symbol.customMatcher](subject) {
    return #data in subject && [this.#data];
  }
}

const subject = new C({ x: 1, y: 2 });

const C({ x, y }) = subject;
x; // 1
y; // 2

The statement

const C({ x, y }) = subject;

is approximately the same as the transposed representation

const [{ x, y }] = %InvokeCustomMatcherOrThrow%(C, subject, undefined);

such that x and y have the values 1 and 2, respectively.

const C(D(x)) = subject

Given,

class C {
  #data1;
  constructor(data1) {
    this.#data1 = data1;
  }
  [Symbol.customMatcher](subject) {
    return #data1 in subject && [this.#data1];
  }
}

class D {
  #data2;
  constructor(data2) {
    this.#data2 = data2;
  }
  [Symbol.customMatcher](subject) {
    return #data2 in subject && [this.#data2];
  }
}

const subject = new C(D("data"));

const C(D(x)) = subject;
x; // "data"

The statement

const C(D(x)) = subject;

is approximately the same as the transposed representation

const [_a] = %InvokeCustomMatcherOrThrow%(C, subject, undefined);
const [x] = %InvokeCustomMatcherOrThrow%(D, _a, undefined);

such that x results in the value "data".

Custom Logic During Destructuring

Given,

const MapExtractor = {
  [Symbol.customMatcher](map) {
    const obj = {};
    for (const [key, value] of map) {
      obj[typeof key === "symbol" ? key : `${key}`] = value;
    }
    return [obj];
  }
};

const obj = {
    map: new Map([["a", 1], ["b", 2]])
};

const { map: MapExtractor({ a, b }) } = obj;
a; // 1
b; // 2

The statement

const { map: MapExtractor({ a, b }) } = obj;

is approximately the same as the transposed representation

const { map: _temp } = obj;
const [{ a, b }] = %InvokeCustomMatcherOrThrow%(MapExtractor, _temp, undefined);

such that a and b result in the values 1 and 2, respectively.

Regular Expressions

// potentially built-in as part of Pattern Matching
RegExp.prototype[Symbol.customMatcher] = function (value) {
  const match = this.exec(value);
  return !!match && [match];
};

const IsoDate = /^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/;
const IsoTime = /^(?<hours>\d{2}):(?<minutes>\d{2}):(?<seconds>\d{2})$/;
const IsoDateTime = /^(?<date>[^TZ]+)T(?<time>[^Z]+)Z/;

// match `input`, extract, and destructure (or throw if match fails) using...

// ...a nested-object extractor
const IsoDate({ groups: { year, month, day } }) = input;

// ...a nested-array extractor
const IsoDate([, year, month, day]) = input;

// concise, multi-step extraction using nested destructuring:
const IsoDateTime({
  groups: {
    date: IsoDate({ groups: { year, month, day } }),
    time: IsoTime({ groups: { hours, minutes, seconds }})
  }
}) = input;
// 1. Matches `input` via `IsoDatTime` RegExp and extracts `date` and `time`
// 2. Matches `date` via `IsoDate` RegExp and extracts `year`, `month`, and `day` as lexical bindings
// 3. Matches `time` vai `IsoTime` RegExp and extracts `hours`, `minutes`, and `seconds` as lexical bindings

Receivers: const obj.extractor(x) = subject

When the ExtractorMemberExpression results in a Reference, the receiver is preserved and passed on to the custom matcher:

Given,

class C {
  #f;
  constructor(f) {
    this.#f = f;
  }
  extractor = {
      [Symbol.customMatcher](subject, _kind, receiver) {
        return receiver.#f(subject);
      }
  };
}

const obj = new C(data => data.toUpperCase());
const subject = "data";

const obj.extractor(x) = subject;
x; // "DATA"

The statement

const obj.extractor(x) = subject;

is approximately the same as the transposed representation

const _receiver = obj;
const [x] = %InvokeCustomMatcherOrThrow%(_receiver.extractor, subject, _receiver);

such that x results in the value "DATA".

Potential Grammar

For the proposed grammar, see the specification text.

Potential Semantics

For the proposed semantics, see the specification text.

API

This proposal would adopt (and continue to align with) the behavior of Custom Matchers from the Pattern Matching proposal:

  • A Custom Matcher is a regular ECMAScript Object value with a [Symbol.customMatcher] method that accepts a three arguments: subject (the value to match against), hint (either "boolean" or "list"), and receiver, and returns either a Boolean or an Iterable, depending on the value of hint.
  • When hint is "boolean", the return value will be coerced to a Boolean via the ToBoolean() abstract operation. When hint is "list", the return value value must either be an Iterable object or a falsy value (i.e., false, 0, null, undefined, etc.).
    • In Pattern Matching, a hint of "boolean" can be used to avoid the costly allocation of an Iterable object when the return value won't be used for further matching (such as with x is C), while a hint of "list" indicates further matching will be performed on the result (such as with x is C(1, 2)).
    • For destructuring and binding patterns (e.g., this proposal), hint will always be "list".

Relation to Pattern Matching

We believe that Extractors would also be extremely valuable as part of the Pattern Matching Proposal, and intend to discuss adoption with the champions should this proposal be adopted.

Extractors could easily be added to MatchPattern using the same syntax as proposed for destructuring, which would allow for more concise and potentially more readily understood code:

match (opt) {
  // without extractors
  when (${Option.Some} with [value]): ...;

  // with extractors
  when (Option.Some(value)): ...;
}

match (msg) {
  // without extractors
  when (${Message.Move} with { x, y }): ...;

  // with extractors
  when (Message.Move({ x, y })): ...;
}

This is even more evident with respect to complex, nested patterns:

match (opt) {
  // without extractors
  when (${Option.Some} with [${Message.Move} with { x, y }]): ...;
  when (${Option.Some} with [${Message.Write} with [text]]): ...;

  // with extractors
  when (Option.Some(Message.Move({ x, y }))): ...;
  when (Option.Some(Message.Write(text))): ...;
}

Relation to Enums and Algebraic Data Types

We strongly believe that ECMAScript will eventually adopt some form of the current enum proposal, given the particular value that Algebraic Data Types could provide. The Enum proposal would strongly favor consistent and coherent syntax between declaration, construction, destructuring, and pattern matching, as in the following example:

enum Message of ADT {
  Quit,
  Move({x, y}),
  Write(message),
  ChangeColor(r, g, b),
}

// construction
const msg1 = Message.Move({ x: 10, y: 10 });
const msg2 = Message.Write("Hello");
const msg3 = Message.ChangeColor(0x00, 0xff, 0xff);

// destructuring
const Message.Move({x, y}) = msg1;          // x: 10, y: 10
const Message.Write(message) = msg2;        // message: "Hello"
const Message.ChangeColor(r, g, b) = msg3;  // r: 0, g: 255, b: 255

// pattern matching
match (msg) {
  when Message.Move({x, y}): ...;
  when Message.Write(message): ...;
  when Message.ChangeColor(r, g, b): ...;
  when Message.Quit: ...;
}

Here, declaration, construction, destructuring, and pattern matching are consistent for ADT enum members and values:

enum Message of ADT { Move({ x, y }) }      // declaration
const msg =   Message.Move({ x, y });       // construction
const         Message.Move({ x, y }) = msg; // destructuring
match (msg) {
  when        Message.Move({ x, y }): ...;  // pattern matching
}
enum Message of ADT { Write(message) }      // declaration
const msg =   Message.Write(message);       // construction
const         Message.Write(message) = msg; // destructuring
match (msg) {
  when        Message.Write(message): ...;  // pattern matching
}

As noted before, it is possible that a future property-literal construction syntax that might be used by Algebraic Data Types or other constructors. Adding a matching syntax for object extraction would be the responsibility of that proposal and is out of scope for this proposal.

TODO

The following is a high-level list of tasks to progress through each stage of the TC39 proposal process:

Stage 1 Entrance Criteria

  • Identified a "champion" who will advance the addition.
  • Prose outlining the problem or need and the general shape of a solution.
  • Illustrative examples of usage.
  • High-level API.

Stage 2 Entrance Criteria

Stage 3 Entrance Criteria

Stage 4 Entrance Criteria

  • Test262 acceptance tests have been written for mainline usage scenarios and merged.
  • Two compatible implementations which pass the acceptance tests: [1], [2].
  • A pull request has been sent to tc39/ecma262 with the integrated spec text.
  • The ECMAScript editor has signed off on the pull request.