Speeding up the JavaScript ecosystem - Rust and JavaScript Plugins
📖 tl;dr: Up until recently, supporting JavaScript in Rust based tools has been deemed not worth it. The main concern is the overhead of the de-/serialization cost when sending data back and forth. But there is a way to get rid of the deserialization cost entirely that's not widely known.
- Part 1: PostCSS, SVGO and many more
- Part 2: Module resolution
- Part 3: Linting with eslint
- Part 4: npm scripts
- Part 5: draft-js emoji plugin
- Part 6: Polyfills gone rogue
- Part 7: The barrel file debacle
- Part 8: Tailwind CSS
- Part 9: Server Side JSX
- Part 10: Isolated Declarations
- Part 11: Rust and JavaScript Plugins
Over the past year (2024) there has been a strong movement to rewrite JavaScript tools in Rust to make them faster. Rust is well suited for this as it runs much closer to hardware and doesn't rely on garbage collection. This makes it an ideal candidate for computationally intensive tasks. Linting in its basic form is such a task, as it involves parsing and traversing lots of source code.
But there is a big elephant in the room: Plugins.
The Rust plugin dilemma
To me the main reason the JavaScript ecosystem has flourished in the past years is the pluggability of our tools. You want to explore a custom template syntax for your framework? Just write your own babel plugin. You want to extend ESlint with custom rules? Just write a plugin. The list goes on and on.
Rust on the other hand being a different language is less accessible to JavaScript developers. There is a shared concern that by interfacing with JavaScript you'd lose the performance gains achieved with going with Rust in the first place due to the overhead of serializing and deserializing messages between the two languages. And that's for good reason, that part is costly.
For this reason there is a strong desire to stay in Rust-land as much as possible. Some tooling developers even go so far as ditching JavaScript entirely and favouring some form of query language like GritQL as a way to extend their tools. But this in turn creates new problems: No JavaScript developer knows that query language. How do you debug it? How do you test it? etc.
Is there any chance JavaScript plugins can ever be made fast together with Rust?
Serialization and deserialization costs
Solving this exact problem became my main task at the end of 2024. We wanted to add support for JavaScript plugins in Deno's built in linter. The goal was to have a plugin system that is compatible with ESLint, which means the AST format we're going to use is TSESTree which itself is a superset of ESTree with support for TypeScript and JSX.
The first prototype relied on plain JSON as the message format. It's well supported and pretty much every language ships with an optimized JSON parser out of the box. As a test case we picked the checker.ts
file in TypeScript's code base. It contains 53036
lines of code and ends up creating 610k
AST nodes in total. Enough to stress test the serialization and deserialization aspect.
Task | Time (in s) |
---|---|
serde-json + JSON.parse() |
2.91s |
And yeah, as expected, it's slow. However, this gives us a good reference point to compare future optimizations to. Whilst there are various strategies to reduce the serialization overhead by going with a custom format, we need something that's multiple order of magnitudes faster.
What if we could get rid of the deserialization overhead entirely? What if the JavaScript side could traverse the AST tree by walking the received buffer directly without ever having to deserialize it? Instead of creating 610k
AST nodes, we would create zero nodes on the JavaScript side.
Flattening the AST
The tricky bit is that ASTs are a tree structure. They're composed of objects pointing to other objects and we somehow need to represent it as a flat list. That flat list needs to be able to be expressed in numbers in a buffer, like a Uint8Array
. What's worse is that every node has different amounts of object properties and few have a similar shape.
// Input: if (condition) { foo() }
const ast = {
type: "IfStatement",
test: {
type: "Identifier",
name: "condition",
},
consequent: {
type: "BlockStatement",
body: [
{
type: "ExpressionStatement",
expression: {
type: "CallExpression",
// ...
},
},
],
},
alternate: null,
};
When you flatten a recursive structure, you typically have to either insert start and end markers in your message or pre-calculate offsets to the relevant nodes in your buffer. This is necessary to know when a node begins and ends. It's doable but that bookkeeping overhead quickly becomes annoying to keep track of.
A much more elegant approach to this problem is to store every AST node in an array and leverage the index into that array as the node's id
. Instead of having nodes pointing to other nodes directly, you'll point to other nodes via its id
.
const ast = [
// First node at index 0 is a placeholder as we'll use the
// index 0 as an empty value
{ type: "" },
{
type: "IfStatement",
test: 2, // Points to index 2
consequent: 3, // Points to index 3
alternate: 0, // Empty, points to placeholder node at index 0
},
{
type: "Identifier",
name: "foo",
},
{
type: "BlockStatement",
body: [3],
},
// ...etc
];
This flattens the tree so that we don't have to deal with the recursiveness anymore. We still need to deal with the variable amount of properties though. Consequently, a way to make every node the same shape is needed.
For our use case of linting the AST tree, we can safely assume that walking the AST tree is the most common operation. We need to visit parent nodes first and then all their children in order. But we don't really care under which property a child node can be found. What matters is that we visit all child nodes in the correct order.
We can exploit that aspect by extracting out all properties into a separate data structure. The nodes itself will only keep the type
, a child
, next
and parent
pointer. Theoretically, you could drop the parent
pointer, but it's going to make our life easier because linting rules in ESLint can traverse upwards at any time. The node itself only contains information how to traverse it, without any knowledge of its properties.
const ast = {
properties: [
{},
{ test: 2, consequent: 3, alternate: 0 },
{ name: "foo" },
{ body: [3] },
],
nodes: [
// First node at index 0 is a placeholder as we'll use the
// index 0 as an empty value
{ type: "", child: 0, next: 0, parent: 0 },
{
type: "IfStatement",
child: 2, // Points to index 2
next: 0,
parent: 0,
},
{
type: "Identifier",
child: 0,
next: 3, // Point to next child of IfStatement
parent: 1, // Parent node is IfStatement
},
{
type: "BlockStatement",
child: 3,
next: 0,
parent: 1, // Parent node is IfStatement
},
// ...etc
],
};
With the new shape, we can traverse a node and its children by walking the child
pointers, followed by iterating over the next
pointers. Traversal becomes very simple and can be expressed with very few lines of code.
function traverse(buf, idx) {
// Check if a lint rule matches
if (matches(buf, idx)) {
visit(buf, idx);
}
// Traverse over children
const childIdx = readChild(buf, idx);
if (childIdx > 0) {
traverse(buf, childIdx);
}
// Traverse next siblings
const nextIdx = readNext(buf, idx);
if (nextIdx > 0) {
traverse(buf, nextIdx);
}
}
The reason traversal has become so simple is that the node's data structure only ever contains traversable properties like child
and next
. We never have to check if a property is traversable or not anymore. This seems like a minor detail, but reduces the number of checks you have to do during traversal and typically makes code simpler.
The only part of variable length left is the type
property. We can get rid of that variableness by storing strings in a string table and referring to a string by its index into that table instead. This has the neat benefit that it also compresses our message at the same time, because we only ever store a particular string once in our message.
const ast = {
stringTable: ["", "IfStatement", "Identifier", "BlockStatement"],
properties: [
{},
{ test: 2, consequent: 3, alternate: 0 },
{ name: "foo" },
{ body: [3] },
],
nodes: [
// First node at index 0 is a placeholder as we'll use the
// index 0 as an empty value
{ type: 0, child: 0, next: 0, parent: 0 },
{
type: 1, // StringTable index 1
child: 2, // Points to index 2
next: 0,
parent: 0,
},
{
type: 2, // StringTable index 2
child: 0,
next: 3, // Point to next child of IfStatement
parent: 1, // Parent node is IfStatement
},
{
type: 3, // StringTable index 3
child: 3,
next: 0,
parent: 1, // Parent node is IfStatement
},
// ...etc
],
};
Effectively, our new format allows us to represent a node with just 4 numbers. We can drop the object property names and flatten the data even further by storing all numbers in the same array. Each node has the same length which allows us to easily calculate the offset of each node on the fly. We don't need to keep track of offsets anymore as the offset and the fixed size of a node gives us enough information to know where a node starts and ends.
[
// Empty placeholder node
0, // type
0, // child
0, // next
0, // parent
// IfStatement
1, // type
2, // child
0, // next
0, // parent
// Identifier
2, // type
0, // child
3, // next
1, // parent
// BlockStatement
3, // type
3, // child
0, // next
1, // parent
];
To get a node's offset in the buffer, we'll use this simple formula: index * 4
. It's as simple as that. Doing this calculation on the fly costs practically nothing.
Lazy deserialization
So far we've only focused on the traversal aspect, but when a lint rule matches we still need to materialize an actual object for the lint rule to consume. We don't want to expose the buffer directly to users and in fact they should be completely unaware of that.
However, we want to be really careful to only materialize nodes that are actually needed. If we'd materialize a node and all its children upfront we'd nullify a huge portion of the optimization work we've put in so far.
A solution to this problem is to go with a facade class that looks like a node, but has a bunch of getters getters
that surf over the underlying buffer instead. When you access a getter
in a lint rule that points to another node, we'll materialize that one on demand. We will only ever materialize a node the user asked for, never more.
class NodeFacade {
constructor(buffer, stringTable, propertyTable, index) {
this.#buffer = buffer;
this.#stringTable = stringTable;
this.#propertyTable = propertyTable;
this.#index = index;
}
get type() {
const typeNum = this.#buffer[this.#index];
return this.#stringTable[typeNum];
}
get test() {
const properties = this.#propertyTable[this.index];
const property = properties["test"];
if (isNodeIndex(property)) {
return new NodeFacade(
this.#buffer,
this.#stringTable,
this.#propertyTable,
property
);
} else if (isString(property)) {
// ...
}
}
// ... more getters here
}
Since there are way less than 255 unique properties for all AST nodes (JavaScript + TypeScript + JSX) combined, we can go even one step further and simply create getters for every property upfront (thanks to @jviide for that tip!). This might seem like an odd choice, because not every node has every property, but we can pretend to only have specific properties through typing it differently in TypeScript.
interface Visitor {
Identifier(node: { type: "Identifier"; name: string }): void;
IfStatement(node: {
type: "IfStatment";
test: Expression;
consequent: Statement;
alternate: Statement;
}): void;
BlockStatement(node: { type: "BlockStatement"; body: Statement[] }): void;
}
From a typing perspective the lint rule always gets the correct object, but in reality the object is an instance of our NodeFacade
class and has many more properties. The genius part here is that our code gets a lot shorter. We don't need to declare 170+ different classes for every AST node type. We just need a single one. We've hidden the fact that we're just surfing over a buffer object behind the scenes completely.
const eslintPlugin = {
name: "my-plugin",
rules: {
"my-rule": {
create(ctx) {
return {
IfStatement(node) {
// Just an example
if (node.test.type === "Identifier" && node.test.name === "foo") {
ctx.report({
node: node.test,
message: "Don't pass 'foo' to an IfStatement",
});
}
},
};
},
},
},
};
With all optimizations in place the number looks much better. Sure, there is still some overhead of running plugins compared to not running them at all, but it's gotten fast enough to be usable at this point.
Task | Time (in s) | Speedup |
---|---|---|
serde-json + JSON.parse() |
2.91s | - |
custom | 0.62s | ~4.7x |
The whole traversal and selector matching code only takes 20.4ms
for 610k
AST nodes. Most files have much less nodes, like around ~6000 nodes or fewer. Nearly all time is now exclusively consumed by the serialization step. There are some optimizations still left to explore that I haven't gotten around to. But for now it's fast enough.
Here is a quick demo of the lint plugin system in action:
Bonus: Supporting all the languages
With a generic NodeFacade
class we now have no aspect in our data format that's tied to the TSESTree or ESTree format anymore. The data format has no knowledge of what it contains. It just knows that it's a bunch of objects with a specific hierarchy encoded in them. It's completely independent of any language. We could pass in a CSS AST tree, a markdown AST tree or any AST from any other language and the plugin system would support it out of the box.
The key thing that enabled us to do that was to encode tree traversal in a generic way in our data format.
Conclusion
So there you have it. Rust can work together with JavaScript plugins in a performant way. I think that this is the future of JavaScript tooling. Keeping pluggability alive is very important to me and I think for the ecosystem at large. All this work described here was done in a span of two months and shipped in Deno 2.2. There is still some work left to do to support ESLint's config format, but you can already run (or write) raw ESLint plugins and have them run with deno lint
. Bartek even wired everything through the rest of Deno so that you can apply fixes, and see suggestions from your lint plugins right in your editor.
All in all, I think Rust is here to stay in our JavaScript tools. I'm excited to see where the future leads us and how we can make both Rust and JavaScript work even better together.