WebGPU How It Works

Let’s try to explain WebGPU by implementing something similar to what the GPU does with vertex shaders and fragment shaders but in JavaScript. Hopefully this will give you an intuitive feeling about what’s really going on.

If you’re familiar with Array.map, if you squint real hard you can get some idea of how these 2 different kinds of shader functions work. With Array.map you provide a function to transform a value.

Example:

const shader = v => v * 2;  // double the input
const input = [1, 2, 3, 4];
const output = input.map(shader);   // result [2, 4, 6, 8]

Above our “shader” for array.map is just a function that given a number, returns its double. That’s probably the closest analogy in JavaScript to what “shader” means. It’s a function that returns or generates values. You don’t call it directly. Instead, you specify it and then the system calls it for you.

For a GPU vertex shader you don’t map over an input array. Instead, you just specify a count of how many times you want the function to be called.

function draw(count, vertexShaderFn) {
  const internalBuffer = [];
  for (let i = 0; i < count; ++i) {
    internalBuffer[i] = vertexShaderFn(i);
  }
  console.log(JSON.stringify(internalBuffer));
}

One consequence is that unlike Array.map, we no longer need a source array to do something.

const shader = v => v * 2;
const count = 4;
draw(count, shader);
// outputs [0, 2, 4, 6]

The thing that makes GPU work complicated is that these functions run on a separate system in your computer, the GPU. This means all the data you create and reference has to somehow be sent to the GPU and you then need to communicate to the shader where you put that data and how to access it.

Vertex And Fragment shaders can take data in 6 ways. Uniforms, Attributes, Buffers, Textures, Inter-Stage Variables, Constants.

Uniforms

Uniforms are values that are the same for each iteration of the shader. Think of them as constant global variables. You can set them before a shader is run but, while the shader is being used, they remain constant, or to put it another way, they remain uniform.

Let’s change draw to pass uniforms to a shader. To do this we’ll make an array called bindings and use it to pass in the uniforms.
```
*function draw(count, vertexShaderFn, bindings) {
  const internalBuffer = [];
  for (let i = 0; i < count; ++i) {
*    internalBuffer[i] = vertexShaderFn(i, bindings);
  }
  console.log(JSON.stringify(internalBuffer));
}
```
And then let’s change our shader to use the uniforms
```
const vertexShader = (v, bindings) => {
  const uniforms = bindings[0];
  return v * uniforms.multiplier;
};
const count = 4;
const uniforms1 = {multiplier: 3};
const uniforms2 = {multiplier: 5};
const bindings1 = [uniforms1];
const bindings2 = [uniforms2];
draw(count, vertexShader, bindings1);
// outputs [0, 3, 6, 9]
draw(count, vertexShader, bindings2);
// outputs [0, 5, 10, 15]
```
So, the concept of uniforms hopefully seems pretty straight forward. The indirection through bindings is there because this is “similar” to how things are done in WebGPU. Like was mentioned above, we access the things, in this case the uniforms, by location/index. Here they are found in bindings[0].

Attributes (vertex shaders only)

Attributes provide per shader iteration data. In Array.map above, the value v was pulled from input and automatically provided to the function. This is very similar to an attribute in a shader.

The difference is, we are not mapping over the input, instead, because we are just counting, we need to tell WebGPU about these inputs and how to get data out of them.

Imagine we updated draw like this.

*function draw(count, vertexShaderFn, bindings, attribsSpec) {
  const internalBuffer = [];
  for (let i = 0; i < count; ++i) {
*    const attribs = getAttribs(attribsSpec, i);
*    internalBuffer[i] = vertexShaderFn(i, bindings, attribs);
  }
  console.log(JSON.stringify(internalBuffer));
}

+function getAttribs(attribs, ndx) {
+  return attribs.map(({source, offset, stride}) => source[ndx * stride + offset]);
+}

Then we could call it like this.

const buffer1 = [0, 1, 2, 3, 4, 5, 6, 7];
const buffer2 = [11, 22, 33, 44];
const attribsSpec = [
  { source: buffer1, offset: 0, stride: 2, },
  { source: buffer1, offset: 1, stride: 2, },
  { source: buffer2, offset: 0, stride: 1, },
];
const vertexShader = (v, bindings, attribs) => (attribs[0] + attribs[1]) * attribs[2];
const bindings = [];
const count = 4;
draw(count, vertexShader, bindings, attribsSpec);
// outputs [11, 110, 297, 572]

As you can see above, getAttribs uses offset, and stride to compute indices into the corresponding source buffer and pulls out values. The pulled out values are then sent to the shader. On each iteration attribs will be different.

 iteration |  attribs
 ----------+-------------
     0     | [0, 1, 11]
     1     | [2, 3, 22]
     2     | [4, 5, 33]
     3     | [6, 7, 44]

Raw Buffers

Buffers are effectively arrays, again for our analogy let’s make version of draw that uses buffers. We’ll pass these buffers via bindings like we did with uniforms.
```
const buffer1 = [0, 1, 2, 3, 4, 5, 6, 7];
const buffer2 = [11, 22, 33, 44];
const attribsSpec = [];
const bindings = [
  buffer1,
  buffer2,
];
const vertexShader = (ndx, bindings, attribs) => 
    (bindings[0][ndx * 2] + bindings[0][ndx * 2 + 1]) * bindings[1][ndx];
const count = 4;
draw(count, vertexShader, bindings, attribsSpec);
// outputs [11, 110, 297, 572]
```
Here we got the same result as we did with attributes except this time, instead of the system pulling the values out of the buffers for us, we calculated our own indices into the bound buffers. This is more flexible than attributes since we basically have random access to the arrays. But, it’s potentially slower for that same reason. Given the way attributes worked the GPU knows the values will be accessed in order which it can use to optimize. For example, in order access is usually cache friendly. When we calculate our own indices the GPU has no idea which part of a buffer we’re going to access until we actually try to access it.
Textures

Textures are 1d, 2d, or 3d arrays of data. Of course, we could implement our own 2d or 3d arrays using buffers. What’s special about textures is they can be sampled. Sampling means that we can ask the GPU to compute a value between the values we supply. We’ll cover that this means in the article on textures. For now, let’s make a JavaScript analogy again.

First we’ll create a function textureSample that samples an array between values.
```
function textureSample(texture, ndx) {
  const startNdx = ndx | 0;  // round down to an int
  const fraction = ndx % 1;  // get the fractional part between indices
  const start = texture[startNdx];
  const end = texture[startNdx + 1];
  return start + (end - start) * fraction;  // compute value between start and end
}
```
A function something like that already exists on the GPU.

Now let’s use that in a shader.
```
const texture = [10, 20, 30, 40, 50, 60, 70, 80];
const attribsSpec = [];
const bindings = [
  texture,
];
const vertexShader = (ndx, bindings, attribs) =>
    textureSample(bindings[0], ndx * 1.75);
const count = 4;
draw(count, vertexShader, bindings, attribsSpec);
// outputs [10, 27.5, 45, 62.5]
```
When ndx is 3 we’ll pass in 3 * 1.75 or 5.25 into textureSample. That will compute a startNdx of 5. So we’ll pull out indices 5 and 6 which are 60 and 70. fraction becomes 0.25, so we’ll get 60 + (70 - 60) * 0.25 which is 62.5.

Looking at the code above we could write textureSample ourselves in our shader function. We could manually pull out the 2 values and interpolate between them. The reason the GPU has this special functionality is it can do it much faster and, depending on the settings, it may read as many as sixteen 4-float values to produce one 4-float value for us. That would be a lot of work to do manually.

Inter-Stage Variables (fragment shaders only)

Inter-Stage Variables are outputs from a vertex shader to a fragment shader. As was mentioned above, a vertex shader outputs positions that are used to draw/rasterize points, lines, and triangles.

Let’s imagine we’re drawing a line. Let’s say our vertex shader was run twice, the first time it output the equivalent of 5,0 and the second time the equivalent of 25,4. Given those 2 points the GPU will draw a line from 5,0 to 25,4 exclusive. To do this it will call our fragment shader 20 times, once for each of the pixels on that line. Each time it calls our fragment shader it’s up to us to decide what color to return.

Let’s assume we have pair of functions that help us draw a line between 2 points. The first function computes how many pixel’s we need to draw and some values to help draw them. The second takes that info plus a pixel number and gives us a pixel position. Example:

const line = calcLine([10, 10], [13, 13]);
for (let i = 0; i < line.numPixels; ++i) {
  const p = calcLinePoint(line, i);
  console.log(p);
}
// prints
// 10,10
// 11,11
// 12,12

Note: How calcLine and calcLinePoint work are unimportant, what’s important is that they do work and let the loop above provide the pixel positions for a line. Though if you’re curious, see the live code example near the bottom of the article.

So, let’s change our vertex shader so it outputs 2 values per iteration. We could do that in many ways. Here’s one.

const buffer1 = [5, 0, 25, 4];
const attribsSpec = [
  {source: buffer1, offset: 0, stride: 2},
  {source: buffer1, offset: 1, stride: 2},
];
const bindings = [];
const dest = new Array(2);
const vertexShader = (ndx, bindings, attribs) => [attribs[0], attribs[1]];
const count = 2;
draw(count, vertexShader, bindings, attribsSpec);
// outputs [[5, 0], [25, 4]]

Now let’s write some code that loops over points 2 at a time and calls rasterizeLines to rasterize a line.

function rasterizeLines(dest, destWidth, inputs, fragShaderFn, bindings) {
  for (let ndx = 0; ndx < inputs.length - 1; ndx += 2) {
    const p0 = inputs[ndx    ];
    const p1 = inputs[ndx + 1];
    const line = calcLine(p0, p1);
    for (let i = 0; i < line.numPixels; ++i) {
      const p = calcLinePoint(line, i);
      const offset = p[1] * destWidth + p[0];  // y * width + x
      dest[offset] = fragShaderFn(bindings);
    }
  }
}

We can update draw to use that code like this

-function draw(count, vertexShaderFn, bindings, attribsSpec) {
+function draw(dest, destWidth,
+              count, vertexShaderFn, fragmentShaderFn,
+              bindings, attribsSpec,
+) {
  const internalBuffer = [];
  for (let i = 0; i < count; ++i) {
    const attribs = getAttribs(attribsSpec, i);
    internalBuffer[i] = vertexShaderFn(i, bindings, attribs);
  }
-  console.log(JSON.stringify(internalBuffer));
+  rasterizeLines(dest, destWidth, internalBuffer,
+                 fragmentShaderFn, bindings);
}

Now we’re actually using internalBuffer 😃!

Let’s update the code that calls draw.

const buffer1 = [5, 0, 25, 4];
const attribsSpec = [
  {source: buffer1, offset: 0, stride: 2},
  {source: buffer1, offset: 1, stride: 2},
];
const bindings = [];
const vertexShader = (ndx, bindings, attribs) => [attribs[0], attribs[1]];
const count = 2;
-draw(count, vertexShader, bindings, attribsSpec);

+const width = 30;
+const height = 5;
+const pixels = new Array(width * height).fill(0);
+const fragShader = (bindings) => 6;

*draw(
*   pixels, width,
*   count, vertexShader, fragShader,
*   bindings, attribsSpec);

If we print pixels as a rectangle where 0 becomes . we’d get this

.....666......................
........66666.................
.............66666............
..................66666.......
.......................66.....

Unfortunately, our fragment shader gets no input that changes each iteration so there is no way to output anything different for each pixel. This is where inter-stage variables come in. Let’s change our first shader to output an extra value.

const buffer1 = [5, 0, 25, 4];
+const buffer2 = [9, 3];
const attribsSpec = [
  {source: buffer1, offset: 0, stride: 2},
  {source: buffer1, offset: 1, stride: 2},
+  {source: buffer2, offset: 0, stride: 1},
];
const bindings = [];
const dest = new Array(2);
const vertexShader = (ndx, bindings, attribs) => 
-    [attribs[0], attribs[1]];
+    [[attribs[0], attribs[1]], [attribs[2]]];

...

If we changed nothing else, after the loop inside draw, internalBuffer would have these values

 [ 
   [[ 5, 0], [9]],
   [[25, 4], [3]],
 ]

We can easily compute a value from 0.0 to 1.0 that represents how far along the line we are. We can use this to interpolate the extra value we just added.

function rasterizeLines(dest, destWidth, inputs, fragShaderFn, bindings) {
  for(let ndx = 0; ndx < inputs.length - 1; ndx += 2) {
-    const p0 = inputs[ndx    ];
-    const p1 = inputs[ndx + 1];
+    const p0 = inputs[ndx    ][0];
+    const p1 = inputs[ndx + 1][0];
+    const v0 = inputs[ndx    ].slice(1);  // everything but the first value
+    const v1 = inputs[ndx + 1].slice(1);
    const line = calcLine(p0, p1);
    for (let i = 0; i < line.numPixels; ++i) {
      const p = calcLinePoint(line, i);
+      const t = i / line.numPixels;
+      const interStageVariables = interpolateArrays(v0, v1, t);
      const offset = p[1] * destWidth + p[0];  // y * width + x
-      dest[offset] = fragShaderFn(bindings);
+      dest[offset] = fragShaderFn(bindings, interStageVariables);
    }
  }
}

+// interpolateArrays([[1,2]], [[3,4]], 0.25) => [[1.5, 2.5]]
+function interpolateArrays(v0, v1, t) {
+  return v0.map((array0, ndx) => {
+    const array1 = v1[ndx];
+    return interpolateValues(array0, array1, t);
+  });
+}

+// interpolateValues([1,2], [3,4], 0.25) => [1.5, 2.5]
+function interpolateValues(array0, array1, t) {
+  return array0.map((a, ndx) => {
+    const b = array1[ndx];
+    return a + (b - a) * t;
+  });
+}

Now we can use those inter-stage variables in our fragment shader

-const fragShader = (bindings) => 6;
+const fragShader = (bindings, interStageVariables) => 
+    interStageVariables[0] | 0; // convert to int

If we ran it now we’d see results like this

.....988......................
........87776.................
.............66655............
..................54443.......
.......................33.....

The first iteration of the vertex shader output [[5,0], [9]] and the 2nd iteration output [[25,4], [3]] and you can see, as the fragment shader was called, the 2nd value of each of those was interpolated between the two values.

We could make another function mapTriangle that given 3 points rasterized a triangle calling the fragment shader function for each point inside the triangle. It would interpolate the inter-stage variables from 3 points instead of 2.

Here are all the examples above running live in case you find it useful to play around with them to understand them.

click here to open in a separate window

What happens in the JavaScript above is an analogy. The details of how inter-stage variables are actually interpolated, how lines are drawn, how buffers are accessed, how textures are sampled, uniforms, attributes specified, etc… are different in WebGPU, but the concepts are very similar so I hope this JavaScript analogy provided some help in getting a mental model of what’s happening.

Why is it this way? Well, if you look at draw and rasterizeLines you might notice that each iteration is entirely independent of the other iterations. Another way to say this, you could process each iteration in any order. Instead of 0, 1, 2, 3, 4 you could process them 3, 1, 4, 0, 2 and you’d get the exact same result. The fact that they are independent means each iteration can be run in parallel by a different processor. Modern 2021 top end GPUs have 10000 or more processors. That means up to 10000 things can be run in parallel. That is where the power of using the GPU comes from. By following these patterns the system can massively parallelize the work.

The biggest limitations are:

A shader function can only reference its inputs (attributes, buffers, textures, uniforms, inter-stage variables).
A shader can not allocate memory.
A shader has to be careful if it references things it writes to, the thing it’s generating values for.

When you think about it this makes sense. Imagine fragShader above tried to reference dest directly. That would mean when trying to parallelize things it would be impossible to coordinate. Which iteration would go first? If the 3rd iteration referenced dest[0] then the 0th iteration would need to run first but if the 0th iteration referenced dest[3] then the 3rd iteration would need to run first.

Designing around this limitation also happens with CPUs and multiple thread or processes but in GPU land, with up to 10000 processors running at once, it takes special coordination. We’ll try to cover some of the techniques in other articles.

Questions? Ask on stackoverflow.

Suggestion? Request? Issue? Bug?

webgpufundamentals.org

WebGPU How It Works