Executive Summary

In higher dimensional spaces, vectors sampled from typical random distributions form a hyperspherical shell.
slerp, a.k.a. ‘spherical linear interpolation’, is often recommended for traversing these spaces instead of linear interpolation (lerp).
I visualize the formation of these hypershells, and provide some explanations for why they form.
I present slerp2, a modification of slerp which performs better in lower dimensions and for pessimized cases in higher dimensions.
I experiment with each interpolation scheme in StyleGAN2’s latent ‘Z’-space. In this case, the choice of interpolation scheme makes no difference to the results.

Slerp2 and Higher-Dimensional Space, Projected⌗

Why explore hyperspace?⌗

Hyperspace might seem exotic - and it is - but learning better ways to explore it can be both useful and (dare I say) fun.

Spaces with dimensionality higher than 3 or 4 might be outside of our lived experience, but that doesn’t mean they don’t exist, or that they can’t be useful. For example, hyperspaces frequently shows up in neural networks. By learning to explore hyperspace, we can try gain a better understanding of how they work.

The motivating example for this article comes from the sub-field of Generative Adversarial Networks.

The following portraits were generated by StyleGAN2 (Karras, et. al):

A painting of a woman, looking to the left

A painting of a man, looking to the right

To dramatically oversimplify, these neural networks take an input vector, do some transformations on it, and produce an image. The input vector is typically random noise, and the vector is often quite long - hundreds or thousands of elements long.

These vectors make up the latent space of the model. Because these vectors are long, the latent space is high dimensional. Manipulating the outputs of these models relies on us being able to chart paths through their latent space.

As an example: if we want to smoothly blend from the first painting above to the second, we need a way to traverse from the vector representing one image to the vector representing the other.

The obvious answer (is wrong)⌗

To get from point A to point B, the obvious answer is to go in as straight a line as possible. The simplest answer here is also the shortest. Indeed, when working with latent spaces, going in a straight line does generally work, with varying degrees of success. This is known as linear interpolation (lerp), and would be written something like this:

    def lerp(fraction, start_vec, stop_vec):
        return start_vec + fraction * (stop_vec - start_vec)

In 3D-space, this looks like:

Note: Most of the plots on this page are interactive! Have a play!

While exploring this topic, I came across a befuddling thread which suggested that the best path was not, in fact, a straight line. Rather, a function called slerp, or “spherical linear interpolation”, was suggested. This has the rather complicated functional form:

    def slerp(fraction, start_vec, stop_vec):
        omega = np.arccos(np.clip(np.dot(start_vec/np.linalg.norm(start_vec), stop_vec/np.linalg.norm(stop_vec)), -1, 1))
        so = np.sin(omega)
        # Revert to linear interpolation if the two vectors are pi radians apart
        if so == 0:
            return (1.0 - fraction) * start_vec + fraction * stop_vec # L'Hopital's rule/LERP

        return np.sin((1.0-fraction)*omega) / so * start_vec + np.sin(fraction*omega) / so * stop_vec

Even more confusingly, when plotted in 3D space, this function gives a path that looks like this:

When thinking about what a “good” interpolation path might look like, a few different ideas come to mind. We want it to be:

Smooth - in my experience, jagged, jerky paths do not work well for blending between two latent vectors.
Relatively short, since we care about capturing the changes between two specific points, rather than going sightseeing to irrelevant destinations
Well-trodden - We’ve trained our neural net on a limited set of data. If our interpolation takes us far outside anything the neural net has ever seen, it’s unlikely to perform well.

Looking at slerp, we can see:

It is smooth
It doesn’t look particularly short. In fact, it’s much longer than our straight-line path.
It doesn’t seem to stick particularly closely to the data we’ve trained the network on. In fact, it sometimes goes outside of our (-1, 1) domain entirely!

In 2D and 3D space, linear interpolation simply doesn’t have the issues that slerp does. Yet, slerp is consistently recommended.

Clearly, something about hyperspace behaves very non-intuitively!

Strap in… because it’s time to go exploring.

Windows into Hyperspace⌗

Let’s start with the concept of vectors.

A vector is just a collection of numbers, arranged in a single column or row, like so:

\[ \begin{align} \vec{v}_{3} = \begin{bmatrix} 1.9 \\ 4.7 \\ -3.1 \\ \end{bmatrix} \end{align} \]

An n-dimensional vector is \(n\) items long:

\[ \begin{align} \vec{v}_{n} = \begin{bmatrix} x_{1} \\ \vdots \\ x_{n} \end{bmatrix} \end{align} \]

Vectors can be used to represent all sorts of things, but here we’re going to use them to represent cartesian coordinates.

1-space⌗

If we had only 1 spatial dimension to play with, we could represent every possible position with a 1-dimensional vector:

\[ \begin{align} \vec{v}_{1} = \begin{bmatrix} x_{1} \\ \end{bmatrix} \end{align} \]

If we were to fill our space with lots of random points, uniformly distributed from -1 to 1, it would look like this:

Hopefully, this result is pretty unsurprising.

2-space⌗

If we extend our vectors into two dimensions, and perform the same exercise, we’ll get something like this:

For every possible location in this space, we can define an exact point through something like:

\[ \begin{align} \vec{v}_{2} = \begin{bmatrix} -0.85 \\ 0.24 \\ \end{bmatrix} \end{align} \]

3-space⌗

Extending up to 3D is quite straightforward, where we now have 3-long vectors like this:

\[ \begin{align} \vec{v}_{3} = \begin{bmatrix} 0.26 \\ -0.88 \\ -0.9 \\ \end{bmatrix} \end{align} \]

Let’s again scatter some points uniformly between -1 and 1, this time in 3 dimensions:

We’re very used to looking at 3D space through these kinds of visualizations, where our brain can reconstruct a series of 2D images into a 3D representation.

Flattening Space⌗

What if we wanted to look at this 4D vector inside its vector space:

\[ \begin{align} \vec{v}_{4} = \begin{bmatrix} 0.93 \\ -0.43 \\ 0.67 \\ 0.12 \\ \end{bmatrix} \end{align} \]

We could try using time as an extra dimension, but we’ve already run out of spatial dimensions.

Of course, we want to go far beyond a mere four dimensions. Even if we used time, how would we visualize something like this?

\[ \begin{align} \vec{v}_{1000} = \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{1000} \\ \end{bmatrix} \end{align} \]

Projecting⌗

To glimpse higher dimensions, we’re necessarily going to need to make compromises. With up to 3 dimensions to play with, any given viewport will need to choose what information to show and what to hide.

A natural way to project higher dimensions is to just take the first \(n\) dimensions we can display, and ignore the rest.

We can visualize what this looks like by creating a vector space in three dimensions, and visualizing it with two.

If we want to display the vector:

\[ \begin{align} \vec{v} = \begin{bmatrix} 0.21 \\ -0.85 \\ -0.32 \\ \end{bmatrix} \end{align} \]

We can display the first 2 elements, i.e.:

\[ \begin{align} \vec{c}_2 = \begin{bmatrix} 0.21 \\ -0.85 \\ \end{bmatrix} \end{align} \]

Where \(\vec{c}_2\) represents a cartesian projection down to 2 dimensions.

We can write this as an equation:

\[ \vec{v}_3 \mapsto \vec{c}_2 \]

Where the arrow \(\mapsto\) means “maps to”.

Visualized, it looks like so:

We can pick any 2 elements to display, of course. Representing our 3-space in 2 dimensions could be done equally validly by picking two different elements, such as the last element \(x_{3}\) and the second element \(x_{2}\):

\[ \begin{align} \vec{v} = \begin{bmatrix} -0.32 \\ -0.85 \\ \end{bmatrix} \end{align} \]

What does our 2D projection tell us about the 3D space? Well, we effectively get the same view as if we rotated our 3D view until we were just looking at one face.

If we’re plotting, say, \(x_{1}\) and \(x_{2}\), we get a perfect understanding of how our points are distributed in those two dimensions.

Should we want to know what portion of points have \(x_{1}\) > 0 and \(x_{2}\) < 0, we can look at the 2D chart and easily see the answer is ~25%.

However, we get absolutely no information about the rest of our vector. It wouldn’t matter if we were plotting a vector of length 3 or a vector of length 3000 - from this viewpoint, they all look the same.

Different Projections⌗

So far, we’ve been exploring space with cartesian coordinates.

Without completely justifying it, I’m going to introduce a completely different coordinate system - spherical coordinates.

Most people are used to cartesian coordinates. In the following image, it seems natural to define the position of the red cross based on two distances, which we typically call x and y.

We could represent this point as a vector:

\[ \begin{align} \vec{v}_2 = \begin{bmatrix} x \\ y \\ \end{bmatrix} \end{align} \]

In higher dimensions, we can add more directions, provided they are perpendicular to all the other directions. Hence, for 3d, we might use (x, y, z).

In a spherical coordinate system, however, a point in space is defined not by \(n\) orthogonal coordinates (e.g. x, y, and z), but rather as a radial distance \(r\), and then a series of angles.

To fully describe any point in 2D-space, we need two coordinates. Since we already have one (the distance from the origin \(r\)), we need one more. Hence, a 2D spherical coordinate system would have one angle, \(\theta_1\).

We can also represent this point as a vector:

\[ \begin{align} \vec{s}_2 = \begin{bmatrix} r \\ \theta_{1} \\ \end{bmatrix} \end{align} \]

Notice that both \(\vec{v}_2\) and \(\vec{s}_2\) refer to the exact same point in space. The actual numbers inside the vectors, and the coordinate system used are very different, but the point in space is the same.

Adding Dimensions⌗

In 3-space, we need a third coordinate. For cartesian coordinates, we add z to our existing x and y. For spherical coordinates, we add another angle \(\theta_2\).

These two vectors represent the same position:

\[ \begin{align} \vec{v} = \begin{bmatrix} 0.54 \\ -0.87 \\ 0.26 \\ \end{bmatrix}_{[x,y,z]} = \vec{s} = \begin{bmatrix} 1.06 \\ -1.02 \\ 1.32 \\ \end{bmatrix}_{[r, \theta_1, \theta_2]} \end{align} \]

Why bother with spherical coordinates?⌗

How does this help us? After all, you still need an n-length vector to represent a point in n-space.

What’s interesting, however, is when you start looking at higher dimensions. Since the length \(r\) takes into account the entire vector, plotting the first 2 or 3 elements in the spherical vector gives us a different view on higher dimensions.

Importantly, we always keep the magnitude of the full vector when using spherical coordinates.

We then get to select 1 angle (for a 2D plot) or 2 angles (for a 3D plot). These angles represent the relative positioning between some, but not all elements of the vector.

Earlier, we projected higher-dimensional space into 2D and 3D cartesian plots. We got to pick 2 elements from our larger vector, and had to throw away the rest.

We have to do a similar thing in spherical coordinates. However, we always keep the magnitude. This means that we’re left with the ability to pick one angle (for a 2d plot) or 2 angles (for a 3d plot) from our larger vector.

Below, you can increase the dimensionality of the space being visualized.

Before you do, make a guess about what you think will happen as the number of dimensions increases.

Remember, we’re keeping the vector magnitude, but can only keep one angle (for the 2D plot) or 2 angles (for the 3D plot).

How many dimensions do you think we can plot before the spherical projection will start to look different to the cartesian projection?

You can also change the noise distribution to:

What’s going on?⌗

Our projection has shown us an unintuitive, but true, fact about hyperspace - as the dimensionality increases, our points converge to a hyperspherical shell. The radius of this shell scales with the square root of our initial distribution’s variance, \(\sqrt{\sigma^2} = \sigma\), and with the square root of our dimensionality, \(\sqrt{n}\).

The exact formula for the radius varies depending on the type of noise used (results for a uniform distribution and this great post with results for normal distributions).

For both uniform and normal distributions, the hyperspherical shell has a relatively constant thickness as the dimensionality increases, leading to an increasingly shell-like distribution of points.

What does this mean?⌗

In lower dimension spaces (2D, 3D, etc.) the radius of our hypershell is of the same order as the variance of the distribution. This means that, in general, there isn’t much of a “hole” at the origin. However, even in 3D (using our 2D plot), we start to see a gap open up near the origin.

Below are two different ways to interpret the existence of a hyperspherical shell.

Geometric Interpretation⌗

As explained in John D. Cook’s post, volume grows faster in higher dimensions. For our uniform distribution, our probability density is constant between its bounds of (-1, 1), and so we can pretty much ignore it.

Volume, however, is proportional to \(r^n\), where \(r\) is the distance from the origin and \(n\) is the dimensionality of our space. If \(n = 1000\) dimensions, the difference between a sphere of radius 0.999 and radius 1.000 is

\[ 1.000^{1000} - 0.99^{1000} \approx 0.9999 \]

In other words, >99% of all of our volume is contained in an outer shell, with thickness of 1% the radius of our space. The reason this collosal growth of volume with radius is not intuitive, is because in 3D, the same calculation would give around 3% of the volume in the outermost shell of our sphere:

\[ 1.000^3 - 0.99^3 \approx 0.03 \]

Hence, even though our probability density function is constant in space, when we go to higher dimensions, the amount of space near the origin is astronomically low, and the amount at the outer perimeter is astronomically high.

Because so much space is so far out, our points will inevitably “cluster” there.

Statistical Interpretation⌗

We can also think about this result statistically.

All the elements in our vectors are independent and identically distributed. The more elements we have, the more we will expect to see strong statistical trends in the overall properties of our vector, even while individual elements remain random.

Let’s imagine we’re rolling a fair die, with sides labelled 0, 1, 2, 3, 4, and 5. The expected value of our roll is 2.5, but we wouldn’t be surprised with a 0 or a 5.

If we now roll 2 dice, make a graph, and plot our first roll on the x axis and our second on the y axis, we again get a fairly even distribution.

However, if we instead added the total of our two dice together, we would be looking at a score between 0 and 10, with 5 being our expected value. Already, our sum is starting to cluster, with 5 much more likely than either 0 or 10.

The more dice we roll:

The bigger we expect our total score to be, and
The less and less likely we are to have a sum near 0 (or near the absolute highest possible score of \(5 \times n\) rolls.)

The same process, roughly, is going on with the magnitude of our vectors. Instead of just summing our rolls, we’re squaring each roll, summing the squares, and then taking the square root. These functions warp and compress space a bit, but our intuition should generally still hold.

We should intuitively expect that the more dice we roll,

The bigger our square-root sum of squares is, and
The less and less likely we are to have a point near the origin (or in the corners of our hypercube.)

Tracing Lines Through Hyperspace⌗

Hopefully, you now have a solid grip on the spherical projections we’ll be using from this point onwards. Remember, the distance from a point to the origin in each plot represents the vector magnitude of the full vector.

Under this lens, what does linear interpolation (our lerp function from earlier) look like?

At low dimensions, lerp behaves exactly how we expect it to. But by the time we reach around 20 dimensions, there’s a clear problem. Our linear path is well outside the bounds of all the points in our vector space.

As we increase the dimensionality of our space, the problem gets worse. At 1000 dimensions, lerp spends almost the entirety of its path completely outside of the hyperspherical shell that makes up the points in our vector space.

In a machine learning context, this would mean that the interpolation is feeding in data well outside the bounds of anything the model has been trained on.

Why Does Lerp Behave Like This in Higher Dimensional Spaces?⌗

To understand why lerp diverges from our hyperspherical shell in higher dimensions, we have to think about what it’s doing. For each element \(x_i\) in \(\vec{v}^1\) and \(y_i\) in \(\vec{v}^2\), the output of lerp can never be larger than \(\max(x_i, y_i)\) and can never be smaller than \(\min(x_i, y_i)\). Unless \(x_i\) and \(y_i\) both happen to fall at exact opposite ends of their distributions, lerp will necessarily be operating in a smaller domain.

Right at the midpoint, where fraction=0.5, lerp will give a vector that is the average of \(\vec{v}^1\) and \(\vec{v}^2\). The average is the point where all values will be the most “smoothed out”. Because a particularly large element is equally likely to appear in \(\vec{v}^1\) or \(\vec{v}^2\), averaging the two vectors should give the point in the interpolation with minimum variance. Because the distributions are centred about 0, minium variance corresponds to minimum distance from the origin.

If you want to try visualize this, change the lerp visualization here to have 3 dimensions. Imagine connecting any two points in the space with a straight line. For every pair of points, the midpoint of a straight line connecting them is closer to the origin than either point.

Interestingly enough, the fact that lerp averages between the two vectors means that the distribution of the vector becomes less and less uniform the closer to fraction=0.5 we get. But that’s a topic for another post.

Slerp⌗

Now that you’ve seen lerp in a spherical projection, it’s only fair to show slerp.

What do you think will happen as the dimensionality increases?

Was your intuition right?

What Is Slerp Actually Doing?⌗

Remember the definition of the slerp function from earlier? Let’s break down where it came from, and what it’s actually doing.

By looking at the projections above, you hopefully have a good intuition for what slerp is doing. The hints are in its name - spherical linear interpolation. The function works by rotating about the origin. Instead of translating from point A to point B, slerp rotates between the two points, and scales the magnitude of the vector while doing so.

As our spherical coordinate projections above showed, higher dimensional spaces converge to a hyperspherical shell. To stay in this shell, we want to orbit around the origin, keeping the vector magnitude (approximately) constant. This is what slerp does, or tries to do.

To actually get to the code, we have two parts:

Rotating in hyperspace.
Scaling between the two vectors’ magnitudes.

Part 1) is provided to us by Shoemake & Davis, in the paper Animating Rotation with Quaternion Curves. In it, a rotation between two quaternions, \(q_1\) and \(q_2\), is given by the formula:

\[ \text{Slerp}(q_1, q_2; u) = \frac{\sin( (1 - u)\omega)}{\sin \omega} q_1 + \frac{\sin u\omega}{\sin \omega} q_2 \]

Where \(u\) is the fraction parameter between 0 and 1, and \(q_1 \cdot q_2 = \cos \omega\).

It turns out that this equation generalizes to n-dimensional vectors. Hence, we have part 1): a function to rotate from one vector to another. I’m unclear how “optimal” this rotation is, since there are many ways to rotate between two vectors in hyperspace. However, rotation - or “distance preserving linear maps” - are very complicated and dimension-specific, so sticking with a general formula that works is a good plan.

If we recall the definition of the dot product of two vectors:

\[ a \cdot b = \vert \vert a \vert \vert \space \vert \vert b \vert \vert \cos \omega \]

Where \(\omega\) is the angle between the two vectors. We need \(\omega\) for our slerp formula. So, we can rewrite the formula as:

\[ \omega = \arccos( \frac{a}{\vert\vert a \vert \vert} \cdot \frac{b}{\vert \vert b \vert \vert} ) \]

Another way of thinking about \(\frac{a}{\vert \vert a \vert \vert}\) is that is this is \(\hat{a}\) (pronounced a-hat) - the unit (length 1) vector which represents only the direction components of a.

This is indeed what the python slerp function does:

omega = np.arccos(np.clip(np.dot(start_vec/np.linalg.norm(start_vec), stop_vec/np.linalg.norm(stop_vec)), -1, 1))

Now that we have \(\omega\), we can plug it into Shoemake & Davis’ Slerp, to get:

so = np.sin(omega)
start_hat = start_vec / np.linalg.norm(start_vec)
stop_hat = stop_vec / np.linalg.norm(stop_vec)

new_direction = np.sin((1.0-fraction)*omega) / so * start_hat  + np.sin(fraction*omega) / so * stop_hat

This isn’t what the slerp code from the DCGAN thread does. Rather, this is the actual code:

so = np.sin(omega)

return np.sin((1.0-fraction)*omega) / so * start_vec + np.sin(fraction*omega) / so * stop_vec

There’s a subtle difference - the vectors used in slerp - above titled \(q_1\) and \(q_2\) are not normalized. And, rather than just calculating the angle, this line also does part 2), the vector magnitude scaling.

What this means is that slerp is only performing a pure rotation when \(\vert \vert a \vert \vert \approx \vert \vert b \vert \vert\). In that case, it’s effectively doing this:

def slerp(fraction, start_vec, stop_vec):
    start_mag = np.linalg.norm(start_vec) # ||a||
    stop_mag = np.linalg.norm(stop_vec) # ||b||, approx. ||a||

    start_hat = start_vec / start_mag # a_hat
    stop_hat = stop_vec / stop_mag # b_hat

    omega = np.arccos(np.clip(np.dot(start_hat, stop_hat), -1, 1))
    so = np.sin(omega)

    angle = (
          np.sin((1.0 - fraction) * omega) / so * start_hat
        + np.sin(fraction * omega) / so * stop_hat
        )

    return start_mag * angle

When the magnitudes of the two vectors are not particularly close, such as in lower dimensions, this formula can give quite strange results. This is exacerbated where the vectors are (almost) pi radians apart from one another. (Since the vector spaces on this page are randomly generated, if you refresh a few times, you’re bound to see some strange slerp results in lower dimensions.)

A slight improvement can be had by explicitly treating the interpolation between the two vectors’ magnitudes, and normalizing the vectors before performing the slerp. This results in visually smoother paths, and tends to overshoot the data bounds less when there are large changes in vector magnitude.

We treat the vector magnitude explicitly, and just linearly interpolate (lerp) between the magnitude of the first vector and the magnitude of the second vector. For the purposes of this article, and at risk of being conceited, I’ll call this function slerp2. It is moderately more complicated than slerp. The implementation here is formatted for readability, not performance.

def slerp2(fraction, start_vec, stop_vec):
    start_mag = np.linalg.norm(start_vec) # ||a||
    stop_mag = np.linalg.norm(stop_vec) # ||b||

    start_hat = start_vec / start_mag # a_hat
    stop_hat = stop_vec / stop_mag # b_hat

    omega = np.arccos(np.clip(np.dot(start_hat, stop_hat), -1, 1))
    so = np.sin(omega)

    magnitude = start_mag + (start_mag - stop_mag) * fraction # lerp

    angle = (
          np.sin((1.0 - fraction) * omega) / so * start_hat
        + np.sin(fraction * omega) / so * stop_hat
        )

    return magnitude * angle

Slerp2⌗

Using it in practice⌗

It’s all well and good talking abstractly of hyperspheres, but you’re probably wondering: does slerp actually perform better in real-world applications?

At the start of this post, I wrote about StyleGAN. I picked it because it is a more modern network building on the principals of other GANs like DCGAN. DCGAN, you might remember, is where the thread on slerp originated that kicked off this whole journey.

I didn’t want to just explore DCGAN, where there are known dead zones near the origin of the latent space. Instead, I wanted to explore whether this is still the case with newer networks.

StyleGAN actually has two latent spaces to explore:

Z-space, which is sampled from a normal distribution with 512 dimensions.
W-space, which is internal to the network itself and is the result of learned transformations on the Z-space vectors.

I’ll probably explore W-space in the future, but for now, Z-space matches the kind of latent spaces we’ve been exploring in this article, so that’s what we’ll explore.

Degenerate Case 1: (Almost) passing through the origin⌗

I have a secret to confess: the two portraits I presented at the start of this article were not exactly randomly picked. The second portrait is an (almost) exact opposite of the first - that is, the vector \(\vec{v}^2\) used to generate the second image was calculated as:

\[ \vec{v}^2 = - \vec{v}^1 + \epsilon \]

Where \(\epsilon\) was a very small offset necessary to prevent slerp and slerp2 from blowing up with a \(\frac{1}{\sin(0)}\) term. \(\vec{v}^1\) was a random vector from the correct distribution.

This means our lerp from the first portrait to the second is a straight line (nearly) through the origin.

This is a degenerate case which should show whether vector magnitude actually matters. Play around with the slider below to see the results for yourself.

You can see that the image generated by the lerp-ed vector doesn’t meaningfully change until passing the origin, where it immediately changes to the output image. This is interesting, and might suggest that for StyleGAN2:

Vector magnitude is unimportant, in comparison to vector direction.
The learned transformations between Z-space and W-space might discard magnitude information, or else are only sensitive to magnitude within the tight range for which all input data (in Z-space) was provided.

Degenerate Case 2: (Almost) passing through the origin with elevation changes⌗

This second degenerate case is similar to the first, excepting a slight change in vector magnitude between \(\vec{v}^1\) and \(\vec{v}^2\). In this case, I multiplied \(\vec{v}^1\) by 0.9999 and \(\vec{v}^2\) by 1.0001. This was to see if vector magnitude had an impact on the images produced, and whether slerp or slerp2 handled it better.

This case exacerbates an odd quirk of slerp - when vector magnitudes are relatively different (and when angles are relatively aligned), the interpolation can produce truly bizarre paths. This is a little unfair on regular slerp, though, since this case will statistically never be seen when interpolating between two random vectors.

slerp2 performs best here, presumably because it most evenly sweeps between angles from start to finish. But all three interpolators are producing meaningful results. This is really a testament to the robustness of StyleGAN.

Comparing the ’lerps with real vectors⌗

It’s only fair to compare interpolation between two actually random vectors, and that’s exactly what’s going on here.

You’ll note that even though lerp shows its characteristically out-of-family vector magnitude, the results produced by all 3 interpolators are functionally identical.

Conclusion⌗

As Pedro Domingos famously put it, Intuition Fails in High Dimensions. Before setting out on this journey, I didn’t expect higher dimensional vectors to form hyperspherical shells, nor did I expect spherical interpolation to be a better way of interpolating between two such vectors.

When all is said and done, though, the importance of how one navigates in hyperspace may not matter, and certainly depends on the actual system being studied.

Sometimes, the best solution really is the simplest one.

Navigating Hyperspace