Burst Compiled Noise Map Generation

Learn how to use burst compilation for simple noise maps to increase performance up to 80 times.

Burst Compiled Noise Map Generation
TL;DR. Using Unity's Job System together with burst compilation helped to improve the performance of noise map generation for a 1000x1000 from 7500 ms to 90 ms (that's 80x faster!) using a MacBook Pro M1 MAX.

I guess a lot of you have already watched Sebastian Lague's awesome series about procedural landmass generation on YouTube. I also recently started watching it and after coloring the map (episode 4), I also thought that this would be a nice sample to try burst compilation, that I've not used before.

Sebastian open-sourced the code for his video here and the piece we're going to look at is this one here:

using UnityEngine;
using System.Collections;

public static class Noise
{
  public static float[,] GenerateNoiseMap(int mapWidth, int mapHeight, int seed, float scale, int octaves, float persistance, float lacunarity, Vector2 offset)
  {
    float[,] noiseMap = new float[mapWidth, mapHeight];

    System.Random prng = new System.Random(seed);
    Vector2[] octaveOffsets = new Vector2[octaves];
    for (int i = 0; i < octaves; i++)
    {
      float offsetX = prng.Next(-100000, 100000) + offset.x;
      float offsetY = prng.Next(-100000, 100000) + offset.y;
      octaveOffsets[i] = new Vector2(offsetX, offsetY);
    }

    if (scale <= 0)
    {
      scale = 0.0001f;
    }

    float maxNoiseHeight = float.MinValue;
    float minNoiseHeight = float.MaxValue;

    float halfWidth = mapWidth / 2f;
    float halfHeight = mapHeight / 2f;


    for (int y = 0; y < mapHeight; y++)
    {
      for (int x = 0; x < mapWidth; x++)
      {
        float amplitude = 1;
        float frequency = 1;
        float noiseHeight = 0;

        for (int i = 0; i < octaves; i++)
        {
          float sampleX = (x - halfWidth) / scale * frequency + octaveOffsets[i].x;
          float sampleY = (y - halfHeight) / scale * frequency + octaveOffsets[i].y;

          float perlinValue = Mathf.PerlinNoise(sampleX, sampleY) * 2 - 1;
          noiseHeight += perlinValue * amplitude;

          amplitude *= persistance;
          frequency *= lacunarity;
        }

        if (noiseHeight > maxNoiseHeight)
        {
          maxNoiseHeight = noiseHeight;
        }
        else if (noiseHeight < minNoiseHeight)
        {
          minNoiseHeight = noiseHeight;
        }

        noiseMap[x, y] = noiseHeight;
      }
    }

    for (int y = 0; y < mapHeight; y++)
    {
      for (int x = 0; x < mapWidth; x++)
      {
        noiseMap[x, y] = Mathf.InverseLerp(minNoiseHeight, maxNoiseHeight, noiseMap[x, y]);
      }
    }

    return noiseMap;
  }
}

Depending on your map size, creating the noise map can take several milliseconds. As a benchmark, I have a MacBook Pro M1 MAX with 64 GB RAM.

Creating a 1000x1000 noise map takes around 7500 ms, a 100x100 takes around 77 ms. All timings are in-editor times, so I guess that the performance is better when the game is built.

Given the fact that you normally create terrain based on chunks (which Sebastian will also do later in the series), a 100x100 noise map size feels pretty good and is fast to generate.

However, generating terrain could also mean that you initially need more chunks to generate so the 77 ms sum up quickly.

Also, since I wanted to try out the job system and burst complication, I thought it was a good sample to try out some optimizations. :)

Jobify noise generation

I ended up with the following script to jobify the noise generation and enable burst compilation:

public static class Noise
{
  [BurstCompile]
  public struct MapGenerationJob : IJobParallelFor
  {
    public int2 Dimensions;
    public float Scale;
    public int Octaves;
    public float Persistance;
    public float Lacunarity;

    [ReadOnly]
    public NativeArray<float2> OctaveOffsets;

    [WriteOnly]
    public NativeArray<float> Result;

    public void Execute(int index)
    {
      var halfWidth = Dimensions.x / 2;
      var halfHeight = Dimensions.y / 2;

      var amplitude = 1f;
      var frequency = 1f;
      var noiseHeight = 0f;

      var x = index % Dimensions.x;
      var y = index / Dimensions.x;

      for (var i = 0; i < Octaves; i++)
      {
        var sampleX = (x - halfWidth) / Scale * frequency + OctaveOffsets[i].x;
        var sampleY = (y - halfHeight) / Scale * frequency + OctaveOffsets[i].y;

        var perlinValue = noise.cnoise(new float2(sampleX, sampleY)) * 2 - 1;

        noiseHeight += perlinValue * amplitude;

        amplitude *= Persistance;
        frequency *= Lacunarity;
      }
      
      Result[index] = noiseHeight;
    }
  }

  public static float[,] GenerateNoiseMap(int2 dimensions, uint seed, float scale, int octaves, float persistance, float lacunarity, float2 offset)
  {
    if (scale <= 0)
    {
      scale = 0.0001f;
    }
    
    var random = new Random(seed);
    
    using var jobResult = new NativeArray<float>(dimensions.x * dimensions.y, Allocator.TempJob);
    using var octaveOffsets = new NativeArray<float2>(octaves, Allocator.TempJob);

    for (var i = 0; i < octaves; i++)
    {
      var offsetX = random.NextInt(-100000, 100000) + offset.x;
      var offsetY = random.NextInt(-100000, 100000) + offset.y;
      var nativeOctaveOffsets = octaveOffsets;
      nativeOctaveOffsets[i] = new float2(offsetX, offsetY);
    }

    var job = new MapGenerationJob()
    {
      Dimensions = dimensions,
      Lacunarity = lacunarity,
      Octaves = octaves,
      OctaveOffsets = octaveOffsets,
      Persistance = persistance,
      Result = jobResult,
      Scale = scale,
    };

    var handle = job.Schedule(jobResult.Length, 32);
    handle.Complete();
    
    return SmoothNoiseMap(dimensions, jobResult);
  }

  private static float[,] SmoothNoiseMap(int2 dimensions, NativeArray<float> jobResult)
  {
    var result = new float[dimensions.x, dimensions.y];
    
    var maxNoiseHeight = float.MinValue;
    var minNoiseHeight = float.MaxValue;

    for (var y = 0; y < dimensions.y; y++)
    {
      for (var x = 0; x < dimensions.x; x++)
      {
        var noiseHeight = jobResult[y * dimensions.x + x];

        if (noiseHeight > maxNoiseHeight)
        {
          maxNoiseHeight = noiseHeight;
        }
        else if (noiseHeight < minNoiseHeight)
        {
          minNoiseHeight = noiseHeight;
        }

        result[x, y] = noiseHeight;
      }
    }

    for (var y = 0; y < dimensions.y; y++)
    {
      for (var x = 0; x < dimensions.x; x++)
      {
        result[x, y] = math.unlerp(minNoiseHeight, maxNoiseHeight, result[x, y]);
      }
    }

    return result;
  }
}

IJobParallelFor and [BurstCompile]

At first, I declare a struct MapGenerationJob that implements the interface IJobParallelFor. Additionally, I also use the attribute [BurstCompile]. The first interface, IJobParallelFor, does two things. At first, it allows us to use the Unity Job System. At second, it allows us to perform an independent operation on each element of a native container (e.g. NativeArray<T>). That means that the job system can execute the code on multiple threads and thus parallelizing the workload. To know on which item our code runs, we get an index in our Execute method:

public void Execute(int index) { }

For now, that is an issue for us! Take a look at the original code, the most time-consuming work is done 3-loop thingy. It's a nested loop looping over height, then width, and then octaves. Having a map of 1000x1000 with 10 octaves would result in 1000x1000x10 = 10.000.000 iterations, that's a lot!

Linearize a nested loop

To parallelize it, we need to get rid of the height-width loop and flatten it. This is done in GenerateNoiseMap where we create a jobResult = new NativeArray(dimensions.x * dimensions.y, ...). Here, we create a single-dimensional array that can be used for our job. Now the int index points to exactly one item in the array.

However, within the job system, we still need a x and y position for our Perlin Noise to work. Luckily, we can calculate it with:

var x = index % Dimensions.x;
var y = index / Dimensions.y;

Note, that % is a modulo and in this case, the / is an integer division.

With those two lines, we get back what our old code delivered in the height-width loop. Nice!

We can also precalculate OctaveOffsets because they won't change during the job execution. This part is also done in GenerateNoiseMap.

After that, we can set all properties of the job and hand it over to the job system to execute it with multiple workers, each doing 32 iterations. Why 32? There's no reason for that, I just started with that and the timings are pretty good, so I stick to that. Every worker thread has to execute 32x10=320 (assuming with an octave count of 10) iterations.

At the end, there is one thing we still need: getting min/max height and then make in inverse lerp (called unlerp in Unity.Mathematics) to get the desired results. Take a look at SmoothNoiseMap where this is done.

It would be possible to jobify the second nested loop in this method, however, on my machine, it would slow the overall process again because the management overhead is bigger than the time we could save having a parallel execution.

Also note, that I've swapped out Mathf.PerlinNoise to its equivalent in Unity.Mathematics by using noise.cnoise (cnoise = classic perlin noise) as well as all vectors being int2 or float2.

The timings now look like this:

Map size Old variant Jobified % change
100x100 77 ms 3 ms ~2466 %
1000x1000 7500 ms 90 ms ~8200 %

As you can see, the jobified code is much much faster and the changes we had to do are easy peasy. :)

Cheers!