Implementing Efficient Lindenmayer Systems in unity

Overview of an optimized l-system implementation allowing for rapid updates

GitHub

In my work on Seeb Defender I needed to find a way to simulate the growth of plants, especially plants who's behavior can be reconfigured during runtime. After doing some research I reached for L-Systems, and what follows is an overview of how I started and then optimized my implementation.

Examples

Some examples of what is possible with an optimized L-system, these are all recorded in real-time.

a fractal plant can grow large in only a handful of steps

What is an L-system?

An L-system is defined as a set of replacement rules which will replace specific characters in a string with their replacement string, producing a new combined output string.

A basic example is a model of algal growth. In this example there are 2 symbols: A and B, representing a mature algae cell and an immature cell respectively. By configuring the rules of the l-system, we can simulate exponential growth:

#axiom A // Seed the system with a single mature algae cell
A -> AB  // A mature cell is replaced with itself, and an immature cell
B -> A   // An immature cell matures

Over several steps, the string produced by this system grows nearly exponentially. The first 6 steps:

A
AB
ABA
ABAAB
ABAABABA
ABAABABAABAAB

For more examples of L-systems, reference Wikipedia, my l-system library documentation, or The algorithmic beauty of plants.

This is an example of a basic L-system, but its grammar can be extended in several ways:

  • Stochastic
    • stochastic rules are randomly picked, based on a certain probability factor
  • Parametric
    • parametric l-systems attach numeric parameters to the symbols of its state
    • these parameters can be "captured" when replacing a symbol, and used to determine the parametric state of the next symbols
  • Context-sensitive
    • context sensitive rules inspect the symbols adjacent to the one being replaced
    • advanced usage will allow matching whole branching structures

Implementation overview

The full system is composed of three parts: the rule parsing system, the symbolic update engine, and the turtle rendering system.

The rule parsing system translates all the replacement data into NativeArrays after parsing. Since it only runs once per l-system, optimization of this system has a lower priority and won't be discussed in depth here. Suffice to say the output is a set of native data structures accessible from Burst-compiled jobs. For more details on rule syntax, see the documentation.

The symbolic update engine will generate next-states from input-states. A naive implementation of this engine iterates over every character in the input state, adding its replacement to the end of the output state string builder. Our implementation will take advantage of the fact that all of the replacements happen independently of each other, lending itself to parallelization.

The turtle rendering system will actually render the l-system states. Its name comes from "Turtle graphics" in which a single "turtle" will walk through a space following a series of commands such as move forward 1 unit, turn right, or decrease scale. In order to render branches, the symbols '[' and ']' are used to tell the turtle to push its current position and orientation onto a stack, or return to a previous state. By arranging movement commands interspersed with commands placing a mesh in 3D space, a turtle can trace out a whole plant from one string of characters.

Symbolic Update Engine

Before digging in too deep, first lets describe what data we're dealing with here.

  • Input String
    • a NativeArray<char>. this represents the previous state as a string of characters
  • Output string
    • also a NativeArray<char>. this is generated from the input string as part of the update engine
  • Rule Data
    • a set of native collections containing all of the replacement rules
    • every replacement rule has these properties:
      • a matching character, which this rule will replace
      • a string of characters which will replace the matching character in the output string

To optimize the update engine, we can follow two strategies. First, we will parallelize the process as much as possible. Second, we will adapt the process to run closer to the metal of the processer by converting all work into Unity's Burst-compiled Jobs. However, designing burstable jobs is not too complex for basic L-systems, the real complexity there is encountered when supporting contextual and parametric grammars. We won't discuss many details of Burst, instead focusing on what it takes to run in parallel Jobs.

Parallelizing the basic L-System seems almost trivial. Every character in the input string can determine its own replacement independent of all the other characters' replacements. Even in all of the L-system variant extensions, replacing each character will only require inspecting adjacent characters in the input string.

However, there is one complication we will run into. Some characters can be replaced with nothing, others can be replaced with more than one characters. Thus, we do not know how long the output string will be, and where each individual character's replacement will be copied into the output string. We can resolve this by seperating the process into three jobs, and returning once to the main thread to perform an allocation.

In the first job, we match every character against all of the replacement rules and cache the ID of the matching rule, if any. Since we are storing an ID, we only need a NativeArray parallel to the input string to store this information. The process runs once per character in the input string, and the output space is well-defined, so this job can run as a IJobParallelFor!

Second, we iterate over the matched IDs and add up the replacement rule's lengths, effectively determining how to pack all of the replacement data into the output NativeArray. This isn't parallelized, but it is a very quick iteration. Once we have packed the replacements into the output NativeArray, then back on the main thread we can allocate memory which will fit our replacements exactly. And we can run another parallel job to just copy the data over into the newly allocated output string.

The whole process can be represented as psuedocode:

Symbol Update Engine code
struct JaggedIndexing{
  int index;
  int length;
}

struct Replacement{
  JaggedIndexing replacementString;
  int indexInOutput;
  static Replacement ZERO => default(Replacement);
}

class SymbolicUpdateEngine{
  // persistently allocated
  NativeHashMap<char, Replacement> ruleReplacements;
  NtiveArray<char> replacementCharacterData;

  NativeArray<char> StepSystem(NativeArray<char> input){
    var replacementMapping = new NativeArray<Replacement>(input.Length);
    var dep = new EvaluateReplacementsJob{
      input = input,
      ruleReplacements = ruleReplacements,
      stringReplacements = replacementMapping
    }.ScheduleParallel(input.Length);

    dep = new CountReplacementsJob{
      stringReplacements = replacementMapping
    }.Schedule(dep);
    dep.Complete();

    var lastReplacement = replacementMapping[0^];
    var output = new NativeArray<char>(lastReplacement.indexInOutput + lastReplacement.replacementString.length);
    dep = new CopyReplacementJob{
      stringReplacements = replacementMapping,
      replacementCharacterData = replacementCharacterData,
      output = output
    }.Schedule(dep);
    dep = replacementMapping.Dispose(dep);
    dep.Complete();
    return output;
  }
}

struct EvaluateReplacementsJob : IJobParallelFor {
  [ReadOnly]
  public NativeArray<char> input;
  [ReadOnly]
  public NativeHashMap<char, Replacement> ruleReplacements;
  public NativeArray<Replacement> stringReplacements;
  public Execute(int index){
    if(ruleReplacements.TryGetValue(input[index], out var replace)){
      stringReplacements[index] = replace;
    }else {
      stringReplacements[index] = Replacement.ZERO;
    }
  }
}

struct CountReplacementsJob : IJob {
  public NativeArray<Replacement> stringReplacements;
  public Execute(){
    var indexInReplacement = 0;
    for(int i = 0; i < stringReplacements.Length; i++){
      stringReplacements[i].indexInOutput = indexInReplacement;
      indexInReplacement += stringReplacements[i].replacementString.length;
    }
  }
}

struct CopyReplacementJob : IJobParallelFor {
  [ReadOnly]
  public NativeArray<Replacement> stringReplacements;
  [ReadOnly]
  NtiveArray<char> replacementCharacterData;
  [NativeDisableParallelForRestriction]
  public NativeArray<char> output;
  public Execute(int index){
    var replacement = stringReplacements[index];
    for(int i = 0; i < replacement.replacementString.length; i++){
      output[i + replacement.indexInOutput] = replacementCharacterData[i + replacement.replacementString.index];
    }
  }
}

One of the drawbacks of this method is that it joins back to the main thread once to perform an allocation, as opposed to using something like a NativeList to dynamically allocate during the job. But as long as we're ok with the update spanning a couple of frames, this is a worthwhile tradeoff. We can kick the job off in early update, then complete it either next frame or in late update to perform the allocation.

Turtle Rendering System

At a high level, the turtle system will be responsible for placing component Organs into a final resulting mesh. These following animations illustrate how the turtle walks along a path, leaving behind a trail. In our implementation, the trail will in fact be copies of 3D meshes.

a turtle follows simple commands to trace out a trail

Its less clear how to parallelize the turtle system. Its nature is inherently single-threaded: the turtle "walks" along the input string, modifying its own state as it goes. This part of the system should be single-threaded. Our goal here is to offload as much work as possible into a more parallelizable form, so that the only work that occurs on the first job is that which cannot be done in parallel.

The biggest piece of work we can offload is the building of the 3D mesh out of component organ instances. If we configure the first turtle job to output a list of positions of each mesh type to copy, we can employ the same technique used in the symbolic update engine to pack the vertexes into the output mesh! Once we know which meshes need to be copied into the output mesh and how to position them, we allocate all the mesh space and copy the component organ meshes into the output mesh in parallel.

Future improvements

After converting most of the processing into parallelized jobs, most of the further optimizations from here out involve moving work into hardware which specializes in highly parallel jobs: the GPU! An obvious target for this would be the last step of the turtle job. Instead of uploading a whole mesh into the GPU, we could upload data about the component organs as a sort of point cloud. A particle shader in the GPU could do the same work that's being done on the CPU, turning those points into triangles to render.

In theory parts of the Symbolic Update Engine could also be run on the GPU. All of the heavy lifting is done in parallel jobs, which could be translated into compute shaders, with the re-allocation happening between frames as it currently does. One of the main difficulties with this will be the complexity of the system, due to supporting context-sensitive and parametric grammars in the replacement rules.

Did you like it? Why don't you try also...

Procedurally generating plant stems

Improving the appearance and performance of runtime-generated branching trees

Rendering transparent overlapping meshes only once in URP

Learn how to render overlapping transparent meshes as a single superset polygon, using stencil buffer settings in Unity URP

Seeb Defender

Overview of Seeb Defender