So I continue my optimization on engine and recently have worked on vertex builder a bit. Results are again brilliant! Vertex builder is now 5x to 7x faster!

Test Parameters:
* View range: 5 chunks.
* Total chunks in view: 121 chunks.

              Old #1        |           Old #2          |           New Technique
-------------------------------------------------------------------------------------
 #1          4497 ms        -          3237 ms          -              728 ms
 #2          4419 ms        -          3066 ms          -              623 ms
 #3          4520 ms        -          3089 ms          -              627 ms
 #4          4576 ms        -          3340 ms          -              679 ms
 #5          4413 ms        -          2832 ms          -              613 ms
 #6          4215 ms        -          3022 ms          -              597 ms
 #7          4451 ms        -          3463 ms          -              631 ms
 #8          4790 ms        -          2989 ms          -              627 ms
 #9          5117 ms        -          3442 ms          -              599 ms
#10          4624 ms        -          3191 ms          -              640 ms
-------------------------------------------------------------------------------------
AVG.         4562 ms        |          3167 ms           |             636 ms

* Old #1 = Old vertex building technique with block arrays per chunk.
* Old #2 = Old vertex building technique with single huge block array.
* New Technique = New vertex building technique optimized for single huge block array.
It's ~7 times faster then old #1 and ~5 times faster then old #2 technique.

Even better there are still more rooms for optimization which I’ll be working on them until I get a perfect point. Here’s a speed demonstration video. Note that in video there exists nearly ~960 chunks and ~30 million blocks.

Bonus Screenshot

In one of my previous DevLog entries I had mentioned that I was switching to a single huge block array technique instead of array per chunk which eventually simplifies access to block data. Now I’m done with upgrading my lighting code to take advantage of new block array technique and results are great!

With the latest updates, ligthing is now 3.2 times faster! Wow, that’s a really great optimization which worth every single seconds I’ve worked for it! Here’s test results;

Test Parameters:
* View range: 5 chunks.
* Total chunks in view: 121 chunks.

     Old Lighting Technique - New Lighting Technique
 #1          1316 ms        -          498 ms
 #2          1291 ms        -          384 ms
 #3          1338 ms        -          384 ms
 #4          1293 ms        -          389 ms
 #5          1278 ms        -          387 ms
 #6          1324 ms        -          386 ms
 #7          1284 ms        -          377 ms
 #8          1281 ms        -          380 ms
 #9          1288 ms        -          389 ms
#10          1299 ms        -          381 ms
-----------------------------------------------------------
AVG.         1299 ms        -          395 ms ~ 3.2 faster!

So what changed?

First, the old technique was optimized for old block array per chunk storage. So within this to access a neighboring block, it had to access using chunks – as a neighbor block could be in another chunk. The below propagation functions first need to resolve the chunk that owns the asked block (as block arrays are per chunk) and then need to progress on.

PropagateSunLight(chunk, (byte) (x - 1), y, z, light);
PropagateSunLight(chunk, (byte) (x + 1), y, z, light);
PropagateSunLight(chunk, x, (byte) (y - 1), z, light);
PropagateSunLight(chunk, x, (byte) (y + 1), z, light);
PropagateSunLight(chunk, x, y, (byte) (z - 1), light);
PropagateSunLight(chunk, x, y, (byte) (z + 1), light);

Where as the new lighting technique really doesn’t care about which chunk does the block resides in and can directly access the block data from our single huge block array!

PropagateSunLight(blockIndex + BlockStorage.XStep, propagatedLight); // propagate light to block in east.
PropagateSunLight(blockIndex - BlockStorage.XStep, propagatedLight); // propagate light to block in west.
PropagateSunLight(blockIndex + BlockStorage.ZStep, propagatedLight); // propagate light to block in north.
PropagateSunLight(blockIndex - BlockStorage.ZStep, propagatedLight); // propagate light to block in south.
// DO NOT repropagete to upper block which we don't need to do so and may cause loops!
PropagateSunLight(blockIndex - 1, propagatedLight);   // propagate light to block down.

So What’s Next?
I’ll be also optimizing terrain generation & vertex-building code similarly.

Bonus Screenshot
Nothing new & fancy

So initial re-factoring is done. Voxeliq engine now uses a single huge array of blocks instead of block arrays per chunk. I can say initially that this improved the performance to some extend though there’re still pieces of code that’s optimized for old technique (especially lighting one). As I cover them all I guess we’ll see more performance improvements over time.

Here is the initial tests;

  • View range: 5 chunks.
  • Total chunks in view: 121 chunks.
  Block Array/Chunk          Single Huge Block Array
  Gen  Light  Build   -  -        Gen  Light  Build
 #1  906  1545  4497  -  -   #1  1287  1558  3237
 #2  933  1524  4419  -  -   #2  1283  1520  3066
 #3  933  1567  4520  -  -   #3  1319  1448  3089
 #4  959  1593  4576  -  -   #4  1420  1803  3340
 #5  912  1538  4413  -  -   #5  1256  1470  2832
 #6  903  1512  4215  -  -   #6  1405  1534  3022
 #7  897  1552  4451  -  -   #7  1386  1517  3463
 #8  912  1573  4790  -  -   #8  1362  1524  2989
 #9  935  1580  5117  -  -   #9  1367  1455  3442
#10  920  1606  4624  -  -  #10  1403  1688  3391
ST:  9210  15590  45622      ST:  13488  15517  31871
GT:      70422               GT:      60876

* All values are in msec.
  •  Clearly as you can see, vertex building took advantage of new technique a lot.
  • Lighting code performs nearly the same though it’s not optimized for new technique yet.
  • Terrain generation got slowed a bit though didn’t have time to look for it yet.

I’ll be providing more in-detail info as I progress through.

Bonus Content
A screenshot from ingame-chunk debugger

So basically these days I’m optimizing the engine aiming the best performance available. Recently I’ve seen a great idea by Slaihne over his game BlokWorld’s forums. He basically suggests using a single huge array for blocks and wrapping the array. So I decided to give it a try – and although I’m not done with re-factoring completely, it seems to work great!

Basically until now Voxeliq was using a double-indexed dictionary to cache chunks within player’s region and then storing a single-dimension block array per each chunk. This works to some extend though there are a few problems. First current technique’s speed is not that bad as you can see from my previous videos, though Slaihne’s one seems to be faster. I’ll be explaining below in details;

  • Memory-wise; Voxeliq’s current technique loads new chunks / removes them as player moves – which basically allocs/deallocs memory continuously – given that .NET GC’s in-deterministic nature this is not really good. On the other side slaihne’s method always uses a pre-determined amount of memory for block/chunk caches. Even more hundreds of chunk instances is another memory sink in current method (it’s already known that in .net object instances have quite noticeable overhead).
  • Speed-wise; Especially in the case of lighting the current method needs each chunk to have pointers to neighboring chunks and have extensive checks. The new method completely simplifies the stuff.
  • Recaching; The current technique extensively re-caches chunks as player moves around (allocs/deallocs chunks). Within the new method yet again this will be really simplified a lot thanks to array wrapping.

So slaihne mentions he uses array wrapping but in one point he also mentions about his array being a single dimensional one. This was already a technique I was using in chunk’s block arrays, where I was flattening a 3 dimensional array to a single dimension one (as single dimensional arrays are lot faster in .net compared to multi-dimension ones).

I basically implemented a wrapping array with additional flattening support;

        public Block this[int x, int y, int z]
        {
            get
            {
                var wrapX = x%CacheWidthInBlocks;
                var wrapZ = z%CacheLenghtInBlocks;
                var flattenIndex = wrapX * FlattenOffset + wrapZ * Chunk.HeightInBlocks + y;

                return this.Blocks[flattenIndex];
            }
            set
            {
                var wrapX = x % CacheWidthInBlocks;
                var wrapZ = z % CacheLenghtInBlocks;
                var flattenIndex = wrapX * FlattenOffset + wrapZ * Chunk.HeightInBlocks + y;

                this.Blocks[flattenIndex] = value;
            }
        }

So initially it seemed all good but I’ve to re-factor more parts to let the engine take advantage of this completely. I’ll be posting another update once I’m done with a result video!

Array Wrapping

Oh and this shows how array wrapping works (kudos goes to Slaihne for the mockup!);

Flatten Arrays

For the interested ones here’s array tests for multidimensional, jagged and flattened arrays;

Test Environment: 1 physical cpus, 2 cores, 2 logical cpus.
________________________________________________________________________________

Array size: 256*256*256
________________________________________________________________________________

Itr.    Multi.  Jagged  Flatten (Sequental)
________________________________________________________________________________

#1      00.187s 00.116s 00.093s
#2      00.186s 00.112s 00.095s
#3      00.189s 00.112s 00.094s
#4      00.187s 00.115s 00.098s
#5      00.187s 00.113s 00.094s
#6      00.186s 00.115s 00.094s
#7      00.188s 00.117s 00.094s
#8      00.187s 00.112s 00.094s
#9      00.187s 00.114s 00.095s
#10     00.191s 00.117s 00.097s
~Avg    00.188s 00.115s 00.095s
________________________________________________________________________________

Itr.    Multi.  Jagged  Flatten (Random)
________________________________________________________________________________

#1      00.238s 00.158s 00.126s
#2      00.226s 00.160s 00.123s
#3      00.226s 00.155s 00.122s
#4      00.225s 00.159s 00.123s
#5      00.225s 00.168s 00.136s
#6      00.233s 00.164s 00.149s
#7      00.237s 00.187s 00.126s
#8      00.237s 00.163s 00.128s
#9      00.239s 00.158s 00.125s
#10     00.227s 00.156s 00.125s
~Avg    00.232s 00.163s 00.129s

As you can see flatten arrays in .net 4.0 is 2x times faster then conventional multi-dimensional arrays.You can find my test code over here; https://github.com/raistlinthewiz/dotnet-array-perf-tests

Right now I’m working on major fixes for the engine, especially for the parts I’m not happy with. Here’s a quick change-log.

  • Fixed terrain generator interface, generators do now accept a Biome structure and apply it after initial terrain generation.
  • Started implementing a chunk-cache. Before all chunk management was done within World which harder to maintain. Chunk-cache will be the interface between world and actual chunk-storage and will be also responsible for re-caching.
  • Fixed a major mouse elevation & rotation input bug. yay
  • Added a cool fps-graph widget, will be adding more.

Upcoming fixes

  • I’ll be further improve chunk processors, queues and so. Will be trying to fix that chunk-recache lag.
  • Lightning needs far more work, it’s quite slow and buggy.

  • Improved infinitive terrain implementation
  • Better & optimized memory usage
  • Ingame chunks: 1089 ~ 35 million blocks.
  • Basic lightning – though kinda buggy & incomplete.
  • Shiny textures
  • Skydome (with generated clouds), fog, flying support.
  • Basic shovel ~ block build/crack support.