So, in this kinda blog page, I’ll be also trying to comment on language & framework other than talking about my stuff. With the latest huge array technique with flatten array support, I do now use block pointer functions and here’s one;

        /// <summary>
        /// Returns block index by relative position of block in chunk.
        /// </summary>
        /// <param name="chunk">The chunk block belongs to.</param>
        /// <param name="x">Blocks relative x position in chunk.</param>
        /// <param name="y">Blocks y position in chunk.</param>
        /// <param name="z">Blocks relative x position in chunk.</param>
        /// <returns></returns>
        public static int BlockIndexByRelativePosition(Chunk chunk, byte x, byte y, byte z)
            var xIndex = chunk.WorldPosition.X + x;
            var zIndex = chunk.WorldPosition.Z + z;

            var wrapX = xIndex % CacheWidthInBlocks;
            var wrapZ = zIndex % CacheLenghtInBlocks;

            var flattenIndex = wrapX * FlattenOffset + wrapZ * Chunk.HeightInBlocks + y;
            return flattenIndex;

All code in the engine that access block data has to use one of these functions and it’s really not a good idea to repeat the code all over the source (given that a possible future update on functions will require lots of time and will make it harder to maintain and hunt for bugs). Although today’s modern compilers are quite intelligent, still I was eager for some forced inlining functionality given that those functions get calls millions of times. Statement lambdas are a possibility but they’re technically not what I’m looking for;

Conventional way:
L_0000: nop 
L_0001: ldarg.0 
L_0002: ldarg.1 
L_0003: ldarg.2 
L_0004: ldarg.3 
L_0005: call int32 VolumetricStudios.VoxeliqGame.Chunks.BlockCache::BlockIndexByRelativePosition(class VolumetricStudios.VoxeliqGame.Chunks.Chunk,
 uint8, uint8, uint8)

Statement Lambdas:
L_0000: nop 
L_0001: ldsfld class [mscorlib]System.Func5<class VolumetricStudios.VoxeliqGame.Chunks.Chunk, uint8, uint8, uint8,
 int32> VolumetricStudios.VoxeliqGame.Chunks.BlockCache::BlockIndexByRelativePosition3
L_0006: ldarg.0 
L_0007: ldarg.1 
L_0008: ldarg.2 
L_0009: ldarg.3 
L_000a: callvirt instance !4 [mscorlib]System.Func`5<class VolumetricStudios.VoxeliqGame.Chunks.Chunk, uint8, uint8, uint8, int32>::Invoke(!0, !1, !2, !3)

So at last .Net 4.5 will be coming with a new Method Implementation Option called AggressiveInlining. MSDN has the following explanation;

  • AggressiveInlining: The method should be inlined if possible. It’ll be nice to force regular functions to get inlined with 4.5.

Update: You can find a nice reading on it over here.

So initial re-factoring is done. Voxeliq engine now uses a single huge array of blocks instead of block arrays per chunk. I can say initially that this improved the performance to some extend though there’re still pieces of code that’s optimized for old technique (especially lighting one). As I cover them all I guess we’ll see more performance improvements over time.

Here is the initial tests;

  • View range: 5 chunks.
  • Total chunks in view: 121 chunks.
  Block Array/Chunk          Single Huge Block Array
  Gen  Light  Build   -  -        Gen  Light  Build
 #1  906  1545  4497  -  -   #1  1287  1558  3237
 #2  933  1524  4419  -  -   #2  1283  1520  3066
 #3  933  1567  4520  -  -   #3  1319  1448  3089
 #4  959  1593  4576  -  -   #4  1420  1803  3340
 #5  912  1538  4413  -  -   #5  1256  1470  2832
 #6  903  1512  4215  -  -   #6  1405  1534  3022
 #7  897  1552  4451  -  -   #7  1386  1517  3463
 #8  912  1573  4790  -  -   #8  1362  1524  2989
 #9  935  1580  5117  -  -   #9  1367  1455  3442
#10  920  1606  4624  -  -  #10  1403  1688  3391
ST:  9210  15590  45622      ST:  13488  15517  31871
GT:      70422               GT:      60876

* All values are in msec.
  •  Clearly as you can see, vertex building took advantage of new technique a lot.
  • Lighting code performs nearly the same though it’s not optimized for new technique yet.
  • Terrain generation got slowed a bit though didn’t have time to look for it yet.

I’ll be providing more in-detail info as I progress through.

Bonus Content
A screenshot from ingame-chunk debugger

So basically these days I’m optimizing the engine aiming the best performance available. Recently I’ve seen a great idea by Slaihne over his game BlokWorld’s forums. He basically suggests using a single huge array for blocks and wrapping the array. So I decided to give it a try – and although I’m not done with re-factoring completely, it seems to work great!

Basically until now Voxeliq was using a double-indexed dictionary to cache chunks within player’s region and then storing a single-dimension block array per each chunk. This works to some extend though there are a few problems. First current technique’s speed is not that bad as you can see from my previous videos, though Slaihne’s one seems to be faster. I’ll be explaining below in details;

  • Memory-wise; Voxeliq’s current technique loads new chunks / removes them as player moves – which basically allocs/deallocs memory continuously – given that .NET GC’s in-deterministic nature this is not really good. On the other side slaihne’s method always uses a pre-determined amount of memory for block/chunk caches. Even more hundreds of chunk instances is another memory sink in current method (it’s already known that in .net object instances have quite noticeable overhead).
  • Speed-wise; Especially in the case of lighting the current method needs each chunk to have pointers to neighboring chunks and have extensive checks. The new method completely simplifies the stuff.
  • Recaching; The current technique extensively re-caches chunks as player moves around (allocs/deallocs chunks). Within the new method yet again this will be really simplified a lot thanks to array wrapping.

So slaihne mentions he uses array wrapping but in one point he also mentions about his array being a single dimensional one. This was already a technique I was using in chunk’s block arrays, where I was flattening a 3 dimensional array to a single dimension one (as single dimensional arrays are lot faster in .net compared to multi-dimension ones).

I basically implemented a wrapping array with additional flattening support;

        public Block this[int x, int y, int z]
                var wrapX = x%CacheWidthInBlocks;
                var wrapZ = z%CacheLenghtInBlocks;
                var flattenIndex = wrapX * FlattenOffset + wrapZ * Chunk.HeightInBlocks + y;

                return this.Blocks[flattenIndex];
                var wrapX = x % CacheWidthInBlocks;
                var wrapZ = z % CacheLenghtInBlocks;
                var flattenIndex = wrapX * FlattenOffset + wrapZ * Chunk.HeightInBlocks + y;

                this.Blocks[flattenIndex] = value;

So initially it seemed all good but I’ve to re-factor more parts to let the engine take advantage of this completely. I’ll be posting another update once I’m done with a result video!

Array Wrapping

Oh and this shows how array wrapping works (kudos goes to Slaihne for the mockup!);

Flatten Arrays

For the interested ones here’s array tests for multidimensional, jagged and flattened arrays;

Test Environment: 1 physical cpus, 2 cores, 2 logical cpus.

Array size: 256*256*256

Itr.    Multi.  Jagged  Flatten (Sequental)

#1      00.187s 00.116s 00.093s
#2      00.186s 00.112s 00.095s
#3      00.189s 00.112s 00.094s
#4      00.187s 00.115s 00.098s
#5      00.187s 00.113s 00.094s
#6      00.186s 00.115s 00.094s
#7      00.188s 00.117s 00.094s
#8      00.187s 00.112s 00.094s
#9      00.187s 00.114s 00.095s
#10     00.191s 00.117s 00.097s
~Avg    00.188s 00.115s 00.095s

Itr.    Multi.  Jagged  Flatten (Random)

#1      00.238s 00.158s 00.126s
#2      00.226s 00.160s 00.123s
#3      00.226s 00.155s 00.122s
#4      00.225s 00.159s 00.123s
#5      00.225s 00.168s 00.136s
#6      00.233s 00.164s 00.149s
#7      00.237s 00.187s 00.126s
#8      00.237s 00.163s 00.128s
#9      00.239s 00.158s 00.125s
#10     00.227s 00.156s 00.125s
~Avg    00.232s 00.163s 00.129s

As you can see flatten arrays in .net 4.0 is 2x times faster then conventional multi-dimensional arrays.You can find my test code over here;

Right now I’m working on major fixes for the engine, especially for the parts I’m not happy with. Here’s a quick change-log.

  • Fixed terrain generator interface, generators do now accept a Biome structure and apply it after initial terrain generation.
  • Started implementing a chunk-cache. Before all chunk management was done within World which harder to maintain. Chunk-cache will be the interface between world and actual chunk-storage and will be also responsible for re-caching.
  • Fixed a major mouse elevation & rotation input bug. yay
  • Added a cool fps-graph widget, will be adding more.

Upcoming fixes

  • I’ll be further improve chunk processors, queues and so. Will be trying to fix that chunk-recache lag.
  • Lightning needs far more work, it’s quite slow and buggy.

  • Improved infinitive terrain implementation
  • Better & optimized memory usage
  • Ingame chunks: 1089 ~ 35 million blocks.
  • Basic lightning – though kinda buggy & incomplete.
  • Shiny textures
  • Skydome (with generated clouds), fog, flying support.
  • Basic shovel ~ block build/crack support.