In one of my previous DevLog entries I had mentioned that I was switching to a single huge block array technique instead of array per chunk which eventually simplifies access to block data. Now I’m done with upgrading my lighting code to take advantage of new block array technique and results are great!

With the latest updates, ligthing is now 3.2 times faster! Wow, that’s a really great optimization which worth every single seconds I’ve worked for it! Here’s test results;

Test Parameters:
* View range: 5 chunks.
* Total chunks in view: 121 chunks.

     Old Lighting Technique - New Lighting Technique
 #1          1316 ms        -          498 ms
 #2          1291 ms        -          384 ms
 #3          1338 ms        -          384 ms
 #4          1293 ms        -          389 ms
 #5          1278 ms        -          387 ms
 #6          1324 ms        -          386 ms
 #7          1284 ms        -          377 ms
 #8          1281 ms        -          380 ms
 #9          1288 ms        -          389 ms
#10          1299 ms        -          381 ms
-----------------------------------------------------------
AVG.         1299 ms        -          395 ms ~ 3.2 faster!

So what changed?

First, the old technique was optimized for old block array per chunk storage. So within this to access a neighboring block, it had to access using chunks – as a neighbor block could be in another chunk. The below propagation functions first need to resolve the chunk that owns the asked block (as block arrays are per chunk) and then need to progress on.

PropagateSunLight(chunk, (byte) (x - 1), y, z, light);
PropagateSunLight(chunk, (byte) (x + 1), y, z, light);
PropagateSunLight(chunk, x, (byte) (y - 1), z, light);
PropagateSunLight(chunk, x, (byte) (y + 1), z, light);
PropagateSunLight(chunk, x, y, (byte) (z - 1), light);
PropagateSunLight(chunk, x, y, (byte) (z + 1), light);

Where as the new lighting technique really doesn’t care about which chunk does the block resides in and can directly access the block data from our single huge block array!

PropagateSunLight(blockIndex + BlockStorage.XStep, propagatedLight); // propagate light to block in east.
PropagateSunLight(blockIndex - BlockStorage.XStep, propagatedLight); // propagate light to block in west.
PropagateSunLight(blockIndex + BlockStorage.ZStep, propagatedLight); // propagate light to block in north.
PropagateSunLight(blockIndex - BlockStorage.ZStep, propagatedLight); // propagate light to block in south.
// DO NOT repropagete to upper block which we don't need to do so and may cause loops!
PropagateSunLight(blockIndex - 1, propagatedLight);   // propagate light to block down.

So What’s Next?
I’ll be also optimizing terrain generation & vertex-building code similarly.

Bonus Screenshot
Nothing new & fancy

So basically, Microsoft doesn’t give a single word on future of XNA especially when you think about it within upcoming Windows 8’s context. There are two different views over people’s blogs where few of them claims that XNA reached end of it’s life where others claim that next big XNA update will be coming with new Xbox.

So I dived further in. First I’ve installed Visual Studio 11 Beta, just to see it does not support XNA projects anymore. So at least for now forget about upgrading to VS 11 Beta if you’re still developing for XNA – and I’m really not sure if Visual Studio 2010 will support targeting .Net 4.5 but I hope so.

There is a missing project subtype.
Subtype: ‘{6D335F3A-9D43-41b4-9D22-F6F17C4BE596}’ is unsupported by this installation.

After a few hours of reading, I’ve found these regarding XNA;

  • XNA apps will run perfectly in Windows 8 but only in Desktop mode.
  • You can still have an icon in Metro view to launch your app.
  • Only partial Windows 8 marketplace support, i.e. Marketplace will only provide a web link to an installer that you must host yourself and (as yet) there are many unknowns about handling transactions and security.
  • XNA apps will run on x86 and x64 but not ARM processors.
  • You can reuse your code between XBox360, Windows 7/8 and Windows Phone 7
  • Unknown whether XNA apps will run on Windows Phone Tango/Apollo

Voxeliq’s future
So basically Voxeliq owns a huge codebase which I don’t have any willing/time to port native C++/DirectX any soon. So I’ll be staying within lands of mysterious C# game development. Options available are;

  • Stay within XNA as is.
  • Port to an XNA alternative like ANX (which isn’t any complete yet)
  • Switch to a managed DirectX wrapper – ie, sharpdx or slimdx.

I’ve previously played with slimdx a lot but the bad news is that it doesn’t support upcoming DirectX 11.1 or Windows 8 Platform and it really lacks a sample game framework which I had to implement on my own (too bad I deleted the sources thinking I would never need it!).

SharpDX on the other hand is ready for DirectX 11.1 and Windows 8 platform already. Even better, there’s a good game framework sample itself – the ANX framework which is actually based on SharpDX.

So I guess I’ll be taking SharpDX way though will not be starting to port my code any near before we can get more detailed info on what’s going with XNA.

So, in this kinda blog page, I’ll be also trying to comment on language & framework other than talking about my stuff. With the latest huge array technique with flatten array support, I do now use block pointer functions and here’s one;

        /// <summary>
        /// Returns block index by relative position of block in chunk.
        /// </summary>
        /// <param name="chunk">The chunk block belongs to.</param>
        /// <param name="x">Blocks relative x position in chunk.</param>
        /// <param name="y">Blocks y position in chunk.</param>
        /// <param name="z">Blocks relative x position in chunk.</param>
        /// <returns></returns>
        public static int BlockIndexByRelativePosition(Chunk chunk, byte x, byte y, byte z)
        {
            var xIndex = chunk.WorldPosition.X + x;
            var zIndex = chunk.WorldPosition.Z + z;

            var wrapX = xIndex % CacheWidthInBlocks;
            var wrapZ = zIndex % CacheLenghtInBlocks;

            var flattenIndex = wrapX * FlattenOffset + wrapZ * Chunk.HeightInBlocks + y;
            return flattenIndex;
        }

All code in the engine that access block data has to use one of these functions and it’s really not a good idea to repeat the code all over the source (given that a possible future update on functions will require lots of time and will make it harder to maintain and hunt for bugs). Although today’s modern compilers are quite intelligent, still I was eager for some forced inlining functionality given that those functions get calls millions of times. Statement lambdas are a possibility but they’re technically not what I’m looking for;

Conventional way:
-----------------
L_0000: nop 
L_0001: ldarg.0 
L_0002: ldarg.1 
L_0003: ldarg.2 
L_0004: ldarg.3 
L_0005: call int32 VolumetricStudios.VoxeliqGame.Chunks.BlockCache::BlockIndexByRelativePosition(class VolumetricStudios.VoxeliqGame.Chunks.Chunk,
 uint8, uint8, uint8)

Statement Lambdas:
------------------
L_0000: nop 
L_0001: ldsfld class [mscorlib]System.Func5<class VolumetricStudios.VoxeliqGame.Chunks.Chunk, uint8, uint8, uint8,
 int32> VolumetricStudios.VoxeliqGame.Chunks.BlockCache::BlockIndexByRelativePosition3
L_0006: ldarg.0 
L_0007: ldarg.1 
L_0008: ldarg.2 
L_0009: ldarg.3 
L_000a: callvirt instance !4 [mscorlib]System.Func`5<class VolumetricStudios.VoxeliqGame.Chunks.Chunk, uint8, uint8, uint8, int32>::Invoke(!0, !1, !2, !3)

So at last .Net 4.5 will be coming with a new Method Implementation Option called AggressiveInlining. MSDN has the following explanation;

  • AggressiveInlining: The method should be inlined if possible. It’ll be nice to force regular functions to get inlined with 4.5.

Update: You can find a nice reading on it over here.

So initial re-factoring is done. Voxeliq engine now uses a single huge array of blocks instead of block arrays per chunk. I can say initially that this improved the performance to some extend though there’re still pieces of code that’s optimized for old technique (especially lighting one). As I cover them all I guess we’ll see more performance improvements over time.

Here is the initial tests;

  • View range: 5 chunks.
  • Total chunks in view: 121 chunks.
  Block Array/Chunk          Single Huge Block Array
  Gen  Light  Build   -  -        Gen  Light  Build
 #1  906  1545  4497  -  -   #1  1287  1558  3237
 #2  933  1524  4419  -  -   #2  1283  1520  3066
 #3  933  1567  4520  -  -   #3  1319  1448  3089
 #4  959  1593  4576  -  -   #4  1420  1803  3340
 #5  912  1538  4413  -  -   #5  1256  1470  2832
 #6  903  1512  4215  -  -   #6  1405  1534  3022
 #7  897  1552  4451  -  -   #7  1386  1517  3463
 #8  912  1573  4790  -  -   #8  1362  1524  2989
 #9  935  1580  5117  -  -   #9  1367  1455  3442
#10  920  1606  4624  -  -  #10  1403  1688  3391
ST:  9210  15590  45622      ST:  13488  15517  31871
GT:      70422               GT:      60876

* All values are in msec.
  •  Clearly as you can see, vertex building took advantage of new technique a lot.
  • Lighting code performs nearly the same though it’s not optimized for new technique yet.
  • Terrain generation got slowed a bit though didn’t have time to look for it yet.

I’ll be providing more in-detail info as I progress through.

Bonus Content
A screenshot from ingame-chunk debugger

So basically these days I’m optimizing the engine aiming the best performance available. Recently I’ve seen a great idea by Slaihne over his game BlokWorld’s forums. He basically suggests using a single huge array for blocks and wrapping the array. So I decided to give it a try – and although I’m not done with re-factoring completely, it seems to work great!

Basically until now Voxeliq was using a double-indexed dictionary to cache chunks within player’s region and then storing a single-dimension block array per each chunk. This works to some extend though there are a few problems. First current technique’s speed is not that bad as you can see from my previous videos, though Slaihne’s one seems to be faster. I’ll be explaining below in details;

  • Memory-wise; Voxeliq’s current technique loads new chunks / removes them as player moves – which basically allocs/deallocs memory continuously – given that .NET GC’s in-deterministic nature this is not really good. On the other side slaihne’s method always uses a pre-determined amount of memory for block/chunk caches. Even more hundreds of chunk instances is another memory sink in current method (it’s already known that in .net object instances have quite noticeable overhead).
  • Speed-wise; Especially in the case of lighting the current method needs each chunk to have pointers to neighboring chunks and have extensive checks. The new method completely simplifies the stuff.
  • Recaching; The current technique extensively re-caches chunks as player moves around (allocs/deallocs chunks). Within the new method yet again this will be really simplified a lot thanks to array wrapping.

So slaihne mentions he uses array wrapping but in one point he also mentions about his array being a single dimensional one. This was already a technique I was using in chunk’s block arrays, where I was flattening a 3 dimensional array to a single dimension one (as single dimensional arrays are lot faster in .net compared to multi-dimension ones).

I basically implemented a wrapping array with additional flattening support;

        public Block this[int x, int y, int z]
        {
            get
            {
                var wrapX = x%CacheWidthInBlocks;
                var wrapZ = z%CacheLenghtInBlocks;
                var flattenIndex = wrapX * FlattenOffset + wrapZ * Chunk.HeightInBlocks + y;

                return this.Blocks[flattenIndex];
            }
            set
            {
                var wrapX = x % CacheWidthInBlocks;
                var wrapZ = z % CacheLenghtInBlocks;
                var flattenIndex = wrapX * FlattenOffset + wrapZ * Chunk.HeightInBlocks + y;

                this.Blocks[flattenIndex] = value;
            }
        }

So initially it seemed all good but I’ve to re-factor more parts to let the engine take advantage of this completely. I’ll be posting another update once I’m done with a result video!

Array Wrapping

Oh and this shows how array wrapping works (kudos goes to Slaihne for the mockup!);

Flatten Arrays

For the interested ones here’s array tests for multidimensional, jagged and flattened arrays;

Test Environment: 1 physical cpus, 2 cores, 2 logical cpus.
________________________________________________________________________________

Array size: 256*256*256
________________________________________________________________________________

Itr.    Multi.  Jagged  Flatten (Sequental)
________________________________________________________________________________

#1      00.187s 00.116s 00.093s
#2      00.186s 00.112s 00.095s
#3      00.189s 00.112s 00.094s
#4      00.187s 00.115s 00.098s
#5      00.187s 00.113s 00.094s
#6      00.186s 00.115s 00.094s
#7      00.188s 00.117s 00.094s
#8      00.187s 00.112s 00.094s
#9      00.187s 00.114s 00.095s
#10     00.191s 00.117s 00.097s
~Avg    00.188s 00.115s 00.095s
________________________________________________________________________________

Itr.    Multi.  Jagged  Flatten (Random)
________________________________________________________________________________

#1      00.238s 00.158s 00.126s
#2      00.226s 00.160s 00.123s
#3      00.226s 00.155s 00.122s
#4      00.225s 00.159s 00.123s
#5      00.225s 00.168s 00.136s
#6      00.233s 00.164s 00.149s
#7      00.237s 00.187s 00.126s
#8      00.237s 00.163s 00.128s
#9      00.239s 00.158s 00.125s
#10     00.227s 00.156s 00.125s
~Avg    00.232s 00.163s 00.129s

As you can see flatten arrays in .net 4.0 is 2x times faster then conventional multi-dimensional arrays.You can find my test code over here; https://github.com/raistlinthewiz/dotnet-array-perf-tests

Right now I’m working on major fixes for the engine, especially for the parts I’m not happy with. Here’s a quick change-log.

  • Fixed terrain generator interface, generators do now accept a Biome structure and apply it after initial terrain generation.
  • Started implementing a chunk-cache. Before all chunk management was done within World which harder to maintain. Chunk-cache will be the interface between world and actual chunk-storage and will be also responsible for re-caching.
  • Fixed a major mouse elevation & rotation input bug. yay
  • Added a cool fps-graph widget, will be adding more.

Upcoming fixes

  • I’ll be further improve chunk processors, queues and so. Will be trying to fix that chunk-recache lag.
  • Lightning needs far more work, it’s quite slow and buggy.

  • Improved infinitive terrain implementation
  • Better & optimized memory usage
  • Ingame chunks: 1089 ~ 35 million blocks.
  • Basic lightning – though kinda buggy & incomplete.
  • Shiny textures
  • Skydome (with generated clouds), fog, flying support.
  • Basic shovel ~ block build/crack support.