After experimenting with Neural Nets for style-transfer and other generation of imagery I conclude the following:
The future for fast-rendering of immersive environments, lies in training a network, to convert blocky-voxel renders (handy because the model is simple) and do the time-consuming stuff, like ambient occlusion, god-rays etc in one big post-processing sweep.
Where traditional post-processing does not change the shapes or add detail, neural nets are very capable of doing this.
Also objects can be detected, so that regions can be treated differently.
Finally, this will probably also mean a (very-welcome) break from the efforts