I think lots of these come down to lacking HDR adjustment, and mainly to how the white balance is modeled. Because the problem is that OT uses physical values when integrating light scattering and computing the attenuation, so the values you get should be reasonably close to the real light (there are some simplifications though). However, human eye and brain does a hell of a job adapting, as anyone can check by turning off white balance on a digital camera and view light levels captured by it on a display.
What the brain does with regards to white balance (perceiving a paper sheet as white even during sunset or the blue hour), or enhancing the contrast, must be approximated algorithmically in a post-process pass, as the brain will see the real values as fake when the display is just a part of the viewing area.
The balancing is somewhat subjective and therefore could be in theory manually adjusted or the coefficients assembled into a lookup table, but it would be good to eliminate all things that can be factored out, to reduce the lookup table size, or to transform it into some curve coefficients.
What the WB code currently does is taking a white sheet and computing how it's lighted, and renormalizing all the colors so that this color is perceived as white, or close to it to mimic what the brain does.
Note this does not take into account that people routinely mismatch different lighting conditions, and try to adapt OT to render something they want to see, but with sun positioned so that it would be not possible in real life. For example, with clear sky, when sun is low, sun rays pass long way through the atmosphere, resulting in a lot of what is perceived as the haze (actually light in-scattered into the viewing ray). With sun up you get a lot less haze and much clearer views, but it's not as nice. It's different when clouds partly obstruct the low sun, and the ground is illuminated by the sky in greater amount. Again, white balance and HDR will mess it up.
I'm afraid that people will try to achieve something they have seen in different conditions, and it will never work consistently, especially if it's a postprocess hack and not something physical.