This needs a longer explanation. The process is actually bit complicated and the algorithms are in development and so the exact form is still being refined.
Apart from the Blue Marble data there's also the
MODIS dataset that contains land (and other) classification in a discrete form. As I understand it, it was created in the same process than Blue Marble, using several more infrared bands of collected data. The part that interests us most is the dataset that divides land type into 14-some classes. I think I have already shown it somewhere ... dunno where, reuploading:
This is available with 500m resolution just as the BMNG color data.
This representation more suitable for the engine, as it can serve data for the generators of vegetation and terrain textures. But if we used it directly, the world would consist of flat biomes and sharp transitions between them, which is not what we want.
What we want is this:
- the colors should match the Blue Marble when viewed from above (+ atmospheric effects)
- transitions between the biomes should be natural, fractal like
you don't get a grass that's 30% color of desert and 70% color of savanna at some point in transition, you get 30% grass coverage there instead
Fractal mixer is already driven by the ratio parameter when generating textures and terrain data, as can be seen on this older screenshot of how the transitions are created:
The biome mixer works on the same principle but on a larger scale. It requires data prepared in a special way, basically land types with probability weights that are then fed into the fractal mixer that resolves the land type together with its colorization.
How to get these data by transforming the source land type and color input maps is the major problem. The algorithm basically tries to find an optimal mix of input land-type ingredients that can produce the observed color and maximizes the probability that given composition can occur at given place, constrained by other parameters such as min/avg temperature, precipitation etc.
It's a multidimensional problem of finding the best fit.
Seasons are an integral part of it, once you have the base types defined over the seasons (like, birch cover defined as a vector of colors over seasons), you are going to find whether the birch would explain the observed variation of colors at a particular place over the whole year, and what part it would cover.
Obviously, interpreting color data with the help of auxiliary datasets (major land type, temperature averages and extremes, precipitation, sunshine % etc) is quite demanding both algorithmically and on the processing power, and it also depends a lot on the quality of the input base types given to it to work with. It must be able to provide a quality feedback that tells how well it was able to match it, best suggesting the parameters of possibly missing land types if the deviation is too high. It has to avoid urban areas while analyzing but not when generating the output dataset.
There's also an alternative way that treats the data differently, in somewhat simpler manner. Unlike the first method it primarily considers the discrete land type dataset and computes the weighted land types needed by the biome mixer by analyzing the surrounding terrain types and computing the composition from that. Additionally it takes the color data and encodes it as a color difference against the computed composition color. This brings back the color variation that exists normally (for example, the changing color of the deserts) but would have to be achieved by defining multiple types of desert material using the first approach, increasing its complexity even more (though the result would probably be nicer).
I think ultimately it's going to be a combination of the two. While the first one allows to match real color data if these are available, the second one can be used also with hand-drawn climate maps or generated climate models, usable also by virtual world creators and scientifically. Combining them means to use the second approach to quickly restrict the size of potential land cover candidates, and then refine it by the first method locally.
At last (but not the least) it should also be effective at run time ;)
Not sure about the water coloring, but intuitively it could use the color part, decomposing it again into the light scattering parameters of water. We aren't coloring the water directly, it gets its color from light scattering in order to work correctly at angles and underwater. To match a given color you'd have to find the root elements again.
Also, while the processes above can match the seasonal data and produce our biome dataset, implementation of actual seasons is relatively independent thing, involving rendering of trees and vegetation states and a dynamic snow layer etc.