Based on what you have written, the following publication and the datasets might be of great value for you guys:
Article: http://onlinelibrary.wiley.com/doi/10.1029/2012GC004370/abstract
Gridded test dataset (0.5 degrees; low resolution): http://doi.pangaea.de/10.1594/PANGAEA.788537
The main author is Prof Hartmann of the University of Hamburg: http://www.geo.uni-hamburg.de/de/geologie/personen/hartmann_jens.html
You can contact him and ask for the availability of the data etc. You could, for example, make the case that Outerra could be used for research in terms of lithology, such as a visualisation tool etc.
If not, I could contact him and ask, although I do not know him personally, only some colleagues of his.
Thanks, so that's GLiM. Relatively low res, but usable, provided we will be allowed to use it. We certainly plan for Outerra to be usable and useful for scientific purposes, but there are always some issues with getting the data or being able to use them for all purposes.
I wonder if we could get access to source or intermediate data that were used to make this, that would be better suitable for mapping to probabilities of occurrence. Although we can process also these types of color coded map data, there's a lot of information lost in the whole process, especially since the deduction of component probabilities is completely heuristic. I will contact Prof Hartmann to see what the possibilities are (I can list you in CC if you want). I hope I'll be able to express myself clearly, not mutilating the terminology or anything
AS far as I can see, the data on Pangaea is Creative Commons licensed. Hence you can use it for every purpose you like. I am familiar with Pangaea since I have published stuff on there as well. As you said, even the raw dataset can be a great way to determine the general look of Outerra for a specific region. You can list me in the CC in your email to Prof. Hartmann if you want to, no problem (sent you a message with my email addres).
Thinking further: the soil and vegetation in the real world are resembling the color of the rock they're growing in, the thicker the soild cover the less of the original color is retained and usually the more organic material is incorporated (the blacker = the more organic stuff). Using the rocks as some sort of base, the soil textures could in theory be adjusted to resemble the rock colour. Also, vegetation can grow directly on the rock and doesnt always require soil. Also, specific rock types have specific vegatation that grows on them, such as for example sandstone/sandy soil = usually pine trees or limestone = wine and short scrub, this also varies depending on the biogeographic region of Europe (see data here:
http://www.eea.europa.eu/data-and-maps/data/biogeographical-regions-europe-1).
Second thought is that it might work to make slope profiles for major groups of rocks, such as sandstone or limestone. Hence, if the elevation data contains a steep cliff and the lithological map indicates limestone or sandstone one would, in general, expect a specific pattern for each of those rocks (such as distinct layering for sandstone, and massive + layering for limestone). If the elevation data indicates a flat area with sandstone, one could expect a specific rock texture. You guys probably thought about this already and I am just thinking out loud.
Also, dont forget reefs and atolls. There is a free but unfortunately non-commercial database of reef bodies in the worlds oceans:
http://reefgis.reefbase.org/. Since reefs are always within more or less -10 to -30 meters of water depth (at least satellite images show areas of reefs up to -30m) such a database could be used to create reefs all over the planet. I dont know of any other source unfortunately (yet).
Cheers,
Andreas
Edit: I might even create a map of the major reefs that you could use with the landsat images. Good training for me and usefull for you ;-)