Sorry for delay, but your long post deserved a proper answer thus more than 5 minutes to answer
Then, two things :
First, I'd prefer you to provide us with the original link for your article, because even if it's tagged "sample", I am not sure you have the right to distribute it.
Second, please do not spam yourself to much, even if what you say is interesting, the edit function is enough before you get answers
So you like maths, I do too, but my theoretical physics background also tough me to always keep a global view of any phenomenon.
Mathematics is a great tool, but can as well lead us to miss some parameters when we try to describe everything through functions and formulas.
All you describe here is very true, even though I am sure the Google formula is more complex, it's more or less the PageRank theory.
I think the Google formula is more complex because there are many known filters applied to PageRank, such as sandbox, blacklisting, keyword density analysis and, I am quite sure about it, deep linguistic analysis and etc ...
From what I observed, I really think we can talk about linguistic analysis. First Google can figure out which language is used on a web site even though there is no lang meta tag.
Then, another thing make me think this way, let me explain.
It's about the underscore "_" well known problematic, correctly stating we'd better use hyphens "-" to separate keywords in our URLs than underscores.
The experience made several time was to post two random words separated with an underscore, something like kjsdfhpkohfd_kkdkhdfkh, and then two other ones separated this time with hyphen, let's say reoaeoay-poiypoy.
After this page was crawled, the conclusion was we could only search for the two random words separated with hyphens separately (eg search for reoaeoay or poiypoy alone). The two random words separated with underscore weren't search-able alone.
So many SEOer claimed only hyphen was a separator. This was misunderstanding half the experience I think.
Because by the same time, I was running a site map for a web site of mine, located in a folder called site_map/, and guess what, I was able to perform search query like "name_of_my_site" plus "map" (and "map" was not part of the title
), and was finding my site map, with map highlighted in the result URL.
So this clearly means half the experience was missed. We can conclude that Google is performing quite a deep language analysis, because in the underscore case, it will still be able to find out existing words.
Now you'll say, why the hyphen and underscore difference ?
Because the hyphen is actually used in many languages as a separator, the underscore is not, so when Google found two random words, obviously not listed in any dictionary, it still analyzed the hyphen as a text separator, and even though it was not able to find out any entry in any dictionaries for the two random words, it treated them as separable.
For the underscore, as it's not used as a separator in any language, Google treated the two ununderstandable random words as a single one, something like a symbol or a script file name for example.
So this shows Google is doing many many things when analysing content, and I am sure PageRank also depends on this analysis, and that this is going to be more and more important as Google will use better and better tools to do this. Remember the goal is to find the best content
Then if you add the fact Google is as well performing such deep analysis on the web site you are linked from, you understand there is no way to be as accurate as the example you present us is.
I think that the whole process must not be this far from chaotic.
The chaos theory tells us almost all dynamic equilibrium's ( auto regulated ) are chaotic : population growth, sugar rate regulation in blood, earth cycle around the sun etc ... And this is far from meaning it must be a total mess, actually it's possible to demonstrate the earth would have gone out of the solar system long long ago if it's path was not chaotic.
Chaos means we cannot accurately predict anything, as few changes in the system can lead to major ones in the future, but it allows as well pretty stable regulation of auto regulated and dynamical systems. Lasted for about 4 000 000 000 years with earth trajectory so far
The Google indexing phenomenon is highly dynamic (time depending) and depends on many parameters, with many filter and regulation algorithms, this can be said without looking at his code, it's stating the obvious.
Then, the PageRank being calculated recurrently, this means that your PageRank depends on the PageRank of all web sites linked together with at least one linked to yours, which I am sure can go up to all listed web sites for many cases as for example, a link from DMOZ will give you a PageRank based on DMOZ's PageRank based on the PageRank of all web site linking to it which depends on the PageRank of the web site linked to them ...
So talking about chaos theory in such matter is far from crazy.
This mean we can follow basic and general principles to Search Engine Optimize our web sites, but cannot perform any prediction of the type you suggested, thing are way more complex and simpler as well in a way.
All you said could be resumed in :
The more backlinks and deep backlinks from the Highest and more related pages and with the best content, the better PageRank. Simple isn't it ?
Then we should keep in mind Google must (or will soon) use some kind of categories of his own to sort web site, for example depending on the language analysis performed in it and the backlinks and the links in it etc ...
Same story always, a never ending loop between all params : PageRank Depends on number, type and quality of backlinks, content and the way it is analysed (thus rated), the number of pages your web site has, the way it is internally linked, the number of duplicates, the PageRank of your web site's page (thus your web site's PageRank) , the PageRank of the web site you're linked from (thus the PageRank of the web sites linking to them) ...
All we get in the end, when PageRank is set for a given page is a snapshot of this never ending procedure started the day Google first started PageRanking.
So you cannot say 3000 post will give this PageRank, but you can easily work on optimizing it.
You can as well read this post
, in which I showed, among other things, that you could have a PR 6 with 44 backlinks