So here it is, we are going to have soon a beta for the new Google Sitemaps solution for phpBB Forums.
I say new because I think I'll rename it tosomething like Ultimate Google sitemaps & RSS, because yes, there are RSS Feeds, and also, but it's less important, a nice Yahoo! urllist.txt.
Here it goes.
Cache :
A complete cache system configurable from ACP.
All the maps (sitemaps, rss and urllist.txt) are entirely saved in a folder.
When a cached file is up to date and available, the module will just send it as is to the browser without further processing, making the output very, very fast, comparable to a direct physical file access.
Let's talk about time a bit
- First page load : Cache is being build.
<!-- URL list generated in 5.41892 s - 25 sql - 11834 URLs listed -->
<!-- Output started from cache after 5.42756 s - sql -->
<!-- Output from cache ended up after 6.93087 s - sql -->
This means that the module is building a 11834 URLs list in 5.41892s, and that the cache file is saved in 0.00864s (2 119 631 octets)(the file being sent to the browser right after saving).
The output ended 6.93087s after it was requested.
Second load : And followings until cache expires (configurable in ACP)
And it's more interesting :
<!-- URL list generated in 5.41892 s - 25 sql - 11834 URLs listed -->
<!-- Output started from cache after 0.00256 s - sql -->
<!-- Output from cache ended up after 1.57475 s - sql -->
The first line being itself cached, to recall how hard it was to build up such a long list before sending it. This is to be compared to the 0.00256 s here needed to start the output
Then, file transfer is relatively long, but the file is 2mo and if we take into account the large number of URLs and all the work we're asking to the browser, because we do, and I'll talk about this right after, it's pretty fast.
Because of this, Gun-Zip compression is very powerful. The module is both able to save and output Gun-ziped datas. Our 2mo becomes here 48 ko.
As again the cached file is sent as is to the browser, it really become fast to output a Google sitemaps listing 11834 URLs, as fast as sending a 48ko gif file more or less
Unfortunately, there are no stats available for this output, the function used to read and send a gun-ziped file makes it impossible it seems. But it is for sure a lot faster, even if the browser is again asked to work even more as it will be the one to uncompress the file which could end up making the page show up a bit later after it was fully sent.
Each type of page outputted, sitemaps, rss 2.0 feeds and urllist.txt has it's own cache time limit configurable in acp.
URL rewriting :
You can switch mod rewrite type in acp, so far you can select between the three phpBB SEO mod rewrites, but the code is set to allow a lot more URLs standards. Will auto detect the used phpBB SEO mod rewrite when they'll get updated.
Note that title injection does add some weight to the sitemaps. If we continue with our previous example and inject topic title we get :
First request : building cache.
<!-- URL list generated in 7.27516 s - 25 sql - 11834 URLs listed -->
<!-- Output started from cache after 7.28377 s - sql -->
<!-- Output from cache ended up after 8.92257 s - sql -->
Generation time is longer, this is what it costs to inject and censor 11834 topic title in the same amount of URLs, 1.85s.
This can look long, but the list is huge and gets cached. If we go back to the forum scale, we obtain 0.00784s for 50 injections, which is after all a very good result.
The second request follow the main idea : be fast
- Code: Select all
<!-- URL list generated in 7.52122 s - 25 sql - 11834 URLs listed -->
<!-- Output started from cache after 0.00248 s - sql -->
<!-- Output from cache ended up after 1.52267 s - sql -->
Very fast for a 2.4 mo file (the weight of this many titles). This goes down to 285 ko if gun-ziped.
By the way, here we can see that the Gun-zip compression is harder on topic titles, it's because every one of them is unique, there is not "viewtopic.php?" or "topic" repeated 11834 times.
RSS 2.0 feeds
This is what made this a little longer than first expected. A lot of feeds, with a lot of options
Let's proceed with examples.
This will be our occasion to talk about what we ask to the browser. All the RSS feeds and the Google sitemaps have their XSL transformation. This allows compatible browser to build up an html page out of the xml code sent. The server is just sending an additional stylesheet and it's the browser to do all the presentation.
And it goes like this: http://www.phpbb-seo.com/sitemaps.xml for the Google SitemapIndex.
For the RSS feeds, first, I still need to mod my module (lol) to properly handle both forums here, so you'll see french post from the french speaking forums for now, but you'll get the idea
Then, as I implemented quite some, I also implemented a special channel, listing all available channels on the same page : http://www.phpbb-seo.com/rss-channels.xml
From there you can explore all RSS feeds. More a cosmetic feature than really an SEO enhancement, but quite handy.
For the types of RSS feeds :
- http://www.phpbb-seo.com/rss.xml which is listing the last messages from all available sources (so far it's only the forums, but as for the previous one I'll add KB support and more) ;
- http://www.phpbb-seo.com/rss-boards.xml Which does the same thing as the previous but from the forum only (the same type of feed will be available for KB and etc ... );
- http://www.phpbb-seo.com/rss-board.xml Which outputs a nice list of all forums
- One RSS feed per forum, with forum title injection in url in mixed and advanced mod. Example : http://www.phpbb-seo.com/le-forum-phpbb-rf28.xml
For all feeds, it's possible to add too parameters, so far (I'll change these before release) : -l, -s and -m.
The two first are used to ask for a longer or shorter list. The last one ask for feeds with the messages content (can be summarized, configurable in ACP).
You can add on of the first two to any RSS feed URL, and / or the -m one and play with combinations. It's as well possible to output only the last post of all topics and or to keep the first one.
and it goes like this :
http://www.phpbb-seo.com/rss-m.xml
http://www.phpbb-seo.com/rss-l-m.xml
Etc, the -long and -short before the -m.
Here we talk about the rewritten parameters, these would look like &l, &s and &m if not.
The message content output does not breaks BBcodes and parses smiles.
Since xml in not allowing any html special characters, like < and > needed to activate links in posts, there is a little bit of JavaScript to make it work with Firefox. For once IE is being easier to work with.
Yahoo! urllist.txt :
Before I forget : http://www.phpbb-seo.com/urllist.txt
This was not really required as Yahoo! deals very well with RSS 2.0 (here you start to understand the interest of long RSS lists with links only) but it's still an extra option, and cached.
The list is for now grabbing x last post from each of the public forums, configurable in acp.
Small limitation :
If Gun-zip is activated in phpBB, it will be so in the module, it is though possible to use gun-zip compression on the module if it's turned off in phpBB.
With Gun-zip, the rewritten URLs are taking an extra .gz extension. The module is able to check if Gun-zip is supported by the browser or bot and eventually will uncompress or re-cache the uncompressed file (configurable in acp) and send it to the browser after an HTTP "307 Temporary Redirect" redirection. I think it's a good way to tell the real one is the other one, but it's very easy to go for a 301 is required in such cases, we'll see.
This to tell you you'll be redirected if you follow these links
For now, the duplicate are destroyed only when building the cache, I'll work on a full solution.
Anyway, the sitemapindex was submitted and is used by Google with great satisfaction since yesterday
So here is after all a very nice solution to build up Google sitemaps with 10 000 URLs in each
It should be possible to go over this, but I was too lazy posting more topics to test further
And this explains why it took some time to dev this. It's simple it's all rewritten from scratch, all OO, and as I bench-marked it with 10 000 URLs, I was able to do some optimization in the script ... where small changes make great difference
About going further, I was thinking about the new Yahoo! API tools, like the update notification, could be nice to notify Yahoo! upon every RSS feed cache update
++

English |
French


