Biggest sitemap challenge : Beat the Record! [ 35,001 urls ]

The GYM Sitemaps & RSS module for phpBB. Sitemaps and RSS feeds for Google Yahoo! and MSN Live, support, add ons etc ...

Moderator: Moderators

Biggest sitemap challenge : Beat the Record! [ 35,001 urls ]

Postby dcz » Fri Jul 18, 2008 3:34 pm

Hello,

I have always wondered up to how much URL could GYM sitemaps & RSS go in a single sitemap.

I did some local testing with up to 2396 topic urls within a forum (gunzip off) :

Cache generation :
Code: Select all
<!-- URL list generated in  0.26343 s  - 13 sql - 2396 URLs listed -->
<!--  Output started from cache after 0.26768 s - 14 sql -->
<!--  Output from cache ended up after 0.48564 s - 14 sql -->


Output from cache :
Code: Select all
<!-- URL list generated in  0.26343 s  - 13 sql - 2396 URLs listed -->
<!--  Output started from cache after 0.00310 s - 2 sql -->
<!--  Output from cache ended up after 0.14326 s - 2 sql -->


0.268 sec is very good to generate and cache a 515ko file, but 0.003 is very very fast when outputted from the cache.

With the phpBB2 version, I tested up to over 16 000 URLs within a decent time (was not the same computer so comparison does not mean much, but it was generated and cached within less than 10 sec).

So, since obvioulsy I don't have any online forum with a forum containing 50 000 topics (the sitemap standard limit), and considering than I'm too lazy to set up a test forum with 50 000 topics in it, I though it could be a nice thing to start this "Biggest sitemap challenge".
So who will be the first to reach 50 000, the maximum amount of URL the sitemap standard allows.


To participate, just use this simple template bellow to post the url to the sitemap, the biggest sitemap will be displayed in the second post of this thread, just bellow this one :

Code: Select all
[quote][b]Site name :[/b]
[b]URL :[/b]
[b]Date :[/b] mm/dd/yy
[b]Number of urls total :[/b]
[b]Number of topics in the biggest forum :[/b]
[b]Biggest forum URL :[/b]
[b]Biggest sitemap URL :[/b]
[b]Generation time :[/b]
[code] [/code]
[b]Generation time once cached :[/b]
[code] [/code]
[b]Gunzip :[/b][/quote]


The generation time are to be found in the xml source code of the generated sitemap, at the very bottom.

For the phpBB SEO forum, this would be :

Site name : phpBB SEO Forum
URL : http://www.phpbb-seo.com/boards/
Date : 07/18/08
Number of urls total : 296
Number of topics in the biggest forum : 273
Biggest forum URL : http://www.phpbb-seo.com/boards/advanced-seo-url-vf54/
Biggest sitemap URL : http://www.phpbb-seo.com/boards/advance ... l-gf54.xml
Generation time :
Code: Select all
<!-- URL list generated in  0.03281 s  - 8 sql - 296 URLs listed -->
<!--  Output started from cache after 0.03339 s -  sql -->
<!--  Output from cache ended up after 0.03378 s -  sql -->

Generation time once cached :
Code: Select all
<!-- URL list generated in  0.03281 s  - 8 sql - 296 URLs listed -->
<!--  Output started from cache after 0.00216 s -  sql -->
<!--  Output from cache ended up after 0.00261 s -  sql -->

Gunzip : off


As you can see everybody will be able to check the results.

So let's play now, and get the most out of GYM 2.0 !

The challenge is as well on in the French forum : The French speaking thread.

++
Last edited by dcz on Sun Oct 26, 2008 9:40 am, edited 5 times in total.
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21391
Joined: Fri Apr 28, 2006 9:03 pm

Advertisement

Postby dcz » Tue Jul 22, 2008 3:04 pm

New record : 35001 urls :shock: !!

Site Name: Dogs4Sale.net
URL : http://www.Dogs4Sale.net
Date : 10/26/2008
Number of urls total : 569718
Number of topics in the biggest forum : 38455
Biggest forum URL : http://www.dogs4sale.net/dallas-fort-worth-f29.html
Biggest sitemap URL : http://www.Dogs4Sale.net/dallas-fort-worth-gf29.xml
Generation time :
We're missing the uncached case, but, the cached one tells us to roughly add 18 sec to the total time (generation + file sending)
Generation time once cached :
Code: Select all
<!-- URL list generated in  18.20959 s ( Mem Usage : 5.98 MB ) - 42 sql - 35001 URLs listed -->
<!--  Output started from cache after 0.00177 s -  sql -->
<!--  Output from cache ended up after 9.07569 s -  sql -->

Gunzip :Off
Last edited by dcz on Sun Oct 26, 2008 9:39 am, edited 1 time in total.
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21391
Joined: Fri Apr 28, 2006 9:03 pm

Postby Professional » Tue Jul 22, 2008 3:59 pm

i think i can beat the record if my site map works :lol:
انجمن تخصصی نسل جدید موبایل ها-Apple Portal
My Handwritings: Professional Dreams
Every Thing That U Feel,Is Every Thing That I Feel.
User avatar
Professional
PR5
PR5
 
Posts: 550
Joined: Mon Apr 07, 2008 5:41 am
Location: 1/2 of the World

Postby dcz » Tue Jul 22, 2008 4:23 pm

Well, it's working at least well enough to list over 12 000 urls ;)
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21391
Joined: Fri Apr 28, 2006 9:03 pm

Postby trefle » Wed Jul 23, 2008 10:43 am

hi,

which forum of your boards should have more than 12.000 urls ?

Trefle.
trefle
PR6
PR6
 
Posts: 676
Joined: Tue Jun 03, 2008 5:46 pm

Postby frold » Tue Jul 29, 2008 4:56 pm

how do I get these datails for my forum?
frold
PR0
PR0
 
Posts: 95
Joined: Thu Apr 17, 2008 7:26 pm

Postby dannygsam » Tue Jul 29, 2008 7:34 pm

I really wish to be one among the GYM challenge. But well my site dosnt have so many pages to index. :P One day my site wil be well ahead of all urs :twisted:
dannygsam
 
Posts: 11
Joined: Mon Jul 21, 2008 9:40 am

Postby dcz » Wed Jul 30, 2008 8:10 am

frold wrote:how do I get these datails for my forum?


On the index, you have the number of topics and post for each forums ;)
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21391
Joined: Fri Apr 28, 2006 9:03 pm

Re: Biggest sitemap challenge : Beat the Record !

Postby Orbits » Thu Oct 16, 2008 11:29 pm

dcz wrote:Hello,

The generation time are to be found in the xml source code of the generated sitemap, at the very bottom.
++


Hmm....

I have over 500k posts on my site:

49 Forums:

http://www.dogs4sale.net/sitemaps.xml

Each with 5k Posts
http://www.dogs4sale.net/los-angeles-gf18.xml

However at the bottom of my posts, i'm net getting the page generation pages....

Looks like I'm running your latest code. Thoughts?

UPDATE

Never mind.... I'm running phpbb2 (finishing phpbb3 upgrade now).
Orbits
 
Posts: 45
Joined: Sat Jun 16, 2007 3:57 am

Postby dcz » Sat Oct 18, 2008 9:32 am

You can still have the generation stats on phpBB2, if you activate it in acp.

For the record, you can actually beat it since you have forums with more than 38455 topics (http://www.dogs4sale.net/dallas-fort-worth-f29.html), but you'll need to use a bigger limit for the urls, you seem to use the default 5 000 url limit for now, so we cannot see a sitemap with more links than that (bellow the records then ;)).

++
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21391
Joined: Fri Apr 28, 2006 9:03 pm

Largest phpbb-seo Competition - Dogs4Sale.net

Postby Orbits » Sat Oct 18, 2008 10:30 pm

Here are the figures for Dogs4Sale.net hot off the press. I had to limit the sitemap to 35000 posts because anything larger, my hosting providers mySQL server was killing the script (script execution limits).

I've since moved the Sitemap size back down to 20000 posts where I'll probably leave it.

Enjoy!!


Site Name: Dogs4Sale.net
URL : http://www.Dogs4Sale.net
Date : 10/18/2008
Number of urls total : 569718
Number of topics in the biggest forum : 38455
Biggest forum URL : http://www.dogs4sale.net/dallas-fort-worth-f29.html
Biggest sitemap URL : http://www.Dogs4Sale.net/dallas-fort-worth-gf29.xml
Generation time :
Code: Select all
<!-- URL list generated in  82.36701 s ( Mem Usage : 5.98 MB ) - 182 sql - 35001 URLs listed -->
<!--  Output started from cache after 82.39642 s -  sql -->
<!--  Output from cache ended up after 97.40002 s -  sql -->

Generation time once cached :
Code: Select all
<!-- URL list generated in  82.26161 s ( Mem Usage : 5.98 MB ) - 186 sql - 35001 URLs listed -->
<!-- Output started from cache after 82.29127 s -  sql -->
<!-- Output from cache ended up after 90.69520 s -  sql  -->

Gunzip :Off


So!! What do I win!!!! (besides bragging rights) :)
Orbits
 
Posts: 45
Joined: Sat Jun 16, 2007 3:57 am

Postby dcz » Sat Oct 25, 2008 7:54 am

hehe nice indeed :D

Even though 80 sec is way to long for a sitemap generation, you could still deactivate the cache auto regeneration and update it manually like once a day to be sure that no one beside you would wait so long.

I'm wondering if your stats are the good ones for when cache is generated, because the generation time should go down once it's generated, even if it's long to do it.

Have you refreshed the page in between the two page loads to get the stats ?

As well, I cannot check your example (even though I believe it is true), since right now, the limit is 10 000 urls and gunzip seems activated, I get :
Code: Select all
<!-- URL list generated in  16.60613 s ( Mem Usage : 1.72 MB ) - 57 sql - 10001 URLs listed -->

Which is better in cache generation time, but we have no data about when the file is already cached.

Anyway, I'm pretty sure that we can manage to better tune the mod in your case, you should try to increase the SQL limit a bit.
So far you seem to be querying for about 175 items at a time (10 000 url / 57 sql ~= 35 000 url / 182 sql ), you should try go to at least 500 or even 1 000 at a time.
I'm pretty sure it would be faster for this many urls.

Talking about what do you win, this advice on optimizing the mod already is something ;), but as soon as I'll actually see the sitemap in action beating the record (yes this means that you must keep it online at least until I view it), your site will be the one presented in the second post of this thread.

Not much more than for fame, but still, a good backlink ;)

So let's see up to where your server can go, and we'll update the record :D

++
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21391
Joined: Fri Apr 28, 2006 9:03 pm

Postby Orbits » Sat Oct 25, 2008 6:05 pm

ok here ya go:

Code: Select all
<!-- URL list generated in  18.20959 s ( Mem Usage : 5.98 MB ) - 42 sql - 35001 URLs listed -->
<!--  Output started from cache after 0.00177 s -  sql -->
<!--  Output from cache ended up after 9.07569 s -  sql -->


I increased the SQL Query max from 200 to 1000 and the performance was improved quite a bit.

I've left the settings in place so you can take a look:

http://www.dogs4sale.net/dallas-fort-worth-gf29.xml

Now that we have the generation times way down, I think I'll leave it at 35000 max.

My only hope is google doesn't try to download it and timeout for some reason. When I take a look at the Google Webmaster Tools, although it took a few days they now are showing my sitemaps have over 300k posts which it's working through. Will be funny if this latest change actually indexes all my URL's hehehe...
Orbits
 
Posts: 45
Joined: Sat Jun 16, 2007 3:57 am

Postby dcz » Sun Oct 26, 2008 9:33 am

Congrats :D
I just got :
Code: Select all
<!-- URL list generated in  18.86002 s ( Mem Usage : 5.98 MB ) - 42 sql - 35001 URLs listed -->
<!--  Output started from cache after 0.00165 s -  sql -->

<!--  Output from cache ended up after 36.89638 s -  sql -->


For a little over 6mo file :lol:
And note the efficiency of the cache, 0.00165s is all the time taken to start outputting the file, that's how much you could win if your sitemaps where not dynamically generated, all the rest in the cached case is time spent sending the file.
The generation time is a bit long, but still under 20 sec, so here is what I suggest in you case, turn gunzip on and you'll most likely have sitemap generated and sent in less than 30 sec, which makes it really usable by google.

Anyway, this will be our new record, GYM 1.x is so far beating GYM 2.x :D

With a little more powerful server, I'm sure we'll reach 50 000 urls ;)
Useful links :
SEO Forum || SEO Directory || SEO phpBB || Search
____________________

Liens Utiles :
Forum référencement || Annuaire référencement || Référencement phpBB || Recherche
dcz
Admin
Admin
 
Posts: 21391
Joined: Fri Apr 28, 2006 9:03 pm

Postby Orbits » Thu Oct 30, 2008 5:36 pm

dcz wrote:Congrats :D
I just got :
Code: Select all
<!-- URL list generated in  18.86002 s ( Mem Usage : 5.98 MB ) - 42 sql - 35001 URLs listed -->
<!--  Output started from cache after 0.00165 s -  sql -->

<!--  Output from cache ended up after 36.89638 s -  sql -->



Yahoo ! :D
Orbits
 
Posts: 45
Joined: Sat Jun 16, 2007 3:57 am

Next

Return to GYM Sitemaps & RSS

 


  • Related topics
    Replies
    Views
    Last post

Who is online

Users browsing this forum: No registered users and 3 guests