Gallery2 and SEO - Tips and suggestions

netscan

Joined: 2005-07-16
Posts: 39
Posted: Tue, 2005-09-20 00:08

Updated Feb-05-2007

I don't know if all of this has been covered (can't get the new search to work for me) but I have some general suggestions for the SEO (search engine optimizng) of Gallery2.

In no particular order:

1)Title bar

The title of each album should be in the title bar along with the photo title (when viewing photos). This can be accomplished by modifying the <title> tag in the appropriate tpl page.

2)Mod_Rewrite

Use dashes (-) instead of underscores (_), this way older search engines (and some new ones) will treat the url as separate words.

3)index.php

Apply the mod to use index.php instead of main.php. This is very important! Redirects kill rankings (especially 302's). Also, without this your site will never get assinged the appropriate Page Rank from Google. Update: This mod doesn't work anymore so if you are able to, which you most likely are, modify httpd.conf to add main.php before index.php and all should be well with the world once again.

4)Correct error pages

The default error page generates an HTTP 200 response code (OK), the correct codes should be used or the spiders will just keep indexing the error page. Update: This was fixed in the Gallery code, I believe, correct me if I'm wrong.

5)Duplicate content

Having multiple sizes of an image is great, but to a spider, it looks like copies of the same page. It's best to modify the tpl's to load the full size images in a new window, without the template includes. Duplicate content penalties are common and VERY hard to fix.

6)Album title when viewing photo's

Having the album title on the page with the photo title will GREATLY increase keyword density and weighting.

7)<h1> and <h2> tags

Adding the <h1> tag and formatting to the CSS and using it for the album and picture titles when viewing albums and photos will add to keyword weight. May help, may not but it's worth a shot.

8)Slideshows and other "features"

See duplicate content above.

9)File and album naming

Prepare a naming convention and stick to it throughout your gallery. Photo and album titles should be clear and concise. "Won't someone please think of the spiders?"
Example: Album name: Album-One Album title: Album One Photo name: Photo-001 etc.

10)Preparation is better than perspiration

If you're going to do any of this, do it BEFORE you make your site available to the outside world. Once the search engines latch on to something, it's difficult to make them let go. Especially if it's wrong. This is important so that you don't end up with a bunch of non-existent pages in the search engines when you change the names 20 times.

There is probably more that I'm missing here, but that covers the most important that I can think of and have done to my site to aid the little spiders in moving around my gallery.

If you have any to add or disagree with any of the above, please post and explain your position.

Added:

I've tried to follow this thread for the last year or so, but for some reason it only emails me updates when it feels like it heh, so here goes some Q&A

Question: Index.php always showing instead of mydomain.com/
Answer: There was a change to the Gallery code that ruled out the hack mentioned in the first itteration of this post, now your best bet is to modify httpd.conf to make main.php appear before index.php. This will not affect the way any other pages load, just the directories that contain a main.php.

Question: Do you have a list of changes, or a patch of some sort to implement?
Answer: No, you have to do what is best for your site. Most of the changes are made to individual .tpl files and are quite trivial. But I will add some examples to the end of this post to help out. (And no, Coppermine is not better).

Question: Do bots pick up a session ID when browsing my site?
Answer: Yes and no. There is a built in function to prevent bots from being issued sessions IDs, which fails as soon as certain links are followed, register and login come to mind. Prevent bots from parsing these types of links. Browse around your site as if you were a guest and make note of any links that cause the address to get information appended to the end and then track down what you can do to prevent it (disable registering or at least hard link it with a nofollow).

Question: Is mysite.com/mygallery/picture1.jpg.html?higlightid=3295 the same as mysite.com/mygallery/picture1.jpg.html?
Answer: No. Google will see them as two separate addresses pointing to the same content and label one as duplicate, perhaps both.

Question: If Google picked up some pages that I don't want there anymore (mysite.com/mygallery/picture1.jpg.html?higlightid=3295) can I use the removal to to get Google to drop the listing?
Answer: Yes, but it may take the rest of the site with it. But, the tool only works for 180 days (exactly) then the page will re-appear. Your best bet is to 404 that page until Google forgets about it.

Some quick examples:

And for crying out loud ---- MAKE A BACKUP!!----

==HighlightId==

Open modules/core/templates/blocks/BreadCrumb.tpl

Remove arg3="highlightId=`$theme.parents[parent.index_next].id`"

and

Remove "arg3="highlightId=`$theme.item.id`"

==Picture name in the resize link==

Open modules/core/templates/blocks/PhotoSizes.tpl

Find this block

      <a href="{g->url arg1="view=core.ShowItem" arg2="itemId=`$theme.item.id`"
	arg3="imageViewsIndex=`$theme.sourceImageViewIndex`"}">
	 {$smarty.capture.fullSize}
       </a>

Add {$theme.item.title|markup} thusly:

      <a href="{g->url arg1="view=core.ShowItem" arg2="itemId=`$theme.item.id`"
	arg3="imageViewsIndex=`$theme.sourceImageViewIndex`"}">{$theme.item.title|markup}
	 {$smarty.capture.fullSize}
       </a>

Add some text to make it even spiffier, and make it open in a new window without a template like this:

        <a href="{g->url arg1="view=core.DownloadItem" arg2="itemId=`$theme.item.id`"
        }" target="_blank">View {$theme.item.title|markup} full size
        {$smarty.capture.fullSize}
       </a>

==Make links out of thumbnail titles==

(This is for Matrix, poke around other themes to see how they handle titles)

Open album.tpl

Find


		{if !empty($child.title)}
		<p class="giTitle">
		  {if $child.canContainChildren}
		  {g->text text="Album: %s" arg1=$child.title|markup}
		  {else}
		  {$child.title|markup}
		  {/if}
		</p>
		{/if}

Change to:


		{if !empty($child.title)}
		<p class="giTitle">
		  {if $child.canContainChildren}
               <a href="{g->url arg1="view=core.ShowItem" arg2="itemId=`$child.id`"}">
		  {g->text text="Album: %s" arg1=$child.title|markup}
		  {else}
               <a href="{g->url arg1="view=core.ShowItem" arg2="itemId=`$child.id`"}">
		  {$child.title|markup}</a>
		  {/if}
		</p>
		{/if}

Just your standard <a href=""></a> and the 'ol {g->url}, pretty easy to make it work with other themes.

---Hint, take out Album: and you'll feel better about yourself.

==Peer List==

One thing that annoys me is seeing Snowypicture#13.jpg reduced to sno...13.jpg, so change it!

Open modules/core/templates/blocks/PeerList.tpl

Find entitytruncate:14 (there are 2)

Change the 14 to however many characters you think your file names will grow to.

========

Again, if you see any mistakes, please point them out. I'll continue to keep an eye on the thread and update as necessary. Good luck!

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Tue, 2005-09-20 00:27

Great, thanks for the contribution. Added it to the how to's page:
http://codex.gallery2.org/index.php/Gallery2:How_Tos

maybe we can get some of the optimizations into G2. but certainly not all, since there are conflicting goals. features vs. duplicate content.
@index.php:
now that we have released g2.0, it will be difficult to change from main.php to index.php without screwing up a lot of links (of course we'd leave main.php as a redirect to index.php), but still.
users can change their webserver configuration to use main.php as the default file, then there won't be a index.php -> main.php redirect. and it's hard to believe that there's a huge penalty for this single redirect.

 
netscan

Joined: 2005-07-16
Posts: 39
Posted: Tue, 2005-09-20 00:49

Redirects are death in Google. It sees the 302 as spam, even if it's just a naming thing. Their algo just doesn't know how to distinguish, yet anyway. It would be extremely beneficial to make index.php the default, especially for less experienced webmasters. PageRank is also url dependant, which means if all your inbound links point to http:\\yourdomain.com and it redirects to http:\\yourdomain.com\main.php, the PR won't fully transfer, if at all.

And thanks for the kudos, always happy to help, especially when you guys have given so much!

 
bmsstore

Joined: 2005-10-04
Posts: 3
Posted: Tue, 2005-10-18 03:55

you have the worst technical support ever! no instruction on how to do any this is a joke! no wonder coppper is better!

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Tue, 2005-10-18 08:45

bmsstore
we love you too. i'm sure you'll be proud of your rants when you find them in a few years with google :)

@topic:
most of the above SEO recommendations could be hacked into G2. a diff / detailed instructions / a patch would speed up getting changes into G2. so if someone has implemented some of those recommendations, why not share?
else: feel free to file a feature request...

 
RwD
RwD's picture

Joined: 2005-01-09
Posts: 383
Posted: Tue, 2005-10-18 09:14

I'm not getting it. Where does it say coppermine is better? A lot of work went into coppermine I am sure. And a lot of work went into gallery2 but I haven't seen a comparison myself yet.

no wonder copper sucks :P

On topic. I am trying to keep my website out of the hands of google's search. Using robot.txt I am doing fine, but some safeguards won't hurt. So leave the bad ranking features in ;)
______________________
I made a theme for G2, try it :)

 
stephen_3

Joined: 2005-10-18
Posts: 4
Posted: Tue, 2005-10-18 09:26

I really like G2 - it looks really good intergrated into my wordpress install - but search engine friendly its not. All of the points in the first post are valid and easily changed, but as long as the URL contains the g2_GALLERYSID part its never going to be very search engine friendly. I've spent the best part of today trying to find a way to remove it, but my lack of php ability really let me down.

If anyone has a way to remove the g2_GALLERYSID from the URL - please post it, I don't mind an ugly hack or even a suggestion of where to start.

I know this post sounds a bit down, but really its not I think G2 is great - its just that I'm facing having to move back to a sub standard solution if I can't get this sorted.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Tue, 2005-10-18 09:42

stephen_3
search engines don't get the g2_GALLERYSID part in the URLs, G2 detects google, yahoo etc.
besides: you get the g2_GALLERYSID in your URL only on your first page if you haven't already an open session in g2 or on all pages if you have disabled cookies. and again: yes, even if search engines don't support cookies, they don't get the g2_GALLERYSID part in the urls.

 
RwD
RwD's picture

Joined: 2005-01-09
Posts: 383
Posted: Tue, 2005-10-18 10:17

valiant,
Wasn't google one of those that gets kinda upset when you treat their bot differently form other users?

stephen_3,
Some people seem to have cookies disabled. The standard session id method in PHP is that the session id variable is empty when cookies are enabled. You getting them means you should allow cookies to get ris of them. (valiants reply in different words)
______________________
I made a theme for G2, try it :)

 
stephen_3

Joined: 2005-10-18
Posts: 4
Posted: Tue, 2005-10-18 10:25

Valiant thankyou for your reply - although Im not sure if its correct. Try the search allinurl:g2_GALLERYSID in google and it finds 729,000 pages that have been spidered with the session id in them and not just the front page, whole galleries that have been spidered with the session id.
Do you know whether these are from a previous release or a random bug?
I also checked this site and found that your pages have been spidered without the session ids, so il take your word for it.:)

Im watching my logs, one of the major crawlers should hit the new galleries in the next few days and il post back with whether its crawling with session ids or not.

Thanks for the help.

Stephen.

 
stephen_3

Joined: 2005-10-18
Posts: 4
Posted: Tue, 2005-10-18 10:32

Thanks RwD, I have cookies enabled so that I don't get the session id, the problem is that I don't believe that spiders allow you to set cookies and therefore they get the the session id. I have no problem with my users or myself getting sessions, but i don't want google spidering the same pages multiple times with different session ids.

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Tue, 2005-10-18 10:53

stephen_3
the allinurl:g2_GALLERYSID search is interesting.
there are even 2 pages of my own website that match this criterion.
on the other hand, google finds a lot more gallery pages of my website if i remove the allinurl criterion.

i don't know yet what could have caused this. seeing your server access logs will be interesting.

 
robert070612

Joined: 2003-08-05
Posts: 565
Posted: Tue, 2005-10-18 19:09

Rwd---- appropos your comments about keeping robots out... use dynamic <robots.txt> files.

* The static form of the <robots.txt> file is often the wrong flavour, it works the wrong way around. It automatically and indiscriminately lets in every visiting spyder - without fail - every time - all the time. If you've done your homework you will have coded your <robots.txt> file to include all the places that are inappropriate for indexation. So not only does the visiting spyder know that you're letting absolutely everybody and everything into your site but you've also automatically informed them as to the whereabouts of stuff you'd prefer not to be broadcast. It's all a bit dissatisfying. Wandering through your access logs, as you do, can often leave you with a bad taste in the mouth when you check up the wherewithall of the latest visiting spyder;~/ This usually means then chasing through a myriad of sub-directories rewriting all the .htaccess files - as appropriate. There's NOTHING like KNOWING that you're destined for sessions of repetitive stable door closing to increase your self-esteem!

* After some overlong experiences of spyder hunting/bashing/blocking I turned things around. I coded my site to work the logic the other way. My sites automatically reject new spyders without fail - every time - all the time. They do this without revealing all the sub-directories that should not be spydered even by white-listed spyders. White-listed spyders are provided with appropriate list of disallowed sub-directories. Thus these robots.txt files can be considered as being dynamic.

* Briefly - use PHP to pretend to run a 'normal' robots.txt but whose content is dynamically rendered.

* In detail - amend your existing .htaccess and robots.txt files thus:

[.htaccess]
<Files robots.txt>
     ForceType application/x-httpd-php
</Files>
[robots.txt]
<?php
	$disallow=array("/noindex", "/html", "/assets", "/private", "/cgi-bin", "/pdf-noindex");
	$disallow_none = TRUE;
	$agent = 0;
	header('Content-type: text/plain');
	$robots = array("Yahoo! Slurp", "ConveraCrawler", "Googlebot", "Googlebot-Image", "http://www.almaden.ibm.com/cs/crawler", 
	"ia_archiver", "IRLbot", "JemmaTheTourist", "Knowledge.com", "MJ12bot", "MojeekBot", "Mozdex", "msnbot", "NutchOrg", 
	"pipeLiner", "psbot", "searchengineworld", "snap.com", "Speedy Spider", "SurveyBot", "Teoma");
	foreach($robots as $rob) {
		$pos = substr_count($_SERVER['HTTP_USER_AGENT'], $rob);
		if ($pos) {
				if (!$agent) {
				echo "User-agent: *\n";
					if (in_array($rob, $robots)) {
						foreach($disallow as $dis) {
							echo "Disallow: $dis\n";
						}
					$agent++;
					}
				}
		}
	}
	if ($agent == 0) {
		if ($disallow_none) {
			echo "User-agent: *\n";
			echo "Disallow: /\n";
		}
	}
?>

The above <robots.txt> 'file' whitelists a sample bunch of fairly well known spyders. It provides them with the expected list of disallowed sub-directories. All other spyders get a very short file that disallows everything. The onus is on you to check the likely provenance of newcomers for possible white-listing. BTW they ALWAYS come back for another sniff around no matter what...

----best wishes, Robert

 
RwD
RwD's picture

Joined: 2005-01-09
Posts: 383
Posted: Tue, 2005-10-18 20:42

I was about to go to sleep, caught a flu. Not realy comprehending what you mean...

In any case, this is my robots.txt

User-agent: *
Disallow: /

I don't want anything to show up on searches, I do not reveal any info even though not every piece has te stay de-indexed. Is this a bad method?? Everything I want to keep from the spiders is actually indeed within a subdir :P

I just want to keep the stuff from the search, if I wanted it to be unfindable I would not put it online ;)
______________________
I made a theme for G2, try it :)

 
robert070612

Joined: 2003-08-05
Posts: 565
Posted: Tue, 2005-10-18 20:57

Your file says "to all robots, disallow everything in the root and beyond (except for robots.txt of course)". Properly programmed robots will abide by this and not index anything on your site. Good robots will even de-index everything previously indexed... I have accidently discovered this to my cost with Google after innocently 'protecting' a single blogging page with noindex/nofollow on the actual page. Google promptly delisted my entire site and page rating. Oh and FWIW stay away from migratory birds and chickens!!!

----best wishes, Robert

 
robert070612

Joined: 2003-08-05
Posts: 565
Posted: Tue, 2005-10-18 21:04

My flavour of programming is for sites that want particular sections of their site to be properly listed in some nominated search engines but to portray an uncompromising 'go away' attitude to all other and any brand new visiting spyders (until investigated) without incidently highlighting your nominated areas of interest. It means you don't have to be on perpetual lookout for dubious spyders. It is NOT a strategy for all site administrators;~)

----best wishes, Robert

 
robert070612

Joined: 2003-08-05
Posts: 565
Posted: Tue, 2005-10-18 21:24

Maybe this will clarify things...

1) whitelisted spyder example
googlebot visits and requests the robots.txt file
my site recognises its (whitelisted) signature and
dynamically responds with encouraging content
to index everything but the following sub-directories...
User-agent: *
Disallow: /noindex
Disallow: /html
Disallow: /assets
Disallow: /private
Disallow: /cgi-bin
Disallow: /pdf-noindex

2) unlisted/blacklisted spyder example
evilharvestor visits and requests the robots.txt file
my site does not recognise this spyder and
dynamically responds with discouraging content
to index nothing at all...
User-agent: *
Disallow: /

That's it... automatically;~)

----best wishes, Robert

 
RwD
RwD's picture

Joined: 2005-01-09
Posts: 383
Posted: Wed, 2005-10-19 11:11
icpix wrote:
Oh and FWIW stay away from migratory birds and chickens!!!

Better a normal flu the this birds version. I heard it is pretty deadly. I just hope it passes over. Negative projections say hundreds of milions of people will die from it within a short time. Luckily those are the negative projections :P

 
stephen_3

Joined: 2005-10-18
Posts: 4
Posted: Mon, 2005-10-24 00:02

Valiant,
Google just spidered my gallery and as you said it wasn't served session id's - which is nice. Why some site's have session ids in their indexed URL's? - Im wondering how g2 identifies spiders - if its by ip then its feasible that google have added an extra range, if its by user agent could it be that google has spidered without it?
Now Im waiting for Yahoo and msn to spider to see how their handled.

 
jik

Joined: 2005-10-20
Posts: 3
Posted: Mon, 2005-10-24 12:07

Hi, looks like you are real SEO-guru, can you suggest how to solve most visible SEO problems for G2 v 2.0.1?

Maybe some mini-HOWTO or patch to help?
I would really apreciate having it as good as my static pages, but with G2 running behind. Can you help?

 
valiant

Joined: 2003-01-04
Posts: 32509
Posted: Mon, 2005-10-24 12:28

stephen_3:
from modules/core/classes/GallerySession.class


    /**
     * Return the id of the search engine currently crawling the site by
     * analyzing the current request.
     *
     * @return string the crawler id, or null if it's a regular user
     */
    function identifySearchEngine() {
	if (!isset($_SERVER['HTTP_USER_AGENT'])) {
	    return null;
	}
	$userAgent = $_SERVER['HTTP_USER_AGENT'];
	if (strstr($userAgent, 'Google')) {
	    return 'google';
	} else if (strstr($userAgent, 'Yahoo')) {
	    return 'yahoo';
	} else if (strstr($userAgent, 'Ask Jeeves')) {
	    return 'askjeeves';
	} else if (strstr($userAgent, 'msnbot')) {
	    return 'microsoft';
	}

	return null;
    }

so no, we don't identify by IP, we parse the userAgent string for google, yahoo, etc.

 
Vassa

Joined: 2005-10-25
Posts: 1
Posted: Tue, 2005-10-25 00:37

Hi guys, G2 is absolutely great app, the best piece of software on the net. lost fo kudos to you all.

SEO question, is there ANY simple way to use index.html instead of silly main.php?
also, the nice URL Rewriter seem to work only if you look on the same page more than once, or if you refresh it.....
Very sad. Is it bug to be fixed soon or did I made something wrong ???

 
nivekiam
nivekiam's picture

Joined: 2002-12-10
Posts: 16504
Posted: Tue, 2005-10-25 13:31

Yes there is: http://gallery.menalto.com/node/32349?highlight=index%2Cmain :)

For your URL Rewrite issue, please start a new topic and include the information requested (you'll see that when you start a new topic). You shouldn't even need to look at a page for the URL Rewrite module to work. If it's an album you see an error with and you are manually entering the URL, you need to put a trailing slash on the URL (www.example.com/v/G2album/ not www.example.com/v/G2album)

I'm not a rewrite rule expert, but if you start a new topic with your problem, the URL Rewrite module developer will probably be able to help you. Be sure to be clear in your subject of what your problem is "Gallery doesn't work" doesn't help :)
____________________________________________
Like Gallery? Like the support? Donate now!!! See G2 live here

 
ryooki

Joined: 2005-11-03
Posts: 15
Posted: Sat, 2005-11-19 20:51

I have a question about 5) if I can see the duplicate content (large images), but no one else is allowed to. Will that affect that? On the same note, do spiders only crawl along public / anonymous user content? Can they even attempt to index content that's restricted?

 
nivekiam
nivekiam's picture

Joined: 2002-12-10
Posts: 16504
Posted: Sat, 2005-11-19 21:07

Spiders cannot crawl content they cannot access. So if they don't have the username and password to login and crawl the site unrestricted or as that user, then they can't see the content. Example, if you make it so users have to login to see any content a search spider will never know of any of the content you have in your gallery.
____________________________________________
Like Gallery? Like the support? Donate now!!! See G2 live here

 
netscan

Joined: 2005-07-16
Posts: 39
Posted: Sat, 2005-11-19 21:20
ryooki wrote:
I have a question about 5) if I can see the duplicate content (large images), but no one else is allowed to. Will that affect that? On the same note, do spiders only crawl along public / anonymous user content? Can they even attempt to index content that's restricted?

No, only publicly accessible content can be spidered. If you are the only one able to access the full image then the spider will not be able to crawl and index it.

 
ryooki

Joined: 2005-11-03
Posts: 15
Posted: Sat, 2005-11-19 21:55

Thanks for answering so quickly. :)

 
Continental

Joined: 2004-06-14
Posts: 243
Posted: Mon, 2005-12-05 18:06

@index.php
solved. thanx for define

@slideshow
solved. in robots.txt

@fullsize
/v/architecture/London_High_Building.html?g2_imageViewsIndex=1
this construction I don't know to to solve. Can't put in Robots.txt in Disallow.
I don't want full size image to be spidered, but it should be available for guests.
What would you recommend?

 
suydam

Joined: 2005-12-27
Posts: 10
Posted: Tue, 2005-12-27 14:41

Continental: How did you solve the slideshow problem in your robots.txt file?

 
robert070612

Joined: 2003-08-05
Posts: 565
Posted: Tue, 2005-12-27 15:26

By increments.

1) amended my dynamic robots.txt file to include...
Disallow: slideshow.html

That should've done it but Googlebot and Googlebot-Image are still parsing shedloads of slideshow URLs.

2) amended /modules/slideshow/templates/local/Header.tpl to show...
</title>
{* -------------------------------------------------- *}
{literal}<META NAME="ROBOTS" CONTENT="NOINDEX">{/literal}
{* -------------------------------------------------- *}
<script type="text/JavaScript">
... but they still keep coming;~/

3) The above robots.txt stuff should've been interpreted as any URL including the character string of slideshow.html but the google family steadfastly have been ignoring this and, indeed, the per iteration NOINDEX declaration. Mystifying. So, by way of experiment, I have amended my dynamic robots.txt file to show...
Disallow: /slideshow.html
...despite this apparently having a meaning of don't index any files called slideshow.html in the root.

Yes, I could use * wildcards and end of line markers $ as per here but I cannot rely on other less involving search engines to cooperate. So, for a few more days, I will experiment with the above amendment. Will update this thread with my results.

 
dmolavi
dmolavi's picture

Joined: 2002-12-05
Posts: 573
Posted: Wed, 2006-03-29 20:26
valiant wrote:
bmsstore
we love you too. i'm sure you'll be proud of your rants when you find them in a few years with google :)

Done and done :)
http://72.14.203.104/search?q=cache:-D2X2R6si2kJ:gallery.menalto.com/node/36854+bmsstore+site:menalto.com&hl=en&gl=us&ct=clnk&cd=1

 
eosguy

Joined: 2005-07-27
Posts: 38
Posted: Mon, 2006-04-03 01:00

I've been reading this thread and still don't quite know what is the best way to get Google to index all the pages in a Gallery powered site. Does anyone have a properly indexed Gallery site? Smugmug users seem to have all their pages indexed by Google, though it's a completely different gallery engine.
-----
Photography gallery at http://www.kennyyeoh.com

 
dmolavi
dmolavi's picture

Joined: 2002-12-05
Posts: 573
Posted: Sun, 2006-04-09 23:57
 
eosguy

Joined: 2005-07-27
Posts: 38
Posted: Wed, 2006-04-12 01:47

Google has only indexed my old pages. My site has been revamped and some pages are now gone, and there are many more new pages. Anyway to force Googlebot to reindex my site? :(
-----
Photography gallery at http://www.kennyyeoh.com

 
netscan

Joined: 2005-07-16
Posts: 39
Posted: Wed, 2006-04-12 21:07
eosguy wrote:
Google has only indexed my old pages. My site has been revamped and some pages are now gone, and there are many more new pages. Anyway to force Googlebot to reindex my site? :(
-----

This is a good reason to restrict access to your site until you are completely done with it. Once Google has it, it's pretty much stuck in there, only thing you can do is 301 to the new pages and hope the bot takes the bait.

Also make sure the pages that aren't there anymore are returning 404's instead of 200's otherwise GoogleBot will think the pages are still there.

 
forumposters

Joined: 2006-08-12
Posts: 7
Posted: Sat, 2006-09-23 21:14

I don't see how that robots.txt file stops Google from trying to index the slideshow.

 
robert070612

Joined: 2003-08-05
Posts: 565
Posted: Sat, 2006-09-23 21:32

forumposters-----
What or which file? Whatever, there's a better way to
stop any robot. Use the URL rewrite module and switch
the position of the 'slideshow' element of the path
from the end to the front...
slideshow/%path%/slideshowapplet.html
...then the robots' exclusion always works:
Disallow: /slideshow
----best wishes, Robert

 
hollyonline

Joined: 2006-09-13
Posts: 59
Posted: Tue, 2006-09-26 00:01

When you delete an album or some pictures which Google has already picked up on and indexed, it (google) will keep the page linked and gallery returns an error when you go to it. Is there any way of overcoming this?

---
Robert Hollingworth
www.roberthollingworth.co.uk

 
hollyonline

Joined: 2006-09-13
Posts: 59
Posted: Tue, 2006-09-26 00:10

Oh, another thought, I guess from what has been said here that Google won't exactly love permalinks from the point of duplication???

---
Robert Hollingworth
www.roberthollingworth.co.uk

 
ichthyous

Joined: 2006-06-16
Posts: 324
Posted: Mon, 2007-02-12 23:55
Quote:
When you delete an album or some pictures which Google has already picked up on and indexed, it (google) will keep the page linked and gallery returns an error when you go to it. Is there any way of overcoming this?

Google retains all copies of pages it's indexed for quite a while...sometimes up to a year. The next time Googlebot comes to spider your pages it will get the 404 (Page not found) error from Gallery app or from your server and then the page will go supplemental (will have the word "supplemental" in green under the listing) Supplemental pages are out of date pages or pages Google considers to be duplicate content and perhaps suspect as SPAM. You'll have to wait until Google finds the new page and indexes that and it will appears in google's index of your site. You can help things along by issuing a 301 redirect or every page that's been moved and that will tell all the search engines where to find the new page immediately. 301 redirects require specific knowledge of Apache mod_alias and mod_rewrite rules though

Andrew
New York Photography
Washington DC Photography

 
ichthyous

Joined: 2006-06-16
Posts: 324
Posted: Sat, 2006-10-07 13:47
Quote:
Oh, another thought, I guess from what has been said here that Google won't exactly love permalinks from the point of duplication???

You are correct. Google will index both directories and find duplicate pages, then one or both sets are lkely to go supplemental. I turned permalinks off for my site and use custom 301 redirects when content moves. The whole time my site was being built google was indexing it, even though I had a noindex tag in robots.txt. When google updated it's index there were hundreds of broken urls as I had changed the url structure on my site several times. It took a lot of time to clear it up as custom 301 redirects had to be written to match every broken URl pattern and rewrite then to the current URL pattern.

Andrew
New York Photography

 
Dayo

Joined: 2005-11-04
Posts: 1642
Posted: Sat, 2006-10-07 15:05
hollyonline wrote:
When you delete an album or some pictures which Google has already picked up on and indexed, it (google) will keep the page linked and gallery returns an error when you go to it. Is there any way of overcoming this?

I think the webmasters tool on google has a facility somewhere to remove content.

.
Gallery Version: 2.1.2
Gallery Theme: PGTheme 1.1.0 (RC01)
Web Site: http://dakanji.com

 
netscan

Joined: 2005-07-16
Posts: 39
Posted: Sat, 2006-10-07 15:38
Dayo wrote:
hollyonline wrote:
When you delete an album or some pictures which Google has already picked up on and indexed, it (google) will keep the page linked and gallery returns an error when you go to it. Is there any way of overcoming this?

I think the webmasters tool on google has a facility somewhere to remove content.

.
Gallery Version: 2.1.2
Gallery Theme: PGTheme 1.1.0 (RC01)
Web Site: http://dakanji.com

Yeah, but it doesn't actually remove it. It just hides it for 180 days, then it pops back up.

You can hose the entire site with that so your best bet is just wait for googlebot to figure it out.

 
ichthyous

Joined: 2006-06-16
Posts: 324
Posted: Fri, 2008-01-25 23:15

I have made quite a few modifications to my site to help search engine compatibility. Here is a short rundown of the changes I made:

1) The biggest one...use rewrite module to create keyword rich URLS
2) Removed ?g2_enter=0,1 from all album thumbnails
3) Removed highlight ID's from the breadcrumb links...this reduces usability as the user will be 4) returned to the first page of the gallery and not the last page viewed
4) Banned bots access to all admin functions such as login links, contact ownwer links, search module, etc. You can do this by adding an exclusion line in your robots.txt file
5) Added linkable album titles under each album thumb
6) Added added meta tag keywords to front end of each page
7) Removed all slideshows - not everyone will like that, so if you need a slideshow then just make sure to exclude slideshow urls in your robots.txt file
8) Removed all image resizes

This short list of changes will help greatly in SEO of your pages. I have yet to figure out how to implement one final change which would help greatly, namely to change the base URI in gallery to just the domain name and not domain name+index.php or main.php. You can see the various customizations at my website below.

Andrew
Photos of New York City
Photos of Times Square, NYC
Central Park photos
Photos of Washington DC

 
blacksburgpoker

Joined: 2005-11-26
Posts: 108
Posted: Thu, 2006-11-16 00:20

ichthyous,

What do you mean by your change number "2"

==================================
Website Design and SEO | Poker Picture Gallery | Free Desktop Wallpapers

 
joe7rocks
joe7rocks's picture

Joined: 2004-10-07
Posts: 560
Posted: Wed, 2006-12-20 10:51

index.php won't be shown if that's the default what apache is looking for.
or are you talking about the link in breadcrumb?

My Gallery 2: http://gallery.site.hu

 
ichthyous

Joined: 2006-06-16
Posts: 324
Posted: Mon, 2007-02-05 23:25

I have index.php set as the default for my site, but that's not really what I meant. Gallery app itself uses index.php when it generates the urls. I had to change the actual code to remove that. It's not something that is inherently bad, but since all my incoming links pointed to my domain I didn't want all the internal links pointing to index.php. I had posted about duplicate content issues with Google and slideshows...the new exlide flash slideshow module should take care of that...if you can get it to work.

My Photos of Spain Gallery
My Photos of Italy Gallery
My Photos of Greece Gallery

 
joe7rocks
joe7rocks's picture

Joined: 2004-10-07
Posts: 560
Posted: Mon, 2007-02-05 23:41

Do you really think that there is an issue with duplicate content & slideshows?

1. I don't think so, google should not be that dumb
2. You can still disallow tracking slideshows for spiders in robots.txt, as that's a cpu intensive task anyway. (there is a topic about this, and also mentioned under performance hints)

linkfelhő | My gallery with dogs

 
netscan

Joined: 2005-07-16
Posts: 39
Posted: Tue, 2007-02-06 00:13

Yes, GoogleBot is that dumb.

 
netscan

Joined: 2005-07-16
Posts: 39
Posted: Tue, 2007-02-06 00:22

Here's a bit of an update for the index.php bit...

Set apache to look for main.php before index.php - poof, no more re-directs.

ie:

httpd.conf:

DirectoryIndex main.php index.php index.html blah.html and so on

 
ichthyous

Joined: 2006-06-16
Posts: 324
Posted: Fri, 2008-01-25 23:16
Quote:
Do you really think that there is an issue with duplicate content & slideshows?

Yes, these days Google is very touchy about dupe content so why risk it? Flash slideshows are much prettier and no dupe content issues at all. Once you start seeing a good number of pages going supplemental it's very hard and time consuming to track it and turn it around.

Netscan, I'm not sure I really follow your last post. I don't have any main.php/index.php redirect as I have swapped the files. I did a lot of editing in one of the class files to remove the index.php entirely from my links. All of my internal links point to just the domain name.