Module for telling search engines to go away...

rWatcher
rWatcher's picture

Joined: 2005-09-06
Posts: 722
Posted: Wed, 2009-06-17 22:47

If anyone is interested I've created a Gallery 3 module which will add a few HTML META tags that will tell search engines to not index/archive you web site.
[img]http://codex.gallery2.org/images/6/67/Nobots.jpg[/img]

It's useful if you're running a personal site and you want to try and keep it from turning up in in random search results. If anyone is interested, the code is attached to this post. Documentation can be found at http://codex.gallery2.org/Gallery3:Modules:nobots

Edit:
It is also now on my github account:
http://github.com/rWatcher/gallery3-contrib/tree/master

---
Report Problems/Suggestions Here | Get latest version | Documentation | Coffee Fund | My Library | My GitHub

AttachmentSize
nobots.zip1.56 KB
 
nivekiam
nivekiam's picture

Joined: 2002-12-10
Posts: 16504
Posted: Thu, 2009-06-18 05:34

why not just add robots.txt to the root of your site?
____________________________________________
Like Gallery? Like the support? Donate now!!! See G2 live here

 
rWatcher
rWatcher's picture

Joined: 2005-09-06
Posts: 722
Posted: Thu, 2009-06-18 07:40

I've got a (cheap) shared hosting account on a larger site so I don't have access to the root domain folder. I suppose I could probably put a robots.txt file in my accounts root folder, but I'm not sure how effective a domain.com/users/USERNAME/robots.txt file would be. Plus it would tell anyone/anything that thought to look for a robots file where the various sub-websites within my account were located at. I've also got phpnuke and mediawiki installed into different sub folders within my account with those same META tags. As far as google is concerned neither of those folders exist so it seems just as effective as a robots.txt file would be.

 
nivekiam
nivekiam's picture

Joined: 2002-12-10
Posts: 16504
Posted: Thu, 2009-06-18 12:44

robots.txt is only read if when it's at the root of the website. www.example.com/robots.txt or photos.example.com/robots.txt not ever at www.example.com/~username/robots.txt or any other sub-directory.

____________________________________________
Like Gallery? Like the support? Donate now!!! See G2 live here

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7994
Posted: Fri, 2009-06-19 01:10

Awesome. I've pulled this into the gallery3-contrib repo!
---
Problems: Check gallery3/var/logs first!
file a bug or feature request | upgrade to the latest code | use git

 
rWatcher
rWatcher's picture

Joined: 2005-09-06
Posts: 722
Posted: Fri, 2009-06-19 04:46
nivekiam wrote:
robots.txt is only read if when it's at the root of the website. www.example.com/robots.txt or photos.example.com/robots.txt not ever at www.example.com/~username/robots.txt or any other sub-directory.

I thought that might be the case, but I wasn't sure. Thanks for the info!

bharat wrote:
Awesome. I've pulled this into the gallery3-contrib repo!

Cool, thanks!

 
sproonz

Joined: 2010-05-23
Posts: 11
Posted: Sun, 2010-05-30 17:36
<meta name="msnbot" content="noindex, nofollow, noarchive, nosnippet, noodp">
<meta name="teoma" content="noindex, nofollow, noarchive">

These two lines lack the end tag '/>' (to be XHTML 1.0 Transitional)

 
rWatcher
rWatcher's picture

Joined: 2005-09-06
Posts: 722
Posted: Mon, 2010-06-14 22:39
sproonz wrote:
<meta name="msnbot" content="noindex, nofollow, noarchive, nosnippet, noodp">
<meta name="teoma" content="noindex, nofollow, noarchive">

These two lines lack the end tag '/>' (to be XHTML 1.0 Transitional)

Fixed, thanks :)

 
pinn8

Joined: 2009-02-19
Posts: 74
Posted: Tue, 2010-08-10 23:50

My G3 installation always has these meta tags, even though I am not using the nobots module. My G3 is up to date -- I pulled from git last night. What can I do to get rid of these entries so that Google and other bots will crawl my site?

http://pinnacle8.com/galleries

 
nivekiam
nivekiam's picture

Joined: 2002-12-10
Posts: 16504
Posted: Wed, 2010-08-11 04:17

Take a look at your theme and see if they are in there.
____________________________________________
Like Gallery? Like the support? Donate now!!! See G2 live here

 
bharat
bharat's picture

Joined: 2002-05-21
Posts: 7994
Posted: Wed, 2010-08-11 05:10

Those tags are baked into the page.html.php file in GreyDragon...
---
Problems? Check gallery3/var/logs
bugs/feature req's | upgrade to the latest code | use git

 
pinn8

Joined: 2009-02-19
Posts: 74
Posted: Wed, 2010-08-11 10:52

Thank you. That was it.

 
vudo

Joined: 2011-03-21
Posts: 2
Posted: Mon, 2011-03-21 16:53
rWatcher wrote:
If anyone is interested I've created a Gallery 3 module which will add a few HTML META tags that will tell search engines to not index/archive you web site.
[img]http://codex.gallery2.org/images/6/67/Nobots.jpg[/img]

It's useful if you're running a personal site and you want to try and keep it from turning up in in random search results. If anyone is interested, the code is attached to this post. Documentation can be found at http://codex.gallery2.org/Gallery3:Modules:nobots

I've used this module and it works great. Now I want search engines index my website. Please tell me how can I do? Thank you very much!

 
undagiga

Joined: 2010-11-26
Posts: 693
Posted: Mon, 2011-03-21 23:57

You could try the XML sitemap module. But note that it takes time for Google to start indexing a site that it was previously excluded from. As I understand it, Google doesn't want to swamp or overload a site, so it will index in stages. Not sure about the other crawlers.

Note also that you need to correct some missing "/" characters in the module. Details in the relevant forum thread.

U-G

 
vudo

Joined: 2011-03-21
Posts: 2
Posted: Tue, 2011-03-22 02:08

Thank you. I'll try it :)

 
rWatcher
rWatcher's picture

Joined: 2005-09-06
Posts: 722
Posted: Tue, 2011-03-22 04:23

The metadescription module might help with that too:
http://codex.gallery2.org/Gallery3:Modules:metadescription

 
undagiga

Joined: 2010-11-26
Posts: 693
Posted: Tue, 2011-03-22 06:30

While I use the metadescription module, I doubt that it has any impact. My understanding is that the major search engines stopped using these fields years ago, especially keywords, because of misuse and abuse. IIRC the description is occasionally used, but not in a way that affects ranking.

U-G