SEO - Blocking Search Engines From Pages
A discussion of how search engines block certain pages so that they are not indexed and catalogued as searchable pages or counted in the search engine rankings.
Contrary to popular belief, the search engine spiders sent out by the major search engines do not have to search everything on a site. You can actually technically keep a search engine spider away from a page by instructed it through a certain robots metatag or a file not to come near the page.
Webmasters can instruct spiders not to crawl certain files or directories through the standard robots.txt file in the root directory of the domain. Additionally, a page can be explicitly excluded from a search engine's database by using a robots meta tag. If for some reason you do not want a search engine spider to crawl a page you do have the means to do so.
When a search engine visits a site, the robots.txt located in the root folder is the first file crawled. The robots.txt file is then parsed, and only pages not disallowed will be crawled. However this is not always fool proof. Search engine spiders have a habit of going away from a page and then coming back and looking at the page a second time later. As a search engine crawler may keep a cached copy of this file, it may on occasion crawl pages a webmaster does not wished crawled.
Pages that most webmasters prefer not be crawled include login specific pages such as shopping carts and user-specific content such as search results from internal searches. Other pages that you might not want crawled, depending on the content might be a guest book that you expect to be filled with spam or a feedback system that is not very flattering to you. It is also a good idea to instruct the spiders not to crawl a page with a lot of animation or flash on it as this can be mistakenly read by a spider as a malfunctioning site.
Chris Angus is a SEO and website promoter, he can be contacted at sales(at)brilliantseo.com
Car Insurance
Mobility Scooters
Asbestos
Webmasters can instruct spiders not to crawl certain files or directories through the standard robots.txt file in the root directory of the domain. Additionally, a page can be explicitly excluded from a search engine's database by using a robots meta tag. If for some reason you do not want a search engine spider to crawl a page you do have the means to do so.
When a search engine visits a site, the robots.txt located in the root folder is the first file crawled. The robots.txt file is then parsed, and only pages not disallowed will be crawled. However this is not always fool proof. Search engine spiders have a habit of going away from a page and then coming back and looking at the page a second time later. As a search engine crawler may keep a cached copy of this file, it may on occasion crawl pages a webmaster does not wished crawled.
Pages that most webmasters prefer not be crawled include login specific pages such as shopping carts and user-specific content such as search results from internal searches. Other pages that you might not want crawled, depending on the content might be a guest book that you expect to be filled with spam or a feedback system that is not very flattering to you. It is also a good idea to instruct the spiders not to crawl a page with a lot of animation or flash on it as this can be mistakenly read by a spider as a malfunctioning site.
Chris Angus is a SEO and website promoter, he can be contacted at sales(at)brilliantseo.com
Car Insurance
Mobility Scooters
Asbestos

Use the feedback form below to submit your comments.

Use the form below to email this article to your friends.

- Do your own SEO Part 1.
- Can Search Engine Optimization (SEO) Increase Website Traffic?
- Search Engine Optimization
- Meta Optimizing Services for Higher Search Engine Rank
- An SEO's Call - Google Help Me Find My Way Home!
- Search Engine Optimization: To Want or To be Wanted!!
- Google Sandbox--How To Cut The Time Your Site Spends In It
- Powerful SEO Secrets
- Search Engine Optimization is a Continuous Process
- The Importance of Latent Semantic Indexing
- Questions to Ask Your Potential Search Engine Optimization Company - Part 2
- The Meteoric Rise and Fall of The Keyword Meta-Tag
- What I have learned (the hard way) about Search Engines
- Search Engine Optimization Guidelines for Web Masters
- SEO: In-house or Outsource?
- Don't Sabotage Your Search Engine Optimization Company
- SEO: Flash Is Evil. Five Big Reasons Not to Use Flash
- Use Google’s Advanced Search Operators to Assist SEO
- SEO for the Non-Profit Organization
- SEO: Its not just about clicks!
- What are Meta Tags
- How Does a Search Engine Work?



