How Do You Define Duplicate Content And Does It Really Matter?
People have been arguing over the topic of duplicate content for a considerable time now, but precisely what is defined as duplicate content and does having it really matter?
The argument about precisely what duplicate content is and whether duplicate content is a problem has been going on for some time now and there is little sign that it is going to go away. So precisely how do we define duplicate content and does it matter?
It is widely thought that duplicate content is an important matter and, although one highly respected search engine optimization expert recently wrote an article opposing this view, even a quick look at the mountain of material which has been published on this topic recently will clearly show that this is a minority opinion.
If we agree that duplicate content does matter, then how should we go about defining duplicate content? For example, if I write an article for submission to an article directory and then alter that same article for submission to a second article directory how will the search engines check these two articles and decide whether they contain duplicate content? The simple truth is that we don't know, however, here are one publisher's thoughts.
When duplicate content checking was initially undertaken by the search engines it was a simple case of comparing one web page as a whole with another and there was no attempt to start dividing up the two pages and comparing individual page elements. Back then you could use identical content and simply add an introduction and conclusion to one of the two pages and that would be sufficient to fly under the duplicate content radar. Sadly for many webmasters these days have long since disappeared.
The search engines now divide up the two pages to permit them to look at individual elements and it is this which is the core of today's argument. It is generally agreed that attention is largely restricted to the central content of a page rather than the structure of the page. A large number of webmasters use templates for their pages which define the structure of each page including things like headers, footers and navigation menus. This is widely believed to be accepted and the search engines do not class this as duplicate content. What the search engines are concerned about is the main content contained within the body of the page. But just how do they go about examining this page content?
Some people argue that this checking is undertaken at 'block' level (in other words at the level of individual paragraphs or sentences), but other people believe that filters look for phrases or even individual words. Noone really knows of course but it might seem reasonable to assume that the most likely basis of examination would be to use either sentence or phrase matching.
Sentence matching is quite clear-cut and simply means cutting both pages up into chunks determined by the punctuation of the page. For example, take a look at this sentence:
It is relatively simple to get a good deal on a hi-fi, providing you know what to look for.
This would either be viewed as one single sentence or as two sentences, depending upon whether you use the traditional definition of a full-stop as indicating the end of a sentence or adopt a flexible approach and use other punctuation marks, like commas.
Matching at the phrase level is a bit more complex. What is a phrase? Should a phrase consist of 2 words or 3 words or 4 words or…?
Let's assume for now that a phrase is defined as 3 words. If this were the case the following phrases would be viewed as duplicate content if they appeared on two pages which were being compared:
In those days
The answer is
You can get
Day to day
Take a look
One way to
Did you know
At that time
In the end
These phrases are all ordinary everyday phrases which could be used on pages about breeding ferrets, learning to swim, pay per click advertising or anything else you can think of. Now some people would say that the search engines do examine pages down to this level. For example, when I questioned the support staff for one particular duplicate checker (Dupecop) about the basis on which they examined duplicate content they said:
"DupeCop compares both individual words and 3-word phrases. It also ignores all punctuation and scans across sentences"
It was not a surprise therefore that when I Your guess would be as good as mine.
Over the years I have written hundreds of articles and closely watched the results in terms of duplicate content penalties, as far as it is possible for anyone to do so. Based upon my own experience I am content that filtering is not carried out down to the level of 3 or 4 word phrases but almost certainly ends at the sentence level. So, providing you are re-writing articles down to this level, you should have no problem in escaping the duplicate filters. In actual fact, even if a couple of sentences are duplicated you ought to be okay.
WebMarketingCentre.com provides information on article writing and article submission and is also an article directory where you can pick up free articles for your website or ezine and to which you can submit articles on a wide variety of topics including writing articles and much more.
It is widely thought that duplicate content is an important matter and, although one highly respected search engine optimization expert recently wrote an article opposing this view, even a quick look at the mountain of material which has been published on this topic recently will clearly show that this is a minority opinion.
If we agree that duplicate content does matter, then how should we go about defining duplicate content? For example, if I write an article for submission to an article directory and then alter that same article for submission to a second article directory how will the search engines check these two articles and decide whether they contain duplicate content? The simple truth is that we don't know, however, here are one publisher's thoughts.
When duplicate content checking was initially undertaken by the search engines it was a simple case of comparing one web page as a whole with another and there was no attempt to start dividing up the two pages and comparing individual page elements. Back then you could use identical content and simply add an introduction and conclusion to one of the two pages and that would be sufficient to fly under the duplicate content radar. Sadly for many webmasters these days have long since disappeared.
The search engines now divide up the two pages to permit them to look at individual elements and it is this which is the core of today's argument. It is generally agreed that attention is largely restricted to the central content of a page rather than the structure of the page. A large number of webmasters use templates for their pages which define the structure of each page including things like headers, footers and navigation menus. This is widely believed to be accepted and the search engines do not class this as duplicate content. What the search engines are concerned about is the main content contained within the body of the page. But just how do they go about examining this page content?
Some people argue that this checking is undertaken at 'block' level (in other words at the level of individual paragraphs or sentences), but other people believe that filters look for phrases or even individual words. Noone really knows of course but it might seem reasonable to assume that the most likely basis of examination would be to use either sentence or phrase matching.
Sentence matching is quite clear-cut and simply means cutting both pages up into chunks determined by the punctuation of the page. For example, take a look at this sentence:
It is relatively simple to get a good deal on a hi-fi, providing you know what to look for.
This would either be viewed as one single sentence or as two sentences, depending upon whether you use the traditional definition of a full-stop as indicating the end of a sentence or adopt a flexible approach and use other punctuation marks, like commas.
Matching at the phrase level is a bit more complex. What is a phrase? Should a phrase consist of 2 words or 3 words or 4 words or…?
Let's assume for now that a phrase is defined as 3 words. If this were the case the following phrases would be viewed as duplicate content if they appeared on two pages which were being compared:
In those days
The answer is
You can get
Day to day
Take a look
One way to
Did you know
At that time
In the end
These phrases are all ordinary everyday phrases which could be used on pages about breeding ferrets, learning to swim, pay per click advertising or anything else you can think of. Now some people would say that the search engines do examine pages down to this level. For example, when I questioned the support staff for one particular duplicate checker (Dupecop) about the basis on which they examined duplicate content they said:
"DupeCop compares both individual words and 3-word phrases. It also ignores all punctuation and scans across sentences"
It was not a surprise therefore that when I Your guess would be as good as mine.
Over the years I have written hundreds of articles and closely watched the results in terms of duplicate content penalties, as far as it is possible for anyone to do so. Based upon my own experience I am content that filtering is not carried out down to the level of 3 or 4 word phrases but almost certainly ends at the sentence level. So, providing you are re-writing articles down to this level, you should have no problem in escaping the duplicate filters. In actual fact, even if a couple of sentences are duplicated you ought to be okay.
WebMarketingCentre.com provides information on article writing and article submission and is also an article directory where you can pick up free articles for your website or ezine and to which you can submit articles on a wide variety of topics including writing articles and much more.

Use the feedback form below to submit your comments.

Use the form below to email this article to your friends.

- A Threat to Your Wordpress Blog: Duplicate Content
- Duplicate Content - Is It A Big SEO Problem?
- Duplicate Content: What You Ought to Know About
- The Kidnapping of Content
- Ways Publishers Can Cope With The Increasing Question Of Duplicate Content
- When Will I Get Dinged for Duplicate Content?
- Article Writing Tips to Avoid Duplicate Content
- Ways to Avoid Plagiarism
- How Can You Prevent Plagiarism?
- 1970s Band Accused Avril Lavigne of Plagiarism
- Author of Bestseller The Swarm Stung By Claims of Plagiarism
- Booker Winner in Plagiarism Row
- CBS Anchor Embarrassed By Plagiarism
- Plagiarism is the Curse of Greedy Publishers
- Star Young Author Admits 'unconscious' Plagiarism
- Washington Post's Left-baiting Blogger is Fired for Plagiarism
- South African Author Accused of Plagiarism
- Promoting yourself as a Content Writer
- Earning via Content Writing
- Web Content Writing



