Search engines dislike duplicate content for a few reasons. One is that major search engines such as Google, Yahoo, MSN, and Ask aim to provide searchers with a diverse cross-section of unique content, and duplicate content often results in duplicate listings that impair the searcher’s experience. Another reason is that search engines don’t want to spend the resources (bandwidth) on indexing pages that are very similar.
In some instances, pages containing duplicate content are filtered at the time search engine results are sorted, so there is no guarantee as to which version of a page will appear in results and which won’t. Duplicate content may even hinder some sites and web pages from getting indexed by search engines, and there are some cases in which a search engine crawler will stop indexing all of the pages of a site because it finds too many copies of the same pages under different URLs.
While content duplication is sometimes used in an attempt to manipulate search engine rankings to garner more website traffic, in most cases it occurs without ill intent on behalf of the site owner or webmaster. The following is a list of duplicate content scenarios that could be burdening your site.
Scenario #1: Ecommerce sites that include product descriptions from manufacturers, producers, and publishers
Product distribution websites often use text from the manufacturer or producer of the product as a description for the item on their own pages. With the addition of the product name, creator, manufacturer, writer, or recording artist appearing on the page, there is a considerable amount of duplicate content on pages that don’t originate from the same website. Here are some examples:
Scenario #2: Printer-friendly pages
Many sites offer “printer friendly” versions of their content on different pages. Without the application of robots.txt disallow statements or meta “noindex” tags on these pages to keep search engines from indexing them, they may be indexed as duplicate content. See these samples:
Scenario #3: Websites that create session IDs
A session ID lets you create customized applications for a more personalized user experience, thus increasing the appeal of your website. A visitor to your site would be assigned a unique session ID which is either stored in a cookie on the user side or is propagated in the URL.
Websites with session IDs serve information in their URLs to track visitors as they go through the pages of that site. When search engine crawlers detect this tracking information they may index the same page several times under different URLs. A good example of this is www.staples.com.
Search engine guidelines advise you to allow bots or spiders to crawl your sites without session IDs that track their path through the site. While this technique is great for tracking individual user behavior, the access pattern of bots is entirely different. Since bots cannot always decipher URLs that look different but point to the same page, the use of session IDs may result in incomplete indexing of your site.
Scenario #4: URLs that include multiple data variables
When multiple data variables exist within a URL, this causes bots to crawl and index the same page under different URLs. Here are some examples of sites that show different data variables in their URLs.
It is difficult for a search engine bot or spider to crawl the URLs listed above. If this scenario applies to your website, you may want to implement the mod-re-write server settings.
Scenario #5: Pages sharing similar elements
Some websites have elements that are very common from one page to another, such as title, meta descriptions, headings, navigation, and text that is shared sitewide. This can be a problem since bots might consider it to be duplicate content. Beware of this scenario if you own an ecommerce site that includes your brand name and information about that brand in every title on every page of your site. In addition, the use of content management systems that do not allow for distinct meta description tags to be placed on each page of a website can cause a similar dilemma.
Here are two well-known websites that use their brand names on every page:
These five scenarios represent situations in which search engine crawlers may perceive your website to have duplicate content. Although it is probably inadvertent on your part, you should take steps to resolve these issues to ensure that all of your web pages are properly indexed on the search engines.David Montalvo is the CEO of UnReal Web Marketing LLC. He has achieved over 150,000 top 10 positions for Fortune 500 companies since 1997. UnReal Web Marketing is an Internet firm specializing in website design and development, search engine optimization (SEO), pay-per-click management (PPC), e-commerce solutions, and website analytics. In addition, we provide our clients with keyword research, SEO copywriting, link building strategies and email marketing.
The diverse talent at UnReal Web Marketing has over 50 years of combined experience in Internet marketing and web design, and has been instrumental in creating and optimizing more than 1,350 websites generating close to $82 million in sales for small to mid-sized companies throughout the United States.