While often very complex in their calculations and data processing, the critical operations performed by the major search engines in order to rank websites isn’t as lengthy as one might think. The processes they use to provide relevant results when a web search is undergone can best be described in the following four steps.
- Send out the Web Crawlers
Search engines use invisible “bots” or “spiders,” which are really programs or automated scripts, that browse (or “crawl”) the World Wide Web in a methodical, automated manner. Search engines use spidering as a means of providing up-to-date data. This type of technology is necessary because the rate at which people create new Internet documents greatly exceeds any manual indexing capacity. In fact, an estimated 20 billion web pages exist, and search engines have crawled about half of them.
- Index the Pages
After a spider crawls a web page, it makes a copy of it and adds it to its database. This process is known as indexing. With so many search queries submitted each minute, it is very important that search engines are steadfast in their index management so that they can search and sort billions of documents in fractions of a second.
- Process Queries
Search engines process hundreds of millions of search queries every day. When someone keys in a search term and clicks “Search,” the engine retrieves from its index all of the documents that match the query. It determines a match by finding the same terms or phrase entered into the search bar. Entering a multi-word phrase by itself can return literally millions of results, but entering that same phrase in quotes can greatly narrow the results, giving the user a more accurate listing of websites that relate to their particular search.
- Rank Pages
A very closely guarded mathematical equation, called an algorithm, is employed by each search engine to determine how to sort and rank search query results. This algorithm allows the engine to rank the most relevant web pages first, and the rest in descending order of importance to the user.
What You Can Do for Your Website: Avoid Speed Bumps & Walls
You may not know it, but you could be hindering or preventing your website from being crawled by search engine spiders. As spiders crawl the web, they rely on the architecture of hyperlinks to find new web pages and revisit those that may have changed. Complex links and deep site structures with little unique content may act as “speed bumps” in the process by slowing down the spiders. Even worse, data that cannot be accessed by web crawlers are really like “walls” in that they completely prevent your web pages from being ranked.
Beware of the Following “Speed Bumps”:
- URLs with 2+ dynamic parameters; i.e. http://www.url.com/page.php?id=4&CK=34rr&User=%Tom% (spiders may be reluctant to crawl complex URLs like this because they often result in errors with non-human visitors)
- Pages with more than 100 unique links to other pages on the site (spiders may not follow each one)
- Pages buried more than 3 clicks/links from the home page of a website (unless there are many other external links pointing to the site, spiders will often ignore deep pages)
- Pages requiring a “Session ID” or Cookie to enable navigation (spiders may not be able to retain these elements as a browser user can)
- Pages that are split into “frames” can hinder crawling and cause confusion about which pages to rank in the results.
Beware of the Following “Walls”:
- Pages accessible only via a select form and submit button
- Pages requiring a drop down menu (HTML attribute) to access them
- Documents accessible only via a search box
- Documents blocked purposefully (via a robot meta tag or robots.txt file)
- Pages requiring a login
- Pages that re-direct before showing content (search engines call this cloaking or bait-and-switch and may actually ban sites that use this tactic)
In order to avoid the above pitfalls and ensure that your website’s contents are fully crawlable, be sure to provide direct, HTML links to each page you want the search engine spiders to index. Remember to make every page of your site accessible from the home page, since the home page is usually the place spiders begin their crawl. It’s also a good idea to add a sitemap to your website in order to increase its navigation.
This article was written by David Montalvo. David Montalvo is the CEO of UnReal Web Marketing LLC. He has achieved over 250,000 top 10 positions for Fortune 500 companies since 1997.