How to Google page crawling and works

- April 08, 2016

How Search Works

These processes lay the foundation — they're how we gather and organize information on the web so we can return the most useful results to you. Our index is well over 100,000,000 gigabytes, and we’ve spent over one million computing hours to build it.

Finding information by crawling

Google use software known as "web crawlers" to discover publicly available webpages. The most well-known crawler is called "Googlebot."

Crawlers look at webpages and follow links on those pages, much like you would if you were browsing content on the web. They go from link to link and bring data about those webpages back to Google's servers.

The crawl process begins with a list of web addresses from past crawls and sitemaps provided by website owners.

As Google crawlers visit these websites, they look for links for other pages to visit. The software pays special attention to new sites, changes to existing sites and dead links.

Computer programs determine which sites to crawl, how often, and how many pages to fetch from each site.

Google doesn't accept payment to crawl a site more frequently for our web search results. Google care more about having the best possible results because in the long run that's what's best for users and, therefore, our business.

Googlebot

Googlebot is Google's web crawling bot. Crawling is the process by which Googlebot discovers new and updated pages to be added to the Google index.

Google use a huge set of computers to fetch billions of pages on the web. Googlebot uses an algorithmic process: computer programs determine which sites to crawl, how often, and how many pages to fetch from each site. Googlebot visits each of these websites it detects links (SRC and HREF) on each page and adds them to its list of pages to crawl.

References source at: Google inside search

Search...

Around stuff information