The surface Web (also known as the visible Web or indexable
Web) is that portion of the World Wide Web that is indexable by conventional
search engines. The part of the Web that is not reachable this way is called
the Deep Web. Search engines construct a database of the Web by using programs
called spiders or Web crawlers that begin with a list of known Web pages. The
spider gets a copy of each page and indexes it, storing useful information that
will let the page be quickly retrieved again later. Any hyperlinks to new pages
are added to the list of pages to be crawled. Eventually all reachable pages
are indexed, unless the spider runs out of time or disk space. The collection
of reachable pages defines the Surface Web.
For various reasons (e.g., the Robots Exclusion Standard,
links generated by JavaScript and Flash, password-protection) some pages cannot
be reached by the spider. These 'invisible' pages are referred to as the Deep
Web. A 2005 study queried the Google, MSN, Yahoo!, and Ask Jeeves search engines
with search terms from 75 different languages and determined that there were
over 11.5 billion web pages in the publicly indexable Web as of January 2005.As
of June 2008, the indexed web contains at least 63 billion pages.