Google searches 30 trillion web pages, 100 billion times a month. Here's basically how it works. How big is the Web? One way to look at is this: There are roughly 4,348 pages out there for each of the 6.9 billion people on the planet, or to put it another way 30 trillion pages. And there will be a lot more tomorrow. Given the immensity of the Web, it seems nothing short of magic that Google’s search, imperfect as it is, still indexes it as well as it does. The Google search blog is often very interesting, and a recent post gives us some insight into how the engine works and how it identifies and kills spam. As you may know, Google sends out “robots,” little programs that crawl across the Web, following links from page to page while sorting them by content and other factors, and adding information to an index. That index is immense, taking up over 100 million gigabytes. Even so, not every page on the Web is indexed. When the robots get to a page, they look for a file called robots.txt which tells the engine not to index. If it’s there, and contains instructions placed by an authorized Web master, the Google robot will not index the page. When you type something in a search box, formulas called algorithms evaluate your query and pull relevant pages from the index. Exactly how those pages are ranked is a closely guarded secret, but Google does say that it uses over 200 factors to do so. Results are typically served up in one-eighth of a second. Humans, of course, do not enter the picture, but Google uses a corps of trained people to evaluate the accuracy of searches by testing. In a typical year, the company says, it will run over 40,000 evaluations. Most spam removal is automatic, but some questionable pages are examined by hand. Google looks for quite a few factors that indicate spam. Hidden text and “keyword stuffing” is a clue that a page is bogus, as is user-generated spam that appears on forum or guestbook pages or user profiles. Last year, Google launched an update to its anti-spam algorithm called Penguin which decreases the rankings of sites that are using what it calls Webspam tactics. When Google is going to take action against a site, it attempts to find and notify the owners and gives them a chance to fix the problem. The number of those requests varies quite a bit, but in one particularly busy month last year, more than 650,000 notices to Web sites were sent out. As important as search results are to users, they can be life and death to a commercial Web site. They have a huge impact on how much traffic a site gets, and that, in turn, affects ad revenue. Anyone who runs a commercial site (including this one) spends a good deal of time trying to figure out ways to rank high in searches, or in the case of news sites (like this one) how to be included in results on Google News. Related content brandpost Sponsored by Catchpoint Systems, Inc. Gain full visibility across the Internet Stack with IPM (Internet Performance Monitoring) Today’s IT systems have more points of failure than ever before. Internet Performance Monitoring provides visibility over external networks and services to mitigate outages. By Neal Weinberg Dec 01, 2023 3 mins IT Operations brandpost Sponsored by Zscaler How customers can save money during periods of economic uncertainty Now is the time to overcome the challenges of perimeter-based architectures and reduce costs with zero trust. By Zscaler Dec 01, 2023 4 mins Security feature LexisNexis rises to the generative AI challenge With generative AI, the legal information services giant faces its most formidable disruptor yet. That’s why CTO Jeff Reihl is embracing and enhancing the technology swiftly to keep in front of the competition. By Paula Rooney Dec 01, 2023 6 mins Generative AI Generative AI Generative AI feature 10 business intelligence certifications and certificates to advance your BI career From BI analysts and BI developers to BI architects and BI directors, business intelligence pros are in high demand. Here are the certifications and certificates that can give your career an edge. By Thor Olavsrud Dec 01, 2023 8 mins Certifications Business Intelligence IT Skills Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe