When you work for a search engine, however, you get a different point of view. Because you are literally crawling and analyzing the entire internet constantly (well, most of it anyway), you get a unique top down view of what's out there. The good, the bad and the ugly. Unfortunately, the good seems to be getting crowded out by the bad and the ugly.
More specifically, we're seeing at blekko is a non-stop firehose of web spam. Millions of pages generated every day solely for the purpose of getting indexed by the major search engines and syphoning traffic from them. Of course the goal of this traffic is not to inform, but to monetize users through a variety of ad networks.
Unless you live this every day, its hard to communicate the size and breadth of this problem. In these instances, a graphic helpful. Thus, we created the spam clock. As you can see, based on our calculations we are showing spam growing at a rate of 1 million pages per hour.
Think about that 1 million pages an hour. Wikipedia is 3.5M articles. Every 3.5 hours a new volume of text the size of wikipedia is unleashed on the internet and unsuspecting users. Every 3.5 hours. 7 wikipedia size corpuses being created every day. Ugh.
Lest you think otherwise, the war on spam is far from over and the enemy is hardly backing down.