Automated scripts inflating IE's market share?
Las Vegas (NV) - A new Web browser market share study released this morning by Janco Associates claims the advance of the Mozilla Firefox Web browser may be subsiding. But the report’s release comes on the heels of new studies by leading Internet engineers that calls into question how browser usage is being counted. In fact, Firefox market share may be much higher than some current studies claim.
The new numbers from Janco show Microsoft Internet Explorer to be responsible for 85.07 percent of the traffic among the businesses the company tests, with Firefox claiming 8.83 percent. But as the report states, and as Tom’s Hardware Guide confirmed this afternoon, Janco’s study is restricted to a total of 11 domain names, eight of which are sites Janco Associates operates, and the other three of which belong to Janco clients. Today, M. Victor Janulaitis, the company’s CEO, told us that his study accounts for 371,000 page views per month from Janco-related traffic, plus an unspecified number from his other clients.
Janulaitis told us his Web sites generate primarily business-to-business and commercial traffic. He has not performed any analysis of what percentage of overall usage his traffic represents, he told us, though he said he receives quite a bit of traffic from Kuwait, Saudi Arabia, and Eastern Europe.
Last week, data from a personal experiment published by Tim Bray, Sun Microsystems Director of Web Technologies - heralded by many as the co-creator of XML - indicate that many Web browser usage studies may give too little credit to Mozilla Firefox. As reported on Bray’s personal blog , "ongoing", his study shows that by restricting traffic samples to referrals that may reliably be attributed to human beings, as opposed to automated browsing scripts, administrators may get a vastly altered perspective of their users’ browser geography.
Last April, on the advice of one of his readers, Bray devised a script which examines his own site’s referrer logs and filter out traffic from so-called "referrer spambots" - automated browsers that pretend to be following a link to the site that doesn’t actually exist. The script extracts from his referrer logs only those referrals that appear to have come from search engines. Such referrals contain the search terms that living, breathing people use to link to a Web site. Bray estimates that, as of 7 July, 70 percent of his site’s legitimate referrals come from users of Internet Explorer, a drop of 8 percent from figures recompiled from 12 months ago. Meanwhile, referrals from Firefox users - which tend not to be spambots anyway - rose in the same period from 14 percent to 23 percent.
Critics of Bray’s study point out that his figures are not representative of the broader Internet, which Bray concedes. However, with the release of new data by Janco Associates this morning, Bray’s study could quite possibly be on a par with Janco’s, and could be construed as at least as representative of the Internet at large as the Janco study.
"There are a huge number of spambot referrers out there now," Bray told Tom’s Hardware Guide. "If you go look at your log files, you’ll find lots of bogus visitors to your site apparently coming from Viagra vendors and online poker sites. For some reason, all of those bogus visitors claim to be Internet Explorer. So it’s fair to assume that some people who are not aware of that may be counting IE a little high."
Each month, the Web developer site W3Schools.com publishes an analysis of its own Web traffic, and reports that for the month of June, 71.8 percent of its traffic came from IE browsers, with Firefox responsible for 20.7 percent. The site explicitly states that its studies are only representative of its own user community, which it admits is based largely of the developers and enthusiasts that it attracts. However, that didn’t stop at least one leading financial research firm and holdings company last November from citing W3School’s report as an official Web browser market share survey.
For its part, the Janco report includes a footnote stating, "To the best of our knowledge, excluded from this data are all ’non-human’ interactions such as spiders and ’bots’ which we were able to identify." In addition, Janulaitis told us, his study excludes traffic from IP addresses which make too many redundant requests within a given cycle, as well as from RSS readers such as Pluck. But Pluck is an RSS-enabled plug-in, originally for Internet Explorer, though a beta edition for Firefox was recently made available. Conceivably, RSS traffic could come from any number of aggregators, including Pluck, NewzCrawler - which uses IE as its renderer by default - and Sage, a plug-in for Firefox. Some of Janco’s sites do include RSS feeds.
Among the most quoted Web analysts today include WebSideStory, which on 29 April projected IE’s US market share at 88.86 percent, with Firefox at 6.75 percent. Meanwhile, last May, NetApplications.com reported IE’s worldwide market share at 87.23 percent, with Firefox at 8.06 percent. Representatives for both firms told Tom’s Hardware Guide today that they aggressively filter out non-human usage. Erik Bratt, marketing communications manager for WebSideStory told us that his company’s methodology only counts page accesses where the complete page is loaded. Spambots and other non-human sources typically access only the URL’s main page, skipping embedded images. Some Web engineers - including Bray’s reader - confirm this, and have created measurement tricks such as "image traps" that only non-co-opted browsers will load - in effect, embedded images that human users would never see, although they download them anyway. By downloading them, they confirm the validity of their own traffic.
Dan Shapero, Chief Operating Officer for NetApplications.com, told us his company’s methodology excludes any usage coming from browsers that do not accept the analytics software’s cookies. But Tim Bray told us today this might not be the wisest choice. Referring to WebSideStory’s and NetApplications’ techniques, Bray said, "I think both the image and cookie techniques are valuable, but not complete solutions. It’s easy enough to program a robot to load images and accept cookies, so if someone wants to game the system they can. On the other hand, their numbers are going to be better than raw numbers."
According to Tom’s Hardware Guide’s most recent logs, 50.4 percent of our current traffic reports itself as Internet Explorer, with Firefox garnering 28.8 percent. Our numbers are not filtered for spambots, however, though if most spambots do represent themselves as IE, our results may be even more surprising for Firefox. The monthly rate of increase for Firefox users in our logs hovers at around 1 percent per month - a rate which corresponds to numbers Bray has seen for his site and others, both before and after spambot filtration is applied. Tom’s Hardware Guide currently generates over 50 million page views per month for its more than three million readers.
Why would such things as referrer spambots even exist ? In May 2001, at about the time when their existence was first detected, a contributor to the popular Kuro5hin technology blog presented the theory that devious marketers may be writing scripts that intentionally plant their own URLs in sites’ referrer logs to make it appear that they are actually sending those sites legitimate traffic, in order to make advertising on those URLs seem like a more attractive prospect.