The Solution Requires A Pocket Calculator
So how should Web sites and service providers cope with these increasing discrepancies? In the near term, Forrester's Chatham suggests, customers should migrate to analytics applications that generate first-party cookies - data files that attribute themselves to the originating Web site, thereby distinguishing themselves from adware. Coremetrics' announcement yesterday advised its customers to do the same, citing the next version of its LIVEmark measurement system as one way to accomplish this. Peterson added WebTrends to this list, citing that the newest version 7.5 of its software "improves companies' ability to migrate from third- to first-party cookies."
But as the techniques with which analytics software migrates to a first-party approach becomes more widely distributed, conceivably, spyware and adware could follow suit, leading to a future situation where Web users reject all or most cookies, or refuse selected cookies from sources they're not certain they trust. Both Internet Explorer and Firefox already provide users with the tools they would need to accomplish precisely that. So the first-party approach may be a bandage for the problem, but not a remedy.
"So what could we do, if we can't track everyone?" asked Peterson. The question was not rhetorical; he proposes this solution: college statistics. "We could just figure out what the sample size is, and just do the stats...If [Web sites] knew the answer to that single question, then they would know how much of a sample they had, how statistically relevant that sample was, and what the margin of error was." This margin of error, he suggested, could then be incorporated in sites' reports to their directors of marketing.
A sampling error level, Peterson stated, could be estimated by evaluating the number of users who log into a site that requires explicit logins, measured against the data reported by cookies accessed through those same users. The difference between the cookie data and the actual number of logins, he said, could provide a reliable sampling error level. Provided the usage sample represented a fairly random slice of Internet usage, he explained, that same level could be applied to any other sample obtained by any other means. The result would be a corrected estimate that, like a Gallup poll, would come equipped with a reasonable margin of error.
"If somebody - [be it] NetRatings or Jupiter Research - were to publish data that said, 'In the publishing vertical, in the technology publishing sub-vertical, cookie deletion seems to be trending around 18% month-over-month,' you could take your [unique users] number, adjust it downward by 18%, and know pretty accurately, based on a statistical sample, how big your audience was."
If the long-term solution is as simple as applying college statistics, what resistance could there possibly be to implementing it? Unfortunately, there may be two major obstacles. The first, which Peterson pointed out, is that it involves restating audience numbers. Since cookie rejection has been increasing rapidly - doubling, by some estimates, over the past 12 months - such restatements could conceivably cancel out reported audience gains, said Peterson, and perhaps result in restatements that reflect audience losses.
If companies were compelled to suddenly account for discrepancies in usage traffic, the amount of the one-time adjustments they would have to make could eclipse the level of declining usage projections that triggered the first great Internet fallout in 2000. Advertising revenues could decline significantly and, as occurred five years ago, some Web publishers could find themselves exiting the business.
The second obstacle is this: If these discrepancy intervals were made public, they could apply as well to analytics vendors' data as to the category of log file analysis data those vendors originally sought to replace. One of the key value advantages of ASP analytics could conceivably be eliminated with a trick of college math. And with just a bit more college math applied, smaller usage samples such as the level revealed in our report last week on Janco Associates - which admitted to measuring 371,000 page views in its report released last week - could actually become more statistically relevant.
"If it's truly random, and truly representative," admitted StatMarket's Johnston, "the Central Limit Theorem says you can use a fairly small number - the low thousands." This theorem states that the factors that "fuzzify" the results of sample data are generally normally distributed, so they would apply almost equally to both large and small samples. However, Johnston pointed out, "we have, by far, the largest sample," tracking as many as 40 million unique browser sessions per day, by WebSideStory's estimates.
Latest Miscellaneous News
- 09/02 – Google Could be Planning a Retail Store in Dublin
- 08/02 – Anonymous Hacks Syrian President; His Password Was 12345
- 08/02 – Motorola's Motoluxe to Hit UK End of February
- 08/02 – TomTom, UK Insurance Company Team Up for Cheaper Policies
- 08/02 – Microsoft and UK Protection Firm Create Child-safe Browser