'Cookie stuffing': A data scientist takes on seamy side of online ads

Combatting cookie stuffing, smoking out dubious URLs, designing for behavior change and selling the company culture: The Data Mill reports.

Ever feel like it's getting harder to tell the machine from the human? For Claudia Perlich, distinguishing between the two is just part of the job. She's the chief scientist for Dstillery, an advertising technology company that aims to match online customer interest with specific brands and products. So, when Perlich's predictive models nearly doubled in performance within the span of two weeks, a red flag went up.

The Data Mill"One rule for data science: If it looks too good to be true, it probably is too good to be true," Perlich said during a webinar preview for the upcoming Strata + Hadoop World 2013, where she'll be presenting.

In fact, Perlich was pretty convinced the spike in performance she saw was the work of bots -- Web robots that simulate human behavior and are known to defraud display advertisers. In some cases, bots are sent to hit fake websites, artificially inflating traffic numbers to make the ad space seem more valuable. In other cases, scammers and bots are even more insidious, affecting commercial websites with something called cookie stuffing.

Cookie stuffing is a technique used by affiliate marketers who promote brands such as eBay or Amazon to artificially inflate their customer referral numbers. Cookie stuffers add a bit of code to advertisements that automatically fake clicks on advertisers' links and embed a third-party cookie onto a visitor's computer. The third-party cookie contains the affiliate's unique ID so that if the visitor then stops by eBay or Amazon and makes a purchase within the next month or so, the affiliate is credited with the sale and is paid a commission. Cookie stuffing drives up demand and the price of advertisements based on fake interest, and it also can impact the accuracy of Perlich's models.

That's where "the penalty box" comes in. Perlich pays attention to visitors' browsing history and if she sees something that's off -- say, visits to sites she has tagged as dubious -- those visitors are placed in the penalty box. That means any data they generate or action they take is completely ignored for up to a minute. The tactic works, Perlich said. Within weeks of implementing the penalty box, Perlich's predictive models started to perform more like they used to.

Fun fact about bad URLs

How can you spot a dubious URL? Perlich said: If the word "mom" is in the URL stream, it's three times more likely to be fraudulent; if the word "arcade" is in the URL stream, it's five times more likely to be fraudulent. "And we found plenty of URLs posing as arcade rooms for moms," she said.

Behavior change

A slew of businesses today are building products that hope to change customer behavior in some way. That's according to Stephen Wendel, principal scientist for HelloWallet, an application designed to provide financial guidance to its customers. During an O'Reilly webcast, Wendel pointed to giants like Twitter and Facebook that have changed how people interact, to more mundane products such as Mailbox that could change how you organize email, and to Speek, an app that could influence how you make and participate in conference calls.

Previously on
The Data Mill

Does business have the patience for data science?

Semi-structured data is king of LinkedIn analytics

Community cloud could fix data crunching dilemma for cancer research

According to Wendel, products aimed at changing customer behavior should be developed in four phases: understanding how the mind makes decisions, discovering the right behavior to change, designing the product or feature around the behavior and measuring its impact, and refining the product over time. This cycle should be layered on top of the current product development cycle rather than replace it, he said. It will work with Agile, a combination of Lean and Agile, and even with a waterfall development process -- all of which contain stages of understanding, discovering, designing and refining.

"And you can slot in these four phases as appropriate for that methodology," said Wendel, who is also the author of the new book, Designing for Behavior Change.

Survey says

Big data investments continue to rise. That's according to new survey data from Gartner Research Inc. The results indicated that 64% of organizations are investing or planning to invest in big data technology in 2013. That's compared to 58% in 2012. Even so, big data still has a ways to go. Less than 8% of the more than 700 survey respondents have actually deployed big data technology.

Hiring on limited resources

Can't compete with the Apples and Googles of the world when hiring engineers? Not to worry. According to a workshop hosted by the Massachusetts Technology Leadership Council, building a solid company culture can be an appealing incentive for new employees.

"We sell culture a lot," said Scott Ward, senior vice president at the Boston-based startup Nanigans. "Having a work environment people like going to every single day -- that goes a very long way."

And it won't break the bank.

Welcome to The Data Mill, a weekly column devoted to all things data. Heard something newsy (or gossipy)? Email me or find me on Twitter at @TT_Nicole.

Dig Deeper on Enterprise business intelligence software and big data