At the January 24, 2019, ARF Leadership Lab speaker Rick Bruner, CEO/Co-Founder of Central Control and US Vice Chair of I-COM introduced critical topics, such as:
• The most important step you can take to ensure good campaign measurement
• How experimental design fits into Multi-Touch Attribution and other ad ROI models
• What is incrementality, and how does it relate to campaign measurement?
• The role of randomized control approaches in campaign measurement
• Steps to creating a culture that enhances optimal measurement
• The impact of ghost ads on campaign effectiveness
This half-day workshop featured speakers and case studies from Netflix, Pandora, Wayfair, Walmart and more to allow attendees to translate best practices into everyday techniques.
Introduction to “Incrementality”
Paul Donato, the ARF’s Chief Research Officer, opened the January 24 Leadership Lab by explaining the foundational concepts for causal inferences.
- The only true way to measure causality is to use a true experimental design.
- Without counterfactuals (the contrary to the fact conditionals) or test and control, the data can lead to spurious correlations.
- Correlation does not equal causation.
Paul Donato – Chief Research Officer, ARF
Access presentation »
Dare to Know: Incrementality, A Primer
Rick Bruner – CEO/Co-Founder, Central Control; US Vice Chair, I-COM
Rick Bruner, the CEO/Co-Founder of Central Control and US Vice Chair of I-COM, expanded on the vital need for Randomized Controlled Trials (RCT) in campaign measurement, and why incrementality is not a mere buzzword of the moment, but serves as a rallying cry for the industry.
- One of the biggest problems in advertising is that no one believes in their results. In contrast to the “hard sciences,” like medicine or rocket science, it’s has been traditionally very difficult to know the causal effects in advertising. This has led us to pretend that correlation is causation when planning advertising dollars. The only way to reduce this uncertainty and waste is through true experiments.
- Correlation is not causation. For instance, stats from the U.S. Department of Agriculture and Center for Disease Control & Prevention show a 94.71% correlation between deaths by bedsheet entanglements and per capita cheese consumption.
- Randomized Controlled Trials (RCT) = ROI. The secret formula for advertising success = Trial + Error. The only method advertisers should trust for ROI above all is the scientific method.
- Incrementality may save your job. Companies that are gaining market share like wildfire (e.g., Netflix, Dollar Shave Club) use RCT as they do not have the baggage of the old methodologies.
- Embracing incrementality requires a change in advertising culture: dare to know!
Rick Bruner – CEO/Co-Founder, Central Control; US Vice Chair, I-COM
Access presentation »
View Videos:
Improving on the Status Quo for Advertising ROI Using Randomized Control Trials
The Value of Scientific Certainty
Bill Harvey – Chairman, RMT
Bill Harvey, Chairman of RMT, reinforced why reducing uncertainty through controlled experimentation is a must for marketers.
- The marketing industry has been satisfied with a mild degree of certainty, possibly misled by the 95% confidence ranges. Confidence ratings only apply to sample size, not to biases.
- The industry mostly uses epidemiological (MMM, MTA, Singlesource) or observational research methods, which only allow for correlational conclusions. These are not true experimental methods, as these approaches control mostly post-hoc for the differences among groups.
- The point of controlled experimentation is to eliminate uncertainty, to show that differences observed could only have been caused by one specific variable.
- Addressable TV and Digital seem to lend themselves well to A/B testing. Bill Harvey’s research with TRA (2011) also found that Zone Targeting seems to show some ability to do A/B testing. A/B testing in addressable TV should be greatly improved in 2020 when the ATSC 3.0 standard is rolled out.
Bill Harvey – Chairman, RMT
Access presentation »
Through the Looking Glass
Kiril Tsemekhman – VP, Data Science & Engineering, BounceX
Kiril Tsemekhman, VP of Data Science & Engineering at BounceX, explains why using the wrong measurement is so dangerous for advertisers and the industry.
- When we have the wrong incentives, the system gets broken. Wrong incentives lead to proxy outcomes, consistently wrong decisions, wasted budgets, and criminal activity (e.g., fraud). In the example of the healthcare industry, success is not measured by how well people are treated, but by proxy outcomes (i.e., number of medical procedures), which drives the wrong incentives, which leads to rising premiums.
- Good honest effort to change the paradigm in the industry include: thinking of deficiencies, thinking of incentives, thinking of consequences, honest admission of limited accessibility of the ideal solution, thinking of incremental improvements (rather than breakthrough), sometimes compromising and giving up some purity.
- When compromising: beware of limitations (e.g., cookies are not people) of any measurement (walled gardens and other channels), combine RCTs with tested observational methods, beware of potential conflicts of interest.
Kiril Tsemekhman – VP, Data Science & Engineering, BounceX
Access presentation »
View Video:
Experiment Methodologies in Practice: Placebos, Ghost Ads, Intent to Treat
Ghost Ads: Improving the Economics of Measuring Online Ad Effectiveness
Garrett Johnson – Assistant Professor, Boston University
Garrett Johnson, Assistant Professor, Boston University, introduced Ghost ads as an improved approach to existing Public Service Announcement (PSA) and Intent-to-Treat A/B methods in measuring online ad effectiveness.
- To measure the effect of advertising, marketers must know how consumers would behave had they not seen the ads, but to date, it has been too difficult and costly to conduct a true RCT (Randomized Controlled Trials) in online advertising. Buying ads for the control treatment in a competitive marketplace unbalances the test/control symmetry while also being too expensive.
- Ghost ads is a parallel accounting system or database that records ads that would have been shown to a user in a control group. Ghost ad impression is a log entry that flags when an experimental ad would have been served to a user in a control group
- Ghost ads is the best of both PSAs and Intent-to-Treat (A/B) methods by allowing for scientifically rigorous RCT at large scale with very little cost.
- Many advertisers are already using Ghost ads at scale. Google was the first to put into production a version of ghost ads and currently offers two services: Brand Lift and Conversion Lift. Around 100 advertisers are using these services monthly at Google. To date, over 20 billion user experiments have already been conducted. Other ad platforms and advertisers using versions of Ghost ads include: AdRoll, dataxu, Netflix, Walmart.
- Ghost ads are the future of advertising by changing accountability. If marketing only brings in vanity metrics to the table, it will only be seen as a cost center within an organization. By using scientific methods to show true ROI from advertising, then the statue of marketing within the firm will rise commensurably.
Garrett Johnson – Assistant Professor, Boston University
Access presentation »
View Video:
Case Studies and Best Practices
Incrementality at Netflix
Randall Lewis – Director, Economics, Netflix
Randall Lewis, Director of Economics at Netflix, explained how incrementality is used to measure and optimize ad effectiveness Netflix.
- Netflix spent approximately $2 billion on external marketing in 2018. Incrementality is used for:
- Channel Mix – to create a model to determine which channel mix drives sign-ups.
- Title (product/creative) selection for ads – to determine which title to feature on its home page.
- Incrementality Bidding – to find the counterfactual for a given ad.
- Don’t create a correlation that isn’t causal: Allocating advertising based on expected effectiveness invalidates standard “MMM” approaches to ad effect inference.
- Experiment to determine the appropriate spending: Only by ignoring expected effectiveness when allocating, can channel effectiveness be evaluated (locally).
- Incrementality bidding is the “holy grail” of ad effectiveness measurement: It requires systems that allow marketers to determine the value of the ad.
- Incrementality improves decision-making in many business applications where human judgment or machine learning confound measurement.
Randall Lewis – Director, Economics, Netflix
Access presentation »
View Video:
Google’s Randomized Experiments
Ying Liu – Quantitative Analyst, Google
Ying Liu, Quantitative Analyst at Google, discussed Google’s solutions for running randomized experiments at scale.
- Randomized experiments set up the foundation for valid measurements for incrementality.
- Google Survey Lift studies measure incremental brand effectiveness for video ads (ad recall, brand awareness, consideration, favorability and purchase intent).
- Challenges with surveys include: Solicitation Bias (e.g., users who are more active online have larger chance to be solicited); Response Bias (e.g., users who know the brand better have larger likelihood to respond).
- Google’s solution: a two-step logistic regression model for bias correction.
- Other products at Google that leverage randomized experiments include: Conversion Lift (randomized experiments are used to measure the impact on conversion) and Unskippable Labs (randomized experiments are conducted to see how ads affect consumers).
Ying Liu – Quantitative Analyst, Google
Access presentation »
View Video:
Incrementality at AdRoll Group: Best Practices & Learnings
Larissa Licha – Head of Product Strategy, AdRoll
Larissa Licha, Head of Product Strategy at AdRoll, discussed best practices for incrementality:
- Always Validate: Companies need the ability to properly validate the mechanisms and results internally. Mechanisms that allow validation include: A/B testing, balance analysis, identify matching. Make sure demonstrated results are valid (e.g., statistical significance, confidence intervals) and allow access to that data.
- Enforce Transparency: It is crucial to understand and have the ability to communicate the what and why behind results. Offer accessible, transparent, and actionable reporting to the end user. Move outside the black box and let the end users clearly understand what drives and doesn’t drive performance.
- Measure tangible success metrics: focus on the broader impact. How does it all tie to the overall goals that matter to the end user? For example, lift is a good baseline but doesn’t tell the full story. Also consider Volume (understand the amount of incremental sales driven and tie it back to business growth) and Cost efficiency (identify at what cost the measured value was drive and tie it back to business profits).
- One size DOES NOT fit all: What works and doesn’t work heavily depend on the company, their customer set and data.
Larissa Licha – Product Lead, Measurement, AdRoll
Access presentation »
View Video:
Measuring Causal Impact from the Mar Tech Perspective: View from the Trenches
Vadim Tsemekhman – Director of Product, Walmart Labs
Vadim Tsemekhman, Director of Product at Walmart Labs, described the challenge of measuring incrementality across all channels.
- Walmart can leverage first-party and third-party data to build unique identify graph. Every day, Walmart has access to 25 million in-store transactions and 500K online transactions which can be used for analytics and optimization (80% of U.S. consumers shop at Walmart).
- Key challenges include the mostly offline nature of customer knowledge; limited scope of first-party graph and limited quality of third-party graph solutions. How do you measure the effectiveness of digital campaigns both in store and online when the overwhelming majority of your customers shop at brick-and-mortar stores? How do you smooth the transition between a physical in-store shopper and their online identity? How do you measure all channels with the same metrics?
- Walmart’s top priorities include focusing on finding ways to conduct scaled randomized control testing across every marketing channel and cooperating to build an independent identity resolution consortium.
Vadim Tsemekhman – Director, Project Manager, Walmart Labs
Access presentation »
View Video:
Marketing Science & Communications: Marketing Incrementality at Booking
Dane Aronsen – Experimental Design & Measurement, Booking.com
Laura Oliveria – Marketing Analyst, Booking.com
Laura Oliveria, Marketing Analyst, Booking.com, shared how incrementality is used at Booking.com.
- Once you start to measure incrementality, there are many things to consider including: sample size, test control split, the metric to be measured.
- Embrace a culture of experimentation: Let the data guide you; ask the right questions; prepare for all outcomes. For every experiment, Booking.com records: the hypothesis; the plan if hypothesis is supported; the data supporting the hypothesis (follow the hierarchy of evidence); the plan if hypothesis is not supported.
- Don’t reinvent the wheel! Leverage statistics that already exist in the field.
- A chain is no stronger than its weakest link. Quality control is important at all phases – recognize the importance of housekeeping.
- The results at Booking.com: hypotheses are more data driven; outcomes and next steps are clear to everyone; “success” is less ambiguous; there is no such thing as a failed experiment.
Dane Aronsen – Experimental Design & Measurement – Booking.com
Laura Oliveria – Marketing Analyst, Booking.com
Access presentation »
View Video:
Panel Discussion
Larissa Licha – Product Lead, Measurement, AdRoll
Vadim Tsemekhman – Director of Product, Walmart Labs
Laura Oliveria – Marketing Analyst, Booking.com
Dane Aronsen – Experimental Design & Measurement, Booking.com
Ying Liu – Quantitative Analyst, Google
Randall Lewis – Director, Economics, Netflix
Panel Moderator: Jon Gibs – Chief Data Officer, Gartner/L2
This first panel discussion touched upon how digital ad effectiveness research has transformed over the past 20 years, the panelists’ approaches to validation, and best practices for marketers with long purchase cycles.
- Simple methods from 20 years ago don’t work as well now with the overall complexity of today’s ad tech targeting and optimization systems.
- However, randomized control trial is still the gold standard of accuracy. Randomization checks or balance tests are needed in order to have confidence in the system that’s being built. Incrementality gives researchers the power to run these tests.
- As things become more complex, there’s more demand from the industry to hold ad vendors accountable. Better methods have to be applied to remove noise in experimentation in order to inform machine learning and shift the focus towards incrementality. At AdRoll, the end consumer can run their own balance analysis and validate the information provided through identity matching. At Google, EDA reports are run on the back end to verify that treatment control groups are similar.
- For marketers with long purchase cycles (e.g., high consideration, high price), marketers should look for signals that people are going down the purchase funnel. For example, at Booking.com, they pay attention to incremental sign-ups. AdRoll uses surrogates as early indicators of conversion.
- The “devil is in the details”: Be careful about how you’re defining your metrics and joining your data.
View Video:
Case Studies and Best Practices Con’t
Causal Measurement: Not for the Casual Scientist
Steve Geinitz – Ads Research Manager, Facebook
Adoption of causal measurement by advertisers is hindered by several challenges. Steve Geinitz, Ads Research Manager at Facebook identified the major barriers for adoption, as well as solutions to address these hurdles.
- The biggest barrier to adoption of causal measurement is that experiments are implemented differently on different platforms. One solution is to hire experts in causal measurement who know how to set up, validate, and verify. Longer term, it will be necessary to standardize principles for causal measurement implementation within the industry.
- Another challenge for advertisers is that platforms have specific setup requirements, which vary based on the campaign. As a solution, companies can develop clear processes that incentivize executional excellence. Longer term, the causal measurement setup needs to be standardized across the industry.
- Causal measurement isn’t always on, but decision-making is. This is one reason behind advertisers’ reliance on MMM and MTA. One option is to integrate causal measurement results into always-on solutions. Longer-term, technology, primarily machine learning, can be used to predict lift in real time.
- What you can do today: Hire experts in causal measurement. Develop clear processes that incentivize executional excellence. Calibrate always on measurement methods with lift.
- What tomorrow will look like: Industry standards around causal measurement solutions, and technology and processes to predict lift at scale.
Steve Geinitz – Quantitative Research Manager, Facebook
Access presentation »
View Video:
Spend Time Wisely: Time & Incrementality Measurement
Margaret Hung – VP, Product – Activation, Integral Ad Science
Margaret Hung, VP of Product – Activation at Integral Ad Science, discusses best practices in correctly measuring time in research designs.
- Account for the opportunity to see in research design. On average, 65% of consumers are served one viewable impression or more within an average campaign. This means that 35% of consumers are served an ad for a campaign but never actually have an opportunity to see it. Therefore, opportunity to see should be accounted for in research design.
- Depending on the objective, you want to also be able to identify who within that test group actually saw a viewable ad. When trying to understand the incremental impact of the campaign as it ran, include everyone in the test group. When looking at the incremental impact of the creative asset on the audience who had an opportunity to see it, focus on those who actually saw the ad.
- When randomly controlled experiments aren’t feasible, consider natural experiments; however, watch out for potential biases. Sampling techniques need to be applied to ensure that the only difference between the two groups is that one was served a viewable ad and one was not. The most important potential bias that needs to be taken into account during this process is the browser history of the consumer.
- Measure exposure time to maximize “Opportunity to Influence.” Research results previously released by IPG Media Lab and IAS show that ads with an exposure above the MRC standard can result in much higher conversion rates compared to those that only met the standard. The difference will vary for every campaign. For example, LifeLock examined the impact of overall exposure time on landing page visits, and used this to determine the optimal time of a single impression. This was used to guide allocation towards placements that were closer to this optimal time in view.
Margaret Hung – Head of Activation, Integral Ad Science
Access presentation »
View Video:
The Path to Successful Omni-Channel Measurement Across All Marketing Touch Points
Markus Dmytrzak – Director, Decision Sciences, Sam’s Club
Markus Dmytrzak, Director of Decision Sciences at Sam’s Club, presented on some of the work Sam’s Club is doing related to marketing measurement and randomized control experiments, both in traditional channels like direct mail as well as digital channels.
- Measurement and optimization of traditional marketing channels is complicated, but in some ways can be more easily accomplished than with digital channels. With direct mail, for example, there is more control in terms of setting up test controls, capturing contact and response history, and building uplift models.
- Digital experiment setup and measurement can become complex. Among the challenges are: drop-off rates after matching, speed and scalability, reliance on safe haven providers, and the need to find the right technology partner.
- While test and control experiments have value, there are limitations, particularly in measuring the impact of brand building activities where there isn’t an immediate conversion. To shift from isolated channel measurement to customer journey optimization, the solution will need to address this alongside incrementality.
- Digital marketing measurement needs to address the following challenges. Isolated measurement approaches are becoming outdated, and a unified measurement approach is needed. MMM currently lacks optimization capabilities on a granular level. MTA alone is insufficient as it does not provide incrementality insights. The solution needs to support scalability and speed. These challenges come alongside organizational barriers and considerations around privacy laws.
Markus Dmytrzak – Director, Marketing Analytics & Decision Sciences, Sam’s Club
Access presentation »
View Video:
Leveraging Randomized Experiments for Uplift Models
Anvesh Sati – Director, Data Science, Wayfair
Anvesh Sati, Director of Data Science at Wayfair, shared how it uses randomized experiments’ data to build incremental models.
- Wayfair has seen clear evidence of the impact from internally developed capabilities in data science and ad tech, reducing ad cost as a percentage of net revenue. Algorithmic tools, machine-learned models, and a scalable tech stack allocate ad spend for all major marketing channels.
- For digital, direct mail, and email, Wayfair is using uplift models where possible. The goal is to utilize uplift modeling to maximize incremental revenue while minimizing ad cost as a percent of incremental revenue.
- However, uplift modeling has its own challenges, such as: requiring control / holdout group data, which in some cases may not be available, depending on the platform; difficulty in modeling without good and clean data (and real-time data is messy!); how low-intent customers have weak first-party signals.
- Commonly used response modeling can work, but may target the wrong customers and waste marketing dollars.
- Uplift modeling can improve marketing efficiency and drive higher incremental revenue. Transformed outcome trees provide a simple, modular way to do this.
- Use Qini plot to evaluate and select the best uplift model; determine your targeting threshold based on the marketing budget and efficiency goal.
Anvesh Sati – Director, Data Science, Wayfair
Access presentation »
View Video:
The Power of Audio Advertising: A Field Experiment on Pandora Internet Radio
David Reiley – Principal Scientist, Pandora
David Reiley, Principle Scientist at Pandora, shared key findings from Pandora’s field experiment on measuring the impact of audio ads.
- Pandora found that while attribution models usually overestimate effects of display ads, they appear to underestimate effects of audio. The study found that audio ads actually caused 3x as many incremental trial-start conversions as would have been measured using the marketing team’s usual attribution model. Audio ads have long-lasting effects, which is in part why attribution wasn’t working – 20 minutes isn’t long enough.
- Results demonstrated that simple voice audio ads increased conversions substantially. Sound-design audio ads increased them even more. The study found that display added little impact to the effect of audio ads.
- Creatives with sound design produced 30% more incremental conversions than the narration-only audio creatives.
- Approximately a quarter of the incremental conversions take place in the three weeks after the four-week campaign.
David Reiley – Principal Scientist, Pandora
Access presentation »
View Video:
Incrementality
Amy Nodalo – Director, Partner Analytics, Viant
Amy Nodalo, Director of Partner Analytics at Viant presented on some of the work being done with randomized controls at Viant.
- Viant is able to measure incrementality in terms of offline sales through credit card transactions, and TV tune-in through ACR on smart TVs. Their approach consists of the following: every ad call runs a randomizing algorithm on the IP address; an algorithm assigns the IP to test or control group; targeting for test and control segments is identical; the control is served a placebo ad; outcome is based on TV tune-in lift and online / offline credit card sales.
- The team also experimented with multivariate tests to see if changes in targeting made an impact on sales. In general, the more niche the target, the more sense it made to pay for that target.
- There were several challenges encountered with multivariate tests. Ad spend needs to be substantial to support multiple control segments. There were also operational resource constraints in setting up and managing multivariate experiments.
- Retail sales benchmarks: a $12 ROAS was seen for department stores, but poorer results were observed in the casual dining category.
Amy Nodalo – Director, Partner Analytics, Viant
Access presentation »
View Video:
Panel Discussion 2
Steve Geinitz – Quantitative Research Manager, Facebook
Margaret Hung – Head of Activation, Integral Ad Science
Anvesh Sati – Director, Data Science, Wayfair
David Reiley – Principal Scientist, Pandora
Amy Nodalo – Director, Partner Analytics, Viant
Markus Dmytrzak – Director, Marketing Analytics & Decision Sciences, Sam’s Club
Panel Moderator: Jim Nail – Principal Analyst, B2C Marketing, Forrester
This second panel discussion explored cultural and organizational considerations for implementing incrementality along with additional insights and best practices.
- Heightened sensitivity around privacy, GDPR, e-privacy can make it a bit more challenging to implement experimental designs, in terms of tying data to outcomes at a user level. It is important to have a data governance department in the organization, and really think about the privacy implications of what you’re doing.
- Trust is very important. Prove the concept first and build from there. A strong leadership vision upfront can help secure the resources necessary to do the incrementality work. Start with the lowest hanging fruit and share results with the business to gain their interest. Then present a roadmap of experiments that will have a big impact, and go from there.
- If the company already has an embedded test-and-control culture, take care when implementing other methodologies like MTA because it is a different read. It is a challenge and requires different departments working together and being on the same page.
- Negative lifts are possible. With emails or push notifications, for example, if users receive too many messages, they may unsubscribe leading to a negative value. It can be related to frequency. Sometimes you can’t get people into a store more often. It may also be because the marketing activity doesn’t work.
- The industry is still working on reconciling this gold standard approach with those available for other channels. In the meantime, use multi-touch attribution to capture the effect of multiple touchpoints. Also, transparency is key. Share what you are actually able to analyze based on the data you have access to, and offer different types of analysis, such as MTA and cross-channel.
- Time in view differs for every campaign and category. It is typically within the range of 5s to 45s, depending on involvement with the category and how entertaining the ad is. On Facebook, a best practice is to show the brand name within the first three seconds.
- Determining how long to run experiments depends of variety of factors. There is a bias-variance trade off. While some long-term effects of the campaign may be missed by limiting the period of time being evaluated, too long periods may have too much noise. AdWorks experiments have shown that long-term effects of advertising often carry into the second year.
- Implementing incrementality is likely to differ for larger vs. smaller brands. For larger brands, incrementality may be more apparent for transactional campaigns than brand ads as the brand is already well known. For smaller brands, incrementality is important for both types of ads. Smaller brands will also have less room to spend budgets inefficiently, whereas big organizations may have more leeway.
- Regardless of whether a brand is large and small, don’t look at incrementality tests narrowly. The impact on the brand can be measured in other ways, and teams should use these other data points as well.
View Video:
Key Takeaways and Q&A
Paul Donato – Chief Research Officer, ARF
Rick Bruner – CEO/Co-Founder, Central Control; US Vice Chair, I-COM
Paul and Rick wrapped up the workshop with the final key takeaways:
- The industry should embrace incrementality in a big way.
- Advertising culture change is required. Buy-in is key
- It’s hard. Details matter. It’s worth doing right.
View Video:
Closing Remarks
Rick Bruner – CEO/Co-Founder, Central Control; US Vice Chair, I-COM