Data Collaboration Archives - AppsFlyer https://www.appsflyer.com/blog/topic/data-collaboration/ Attribution Data You Can Trust Wed, 24 Dec 2025 09:44:19 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.5 https://www.appsflyer.com/wp-content/uploads/2025/11/cropped-54649.-New-Website-favicon-32x32.png Data Collaboration Archives - AppsFlyer https://www.appsflyer.com/blog/topic/data-collaboration/ 32 32 What Black Friday and Cyber Monday taught us about smarter signals for 2026 https://www.appsflyer.com/blog/measurement-analytics/smarter-signals-black-friday/ Wed, 10 Dec 2025 07:15:52 +0000 https://www.appsflyer.com/?p=491589 Smarter signals black Friday featured image

TL;DR BFCM looked great on the surface, but the numbers told a different story. High engagement didn’t always lead to revenue, and many teams realized they were optimizing toward activity, not intent. The real shift comes from using signals that actually reflect purchase behavior. With Signal Hub, brands can build smarter, higher-value audiences based on […]

The post What Black Friday and Cyber Monday taught us about smarter signals for 2026 appeared first on AppsFlyer.

]]>
Smarter signals black Friday featured image

TL;DR

BFCM looked great on the surface, but the numbers told a different story. High engagement didn’t always lead to revenue, and many teams realized they were optimizing toward activity, not intent. The real shift comes from using signals that actually reflect purchase behavior. With Signal Hub, brands can build smarter, higher-value audiences based on real spend and activate them across major channels, turning strong-looking campaigns into ones that truly drive business outcomes.


Another Black Friday and Cyber Monday is in the books.
Your dashboards are full, your spreadsheets overflowing, and your media team is somewhere between proud and exhausted.

Some campaigns crushed it. Others… not so much.

Every year, the same question comes up once the dust settles: what actually worked?

When success on paper doesn’t add up

If you judged your campaigns by click-through rates or installs, this season probably looked great. Engagement was high, creative was on point, and spend hit the targets.

But dig a little deeper, and many marketers are seeing the same thing: conversions that looked promising on paper didn’t translate to revenue. The signals were strong, but they weren’t the right ones. Or as some marketing leads told us after the season wrapped: “We reached everyone we wanted. Just not the ones who actually bought.”

That’s the paradox of modern performance marketing. You can have more data than ever, but if it’s all surface-level engagement, you’re still flying blind.

Picture this: your team spends weeks preparing a big November campaign. Budgets are locked, creatives polished, targeting refined. Launch day hits, and the metrics look great. But when you look at the business impact, something’s off. The audiences that engaged weren’t necessarily the ones who converted. You optimized toward activity, not intent, because your data stopped at digital behavior, such as surface-level signals like clicks, installs, and in-app events, which don’t reveal real purchase intent.

Building smarter audiences with the signals that matter

Now imagine building next year’s plan differently. Instead of targeting users  who clicked your ad or watched your video, you focus on people who actually spend, verified through real-world purchase signals.

That’s the idea behind Signal Hub, a new privacy-safe signal marketplace inside AppsFlyer, built for marketers. It enables brands to go beyond digital behavior and reach audiences based on their actual purchase transactions.

Here’s what that looks like in action.

Finding new high-value subscribers

A leading entertainment brand is launching a new streaming service and wants to acquire users with proven willingness to pay for content. Using Signal Hub’s purchase signals, the team identifies consumers who have recently spent on digital entertainment such as video, gaming, and music subscriptions across multiple merchants.

No first-party data is required. These are new potential users identified purely through anonymized, transaction-based spend data. Once built, the audience is activated across major ad platforms through AppsFlyer integrations, improving conversion rates and ROAS from the very first campaign.

Finding new high-value subscribers

Engaging high-value players

A mobile gaming publisher wants to re-engage existing players ahead of a new release.
Using first-party app data, the team identifies users who have installed or recently played one of their titles. Through Signal Hub’s financial data, they find which of those same users have also made recent app store purchases, indicating active, paying behavior in the wider mobile ecosystem.

By targeting that intersection, the publisher builds a segment of high-value, payment-active gamers and reaches them across channels with early-access or upgrade offers. The result: improved Day 7 ROAS and higher monetization from returning users.

Engaging high-value players

From campaign reports to business outcomes

Signal Hub allows brands to enrich their first-party data with anonymized, transaction-based purchase signals securely and without moving or exposing any raw data.

Fully integrated into AppsFlyer’s ecosystem, Signal Hub lets teams activate and measure these enriched audiences across every major media channel without complex setup or engineering.

It’s a quiet but powerful shift: moving from campaign optimization to business optimization.

If you want to see how this works in practice and explore the full range of available signals, head to the Signal Hub page.

Key takeaways

Stepping back from the season, the pattern is pretty clear. The campaigns that truly performed weren’t just the ones that drove the most clicks or installs, but the ones built on signals that actually pointed to revenue. When you look at everything through that lens, a few things stand out:

  • High engagement doesn’t guarantee revenue. Many campaigns performed well on paper but didn’t deliver business impact.
  • Digital behavior isn’t enough. To find users who actually buy, you need signals rooted in real spending.
  • Signal Hub makes that possible by using anonymized, transaction-based purchase data to build smarter, higher-value audiences.
  • You can activate these enriched audiences across major media channels and measure real outcomes, not just clicks or installs.

The post What Black Friday and Cyber Monday taught us about smarter signals for 2026 appeared first on AppsFlyer.

]]>
Solving retail media measurement – your ultimate guide to earning advertiser trust https://www.appsflyer.com/blog/measurement-analytics/retail-media-measurement-guide/ Wed, 05 Nov 2025 12:04:12 +0000 https://www.appsflyer.com/?p=461641

Imagine setting sail without a compass — exciting, but fraught with uncertainty. That’s exactly how retail media networks (RMNs) can feel without the right measurement tools in place. In a fast-evolving space where every advertiser demands proof of ROI and actionable insights, accurate measurement isn’t just a bonus. It’s the anchor that keeps campaigns steady […]

The post Solving retail media measurement – your ultimate guide to earning advertiser trust appeared first on AppsFlyer.

]]>

Imagine setting sail without a compass — exciting, but fraught with uncertainty. That’s exactly how retail media networks (RMNs) can feel without the right measurement tools in place. In a fast-evolving space where every advertiser demands proof of ROI and actionable insights, accurate measurement isn’t just a bonus. It’s the anchor that keeps campaigns steady and successful.

Retail media is booming as brands are eager to capitalize on its unparalleled access to first-party data. eMarketer even argued that it is the “most important worldwide growth driver right now”, and is set to attract 22% of all digital ad spend in 2025! 

But navigating this ecosystem isn’t straightforward. From fragmented methodologies to the walled garden effect, RMNs face unique challenges that can complicate their growth and credibility. That’s where this guide comes in.

Whether you’re just launching your retail media platform or scaling an established network, this guide will serve as your roadmap. Packed with practical insights, industry best practices, and real-world success stories, it’s designed to help you build trust with advertisers, optimize campaigns, and deliver measurable results.

So, let’s begin by tackling the most crucial question: why does measurement matter in retail media, why is it broken and how can it be fixed?

WHY RMN MEASUREMENT MATTERS 

Measurement is at the heart of retail media success. It’s how RMNs validate ad spend, build trust with advertisers, and unlock growth. With 68% of advertisers ranking higher ROI than other channels as the top reason to increase budgets, proving performance isn’t optional—it’s critical.

However, delivering on measurement isn’t simple. The IAB reports that 41% of advertisers feel RMNs lag behind other channels in measurement capabilities. Inconsistent standards, fragmented methodologies, and limited transparency in “walled garden” ecosystems make it harder for brands to compare performance and fully trust the data they receive.

So, what do advertisers need to feel confident? Here’s the checklist of essentials every RMN should deliver.

Must-haves for effective RMN measurement

Use case coverage
1) Omnichannel coverage & cross-platform attribution
Seamless measurement of user journeys across onsite, offsite (Meta, Google), in-store, and conversion attribution across mobile and web.

Precision

2) Deduplicated attribution
Ensure a unified dashboard for onsite and offsite campaigns with accurate, cross-channel deduplication.

3)  SKU-level attribution
The ability to link ad spend directly to individual product sales for precise performance insights.

4)  Lift measurement
Prove campaign-driven value by isolating incremental sales and engagement. In retail media, where purchase intent is high, lift measurement prevents overestimating performance and ensures accurate optimization.


Simplicity

5) Easy-to-access reports
Visualized dashboards that provide actionable insights in a simple format—no technical expertise or SQL knowledge required. Option of simplified export to brand’s/RMN’s own BI.

6) Timely insights
Always fresh reporting to allow for real-time optimization, rather than post-campaign summaries.

7) Wide and deep reporting
Comprehensive coverage across top, mid, and bottom-funnel metrics to show the full impact of campaigns. 

⬇Download our cheat sheet on essential retail media KPIs here to ensure you’re measuring the right metrics at every stage.

Bonus point: Flexible attribution logic for tailored insights
Every campaign is unique, and so are its measurement needs. Offering adjustable attribution logic allows advertisers to align metrics with their specific goals—whether it’s customizing attribution windows, tracking cross-channel performance, or isolating incremental lift. This flexibility ensures brands can measure what matters most to them while gaining a clearer, more accurate picture of campaign success.

FINDING THE RIGHT INFLUENCERS & CHANNELS

Setting the stage for a successful retail media campaign requires careful preparation and a clear understanding of your specific use case. This section will walk you through the key steps to ensure your RMN campaigns are ready to deliver measurable success—from defining your use case to consuming actionable insights.

Understand your use case

Before launching a campaign, it’s crucial to identify your use case. This depends on several dimensions:

  • Activation type: Onsite (owned media) or offsite (third-party media like Google, Meta, DSPs, etc.).
  • Management type: Managed (activated by RMN) or self-service activation (activated by brand).
  • Conversion platform: Channel specific or cross-platform conversions.
  • Conversion location: Conversion occurs on the RMN side or the brand side.

These dimensions determine the level of measurement detail possible. For instance, onsite campaigns with conversions on the RMN side enable user-level attribution, while when not approached correctly offsite campaigns often rely on aggregated insights provided by media platforms. 

In such cases, the level of precision depends on the capabilities of the data collaboration platform used for measurement. Here’s a snapshot of how these combinations create 12 unique use cases:

  • Onsite / Conversion on mobile app / Brand side: A brand placed an ad on the RMN mobile app. Users clicked on the ad, which redirected them to the brand’s app, where they completed their purchases.
  • Onsite / Conversion on mobile app / RMN side: A brand placed an ad on the RMN mobile app. Users clicked on the ad, which led them to the product page within the RMN app, where they completed their purchases.
  • Onsite / Cross-platform conversion / Brand side: A brand placed an ad on the RMN mobile app. Users clicked on the ad, which redirected them to the brand’s website, where they completed their purchases.
  • Onsite / Cross-platform conversion / RMN side: A brand placed an ad on the RMN website and app. Users clicked on the ad on the website, which redirected them to the brand’s website to complete their purchases.
  • Offsite / Self-serve / Conversion on mobile app / Brand side: A mobile app utilized RMN data to create segments and launch campaigns on Meta. Users on Meta clicked the ads and were directed to specific locations within the brand’s app, where they completed the target action, such as registration.
  • Offsite / Self-serve / Conversion on mobile app / RMN side: A brand leveraged RMN data to target segments and launch campaigns on Meta. Users on Meta clicked the ads and were directed to specific locations within the RMN’s app, where they completed the purchase of the brand’s product.
  • Offsite / Self-serve / Cross-platform conversion / Brand side: A brand leveraged RMN data to create segments and launch campaigns on Meta. Users on Meta clicked the ads and were directed to the brand’s website to complete the purchase.
  • Offsite / Self-serve / Cross-platform conversion / RMN side: A brand leveraged RMN data to create targeted segments and launch campaigns on Meta. Users on Meta clicked the ads and were directed to the RMN’s website to complete the purchase of the brand’s product.
  • Offsite / Managed / Conversion on mobile app / Brand side: An RMN activated a campaign on Meta to drive conversions on the brand’s mobile app, such as installs.
  • Offsite / Managed / Conversion on mobile app / RMN side: An RMN activated a campaign on Meta to drive sales of the brand’s products on the RMN’s app.
  • Offsite / Managed / Cross-platform conversion / Brand side: An RMN activated a campaign on Meta to drive sales on the brand’s website.
  • Offsite / Managed / Cross-platform conversion / RMN side: An RMN activated a campaign on Meta to drive sales of the brand’s products on the RMN’s website.

Select a measurement partner

Understanding your use case helps you ask the right questions when choosing a measurement partner—typically a data collaboration platform. The key factor is whether your use case enables user-level data insights, which ensure precision, unbiased attribution, and deduplication. Why you need a data collaboration platform for measurement:

  • Connecting data sets owned by different parties
    Engagement data comes from the ad platform, while conversion and mapping data belongs to the party where the conversion happens (e.g., the RMN or the brand). A data collaboration platform bridges these datasets, allowing to run the attribution process while ensuring business privacy.
  • Enabling user-level precision
    User-level data ensures precise, unbiased, and deduplicated measurement. Most platforms struggle to access this data in walled gardens—except AppsFlyer. As both a measurement provider and a data collaboration platform, AppsFlyer has unique access to user-level insights from walled gardens, guaranteeing the highest measurement accuracy.

Prepare your data

Think of retail media measurement as a well-balanced tripod—supported by three essential components: engagement data, conversion data, and campaign mapping. Each plays a critical role in delivering accurate, actionable insights.

  • Engagement data. Engagement data comes from the platform where the campaign is activated — be it onsite or offsite on channels like Google, Meta, TikTok, or DSPs, or both.
  • Conversion data. Conversion data is sourced from the party responsible for facilitating the conversions—whether it’s the retailer itself or the brand. This data provides the other half of the story, capturing the results of campaign activity.

Campaign mapping.  This dataset links the campaign_id to the brand it belongs to and its conversion targets, typically SKUs.

With your data in place, it’s time to configure the measurement platform and start getting the insights. 

Share insights with your brand partners

The final step in retail media measurement is all about making insights actionable and accessible — not just for you, but for your brand partners. Reports should be easy to understand, visually clear, and flexible enough to meet different needs.

Here’s what makes a great reporting experience:

  • Visualized simplicity. Clean, easy-to-read dashboards help marketers quickly grasp key metrics, without wading through complex data tables or relying on data science teams. Visual insights save time and make decision-making faster.
  • Export options. Allowing governed report exports in multiple formats ensures partners can integrate the data into their own systems or workflows, making collaboration seamless.
  • Deep-dive insights. While surface-level dashboards are great for quick overviews, the ability to dig deeper through customizable queries lets advanced users uncover specific trends or address unique questions.

By sharing clear, flexible, and actionable reports with your brand partners, you strengthen the relationship and drive smarter campaign optimizations together. After all, the best insights are the ones everyone can use.

BEST PRACTICES THAT BUILD TRUST AND DRIVE AD REVENUE

To ensure successful campaign measurement and maximize trust with your brand partners, follow these best practices:

  1. Set clear expectations upfront
    Before launching a campaign, align with your brand partners on their goals and the KPIs that will best represent them. Confirm with your measurement partner that these KPIs are measurable within the platform to avoid surprises later.
  2. Rely on user-level insights
    User-level data is essential for precise measurement. Unlike aggregated data, it ensures accurate attribution, unbiased insights, and deduplication across both onsite and offsite campaigns.
  3. Leverage SKU-level measurement
    When relevant, connect ad spend to individual product sales with SKU-level attribution. This granularity not only provides detailed performance insights but also builds confidence in campaign ROI.
  4. Run lift analysis for incremental uplifts
    Prove your campaign’s unique value with lift analysis. In onsite retail media, users are already motivated to buy, making it crucial to separate organic from ad-driven conversions. In offsite media, where intent is lower, lift analysis reveals the true impact of your ads.

  5. Adhere to industry KPI standards
    Ensure consistency in how metrics like conversion rate are calculated, following standard industry practices to make reports clear and comparable.
  6. Offer data verification
    Trust is critical. With onsite activation, RMNs control engagement data, and brands often question the accuracy of impressions calculations. This lack of transparency can erode confidence. To address this, offer your brand partners the option to use third-party tools, such as the AppsFlyer SDK, which has access to engagement data in mobile app environments. These tools enable independent verification of engagement metrics, ensuring accuracy and building trust between RMNs and advertisers.
  7. Simplify the brand experience
    Make data and insights easy to consume. Provide visualized dashboards that require no technical expertise or SQL skills, and ensure reports are delivered without delays for timely decision-making.

By implementing these best practices, you’ll foster trust with your brand partners, improve campaign performance, and set a strong foundation for long-term collaboration.

HOW WOLT ADS ACHIEVED MEASUREMENT EXCELLENCE

Wolt Ads, part of the global food and retail delivery leader DoorDash, supports brand advertisers with retail media solutions across 29 markets. With over 42 million registered users, Wolt Ads set out to establish itself as a go-to platform for impactful, data-driven campaigns.

To meet advertiser demands and scale its retail media solutions, Wolt Ads needed to:

  • Deliver comprehensive reports that link ad spend directly to sales.
  • Streamline operations with tools that reduce dependency on technical resources.
  • Provide a privacy-centric solution that ensures data accuracy and trust.

By leveraging AppsFlyer’s Data Collaboration Platform (DCP), Wolt Ads unlocked critical capabilities:

  1. Efficiency & scale
    With onsite activation and conversions happening directly on Wolt, data operations were simplified. As an existing AppsFlyer customer, Wolt benefited from seamless access to all required data for measurement, enabling quick and effortless campaign execution.
  2. Accurate measurement
    AppsFlyer’s DCP provided end-to-end attribution, connecting ad spend to sales with SKU-level granularity. This level of precision ensured advertisers could see exactly how campaigns drove results.

The results of the partnership were impressive:

  • 32% Revenue Uplift: Campaigns using AppsFlyer’s DCP outperformed those without by a significant margin.
  • 4x ROAS: Return on ad spend far exceeded expectations.
  • 159% Penetration Rate Increase: The campaign expanded the ice cream brand’s audience reach.
  • 148% Category Share Uplift: Enhanced targeting and personalization led to category dominance.

We had a really easy way to offer custom, bespoke audiences, and we could measure everything easily within the same platform. That end-to-end experience, along with the results, proves that this truly delivers value for our partners.

Catalina Salazar,Head of Wolt Ads

With these results, Wolt Ads is now equipped to scale its retail media offering across all 29 markets, driving measurable outcomes for its partners and strengthening its position as a trusted retail media network.

Key Takeaways

Retail media measurement is the foundation for building trust with advertisers and driving campaign success. Here are the key points to keep in mind:

  1. Measurement drives ad revenue
    With 68% of advertisers ranking ROI as their top priority, proving performance is critical. Transparent and accurate measurement validates ad spend and strengthens advertiser relationships.
  2. User-level data is essential
    Campaigns fueled by user-level data provide precise attribution, unbiased insights, and deduplication—key for both onsite and offsite activations. This level of detail ensures confidence in campaign outcomes.
  3. Flexibility and granularity matter
    Flexible attribution logic and granular insights, such as SKU-level measurement, empower brands to tailor campaigns and focus on metrics that truly reflect their goals.
  4. Simplify the experience to meet brand expectations
    RMNs, while complementing traditional media, also compete with it. Brand-side marketers are accustomed to the seamless, self-service reporting provided by major media players, making retail media’s often resource-intensive nature a challenge. To remain competitive, RMNs must simplify the experience for brands by offering intuitive, easy-to-use reporting tools and reducing operational complexity. This ensures that retail media remains a highly valuable yet accessible strategy for advertisers.
  5. Preparation sets the stage for success
    Defining your campaign use case and aligning data—engagement, conversions, and mapping—ensures your campaigns are set up for measurable success.

By implementing these practices, retail media networks can not only build trust with advertisers but also demonstrate their value, ensuring sustained growth and long-term partnerships.

The post Solving retail media measurement – your ultimate guide to earning advertiser trust appeared first on AppsFlyer.

]]>
Beyond walled gardens: Maximize retail media performance in 2026 https://www.appsflyer.com/blog/measurement-analytics/retail-media-performance/ Sun, 24 Aug 2025 12:00:00 +0000 https://www.appsflyer.com/?p=460460 Retail media: centralize dados de performance onsite e offsite - featured image

Retail media networks (RMNs) have always been comfortable playing on their home turf: leveraging owned channels, enjoying reliable visibility, and delivering consistent outcomes.  But expanding their reach to offsite platforms has often felt like venturing into uncertain terrain filled with unknowns. Fragmented measurement, unclear attribution, and limited visibility into performance are real barriers that marketers […]

The post Beyond walled gardens: Maximize retail media performance in 2026 appeared first on AppsFlyer.

]]>
Retail media: centralize dados de performance onsite e offsite - featured image

Retail media networks (RMNs) have always been comfortable playing on their home turf: leveraging owned channels, enjoying reliable visibility, and delivering consistent outcomes. 

But expanding their reach to offsite platforms has often felt like venturing into uncertain terrain filled with unknowns. Fragmented measurement, unclear attribution, and limited visibility into performance are real barriers that marketers frequently face.

Yet, offsite is where the biggest growth opportunities lie. Brands want broader reach, precise targeting, and measurable results, which offsite platforms clearly offer. So why haven’t more RMNs jumped in?

The problem isn’t ambition, it’s clarity in measurement

Marketers face several measurement challenges when dealing with multiple platforms. Data often arrives fragmented, leading to manual consolidation that’s tricky and error-prone. It can be difficult to pinpoint exactly which engagement triggered a conversion, resulting in uncertainty. There’s also the risk of double-counting conversions across channels, inflating numbers and causing confusion. 

Here’s a scenario many retail media networks face:

A leading retail marketplace offers brands a managed media package that includes onsite ads, social activation via Meta, and campaigns on the open web. About 60% of conversions happen on mobile and 40% on the web. Brands want a clear view of their campaign performance to optimize spending effectively, but reality is different.

Currently, each media source provides separate reports, forcing brands to dedicate significant engineering resources to merge fragmented data. Without a unified view, optimization relies heavily on guesswork.

A new way forward: Consolidation and clarity

To address these challenges, AppsFlyer developed a measurement capability within its Data Collaboration Platform (DCP), integrating both mobile and web performance data from Meta and Google. This provides a cohesive dashboard that merges onsite and offsite campaign data.

A key part of this unified view is deduplication. AppsFlyer uses its access to user-level engagement data from walled gardens through MMP integrations and from onsite environments—to deliver last-touch mobile conversion attribution that’s already deduplicated across onsite and offsite channels.

On top of that, DCP brings in web conversions and applies a simple rule: only share them with Meta or Google if the click ID is tied to an active campaign, and if its timestamp is the closest one to the conversion. This validation happens across both onsite and offsite sources, so each conversion is shared just once. That way, even measurement coming back from the platforms themselves is clean and deduplicated.

DCP then brings everything together into a single view — across channels and platforms.

A new way forward: Consolidation and clarity

Confidence to expand offsite

With clear insights, RMNs have a straightforward view of their campaign performance, making optimization and decision-making simpler. Better measurement means retail media networks can confidently expand their offsite marketing efforts. Clear visibility across mobile and web channels, accurate attribution, and user-friendly dashboards remove uncertainty, providing marketers with simple, actionable insights.

Stepping into offsite territory no longer means uncertainty. Clear measurement is finally within reach, helping retail media networks fully use the potential of offsite advertising.

Key takeaways

Looking to expand into offsite while keeping visibility and performance top of mind? Start here:

  • Give your partners one version of the truth
    A unified view across onsite and offsite, mobile and web, helps eliminate reporting gaps and confusion.
  • Make duplication a thing of the past
    Smart deduplication logic ensures each conversion is counted once—so everyone’s working with clean, trusted data.
  • Simplify what used to take engineering hours
    No more stitching together reports from Meta, Google, and your own platform. Let your team focus on insights, not integration.
  • Measure what matters, wherever it happens
    Whether a conversion happens onsite or offsite, on web or in-app, you’ll have the full picture—ready to act on.

FAQ

The post Beyond walled gardens: Maximize retail media performance in 2026 appeared first on AppsFlyer.

]]>
Inside the data handshake powering the new wave of retail media https://www.appsflyer.com/blog/mobile-marketing/unlock-retail-media-growth-with-smarter-data-collaboration/ Wed, 21 May 2025 12:59:00 +0000 https://www.appsflyer.com/?p=460538 data-collaboration-retail-Featured

Back in 2021, when industry analyst Eric Seufert said, “everything is an ad network,” it sounded pretty bold. At the time, it was mostly the Walmarts and Amazons of the world playing in this space — not the tidal wave of specialty retailers, grocers, airlines, and banks we’re seeing today. Fast forward a few years, […]

The post Inside the data handshake powering the new wave of retail media appeared first on AppsFlyer.

]]>
data-collaboration-retail-Featured

Back in 2021, when industry analyst Eric Seufert said, “everything is an ad network,” it sounded pretty bold. At the time, it was mostly the Walmarts and Amazons of the world playing in this space — not the tidal wave of specialty retailers, grocers, airlines, and banks we’re seeing today.

Fast forward a few years, and Eric’s call is looking spot-on.

Today, companies across every sector are spinning up ad networks, fueled by consumer data. It’s not just about selling products anymore — it’s about building new revenue streams and unlocking bigger business opportunities. Off-site retail media ad spending is projected to grow 163.6% between 2024 and 2028, reaching $28.05 billion by 2028 (eMarketer). 

But none of this works without trust.

At the core of every ad network lies an unspoken deal — consumers share their data, and in return, they expect value. That value exchange depends on two key seemingly contradictory ingredients: privacy and personalization. And the secret sauce behind it all? Data Collaboration.

Everything is [indeed] an ad network

Retail media networks might have kicked things off, but the ad network model isn’t just for big box stores anymore. Today, you name it, they have it — fitness apps, airlines, banks, etc.. Even finance giants like PayPal, Chase, Revolut, and Mastercard have jumped on the media network bandwagon  in a big way over the last 18 months.

The retail media surge  is creating massive opportunities — not just for brands building their own networks, but for industries that have traditionally been relatively data-poor, like consumer packaged goods (CPG). With access to richer datasets, these brands can tap into new revenue streams and create much sharper and smarter customer experiences.

The value exchange: privacy, personalization, and consent

Collecting data isn’t enough anymore.

Consumers are smarter — they know their data has value, and they expect something meaningful in return. Consent is step one. But personalization? That’s the real payoff.

People don’t hate ads — they hate bad irrelevant ads.

When companies like United Airlines or PayPal use data thoughtfully, they’re not just pushing ads — they’re delivering experiences that feel helpful and personal.

Imagine landing in a new city and getting a curated list of restaurants you’ll actually want to visit. That’s personalization done right. It builds trust because consumers see that their data is making their lives easier, not just fueling sales.

Personalization starts with collaboration

No single company has a full picture of their customers anymore.

To deliver real personalization, brands need to work together — securely and responsibly.

That’s why data collaboration is becoming a must-have.

For example, a bank might partner with a retailer to deliver smarter credit card offers based on shopping habits. Or an airline could suggest local events based on a customer’s  travel plans. CPG brands can finally break out of broad, scattershot marketing by partnering with platforms that have richer, real-time customer insights.

These collaborations create a win-win-win:

  • Consumers get more relevant experiences.
  • Brands unlock better targeting and new revenue.
  • Networks deepen their value and extend their reach.

At the end of the day, it’s about meeting consumers where they are — with the right message, at the right moment.

Privacy tech and standards are the backbone

Collaboration only works if it’s built on trust, and that trust must be underpinned by real privacy solutions and clear standards. Last November, the FTC clearly stated that  the ecosystem needs transparent, enforceable definitions, not just marketing slogans.

Privacy-preserving technologies like data clean rooms, differential privacy, and secure multi-party computation are reshaping the way brands can collaborate without exposing consumer identities. But without consistent standards, even the best technology can’t protect the fragile trust being built.

Measurement: proving value in a privacy-first world

As brands embrace collaboration and privacy-driven innovation, measurement becomes the next critical pillar. Accurately measuring the effectiveness of retail media networks, partnerships, and personalization efforts is essential for sustaining trust and demonstrating real business impact.

Measurement can be challenging, especially when attributing conversions accurately across different platforms, which often leads to double-counting and skewed results. Similarly, accurately linking ad spend to specific product sales, especially in complex retail environments, poses another significant challenge. For strategies on navigating these complexities, check out AppsFlyer’s Retail Media Measurement Guide.

Key takeaways


Navigating the new world of ad networks and data collaboration can be complex—but focusing on these core elements will help your brand build stronger consumer relationships and unlock lasting business growth.

  • Build trust first: If you want consumer data, offer real value in exchange. Privacy isn’t a nice-to-have; it’s the foundation.
  • Team up to personalize: You can’t win alone. Collaborate securely with partners to gain deeper insights and deliver experiences your customers truly value.
  • Measure everything: If you can’t show results, you can’t keep trust. Clear, transparent measurement helps everyone understand what’s working—and why.
  • Tech and standards matter: Solid technology and clear privacy standards aren’t optional. They’re essential tools for protecting your relationships with customers and partners.

The post Inside the data handshake powering the new wave of retail media appeared first on AppsFlyer.

]]>
DCR out, DCP in: Why retail media can’t deliver on its promise with data clean rooms alone https://www.appsflyer.com/blog/trends-insights/dcr-out-dcp-in/ Thu, 30 Jan 2025 10:33:54 +0000 https://www.appsflyer.com/?p=452503

In the dynamic world of data-driven marketing, the standalone data clean room (DCR) is being overshadowed by a new star (and acronym – always industry favorites) —the data collaboration platform (DCP). Once known as the ultimate solution for addressing siloed data, privacy regulations, and diminishing identifiers, standalone clean rooms promised secure and private data sharing […]

The post DCR out, DCP in: Why retail media can’t deliver on its promise with data clean rooms alone appeared first on AppsFlyer.

]]>

In the dynamic world of data-driven marketing, the standalone data clean room (DCR) is being overshadowed by a new star (and acronym – always industry favorites) —the data collaboration platform (DCP).

Once known as the ultimate solution for addressing siloed data, privacy regulations, and diminishing identifiers, standalone clean rooms promised secure and private data sharing between partners. Yet, their limitations—high costs, engineer-heavy operations, and limited scalability—have exposed their shortcomings.

Today, clean rooms are finding their true potential not as independent entities but as integrated features within broader platforms. These platforms are redefining how marketers collaborate, unlocking scalable insights, and enabling actionable strategies to turn overlapping data into meaningful outcomes. This shift signals a pivotal moment for the industry: a move from isolated solutions to interconnected ecosystems that drive real value.

In this blog we’ll review some of the key characteristics that differentiate data collaboration platforms from data clean rooms. I’ll also explain that although both are important for advertisers and commerce media networks, a DCR on its own is not a sufficient solution in today’s landscape.

Data Clean Room → Data Collaboration Platform

A data clean room (DCR) provides businesses with a secure environment for seamless data collaboration. Within these protected spaces, multiple entities can effectively merge sensitive data without undermining privacy or security.

However, simply combining data within a data clean room lacks substantial value for marketers without additional applications. In fact, DCR technology has been largely commoditized and has transitioned into a more evolved state, marked by its transformation into a Data Collaboration Platform (DCP).

Unlike the traditional DCR setup, a DCP not only facilitates the secure merging of 1st party data (1PD) but also acts as a centralized hub for data collaboration processes. This includes functionalities such as audience creation, campaign activation across diverse channels supporting both endemic and non-endemic scenarios, and most significantly, the ability to measure campaign effectiveness.

Why measurement still falls short

In recent months, we have engaged with over 200 advertisers and commerce media networks (CMNs). According to Nielsen, while all advertisers recognize the potential of personalized campaigns tailored to well-segmented audiences, a significant 84% of brands prioritize measurement as their top consideration when contemplating a major shift in budget allocation from existing channels to CMNs.

Although the necessity for measurement is straightforward, it’s important to dive deeper into the relevant challenges. The CMN attribution model resembles a three-legged stool, necessitating engagement data (views), conversion data (transactions), and a mapping mechanism to connect various touchpoints.

It’s also worth highlighting that brands often seek specific event or SKU-level attributions rather than broader attributions.

Adding another layer of complexity, campaigns may operate across multiple platforms including mobile, web, open web, CTV, etc., and can run either on-site (utilizing CMN digital assets) or off-site. This multi-platform  landscape introduces another challenge as off-site campaigns can run on either the advertiser or the CMN account, which makes measuring campaign results even more complex.

  1. Total Cost of Ownership – Most DCPs require their customers to manually upload data to provide engagement or conversion insights, increasing operational complexity and costs.
  2. Fragmented Use Cases – There are at least 12 distinct use cases where engagement and conversion data are owned by different parties across various platforms. Most DCPs fail to offer a comprehensive solution that addresses all these scenarios.
  3. Tamper-Proof Data – Since advertisers pay CMNs based on performance, ensuring data integrity is crucial. However, most DCPs cannot provide a tamper-proof mechanism for attribution data through automated methods like SDKs or APIs. Instead, they rely on file exchanges, which are prone to tampering, human error, or inaccuracies.

Self service audience building is the secret sauce

Despite holding exclusive and highly valuable data on consumer purchasing preferences and behaviors, CMNs encounter various obstacles when attempting to provide this information to advertisers in the form of pre-built audiences:

  • Advertisers’ Business Logic – Each advertiser possesses specific industry knowledge and tailors campaigns according to distinct objectives. For instance, a financial services company may not understand how to create an audience for a consumer packaged goods (CPG) brand, just as US retailers may not grasp what European fashion brands find valuable within their 1PD.
  • Data Consumption -All businesses collect data to serve their own business needs. CMNs are not different in this aspect. However, when a CMN needs to make this data accessible to advertisers, they now run into a number of challenges:
    • Privacy restrictions: as CMNs don’t want to expose one brand to the purchasing data of another brand.
    • Lacking or no UI: Advertisers must be able to segment this data in a simple user interface without any coding
    • Unuseful data: CMNs must be able to group the data in a useful manner. For example, a CMN has data about how much customers spend in a relevant category, while the advertiser wants to build more granular audiences, e.g. for consumers who spend more than a certain amount a year and have purchased in the last 3 months.
  • Scale and Maintenance – Developing and maintaining specialized audiences for different advertisers, continuously updating these datasets without proper infrastructure, is an arduous and costly process for CMNs. Customizing these audiences to optimize campaign performance for each advertiser further compounds the challenge.

A unified solution delivers a 1.5x uplift

The silver lining is that a unified solution addresses these problems, allowing CMNs to upload data once to help brands independently construct their audiences. Implementing this approach not only resolves existing challenges but also boosts campaign results by 1.5 times compared to audiences generated solely by the CMN, according to an analysis we’ve performed for a number of AppsFlyer clients.

Innovating towards an Interoperable Data Clean Room

When discussing the evolution of DCRs, it is vital to recognize that not all are created equal. This discrepancy largely stems from the fact that various clean rooms were tailored to different user profiles.

Clean rooms established by major cloud providers were designed for developers who sought  to build their applications on top of the technology. Conversely, other solutions embedded within a DCP were crafted as a component of a comprehensive solution primarily aimed at marketers, who were looking for an end-to-end solution for leveraging 1st/3rd PD.

With the commoditization of clean room technology, we now observe isolated data silos maintained by both CMNs and advertisers. Due to the reluctance of all parties to transfer data between multiple DCRs—driven by risks, high costs, and privacy concerns—interoperability becomes essential. This entails seamless collaboration across diverse data types, sizes, formats, cloud providers, and data warehouses, enabling DCRs to address the key challenges faced by advertisers effectively.

As DCR vendors grasp this challenge, we’re now witnessing the emergence of early versions of interoperability between clean rooms. The primary hurdles lie in enhancing the customer experience and ensuring the provision of essential functionalities like campaign measurement across clean rooms.

Key takeaways

From Clean Rooms to Collaboration Platforms:

Standalone data clean rooms (DCRs) have evolved into data collaboration platforms (DCPs), offering enhanced scalability, audience creation, campaign activation, and measurement, making them more effective for marketers.

Measurement Remains a Challenge:

Retail media struggles with attribution complexity across platforms, engagement and conversion locations which impact CMN proof advertisers the quality for their data and inventory

Self-Service Audience Building is Key:

Empowering advertisers with no code/SQL user-friendly tools for audience creation, using their own business logic boosts campaign performance by 1.5x, showcasing the value of simplifying data collaboration.

The post DCR out, DCP in: Why retail media can’t deliver on its promise with data clean rooms alone appeared first on AppsFlyer.

]]>
Unlocking ROI: The power of collaboration in creative optimization https://www.appsflyer.com/blog/tips-strategy/collaboration-creative-optimization/ Tue, 11 Jun 2024 11:57:58 +0000 https://www.appsflyer.com/?p=428463

Dani, a UA manager at a mobile gaming company, logs into the dashboard to check the performance of the holiday campaign. She notices that the numbers are lower than expected. With the holiday season well under way, there’s no time to waste. She needs fresh creatives to boost engagement and conversions. Meanwhile, across the office, […]

The post Unlocking ROI: The power of collaboration in creative optimization appeared first on AppsFlyer.

]]>

Dani, a UA manager at a mobile gaming company, logs into the dashboard to check the performance of the holiday campaign. She notices that the numbers are lower than expected. With the holiday season well under way, there’s no time to waste. She needs fresh creatives to boost engagement and conversions. Meanwhile, across the office, the creative team is deep into brainstorming for another campaign. They’re unaware of the issues with the holiday campaign’s creatives. Their focus is on their current project, unaware of the urgent need for adjustments.

This scenario highlights a common challenge in the company’s workflow. The lack of communication between the UA team and the creative team leads to missed opportunities and wasted resources. If only there was a smoother collaboration process, they could address issues promptly and optimize campaigns effectively.

The critical gap lies in the inadequate collaboration among User Acquisition (UA) managers, creative strategists, and Business Intelligence (BI) engineers. These key stakeholders often struggle with ineffective collaboration. This leads to significant inefficiencies, diminished campaign performance and wasted resources. The synergy between these roles is not just beneficial—it’s imperative. A recent McKinsey study revealed a remarkable 25% uptick in overall marketing campaign effectiveness among companies that champion cross-functional collaboration.

This substantial uplift not only boosts ROI but also underscores the critical need for cohesive operations among UA managers, creative strategists, and BI engineers. We’ll look into the challenges and inefficiencies that arise from a lack of collaboration as well as the dynamic potential of collaborative optimization. We’ll also explore how advanced tools, including AI and report aggregation platforms, can do much more than just bridge gaps. They actively enhance and accelerate the creative and analytical processes, propelling marketing teams toward more effective and efficient campaign executions. Through this exploration, the transformative power of strategic unity and technological aids becomes clear.

The impact of siloed work on campaign performance

Disconnected teams often result in redundant efforts, conflicting strategies, and ultimately, missed opportunities for optimizing campaign performance. This disconnection manifests in various impactful ways:

The “Hunch”

Creative strategists frequently develop new marketing creatives based on instincts and qualitative feedback, rather than relying on real-time, data-driven insights that could lead to more targeted and effective content. On the other side, UA managers, who handle critical performance data, face substantial barriers in sharing this valuable information effectively and quickly with the creative team. The root of this problem often lies in the lack of a user-friendly platform for data sharing.

Out-of-sync data

We’ve also seen cases in which UA managers extract data into cumbersome Excel sheets to build their presentations. This is not only time-consuming but also occurs infrequently—sometimes only every few weeks or months—due to the involved labor and time. Creatives are then often out of sync with the latest data insights, leading to misalignment.

Collaboration in Creative Optimization - out of sync data

Divergent objectives

UA managers and creative strategists often clash in their primary objectives. UA managers may focus on quantitative metrics such as conversions and cost per acquisition, while creative strategists emphasize visual appeal and alignment with brand messaging. This lack of unity not only undermines the effectiveness of individual campaigns but can also impact the overall marketing strategy, leading to missed opportunities and decreased campaign performance.

This conflict sometimes goes even further. Top-performing ads can be off-brand, causing friction when UA managers push to run these high-performing ads. Meanwhile, creative strategists may advocate for on-brand ads that don’t perform as well. This classic industry conflict highlights the challenge of balancing performance metrics with brand integrity.

Delayed, inconsistent data

BI engineers are critical. They primarily focus on “plugging the pipes” (that is, connecting the data to the right systems) and make vast amounts of data available, enabling others to perform analysis and derive insights. Yet engineers working independently without effective collaboration with UA managers and creative strategists leads to significant issues in communication and data use. BI engineers often look at data and insights differently from how creative strategists and UA managers might. Without a common framework for data interpretation and decision-making, these differing perspectives can lead to conflicting interpretations of the same campaign performance metrics.

That’s why regular and structured data-sharing sessions are needed, as well as a unified analytics framework that all team members can understand and apply.

Compromised feedback loops

The absence of open communication channels can severely compromise the effectiveness of feedback loops. When communication stalls, feedback on campaign performance, creative content, and strategic direction becomes delayed or even unactionable. This is particularly detrimental when quick iterations on creatives and rapid deployment of A/B testing are necessary to continuously refine and optimize campaigns.

The lack of timely, effective feedback leads to slower campaign optimization, resulting in missed opportunities. For teams to act as a well-oiled “machine” or creative production engine, a robust system must be in place to facilitate swift and ongoing communication.

An often-overlooked creative challenge is the integration of concept with placement. Strategists may at times conceive ideas in isolation, crafting concepts that may excel in creativity but falter in context-specific suitability—be it as a rewarded video, within a social media feed, or on a platform like TikTok. Such a misalignment can force a concept to fit where it doesn’t naturally belong—much like forcing a square peg into a round hole. Emphasizing collaboration early in the creative process ensures that concepts are tailored for optimal placement from the start, preventing abstract ideation and aligning creative vision with practical execution.

Real-world consequences

Real-world examples show the tangible consequences of poor teamwork in creative optimization and overall campaign success. In one common scenario, the creative team creates a set of creatives based on their understanding of campaign needs, only for the UA manager to find them unusable as they don’t match the required specifications or marketing objectives. This not only wastes time as the creative team goes back to the drawing board, but also delays the entire campaign rollout, resulting in lost opportunities and resources.

We also see loss in flexibility. Often, previously successful ad creatives suddenly stop resonating. Ideally, a nimble, collaborative team would quickly analyze performance data, understand the drop in engagement, and iterate on new creatives. But in a siloed environment, this critical information might not reach the creative team in time, or the feedback loop is so delayed that by the time the issue is identified, the campaign has already suffered significant setbacks. What could have been a quick pivot turns into a prolonged, inefficient effort.

This communication challenge is compounded further in companies who manage multiple apps or brands, with each potentially having different data analysts assigned. When these analysts work in isolation, responsible only for their specific apps without a unified communication strategy, the lack of cohesion can lead to inconsistent interpretations and strategies across the board. 

Five Benefits of Collaborative Optimization

Bad communication can be devastating to a campaign. But when collaboration is effective, the positive impacts on creative optimization and overall marketing campaign performance are profound and multifaceted. This synergy not only enhances the efficiency of operations but also drives more impactful and targeted campaign outcomes. Here are 5 benefits:

1. Integrating data-driven insights with creative strategy

When creative teams incorporate data-driven insights provided by BI engineers into their creative process, the outcome is far more impactful. This integration ensures that each creative decision is informed by real-time data about audience behavior and campaign performance.

2. A comprehensive approach to campaign optimization

Collaboration allows campaign optimization to be holistic. UA managers, with their focus on performance metrics, can guide the creative team in understanding which elements of the campaign are driving ROI and where improvements are needed. BI engineers can then analyze vast amounts of data to provide actionable insights that help refine these strategies further. Every aspect of the campaign is then fine-tuned for maximum impact.

3. Better, more agile decision-making

Armed with these comprehensive insights, teams can then make decisions based on a complete understanding of the campaign dynamics. This means faster iterations and A/B testing, as well as more timely recalibrations. Collaborative efforts ensure that the marketing team stays agile, adapting quickly to new information while continuously optimizing strategies.

4. Cross-disciplinary knowledge sharing

Each team member brings a unique set of skills and expertise to the table. UA managers understand the nuances of campaign optimization strategies, creative strategists contribute with innovative design and content ideas, while BI engineers bring critical data analytics capabilities. These diverse perspectives and skills are shared through collaboration, enriching approach and execution. This fosters innovation and builds a more robust understanding of what works best, promoting continuous improvement in strategies.

5. Aligned goals

Collaboration in Creative Optimization - aligned goals

Aligning goals, strategies and resources lead to the creation of optimized creatives and targeted messaging that are backed by data-informed decisions. This unity not only streamlines the marketing process but also enhances the effectiveness of the campaigns, driving better performance and tangible results. Critically, all parties should also agree on which sources and metrics should be used consistently to avoid confusion and inefficiencies. This alignment fosters a cohesive team environment with fewer conflicts and a better culture, demonstrating that the team worlds well together. This approach not only delivers superior results but also strengthens team dynamics and workplace satisfaction.

Collaboration works

And this is more than just theory. Collaborative optimization delivers tangible, research-backed improvements in ROI, conversion rates and customer acquisition. Deloitte’s collaboration with the MIT Sloan Management Review, for instance, found that 53% of businesses leveraging cross-functional teams witnessed significant performance enhancements. Moreover, HubSpot reveals that 28% of marketing leaders recognize “collaborating across teams when planning marketing activities” as a top method for increasing visibility within the company.

The evidence is clear. Teamwork not only brings isolated project benefits but also contributes to enduring organizational success. Effective creative optimization demands not just the unity of efforts but a strategic alignment of goals across all key stakeholders. Fostering a culture that prioritizes shared insights and collaborative engagement unlocks the full potential of marketing campaigns, ensuring both immediate impact and sustained growth.

Best Practices

Optimizing creatives is also about embracing a set of best practices that are about integrating the right tools and enhancing human collaboration. Businesses must therefore combine the right tools, policies and practices.

Choosing the right “Tech stack”

Sophisticated tools are key to boosting synergy across cross-functional marketing teams. Ensuring your “tech stack” is equipped with robust tools like Adverity, Datorama, and AppsFlyer Creative Optimization is crucial. These platforms offer advanced features that enable data-driven decision-making, efficient creative testing, and streamlined campaign optimization. This fosters a collaborative environment that’s faster, easier, and more transparent.

For example, AppsFlyer’s Creative Optimization solution enhances teamwork by allowing for rapid testing of creatives and quick adjustments based on real-time feedback. This ensures that all stakeholders can make informed decisions promptly, boosting both the agility and effectiveness of marketing campaigns.

Meanwhile, tools such as A/B testing platforms, creative analytics software, and project management tools are also key for improving collaboration between teams. A/B testing platforms enable teams to quickly test and refine creative strategies based on real-time data. Creative Analytics software offers detailed insights into creative performance, helping teams tailor content and optimize marketing strategies effectively. Project management tools (e.g., Asana or Slack) are also crucial for maintaining clear communication and coordination, ensuring everyone is aligned on project goals and progress. Collectively, these tools significantly improve data sharing, communication, and the decision-making process across marketing teams.

Appointing a Creative Strategist

But marketing campaigns also require an overall manager. Enter the creative strategist, a role that’s become crucial for enhancing cross-functional collaboration and driving effective creative optimization. As growth consultant and former creative strategist Marcus Burke emphasizes, creative strategy involves refining the often chaotic process of creative ideation into a structured, repeatable workflow that not just spawns innovative ideas but also ensures continual improvement over time.

“… Make sure someone in the team owns this process,” explains Burke. “Otherwise you tend to default to this hit or miss approach where ads are created randomly because someone saw a competitor ad that was interesting or had a good idea. This isn’t predictable and you won’t be able to scale.”

Creative strategists play a pivotal role in breaking down the silos between design, analytics, and user acquisition teams. They act as the linchpin that ensures all teams are aligned, sharing insights across the board to optimize creative outputs. For example, by incorporating early performance metrics into the creative development process, creative strategists can help preemptively adjust strategies, saving time and resources while enhancing campaign effectiveness.

Case studies

Many businesses have found success through the above practices. Ace Games had difficulty identifying effective creative elements and consolidating metrics from different networks. Through the effective use of creative optimization and aggregated reporting tools, the company upped the percentage of successful creatives from 55% to 80% and boosted its click-through rate (CTR) by 52%. This transformation was not just about improving numbers but also about enhancing cross-functional collaboration within the company. The new tools and data insights allowed various teams, from artists to UA managers, to gain a clearer understanding of performance metrics and focus areas, fostering a more inclusive and informed creative process.

Smartmove JSC, a prominent player in the APAC mobile gaming market, also dramatically enhanced its user acquisition (UA) efficiency and boosted ROI through advanced creative optimization. These tools gave the UA team detailed, creative-level ROAS and retention metrics, enabling them to quickly identify and eliminate underperforming creatives while scaling up successful ones. This ability to make data-driven decisions significantly improved their overall campaign performance.

AI tools not only reduced the time spent on manual classification of creatives but also allowed for more nuanced testing and optimization strategies. For example, the team could tailor video content to specific age demographics by analyzing how different colors resonated with various groups, enhancing the appeal and effectiveness of their ads.

Burke also highlights the potential of AI-driven tools in accelerating creative optimization processes:  “When it comes to headline and body texts on Meta, I barely ever have the time to test them because I’m already creating so many different video iterations. This is a job I hand off to Meta’s automation, report on the results on a monthly basis and then iterate. These micro-variations are something I feel AI can really help support me with.”

Beyond siloes

The evidence is clear. Supercharging creative optimization requires a commitment to best practices in technology integration, strategic alignment, and most importantly, fostering a culture of collaboration and continuous improvement. These elements are crucial for turning theoretical benefits into real-world successes, enabling businesses to thrive in the ever-evolving landscape of digital marketing.

The post Unlocking ROI: The power of collaboration in creative optimization appeared first on AppsFlyer.

]]>
Data collaboration: The privacy-compliant key to understanding your audience and making data-based decisions https://www.appsflyer.com/blog/tips-strategy/data-collaboration/ Mon, 10 Jun 2024 12:18:24 +0000 https://www.appsflyer.com/?p=427262

83% of CEOs want their companies to be data-driven, yet only half say they have an enterprise strategy for managing and extracting value from data.  This disconnect stems from internal and external silos that keep you from getting the whole picture of your data. Internally, most companies lack platforms and guidelines for storing and sharing […]

The post Data collaboration: The privacy-compliant key to understanding your audience and making data-based decisions appeared first on AppsFlyer.

]]>

83% of CEOs want their companies to be data-driven, yet only half say they have an enterprise strategy for managing and extracting value from data. 

This disconnect stems from internal and external silos that keep you from getting the whole picture of your data. Internally, most companies lack platforms and guidelines for storing and sharing data across teams. 

Externally, your data is stored in walled gardens that don’t talk to each other. What’s more, changing privacy regulations mean that companies have more data blind spots than ever before and can no longer rely on third-party cookies in the future. 

That’s where data collaboration comes into play. When you collaborate effectively, you empower your team to make data-backed decisions that grow your business.

What is data collaboration? 

What is data collaboration

Data collaboration is an approach to working with data from various sources in a controlled, connected environment. 

Many people use “data collaboration” and “data sharing” interchangeably, but there’s a difference between the terms. Data sharing means giving access or a copy of a data set to another business, while data collaboration is a strategy and process for cleaning, integrating, and managing access to data sets. 

Data collaboration, when done correctly, can ensure that businesses can work together on a dataset, without either side becoming exposed to sensitive or first-party details.

Data collaboration examples

Here are several ways that marketers use data collaboration to optimize advertising efforts, reach the right audience, attribute campaign performance, and improve marketing effectiveness.

  • Data clean rooms. Marketers can use a data clean room to collaborate on data analysis while maintaining privacy and compliance. For instance, multiple brands can add their first-party data to a data collaboration platform that’s powered by data clean room technology. This enables them to analyze cross-platform audience overlap or measure campaign performance, without exposing user-level details.
  • Audience segmentation. This is when advertisers work with networks, such as retail or commerce media networks, to share audience insights and segment specific demographics or characteristics. For example, an eCommerce retail media network may partner with a gaming brand to target users who have shown interest in similar products.
  • Data co-ops. Advertisers participate in data cooperatives, or co-ops, where they pool their first-party data with other advertisers or partners to gain collective insights or enhance targeting capabilities. This collaboration means advertisers can use shared data resources to improve their advertising strategies and achieve better results.

Real use case

A leading cosmetics brand needed to grow their sales and was looking for effective ways to reach high-value customers. However, they had maximized the potential of their limited first-party data on their paid advertising campaigns. They now needed a targeted and structured solution to further their reach. 

The cosmetics brand decided to explore advertising with commerce media networks, and partnered with a leading pharmacy chain to use their onsite (owned) and offsite (third-party) channels for growth. 

While both brands recognized the value of this collaboration, concerns over data sharing meant they struggled to advance audience segmentation. As well as regulatory rules on data sharing, exposing first-party data can have crippling consequences if it falls into the hands of competitors. These factors were limiting the cosmetics brand’s growth potential, as they were unable to build the right audience for their campaigns.

Solution

To meet this challenge, the cosmetics brand and pharmacy chain partnered with the AppsFlyer Data Collaboration Platform. AppsFlyer’s focus is to facilitate collaboration that powers growth, and this platform enables precision segmentation. 

AppsFlyer’s platform is powered by a data clean room that has received industry-leading recognition via the IDC and IAB — giving both parties confidence that all first-party information would be kept private and secure. This was particularly important for the cosmetics brand, which has limited data, when working with the pharmacy chain, which has over 50 million records of active customers on file. 

Given this new trusted environment for collaboration, the companies focused on the challenge of segmenting potential buyers that are critical to the cosmetics brand’s key metrics. The result was a six-fold increase in conversions.

What are the benefits of data collaboration? 

What are the benefits of data collaboration

Data collaboration means better campaign segmentation, attribution, customer experience, and data-based decision-making.

1. More accurate data and attribution

Most marketing teams lack depth and breadth of data, giving them an incomplete picture of their demographics and customer behavior. With data collaboration, marketers can analyze their audience’s behavior from first touch to conversion. 

2. Better segmentation and customer experience

When you have an omnichannel picture of your customers, you can serve them content that meets their needs and desires. For example, when a customer has already viewed your website or completed a demo, you can serve them conversion-focused retargeting ads because you know they’re at the bottom of the funnel instead of the top. 

This also leads to a better experience for potential customers – targeted ads mean more relevant content.

3. Improved decision-making

Getting the full picture of your data means having the evidence you need to make data-based business decisions. A study by Tableau found that data-driven companies enjoy higher customer acquisition, better employee retention, and faster time to market. 

In marketing, you can use past performance data and predictive analytics to develop the budgets and creatives that will drive the most impact. 

4. Better compliance

Changing privacy restrictions are a major concern for companies. Penalties are steep for violating GDPR and similar laws. By setting standards and data governance for collaboration, you lower your liability and safeguard your business. 

5. Deeper partnerships

When companies collaborate, they help each other gain macroeconomic insights, like the size of their market and the characteristics of current and potential customers. One example could be brands and retail media networks participating in data collaboration. 

At enterprise companies, partnerships between departments can deepen working relationships and data.

What is a data collaboration platform?

What is a data collaboration platform

A data collaboration platform is a solution that helps advertisers, publishers, ad networks, and other stakeholders to provide, integrate, and analyze both first-party and third-party data. These platforms provide an environment where participants can collaborate on data-driven initiatives while maintaining privacy, data security, and regulatory compliance.

This differs slightly from a data clean room, a highly secure and privacy-compliant environment for sharing, enriching, and analyzing sensitive first-party data. A data collaboration platform may use a data clean room to ensure data is provided securely, without exposing user-level details. However, platforms offer a broader range of use cases, including audience activation and closed-loop measurement.

What to look for in a data collaboration platform

The best data collaboration platforms prioritize cybersecurity, privacy compliance, and access control to keep your data safe. When considering a data collaboration platform, look for one that:

  • Runs on award-winning, privacy-enhancing technologies such as k-anonymity and a data clean room
  • Is compatible with the tools and platforms you already use 
  • Uses rich audience-building tools to segment audiences
  • Is scalable for your business needs, while offering detailed permissions
  • Provides closed-loop, omnichannel measurement

Best practices for data collaboration between two businesses

Collaborating with another company brings a fresh set of challenges and opportunities, for instance, a brand establishing a data collaboration with a commerce media network. Follow these best practices to safeguard your customers’ data and get the most out of your data collaboration.

1. Trusted and neutral environment for controlled collaboration

Creating a neutral, safe environment for collaboration is crucial. Your collaboration platform should facilitate a data clean room with top-of-the-line PETs (privacy enhancing technology). PETs include technology for data encryption, data minimization, and access control. 

This way, both parties can rest assured that they’re not exposing proprietary business information, and that they’re protecting customer data in line with international data regulations.

2. Seamless integration and cloud-agnostic for flexibility

Your data sources come from all different platforms and data types – add in another company, and you add even more data formats to the mix. You need a cloud-agnostic collaboration platform to reformat and integrate your data without the headache of doing it yourself.

3. Rich audience-building tools for advanced segmentation, according to business goals

Data collaboration platform - advanced audience segmentation

One of the biggest benefits of data collaboration is the ability to build and segment different audiences. For example, you could focus one ad campaign on new users and another one on customers who have already visited your website. You could create a third segment for customers who have bought a similar app or product online (thanks to data from external partners). 

With AppsFlyer’s data collaboration capabilities, you can segment dynamic audiences, build remarketing campaigns, and measure campaign results for each audience with unparalleled flexibility.

4. Agile audience activation

Agile audience activation is another important practice when collaborating with external data sources. You can gain a competitive edge by activating audiences on-site and off-site based on real-time insights, serving up the right message at the right time for the biggest impact.

5. Built-in and advanced measurement capabilities

Data collaboration platform - advanced measurement capabilities

Above all, data collaboration unlocks the insights you need to make better business decisions. With these insights, you can cut costs, boost conversions, and grow your marketing ROI. 

Choose a platform that can measure results across all the platforms you use – including walled gardens, such as Meta, Google Ads, or Apple, as well as across commerce media networks. Measure multi-touch attribution across customer journeys with advanced analytics, SEO attribution, and predictive analytics to get the full picture of your impact. 

Best practices for internal data collaboration

Ready to get started with data collaboration? Convincing people to collaborate across silos isn’t always easy. Neither is configuring the technical aspects of data collaboration, but there are many resources to help.

Follow these steps and best practices to launch or improve your data collaboration initiative. 

1. Set clear goals

Before you start, define clear objectives and key performance indicators (KPIs) that align with your organization’s overall advertising strategy. 

Whether it’s improving audience targeting, enhancing campaign performance, or optimizing ad spend, setting specific goals will guide your data collaboration efforts forward. These are the levers you can pull to convince others to share their data, and the touchstones you can point back to for analysis. 

2. Manage stakeholders

Managing stakeholders should be your number-one task to make your data collaboration a success. 

First, recruit a senior leader to champion your data collaboration initiative across silos. Engage stakeholders from across departments, including marketing, sales, IT, and data analytics, to ensure buy-in and alignment with your plans. Gather stakeholder input and feedback, and involve them throughout the implementation process to foster collaboration and ownership.

3. Consider data security and other factors

Data collaboration platform - Consider user security

Next, choose a data collaboration tool that fits your needs. Consider factors such as data security, scalability, ease of use, and interoperability with existing systems. Look for platforms that offer robust features for data integration, analysis, and collaboration, while prioritizing privacy and regulatory compliance.

4. Deliver value

Data collaboration isn’t knowledge just for the sake of knowledge. To maintain collaboration, make sure you’re delivering on KPIs and tracking the value of the insights you glean. For instance, have you saved admin time by creating a centralized database? Track and communicate your impact over time.

Future trends for data collaboration

Data collaboration in advertising is an evolving field and has several significant changes on the horizon. Here are five trends to keep an eye on that are in development – or are already here. 

1. AI for data analysis

AI and machine learning technologies will play a huge role in optimizing data collaboration processes. They can automate data matching, cleansing, and analysis tasks, and enable more efficient collaboration and faster insight generation. Additionally, AI-powered algorithms can identify patterns and trends in large datasets, delivering powerful insights. 

Effective reach and frequency reporting are crucial in optimizing advertising campaigns to ensure that the right message reaches the right audience with the appropriate frequency. This process provides detailed insights that help brands understand how many people are viewing their ads and how often.

AppsFlyer harnesses the power of AI to elevate data exploration in its platform. For instance, users can ask the chat tool for insights like the number of customers converted from a specific campaign on a particular day. AppsFlyer translates that free-form question into code to pull the data requested. These AI-powered features deliver valuable insights into campaign performance and customer engagement.

The online advertising industry is facing significant challenges with Google phasing out third-party cookies for advertising. Google’s response to this has been to create the Chrome Privacy Sandbox. However, experts worry about Sandbox’s effectiveness and business impact.

Many essential use cases, such as lookalike modeling, competitive separation, and audience creation either aren’t supported or have been degraded, making data collaboration and analysis difficult. While Google works on improvements to Privacy Sandbox, advertisers are in a period of adjustment to the new privacy-centric mode. 

3. Federated data collaboration

Federated data collaboration allows organizations to collaborate on data without centrally pooling or sharing it. Instead, platforms perform local analysis on each participant’s data, sharing only aggregated results that preserve users’ data privacy and security. This option is attractive for industries with strict regulatory requirements, such as healthcare and finance.

4. Privacy-preserving technologies

As privacy regulations evolve and consumers become more conscious of their data privacy, demand is growing for privacy-enhancing technologies in data collaboration processes. 

Companies like AppsFlyer are developing privacy-preserving technologies that let users protect the privacy of the personally identifiable information they share with service providers or apps, while allowing marketers to maintain the functionality of data-driven systems.

5. Data lakes

A data lake is a centralized hub created to manage, process, and safeguard extensive volumes of structured, semi-structured, and unstructured data. Data lakes retain raw data in its original format, overcoming challenges such as data compatibility and size. 

Key takeaways

  • Industry-wide changes pose obstacles to getting the full picture of your customers. Internal barriers include the lack of platforms and guidelines for data sharing across teams. Meanwhile, external challenges arise from disparate data sources that don’t communicate with each other.
  • Two-way data collaboration is a solution to bridge these gaps, empowering teams in making data-backed decisions that drive business growth.
  • The benefits of data collaboration include better campaign attribution, targeting, customer experience, and decision-making. It also helps businesses comply with privacy regulations and foster deeper partnerships among stakeholders.
  • A data collaboration platform is a secure environment where partners can provide, integrate, and analyze customer data in a compliant, privacy-safe way. The platform may use a data clean room to provide the data safely and without exposing user details. 
  • Best practices for data collaboration between companies include using a trusted and neutral environment with seamless integration. You should also look for rich audience-building tools, agile audience activation, and advanced measurement capabilities.  
  • To introduce data collaboration at your company, set clear goals, manage stakeholders, and choose the best data collaboration platform for your needs. And don’t forget to track and communicate your impact. 
  • Future trends in data collaboration have the power to drive value for businesses, including AI-generated analysis and insights, federated data collaboration, data lakes, and privacy-preserving technologies. 

The post Data collaboration: The privacy-compliant key to understanding your audience and making data-based decisions appeared first on AppsFlyer.

]]>
What are PETs and why are they data clean room power multipliers? https://www.appsflyer.com/blog/trends-insights/pet-dcr-power-multipliers/ Thu, 23 Feb 2023 14:00:29 +0000 https://www.appsflyer.com/?p=276135 What are PETs and why are they DCR power multipliers - featured square

Although we are (most likely) not part of the Matrix, we most definitely live in a digital world.  As consumers and organizations, protecting our data and privacy is critical; however, as users, we still want to maximize our experience with our online activity.  This is known as the User Experience Paradox: we expect a personalized […]

The post What are PETs and why are they data clean room power multipliers? appeared first on AppsFlyer.

]]>
What are PETs and why are they DCR power multipliers - featured square

Although we are (most likely) not part of the Matrix, we most definitely live in a digital world. 

As consumers and organizations, protecting our data and privacy is critical; however, as users, we still want to maximize our experience with our online activity. 

This is known as the User Experience Paradox: we expect a personalized experience but are not really keen on sharing personally identifiable information (PII). For example, you’d appreciate your friendly neighborhood barista remembering your name and coffee order, but would get creeped out if they knew where you lived.

Ethical, business, user experience, and regulatory drivers have significantly increased the importance of privacy-enhancing technologies (PETs) in addressing the paradox.

In this article we’ll explore this brave new world and understand how data clean rooms — which have been a hot topic in the digital marketing space—come into the picture to allow marketers to run efficient campaigns in a privacy-compliant and secure way.  

So what exactly are PETs?

So what exactly are PETs?

In the simplest terms, a privacy-enhancing technology is a technology used to protect the personal data of individuals and organizations in the digital world. What does this mean in practice? Here are some examples of what these technologies do: 

  • Data encryption: A process that scrambles data so it is unreadable to anyone who does not have the encryption key (a string of information that once processed, can encrypt or decrypt data). This is one of the most commonly-known PETs available, as it ensures that any personal data stored in a system is secure and unreadable to anyone who does not have the key. 
  • Access control: Technology that allows organizations to define, monitor and control different levels of access to certain types of data so that individuals can only access the data they are authorized to view. For example, many platforms offer admin, user and viewer access levels to differentiate permissions among user types.
  • Data minimization: This PET ensures that only the necessary data is collected. It helps organizations limit the amount of data they collect and store, and can reduce the risk of data breaches. 

Why do we need PETs? 

With personal information collected, shared, and stored online, concerns about privacy have grown, and rightfully so. Local and regional regulations (such as GDPR, CCPA, and others) have been a main area of interest in the last few years, as has the general population’s concern and awareness about data privacy.

Enter PETs, that provide ways to protect personal information from being accessed or misused by others.

Indeed, the internet is home to a wide range of threats to privacy, from hackers, through identity thieves, to online predators and PII snatchers. PETs help to mitigate these threats by providing ways to secure personal information and prevent it from being accessed by unauthorized parties. PETs can also help businesses comply with legal and regulatory requirements for protecting the privacy of their customers and clients.

Protecting the privacy of individuals is not just a legal or regulatory requirement, it is also an ethical obligation. PETs can help organizations to uphold their ethical obligations by providing ways to respect the privacy of individuals and protect their personal information.

Where do data clean rooms come into the picture? 

One of the most critical PETs on the market today is data clean rooms. A data clean room is a secure environment where sensitive data is processed and managed in a way that ensures it is used in a privacy-compliant way.

DCRs have gained significant momentum in the past year, especially in the realm of marketing measurement and optimization. Between Apple’s game-changing App Tracking Transparency (ATT) framework announcement, Meta’s decision to only send user-level data to Mobile Measurement Partners (MMPs) and not advertisers, and the upcoming demise of Google’s 3rd-party cookies and device advertising IDs in 2024, data sharing is becoming increasingly limited. As a result, campaign measurement and optimization are more challenging than ever before for advertisers. 

Data clean rooms, PETs in their own right, can also utilize other PETs in practice. In fact, doing just that can turn a good data clean room into a great data clean room by providing an additional layer of data privacy protection. It can also support the zero-trust policy by all parties using DCR and will reduce privacy vulnerability. 

So how can PETs be used for marketing purposes within a DCR? 

DCRs enable marketers to gain visibility to aggregated campaign insights and make data-driven decisions while preserving their end-users’ privacy.

It’s important to note that choosing the right PET depends on the marketing use case, the data types being used, and the relevant privacy requirements.
Let’s look at a few examples:

Multi-Party Computation (MPC)

A cryptographic technique that enables multiple parties to perform joint computations on private data, without revealing anything beyond what the computation result reveals. In other words, an MPC allows multiple parties to work together on a computation while keeping their inputs private.

Homomorphic encryption

An encryption form that enables computations to be performed directly on encrypted data, without first having to decrypt it. This allows sensitive information to be processed without exposing it to the parties performing the computation. The result of the computation can then be encrypted again and used by all relevant parties.

Private set intersection (PSI)

PSI is a cryptographic protocol that enables two parties to find the intersection of their private sets of data, without revealing any information about the individual elements in the sets except the ones that appear in the intersection.

K-anonymity

K-anonymity involves the obscuring of user identities so they are indistinguishable from the other individuals within a size “k” group. The purpose of k-anonymity is to protect the privacy of individuals by preventing the identification of individual records in a dataset.

K-anonymity
Obscuring user identities with K-anonymity

Federated learning

Companies can train machine learning models on decentralized data, enabling them to make use of data from multiple sources while preserving privacy.

Anonymization

PETs can anonymize personal data, making it difficult or impossible to identify individual people based on the data. This can be done by removing or obscuring personal identifiers such as names, addresses, telephone numbers and persistent device identifiers.

Differential privacy

By adding random noise to sensitive data, companies can hide identifiable characteristics of individuals, ensuring that the privacy of personal information is protected, and yet, it’s small enough to not materially impact the accuracy of the derived insights.

Differential privacy
Differential privacy: adding random noise to the data you wish to mask
Guide

The complete data clean rooms guide

Read now

The challenges of PETs

The value of PETs is clear: increased data security and enhanced user privacy are key pillars in today’s and tomorrow’s digital environment. They improve trust among users and offer greater transparency. 

But it’s also important to address challenges that PETs surface, namely:

  • Increased complexity: PETs can be complex and difficult to implement, which can be costly and time consuming. 
  • Lower efficiency: PETs can sometimes be less efficient than traditional solutions, as they require more processing power and thus can be slower. 
  • Higher cost: PETs can be more expensive to implement than traditional solutions, due to the extra processing power and associated costs.

All of these challenges can be handled properly with the right PET partner. As with anything, having the right tools and services on your side can set you on a path to success.

Key takeaways

I know this wasn’t an easy read, with lots of multisyllabic words that make privacy technology seem highly technical and unobtainable. But if you have stayed with me, I’d like to leave you with 3 key takeaways:

  1. In the marketing world, DCRs are the future of measurement, activation, collaboration and more. As more DCRs emerge, you should consider wisely the ones that will suit your needs and remain relevant in the ever-changing privacy landscape. Choose a vendor that specializes in DCRs and PETs but also offers the relevant marketing background. 
  1. PETs are power multipliers when implemented within your DCR. Security and privacy are tied to all of the main DCR use cases, so I’d suggest you won’t compromise on these needed precautions.
  1. In order to achieve a sufficient level of user privacy protection as well as a better commercial advantage to your business, you will probably need some combination of all of the PETs mentioned above. Don’t be discouraged by this, it’s within reach.

The post What are PETs and why are they data clean room power multipliers? appeared first on AppsFlyer.

]]>
The bleeding edge: Spark, Parquet, and S3 https://www.appsflyer.com/blog/mobile-marketing/spark-parquet-s3/ https://www.appsflyer.com/blog/mobile-marketing/spark-parquet-s3/#respond Mon, 10 Aug 2015 09:00:00 +0000 https://www.appsflyer.com/?p=23163 spark parquet s3 - OG

Spark is shaping up as the leading alternative to Map/Reduce for several reasons including the wide adoption by the different Hadoop distributions, combining both batch and streaming on a single platform and a growing library of machine-learning integration (both in terms of included algorithms and the integration with machine learning languages namely R and Python). […]

The post The bleeding edge: Spark, Parquet, and S3 appeared first on AppsFlyer.

]]>
spark parquet s3 - OG

Spark is shaping up as the leading alternative to Map/Reduce for several reasons including the wide adoption by the different Hadoop distributions, combining both batch and streaming on a single platform and a growing library of machine-learning integration (both in terms of included algorithms and the integration with machine learning languages namely R and Python).

At AppsFlyer, we’ve been using Spark for a while now as the main framework for ETL (Extract, Transform & Load) and analytics. A recent example is the new version of our retention report that we recently released, which utilized Spark  to crunch several data streams (> 1TB a day) with ETL (mainly data cleansing) and analytics (a stepping stone towards full click-fraud detection) to produce the report.

One of the main changes we introduced in this report is the move from building on Sequence files to using Parquet files.

Parquet is a columnar data format, which is probably the best option today for storing long term big data for analytics purposes (unless you are heavily invested in Hive, where Orc is the more suitable format). The advantages of Parquet  vs. Sequence files are performance and compression without losing the benefit of wide support by big-data tools  (Spark, Hive, Drill, Tajo, Presto etc.).

One relatively unique aspect of our infrastructure for big data is that we do not use Hadoop (perhaps that’s a topic for a separate post). We are using Mesos as a resource manager instead of YARN and we use Amazon S3 instead of HDFS as a distributed storage solution. HDFS has several advantages over S3, however, the cost/benefit for running long running HDFS clusters on AWS  vs. using S3 are overwhelming in favor of S3.

That said, the combination of Spark, Parquet and S3 posed several challenges for us and this post will list the major ones and the solutions we came up with to cope with them.

Parquet and Spark

Parquet and Spark seem to have been in a love-hate relationship for a while now.

On the one hand, the Spark documentation touts  Parquet as one of the best formats for analytics of big data (it is) and on the other hand the support for Parquet in Spark is incomplete and annoying to use. Things are surely moving in the right direction but there are still a few quirks and pitfalls to watch out for.

To start on a positive note,Spark and Parquet integration has come a  long way in the past few months. Previously, one had to jump through hoops just to be able to convert existing data to Parquet.

The introduction of DataFrames to Spark made this process much, much simpler. When the input format is supported by the DataFrame API e.g. the input is JSON  (built-in) or Avro (which isn’t built in Spark yet, but you can use a library to read it) converting to Parquet is just a matter of reading the input format on one side and persisting it as Parquet on the other. Consider for example the following snippet in Scala:

val inputPath = "../data/json"
val outputPath = "../data/parquet"
val data = sqlContext.read.json(inputPath)
date.write.parquet(outputPath)

Even when you are handling a format where the schema isn’t part of the data, the conversion process is quite simple as Spark lets you specify the schema programmatically.

The Spark documentation is pretty straightforward and contains examples in Scala, Java and Python.

Furthermore, it isn’t too complicated to define schemas in other languages. For instance, here (AppsFlyer), we use Clojure as our main development language so we developed a couple of helper functions to do just that.

The sample code below provides the details: The first thing is to extract the data from whatever structure we have and specify the schema we like. The code below takes an event-record and extracts various data points from it into a vector of the form [:column_name value optional_data_type]. The data type is optional since it is assumed to be a string unless otherwise specified.

(defn record-builder
  [event-record]
  (let [..
        raw-device-params (extract event-record "raw_device_params")
        result [...
                [:operator (get raw-device-params "operator")]
                [:model (get raw-device-params "model")]
                ...
                [:launch_counter counter DataTypes/LongType]]]
  result))

The next step is to use the above mentioned structure to both extract the schema and convert to DataFrame Rows:

(defn extract-dataframe-schema
  [rec]
  (let [fields (reduce (fn [lst schema-line]
                         (let [k (first schema-line)
                               t (if (= (count schema-line) 3) (last schema-line) DataTypes/StringType) ]
                           (conj lst (DataTypes/createStructField (name k) t NULLABLE)))) [] rec)
        arr (ArrayList. fields)]
    (DataTypes/createStructType arr)))

(defn as-rows
  [rec]
  (let [values (object-array (reduce (fn [lst v] (conj lst v)) [] rec))]
    (RowFactory/create values)))

Finally we apply these functions over an RDD, convert it to a data frame and save as Parquet:

(let [..
     schema (trans/extract-dataframe-schema (record-builder nil))
     ..
     rdd (spark/map record-builder some-rdd-we-have)
     rows (spark/map trans/as-rows rdd)
     dataframe (spark/create-data-frame sql-context rows schema)
    ]
(spark/save-parquert dataframe output-path :overwrite))

As mentioned above, things are on the up and up for Parquet and Spark but the road is not clear yet.

Some of the problems we encountered include:

  • A critical bug in the version 1.4 release where a race condition when writing Parquet files caused significant data loss on jobs (This bug is fixed in version 1.4.1 – so if you are using Spark 1.4 and Parquet upgrade yesterday!)
  • Filter pushdown optimization, which is turned off  by default since Spark still uses Parquet 1.6.0rc3  – even though 1.6.0 has been out for awhile (it seems Spark 1.5 will use parquet 1.7.0 so the problem will be solved)
  • Parquet is not “natively” supported in Spark, instead, Spark relies on Hadoop support for the Parquet format – this is not a problem in itself, but for us it caused major performance issues when we tried to use Spark and Parquet with S3 – more on that in the next section

Parquet, Spark, and S3

Amazon S3 (Simple Storage Services) is an object storage solution that is relatively cheap to use.

It does have a few disadvantages vs. a “real” file system; the major one is eventual consistency i.e. changes made by one process are not immediately visible to other applications. (If you are using Amazon’s EMR you can use EMRFS “consistent view” to overcome this.) However, if you understand this limitation, S3 is still a viable input and output source, at least for batch jobs.

As mentioned above, Spark doesn’t have a native S3 implementation and relies on Hadoop classes to abstract the data access to Parquet.

Hadoop provides 3 file system clients to S3:

  • S3 block file system (URI schema of the form “s3://..”) which doesn’t seem to work with Spark
  • S3 native file system (“s3n://..” URIs) – download  Spark distribution that supports Hadoop 2.* and up if you want to use this (tl;dr – you don’t)
  • s3a – a replacement for s3n that removes some of the limitations and problems of s3n. Download Spark with Hadoop 2.6 and up to use this one

When we used Spark 1.3 we encountered many problems when we tried to use S3, so we started out using s3n – which worked for the most part, i.e. we got jobs running and completing but a lot of them failed with various read timeout and host unknown exceptions.

Looking at the tasks within the jobs the picture was even grimmer with high percentages of failures that pushed us to increase timeouts and retries to ridiculous levels. When we moved to Spark 1.4.1, we took another stab at trying s3a.

This time around we got it to work.

The first thing we had to do was to set both spark.executor.extraClassPath and spark.executor.extraDriverPath to point at the aws-java-sdk and the hadoop-aws jars since apparently both are missing from the “Spark with Hadoop 2.6” build. Naturally we used the 2.6 version of these files but then we were hit by this little problem. Hadoop 2.6 AWS implementation has a bug which causes it to split S3 files  in unexpected ways (e.g. a 400 files jobs ran with 18 million tasks) luckily replacing Hadoop AWS jar to version 2.7.0 for Spark solved this problem and using s3a prefixes works without hitches (and provides better performance than s3n).

Finding the right S3 Hadoop library contributes to the stability of our jobs but  regardless of S3 library (s3n or s3a) the performance of Spark jobs that use Parquet files was abysmal.

When looking at the Spark UI, the actual work of handling the data seemed quite reasonable but Spark spent a huge amount of time before actually starting the work and after the job was “completed” before it actually terminated. We like to call this phenomena the “Parquet Tax.”

Obviously we couldn’t live with the “Parquet Tax” so we delved into the log files of our jobs and discovered several issues.

This first one has to do with startup times of Parquet jobs. The people that built Spark understood that schema can evolve over time and provides a nice feature for DataFrames called “schema merging.” If you look at schema in a big data lake/reservoir (or whatever it is called today) you can definitely expect the schema to evolve over time.

However if you look at a directory that is the result of a single job there is no difference in the schema… It turns out that when Spark initializes a job, it reads the footers of all the Parquet files to perform the schema merging.

All this work is done from the driver before any tasks are allocated to the executor and can take long minutes, even hours (e.g. we have jobs that look back at half a year of install data). It isn’t documented but looking at the Spark code  you can override this behavior by specifying mergeSchema as false :

In Scala:

val file = sqx.read.option("mergeSchema", "false").parquet(path) 

and in Clojure:

(-> ^SQLContext sqtx
    (.read)
    (.format "parquet")
    (.options (java.util.HashMap. {"mergeSchema" "false" "path" path}))
    (.load))

Note that this doesn’t work in Spark 1.3. In Spark 1.4 it works as expected and in Spark 1.4.1 it causes Spark only to look at _common_metadata file which is not the end of the world since it is a small file and there’s only one of these per directory. However, this brings us to another aspect of the “Parquet Tax” – the “end of job” delays.

Turning off schema merging and controlling the schema used by Spark helped cut down the job start up times but, as mentioned we still suffered from long delays at the end of jobs. We already knew of one Hadoop<->S3 related problem when using text files. Hadoop  being immutable first writes files to a temp directory and then copies them over. With S3 that’s not a problem but the copy operation is very very expensive.

With text files, DataBricks created DirectOutputCommitter  (probably for their Spark SaaS offering). Replacing the output committer for text files is fairly easy – you just need to set “spark.hadoop.mapred.output.committer.class” on the Spark configuration e.g.:

(spark-conf/set "spark.hadoop.mapred.output.committer.class" "com.appsflyer.spark.DirectOutputCommitter")

A similar solution exists for Parquet and unlike the solution for text files it is even part of the Spark distribution.

However, to make things complicated you have to configure it on Hadoop configuration and not on the Spark configuration. To get the Hadoop configuration you first need to create a Spark context from the Spark configuration, call hadoopConfiguration on it and then set “spark.sql.parquet.output.committer.class” as in:

(let [ctx (spark/spark-context conf)
      hadoop-conf (.hadoopConfiguration ^JavaSparkContext ctx)]
     (.set hadoop-conf "spark.sql.parquet.output.committer.class" "org.apache.spark.sql.parquet.DirectParquetOutputCommitter"))

Using the DirectParquetOutputCommitter provided a significant reduction in the “Parquet Tax” but we still found that some jobs were taking a very long time to complete.

Again the problem was the file system assumptions Spark and Hadoop hold which were  the culprits. Remember the “_common_metadata” Spark looks at the onset of a job – well, Spark spends a lot of time at the end of the job creating both this file and an additional MetaData file with additional info from the files that are in the directory. Again this is all done from one place (the driver) rather than being handled by the executors.

When the job results in small files (even when there are couple of thousands of those) the process takes reasonable time. However, when the job results in larger files (e.g. when we ingest a full day of application launches) this takes upward of an hour. As with mergeSchema the solution is to manage metadata manually so we set “parquet.enable.summary-metadata” to false (again on the Hadoop configuration and generate the _common_metadata file ourselves (for the large jobs)

To sum up, Parquet and especially Spark are works in progress – making cutting edge technologies work for you can be a challenge and require a lot of digging.

The documentation is far from perfect at times but luckily all the relevant technologies are open source (even the Amazon SDK), so you can always dive into the bug reports, code etc. to understand how things actually work and find the solutions you need. Also, from time to time you can find articles and blog posts that explain how to overcome the common issues in technologies you are using. I hope this post clears off some of the complications  of integrating Spark, Parquet and S3, which are, at the end of the day, all great technologies with a lot of potential.

Morri Feldman and Michael Spector contributed to this post.

The post The bleeding edge: Spark, Parquet, and S3 appeared first on AppsFlyer.

]]>
https://www.appsflyer.com/blog/mobile-marketing/spark-parquet-s3/feed/ 0