- Learning Apache Apex
- Thomas Weise Munagala V. Ramanath David Yan Kenneth Knowles
- 395字
- 2021-07-02 22:38:35
Real-time insights for Advertising Tech (PubMatic)
Companies in the advertising technology (AdTech) industry need to address data increasing at breakneck speed, along with customers demanding faster insights and analytical reporting.
PubMatic is a leading AdTech company providing marketing automation for publishers and is driven by data at a massive scale. On a daily basis, the company processes over 350 billion bids, serves over 40 billion ad impressions, and processes over 50 terabytes of data. Through real-time analytics, yield management, and workflow automation, PubMatic enables publishers to make smarter inventory decisions and improve revenue performance. Apex is used for real-time reporting and for the allocation engine.
In PubMatic's legacy batch processing system, there could be a delay of five hours to obtain updated data for their key metrics (revenues, impressions and clicks) and a delay of nine hours to obtain data for auction logs.
PubMatic decided to pursue a real-time streaming solution so that it could provide publishers, demand side platforms (DSPs), and agencies with actionable insights as close to the time of event generation as possible. PubMatic's streaming implementation had to achieve the following:
- Ingest and analyze a high volume of clicks and views (200,000 events/sec) to help their advertising customers improve revenues
- Utilize auction and client log data (22 TB/day) to report critical metrics for campaign monetization
- Handle rapidly increasing network traffic with efficient utilization of resources
- Provide a feedback loop to the ad server for making efficient ad serving decisions.
This high volume data would need to be processed in real-time to derive actionable insights, such as campaign decisions and audience targeting.
PubMatic decided to implement its real-time streaming solution with Apex based on the following factors:
- Time to value - the solution was able to be implemented within a short time frame
- The Apex applications could run on PubMatic's existing Hadoop infrastructure
- Apex had important connectors (files, Apache Kafka, and so on) available out of the box
- Apex supported event time dimensional aggregations with real-time query capability
With the Apex-based solution, deployed to production in 2014, PubMatic's end-to-end latency to obtain updated data and metrics for their two use cases fell from hours to seconds. This enabled real-time visibility into successes and shortcomings of its campaigns and timely tuning of models to maximize successful auctions.
Additional Resources
- Video: PubMatic presents High Performance AdTech Use Cases with Apache Apex at https://www.youtube.com/watch?v=JSXpgfQFcU8
- Slides: https://www.slideshare.net/ashishtadose1/realtime-adtech-reporting-targeting-with-apache-apex