top of page

Augmented Two-Stage Bandit Framework: Practical Approaches for Improved Online Ad Selection

Paper

Seowon Han, Ryan Lakritz and Hanxiao Wu

In online advertising, maximizing user engagement and advertiser performance hinges on effective ad selection algorithms. Algorithms that tackle Multi-armed bandit problems, such as Thompson Sampling, excel in exploration, but their utilization of contextual information remains limited. Conversely, contextual bandit approaches personalize ad selection by leveraging user and ad-specific features. However, they perform poorly in contexts with limited data and often encounter cold start problems for new ad groups. To address this dilemma, we propose a novel bandit framework that combines context-free and context-aware rewards and is augmented with historical predicted performance, for which we use predicted click-through rate (pCTR) scores. We will refer to this bandit framework as the Augmented Two-Stage Bandit Framework.

Our bandit framework is comprised of two stages. In the first stage, the framework applies context-free Thompson Sampling augmented by historical pCTR scores for initial exploration. The non-contextual bandit algorithm and generalized patterns recognized by our pCTR model allow for effective mitigation of the cold start problem. In the second stage, the framework shifts to a contextual bandit algorithm for refined exploration and exploitation.

We demonstrate the efficacy of our proposed method using extensive simulation and experiments conducted on a real-world ads marketplace at Reddit. Compared to traditional bandit algorithms, our historical pCTR augmented Two-Stage Bandit framework achieves significant improvements in click-through rate. These findings underscore the ability of an Augmented Two-Stage Bandit Framework to enhance online ad selection and improve key performance metrics.

bottom of page