MNL-bandit: a dynamic learning approach to assortment selection
From MaRDI portal
Publication:5129205
Abstract: We consider a dynamic assortment selection problem, where in every round the retailer offers a subset (assortment) of substitutable products to a consumer, who selects one of these products according to a multinomial logit (MNL) choice model. The retailer observes this choice and the objective is to dynamically learn the model parameters, while optimizing cumulative revenues over a selling horizon of length . We refer to this exploration-exploitation formulation as the MNL-Bandit problem. Existing methods for this problem follow an "explore-then-exploit" approach, which estimate parameters to a desired accuracy and then, treating these estimates as if they are the correct parameter values, offers the optimal assortment based on these estimates. These approaches require certain a priori knowledge of "separability", determined by the true parameters of the underlying MNL model, and this in turn is critical in determining the length of the exploration period. (Separability refers to the distinguishability of the true optimal assortment from the other sub-optimal alternatives.) In this paper, we give an efficient algorithm that simultaneously explores and exploits, achieving performance independent of the underlying parameters. The algorithm can be implemented in a fully online manner, without knowledge of the horizon length . Furthermore, the algorithm is adaptive in the sense that its performance is near-optimal in both the "well separated" case, as well as the general parameter setting where this separation need not hold.
Recommendations
- Optimal policy for dynamic assortment planning under multinomial logit models
- Dynamic assortment optimization with a multinomial logit choice model and capacity constraint
- Greedy-like algorithms for dynamic assortment planning under multinomial logit preferences
- Dynamic assortment personalization in high dimensions
- Learning consumer tastes through dynamic assortments
Cites work
- scientific article; zbMATH DE number 3152611 (Why is no real title available?)
- scientific article; zbMATH DE number 3919522 (Why is no real title available?)
- scientific article; zbMATH DE number 5485582 (Why is no real title available?)
- scientific article; zbMATH DE number 6276176 (Why is no real title available?)
- 10.1162/153244303321897663
- A Markov chain approximation to choice modeling
- A general attraction model and sales-based linear program for network revenue management under customer choice
- A note on a tight lower bound for capacitated MNL-bandit assortment selection models
- Assortment optimization under variants of the nested logit model
- Asymptotically efficient adaptive allocation rules
- Demand Estimation and Assortment Optimization Under Substitution: Methodology and Application
- Discrete Choice Methods with Simulation
- Dynamic assortment optimization with a multinomial logit choice model and capacity constraint
- Dynamic assortment with demand learning for seasonal consumer goods
- Finite-time analysis of the multiarmed bandit problem
- Linearly parameterized bandits
- Near-optimal regret bounds for Thompson sampling
- On the tightness of an LP relaxation for rational optimization and its applications
- Probability and Computing
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems
- Revenue Management Under a General Discrete Choice Model of Consumer Behavior
- Some aspects of the sequential design of experiments
- The \(d\)-level nested logit model: assortment and price optimization problems
Cited in
(22)- Game of thrones: fully distributed learning for multiplayer bandits
- Stochastic approximation for uncapacitated assortment optimization under the multinomial logit model
- A tractable online learning algorithm for the multinomial logit contextual bandit
- A regret lower bound for assortment optimization under the capacitated MNL model with arbitrary revenue parameters
- A Learning Approach for Interactive Marketing to a Customer Segment
- Dynamic assortment personalization in high dimensions
- scientific article; zbMATH DE number 7626733 (Why is no real title available?)
- Dynamic assortment with demand learning for seasonal consumer goods
- Smoothness-Adaptive Contextual Bandits
- Robust Learning of Consumer Preferences
- scientific article; zbMATH DE number 7306904 (Why is no real title available?)
- Optimal policy for dynamic assortment planning under multinomial logit models
- Technical note -- The multinomial logit model with sequential offerings: algorithmic frameworks for product recommendation displays
- Optimal pricing of online products based on customer anchoring‐adjustment psychology
- A note on a tight lower bound for capacitated MNL-bandit assortment selection models
- Greedy-like algorithms for dynamic assortment planning under multinomial logit preferences
- Customer choice models vs. machine learning: finding optimal product displays on Alibaba
- Continuous Assortment Optimization with Logit Choice Probabilities and Incomplete Information
- Transfer learning for contextual multi-armed bandits
- Assortment optimization: a systematic literature review
- Stochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithm
- Learning consumer tastes through dynamic assortments
This page was built for publication: MNL-bandit: a dynamic learning approach to assortment selection
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5129205)