Publications
Research publications by Romesh Malinga Perera on database systems, self-tuning databases, and applied machine learning.
2024
Warm-Starting Contextual Bandits Under Latent Reward Scaling
2024 IEEE International Conference on Data Mining (ICDM), 2024
pp. 360-369
Abstract
This paper studies warm-starting contextual bandits when rewards in pre-training and deployment share the same structure but differ by an unknown scale. It proposes a probabilistic model and a simple maximum a posteriori estimator for recalibrating warm-started bandits, gives a regret bound, and evaluates the method on synthetic, benchmark, and database index-selection tasks.
BibTeX
@inproceedings{10884226,
author = {Oetomo, Bastian and Perera, R. Malinga and Borovica-Gajic, Renata and Rubinstein, Benjamin I.P.},
booktitle = {2024 IEEE International Conference on Data Mining (ICDM)},
title = {Warm-Starting Contextual Bandits Under Latent Reward Scaling},
year = {2024},
volume = {},
number = {},
pages = {360-369},
keywords = {Heuristic algorithms;Transfer learning;Neural networks;Probabilistic logic;Data models;Indexes;Servers;Data mining;Standards;Plugs;multi-armed bandits;warm-start;pretraining},
abstract = {This paper studies warm-starting contextual bandits when rewards in pre-training and deployment share the same structure but differ by an unknown scale. It proposes a probabilistic model and a simple maximum a posteriori estimator for recalibrating warm-started bandits, gives a regret bound, and evaluates the method on synthetic, benchmark, and database index-selection tasks.},
doi = {10.1109/ICDM59182.2024.00043},
url = {https://ieeexplore.ieee.org/document/10884226},
pdf = {https://ai-db-uom.github.io/data/2024_icdm.pdf}
} 2023
Cutting to the chase with warm-start contextual bandits
Knowledge and Information Systems, 2023
Vol. 65, No. 9 | Sep | pp. 3533-3565
Abstract
Multi-armed bandits achieve excellent long-term performance in practice and sublinear cumulative regret in theory. However, a real-world limitation of bandit learning is poor performance in early rounds due to the need for exploration-a phenomenon known as the cold-start problem. While this limitation may be necessary in the general classical stochastic setting, in practice where "pre-training" data or knowledge is available, it is natural to attempt to "warm-start" bandit learners. This paper provides a theoretical treatment of warm-start contextual bandit learning, adopting Linear Thompson Sampling as a principled framework for flexibly transferring domain knowledge as might be captured by bandit learning in a prior related task, a supervised pre-trained Bayesian posterior, or domain expert knowledge. Under standard conditions, we prove a general regret bound. We then apply our warm-start algorithmic technique to other common bandit learners-the epsilon-greedy and upper-confidence bound contextual learners. An upper regret bound is then provided for LinUCB. Our suite of warm-start learners are evaluated in experiments with both artificial and real-world datasets, including a motivating task of tuning a commercial database. A comprehensive range of experimental results are presented, highlighting the effect of different hyperparameters and quantities of pre-training data.
BibTeX
@article{oetomo2023cutting,
title = {Cutting to the chase with warm-start contextual bandits},
author = {Oetomo, Bastian and Perera, R. Malinga and Borovica-Gajic, Renata and Rubinstein, Benjamin I. P.},
journal = {Knowledge and Information Systems},
volume = {65},
number = {9},
pages = {3533-3565},
year = {2023},
month = sep,
abstract = {Multi-armed bandits achieve excellent long-term performance in practice and sublinear cumulative regret in theory. However, a real-world limitation of bandit learning is poor performance in early rounds due to the need for exploration-a phenomenon known as the cold-start problem. While this limitation may be necessary in the general classical stochastic setting, in practice where "pre-training" data or knowledge is available, it is natural to attempt to "warm-start" bandit learners. This paper provides a theoretical treatment of warm-start contextual bandit learning, adopting Linear Thompson Sampling as a principled framework for flexibly transferring domain knowledge as might be captured by bandit learning in a prior related task, a supervised pre-trained Bayesian posterior, or domain expert knowledge. Under standard conditions, we prove a general regret bound. We then apply our warm-start algorithmic technique to other common bandit learners-the epsilon-greedy and upper-confidence bound contextual learners. An upper regret bound is then provided for LinUCB. Our suite of warm-start learners are evaluated in experiments with both artificial and real-world datasets, including a motivating task of tuning a commercial database. A comprehensive range of experimental results are presented, highlighting the effect of different hyperparameters and quantities of pre-training data.},
doi = {10.1007/s10115-023-01861-2},
url = {https://link.springer.com/article/10.1007/s10115-023-01861-2},
publisher = {Springer},
month_numeric = {9}
} No DBA? No Regret! Multi-Armed Bandits for Index Tuning of Analytical and HTAP Workloads With Provable Guarantees
IEEE Transactions on Knowledge and Data Engineering, 2023
Vol. 35, No. 12 | pp. 12855-12872
Abstract
Automating physical database design has remained a long-term interest in database research due to substantial performance gains afforded by optimised structures. Despite significant progress, a majority of today's commercial solutions are highly manual, requiring offline invocation by database administrators (DBAs). This status quo is untenable: identifying representative static workloads is no longer realistic; and physical design tools remain susceptible to the query optimiser's cost misestimates. Furthermore, modern application environments like hybrid transactional and analytical processing (HTAP) systems render analytical modelling next to impossible. We propose a self-driving approach to online index selection that does not depend on the DBA and query optimiser, and instead learns the benefits of viable structures through strategic exploration and direct performance observation. We view the problem as one of sequential decision making under uncertainty, specifically within the bandit learning setting. Multi-armed bandits balance exploration and exploitation to provably guarantee average performance that converges to policies that are optimal with perfect hindsight. Our comprehensive empirical evaluation against a state-of-the-art commercial tuning tool demonstrates up to 75% speed-up in analytical processing environments and 59% speed-up in HTAP environments. Lastly, our bandit framework outperforms a Monte Carlo tree search (MCTS)-based database optimiser, providing up to 24% speed-up.
BibTeX
@article{10113193,
author = {Perera, R. Malinga and Oetomo, Bastian and Rubinstein, Benjamin I. P. and Borovica-Gajic, Renata},
journal = {IEEE Transactions on Knowledge and Data Engineering},
title = {No DBA? No Regret! Multi-Armed Bandits for Index Tuning of Analytical and HTAP Workloads With Provable Guarantees},
year = {2023},
volume = {35},
number = {12},
pages = {12855-12872},
abstract = {Automating physical database design has remained a long-term interest in database research due to substantial performance gains afforded by optimised structures. Despite significant progress, a majority of today's commercial solutions are highly manual, requiring offline invocation by database administrators (DBAs). This status quo is untenable: identifying representative static workloads is no longer realistic; and physical design tools remain susceptible to the query optimiser's cost misestimates. Furthermore, modern application environments like hybrid transactional and analytical processing (HTAP) systems render analytical modelling next to impossible. We propose a self-driving approach to online index selection that does not depend on the DBA and query optimiser, and instead learns the benefits of viable structures through strategic exploration and direct performance observation. We view the problem as one of sequential decision making under uncertainty, specifically within the bandit learning setting. Multi-armed bandits balance exploration and exploitation to provably guarantee average performance that converges to policies that are optimal with perfect hindsight. Our comprehensive empirical evaluation against a state-of-the-art commercial tuning tool demonstrates up to 75% speed-up in analytical processing environments and 59% speed-up in HTAP environments. Lastly, our bandit framework outperforms a Monte Carlo tree search (MCTS)-based database optimiser, providing up to 24% speed-up.},
doi = {10.1109/TKDE.2023.3271664}
} 2022
HMAB: Self-Driving Hierarchy of Bandits for Integrated Physical Database Design Tuning
Proc. VLDB Endow., 2022
Vol. 16, No. 2 | Oct | pp. 216-229
Abstract
Effective physical database design tuning requires selection of several physical design structures (PDS), such as indices and materialised views, whose combination influences overall system performance in a non-linear manner. While the simplicity of combining the results of iterative searches for individual PDSs may be appealing, such a greedy approach may yield vastly suboptimal results compared to an integrated search. We propose a new self-driving approach (HMAB) based on hierarchical multi-armed bandit learners, which can work in an integrated space of multiple PDS while avoiding the full cost of combinatorial search. HMAB eschews the optimiser cost misestimates by direct performance observations through a strategic exploration, while carefully leveraging its knowledge to prune the less useful exploration paths. As an added advantage, HMAB comes with a provable guarantee on its expected performance. To the best of our knowledge, this is the first learned system to tune both indices and materialised views in an integrated manner. We find that our solution enjoys superior empirical performance relative to state-of-the-art commercial physical database design tools that search over the integrated space of materialised views and indices. Specifically, HMAB achieves up to 96% performance gain over a state-of-the-art commercial physical database design tool when running industrial benchmarks.
BibTeX
@article{10.14778/3565816.3565824,
author = {Perera, R. Malinga and Oetomo, Bastian and Rubinstein, Benjamin I. P. and Borovica-Gajic, Renata},
title = {HMAB: Self-Driving Hierarchy of Bandits for Integrated Physical Database Design Tuning},
year = {2022},
issue_date = {October 2022},
publisher = {VLDB Endowment},
volume = {16},
number = {2},
issn = {2150-8097},
url = {https://doi.org/10.14778/3565816.3565824},
doi = {10.14778/3565816.3565824},
abstract = {Effective physical database design tuning requires selection of several physical design structures (PDS), such as indices and materialised views, whose combination influences overall system performance in a non-linear manner. While the simplicity of combining the results of iterative searches for individual PDSs may be appealing, such a greedy approach may yield vastly suboptimal results compared to an integrated search. We propose a new self-driving approach (HMAB) based on hierarchical multi-armed bandit learners, which can work in an integrated space of multiple PDS while avoiding the full cost of combinatorial search. HMAB eschews the optimiser cost misestimates by direct performance observations through a strategic exploration, while carefully leveraging its knowledge to prune the less useful exploration paths. As an added advantage, HMAB comes with a provable guarantee on its expected performance. To the best of our knowledge, this is the first learned system to tune both indices and materialised views in an integrated manner. We find that our solution enjoys superior empirical performance relative to state-of-the-art commercial physical database design tools that search over the integrated space of materialised views and indices. Specifically, HMAB achieves up to 96% performance gain over a state-of-the-art commercial physical database design tool when running industrial benchmarks.},
journal = {Proc. VLDB Endow.},
month = oct,
pages = {216-229},
numpages = {14},
month_numeric = {10}
} 2021
Cutting to the Chase with Warm-Start Contextual Bandits
2021 IEEE International Conference on Data Mining (ICDM), 2021
pp. 459-468
Abstract
Multi-armed bandits achieve excellent long-term performance in practice and sublinear cumulative regret in theory. However a real-world limitation of bandit learning is poor performance in early rounds due to the need for exploration- a phenomenon known as the cold-start problem. While this limitation may be necessary in the classical stochastic setting, in practice where "pre-training" data or knowledge is available, it is natural to attempt to "warm start" bandit learners. This paper provides a theoretical treatment of warm-start contextual bandit learning, adopting Linear Thompson Sampling as a principled framework for flexibly transferring domain knowledge as might be captured by bandit learning in a prior related task, a supervised pre-trained Bayesian posterior, or domain expert knowledge. Under standard conditions we prove a general regret bound. We then apply our warm-start algorithmic technique to other common bandit learners, the epsilon-greedy and upper-confidence bound contextual learners. Our suite of warm-start learners are evaluated in experiments with both artificial and real-world datasets, including a motivating task of tuning a commercial database.
BibTeX
@inproceedings{9679034,
author = {Oetomo, Bastian and Perera, R. Malinga and Borovica-Gajic, Renata and Rubinstein, Benjamin I. P.},
booktitle = {2021 IEEE International Conference on Data Mining (ICDM)},
title = {Cutting to the Chase with Warm-Start Contextual Bandits},
year = {2021},
pages = {459-468},
abstract = {Multi-armed bandits achieve excellent long-term performance in practice and sublinear cumulative regret in theory. However a real-world limitation of bandit learning is poor performance in early rounds due to the need for exploration- a phenomenon known as the cold-start problem. While this limitation may be necessary in the classical stochastic setting, in practice where "pre-training" data or knowledge is available, it is natural to attempt to "warm start" bandit learners. This paper provides a theoretical treatment of warm-start contextual bandit learning, adopting Linear Thompson Sampling as a principled framework for flexibly transferring domain knowledge as might be captured by bandit learning in a prior related task, a supervised pre-trained Bayesian posterior, or domain expert knowledge. Under standard conditions we prove a general regret bound. We then apply our warm-start algorithmic technique to other common bandit learners, the epsilon-greedy and upper-confidence bound contextual learners. Our suite of warm-start learners are evaluated in experiments with both artificial and real-world datasets, including a motivating task of tuning a commercial database.},
doi = {10.1109/ICDM51629.2021.00057}
} DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees
2021 IEEE 37th International Conference on Data Engineering (ICDE), 2021
pp. 600-611
Abstract
Automating physical database design has remained a long-term interest in database research due to substantial performance gains afforded by optimised structures. Despite significant progress, a majority of today's commercial solutions are highly manual, requiring offline invocation by database administrators (DBAs) who are expected to identify and supply representative training workloads. Even the latest advancements like query stores provide only limited support for dynamic environments. This status quo is untenable: identifying representative static workloads is no longer realistic; and physical design tools remain susceptible to the query optimiser's cost misestimates.We propose a self-driving approach to online index selection that eschews the DBA and query optimiser, and instead learns the benefits of viable structures through strategic exploration and direct performance observation. We view the problem as one of sequential decision making under uncertainty, specifically within the bandit learning setting. Multi-armed bandits balance exploration and exploitation to provably guarantee average performance that converges to policies that are optimal with perfect hindsight. Our simplified bandit framework outperforms deep reinforcement learning (RL) in terms of convergence speed and performance volatility. Comprehensive empirical results demonstrate up to 75% speed-up on shifting and ad-hoc workloads and 28% speed-up on static workloads compared against a state-of-the-art commercial tuning tool and up to 58% speed-up against the deep RL alternatives.
BibTeX
@inproceedings{9458699,
author = {Perera, R. Malinga and Oetomo, Bastian and Rubinstein, Benjamin I. P. and Borovica-Gajic, Renata},
booktitle = {2021 IEEE 37th International Conference on Data Engineering (ICDE)},
title = {DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees},
year = {2021},
pages = {600-611},
abstract = {Automating physical database design has remained a long-term interest in database research due to substantial performance gains afforded by optimised structures. Despite significant progress, a majority of today's commercial solutions are highly manual, requiring offline invocation by database administrators (DBAs) who are expected to identify and supply representative training workloads. Even the latest advancements like query stores provide only limited support for dynamic environments. This status quo is untenable: identifying representative static workloads is no longer realistic; and physical design tools remain susceptible to the query optimiser's cost misestimates.We propose a self-driving approach to online index selection that eschews the DBA and query optimiser, and instead learns the benefits of viable structures through strategic exploration and direct performance observation. We view the problem as one of sequential decision making under uncertainty, specifically within the bandit learning setting. Multi-armed bandits balance exploration and exploitation to provably guarantee average performance that converges to policies that are optimal with perfect hindsight. Our simplified bandit framework outperforms deep reinforcement learning (RL) in terms of convergence speed and performance volatility. Comprehensive empirical results demonstrate up to 75% speed-up on shifting and ad-hoc workloads and 28% speed-up on static workloads compared against a state-of-the-art commercial tuning tool and up to 58% speed-up against the deep RL alternatives.},
doi = {10.1109/ICDE51399.2021.00058}
} 2019
A Note on Bounding Regret of the C2UCB Contextual Combinatorial Bandit
CoRR, 2019
Vol. abs/1902.07500
BibTeX
@article{DBLP:journals/corr/abs-1902-07500,
author = {Oetomo, Bastian and Perera, Malinga and Borovica{-}Gajic, Renata and Rubinstein, Benjamin I. P.},
title = {A Note on Bounding Regret of the C2UCB Contextual Combinatorial Bandit},
journal = {CoRR},
volume = {abs/1902.07500},
year = {2019},
url = {http://arxiv.org/abs/1902.07500},
eprinttype = {arXiv},
eprint = {1902.07500},
timestamp = {Tue, 21 May 2019 18:03:39 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-1902-07500.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}