Csaba Szepesvári

From MaRDI portal
(Redirected from Person:399885)



List of research outcomes

This list is not complete and representing at the moment only items from zbMATH Open and arXiv. We are working on additional sources - please check back here soon!

PublicationDate of PublicationType
Sample-based planning and learning with function approximation
Statistical Science
2026-02-10Paper
Exponential lower bounds for planning in MDPs with linearly-realizable optimal action-value functions2025-02-11Paper
Cleaning up the neighborhood: a full classification for adversarial partial monitoring2025-01-31Paper
An exponential Efron-Stein inequality for L_q stable learning rules2025-01-31Paper
Optimistic MLE: a generic model-based algorithm for partially observable sequential decision making2024-05-08Paper
scientific article; zbMATH DE number 7626742 (Why is no real title available?)
(available as arXiv preprint)
2022-12-06Paper
scientific article; zbMATH DE number 7626742 (Why is no real title available?)2022-12-06Paper
Gradient descent for sparse rank-one matrix completion for crowd-sourced aggregation of sparsely interacting workers
(available as arXiv preprint)
2020-10-05Paper
Gradient descent for sparse rank-one matrix completion for crowd-sourced aggregation of sparsely interacting workers2020-10-05Paper
Bandit algorithms2020-05-11Paper
A modular analysis of adaptive (non-)convex optimization: optimism, composite objectives, variance reduction, and variational bounds
Theoretical Computer Science
2020-01-29Paper
Mixing time estimation in reversible Markov chains from a single sample path
The Annals of Applied Probability
2019-10-22Paper
Mixing time estimation in reversible Markov chains from a single sample path
The Annals of Applied Probability
2019-10-22Paper
A modular analysis of adaptive (non-)convex optimization: optimism, composite objectives, and variational bounds2019-01-10Paper
A modular analysis of adaptive (non-)convex optimization: optimism, composite objectives, and variational bounds
(available as arXiv preprint)
2019-01-10Paper
Stochastic Optimization in a Cumulative Prospect Theory Framework
IEEE Transactions on Automatic Control
2018-09-18Paper
A Linearly Relaxed Approximate Linear Program for Markov Decision Processes
IEEE Transactions on Automatic Control
2018-06-27Paper
Following the leader and fast rates in online linear prediction: curved constraint sets and other regularities2018-04-17Paper
Following the leader and fast rates in online linear prediction: curved constraint sets and other regularities
(available as arXiv preprint)
2018-04-17Paper
Online Markov Decision Processes Under Bandit Feedback
IEEE Transactions on Automatic Control
2017-05-16Paper
Regularized policy iteration with nonparametric function spaces
Journal of Machine Learning Research (JMLR)
2016-11-22Paper
Partial monitoring -- classification, regret bounds, and algorithms
Mathematics of Operations Research
2015-04-24Paper
On Learning the Optimal Waiting Time
Lecture Notes in Computer Science
2015-01-14Paper
Alignment based kernel learning with a continuous set of base kernels
Machine Learning
2014-08-20Paper
\(X\)-armed bandits2014-02-03Paper
Toward a classification of finite partial-monitoring games
Theoretical Computer Science
2013-03-04Paper
Partial monitoring with side information
Lecture Notes in Computer Science
2012-10-16Paper
Model selection in reinforcement learning
Machine Learning
2012-05-08Paper
Regularized least-squares regression: learning from a sequence
Journal of Statistical Planning and Inference
2011-11-10Paper
Finite-time bounds for fitted value iteration2011-11-08Paper
Training parsers by inverse reinforcement learning
Machine Learning
2010-10-07Paper
Toward a classification of finite partial-monitoring games
Lecture Notes in Computer Science
2010-10-01Paper
Algorithms for reinforcement learning.
Synthesis Lectures on Artificial Intelligence and Machine Learning
2010-09-10Paper
Active learning in heteroscedastic noise
Theoretical Computer Science
2010-07-07Paper
Models of active learning in group-structured state spaces
Information and Computation
2010-04-08Paper
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
Theoretical Computer Science
2009-05-12Paper
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
Machine Learning
2009-03-31Paper
Active Learning in Multi-armed Bandits
Lecture Notes in Computer Science
2008-10-14Paper
Active Learning of Group-Structured Environments
Lecture Notes in Computer Science
2008-10-14Paper
Tuning Bandit Algorithms in Stochastic Environments
Lecture Notes in Computer Science
2008-08-19Paper
Machine Learning: ECML 2004
Lecture Notes in Computer Science
2008-03-14Paper
Improved Rates for the Stochastic Continuum-Armed Bandit Problem
Learning Theory
2008-01-03Paper
Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path
Learning Theory
2007-09-14Paper
Computer Vision - ECCV 2004
Lecture Notes in Computer Science
2005-12-27Paper
Efficient approximate planning in continuous space Markovian decision problems
AI Communications
2002-05-02Paper
An asynchronous stochastic approximation theorem and some applications
Alkalmazott Matematikai Lapok. A Magyar Tudomanyos Akademia. Matematikai es Fizikai Tudomanyok Osztalyanak Közlemenyei
2001-04-01Paper
scientific article; zbMATH DE number 1528670 (Why is no real title available?)2000-11-13Paper
Convergence results for single-step on-policy reinforcement-learning algorithms
Machine Learning
2000-06-21Paper
scientific article; zbMATH DE number 1336305 (Why is no real title available?)1999-09-14Paper
Module-based reinforcement learning: Experiments with a real robot
Machine Learning
1998-10-13Paper
An integrated architecture for motion-control and path-planning1998-06-08Paper
Robust control using inverse dynamics neurocontrollers
Nonlinear Analysis: Theory, Methods & Applications
1998-05-11Paper
Approximate geometry representations and sensory fusion
Neurocomputing
1997-03-31Paper


Research outcomes over time


This page was built for person: Csaba Szepesvári