Learning in Dynamic Systems with Unknown Models from reward policy objective Watch Video
Preview(s):
Gallery
Play Video: (Note: The default playback of the video is HD VERSION. If your browser is buffering the video slowly, please play the REGULAR MP4 VERSION or Open The Video below for better experience. Thank you!)
⏲ Duration: 37 min 75 sec ✓ Published: 12-Oct-2011
Description: Qing ZhaonUniversity of California, DavisnnHostnRichard LannAbstractnSince the first multi-armed bandit (MAB) problem posed by Thompson in 1933 for the application of clinical trials, MAB has developed into an important branch in stochastic optimization and machine learning and has found a wide range of applications in economics and finance, medicine, and industrial engineering. It has recently received increasing attention from the communications and networking research community for formulatin
Play Video: (Note: The default playback of the video is HD VERSION. If your browser is buffering the video slowly, please play the REGULAR MP4 VERSION or Open The Video below for better experience. Thank you!)