Skip to content

flash-36/DOPL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Direct Online Preference Learning

This is the official codebase for the paper:
Direct Online Preference Learning for Restless Bandits with Preference Feedback


Setup Instructions

Install Dependencies

  1. Using pip:

    pip install -r requirements.txt
  2. Using conda:

    conda env create -f environment.yaml

Additional Requirements for Linear Solver

To use the Linear Program Solver, a Gurobi License is required.
Refer to the official academic licensing program here:
Academic Program and Licenses - Gurobi


Setting Up the PREF-RMAB Environments

Environments are completely characterized by their transition kernels (transition probability matrices) and reward functions. To generate these files for predefined environments, run the following commands:

cd dopl/RMAB_env_instances
python create_cpap.py
python create_armman.py
python create_app_marketing.py

Creating Your Own Environment

You can create a custom environment by writing a create_<your_env>.py script that generates the transition kernel and reward .npy files.

  • Modify the env_config.arm parameter in the configuration file to use <your_env> in a run.

Running the Code

Run the code for a specific environment using the following command:

python run.py --config-name=<env_name>

Example:

To start a run for the CPAP environment:

python run.py --config-name=cpap

Check out the configuration files in the conf folder.

The output will be stored in a timestamped directory within the outputs folder.

About

DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages