This is the official codebase for the paper:
Direct Online Preference Learning for Restless Bandits with Preference Feedback
-
Using pip:
pip install -r requirements.txt
-
Using conda:
conda env create -f environment.yaml
To use the Linear Program Solver, a Gurobi License is required.
Refer to the official academic licensing program here:
Academic Program and Licenses - Gurobi
Environments are completely characterized by their transition kernels (transition probability matrices) and reward functions. To generate these files for predefined environments, run the following commands:
cd dopl/RMAB_env_instances
python create_cpap.py
python create_armman.py
python create_app_marketing.pyYou can create a custom environment by writing a create_<your_env>.py script that generates the transition kernel and reward .npy files.
- Modify the
env_config.armparameter in the configuration file to use<your_env>in a run.
Run the code for a specific environment using the following command:
python run.py --config-name=<env_name>To start a run for the CPAP environment:
python run.py --config-name=cpapCheck out the configuration files in the conf folder.
The output will be stored in a timestamped directory within the outputs folder.