# DVRPSR_PPO

**Repository Path**: yeezusback/DVRPSR_PPO

## Basic Information

- **Project Name**: DVRPSR_PPO
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-03-17
- **Last Updated**: 2025-03-17

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# DVRPSR_PPO
DRL for Dynamic Vehicle Routing Problem with stochastic customer requests

# Abstract:

The Vehicle routing problem (VRP) is a well-known combinatorial optimization problem in the operations research and computer science community. It concerns the determination of the set of routes for a fleet of vehicles that travel from a central location (depot) to provide services to a set of customers. In real-world operations, not all information is available at the start of the routes, and changes to the routes and new customer requests can occur dynamically. Decision-makers (DMs) learn this sequential information over multiple periods in a dynamic manner. This gives rise to a class of VRPs known as Stochastic Dynamic Vehicle Routing Problems (SDVRPs).
In this thesis, we consider the Dynamic VRP with stochastic requests (DVRPSRs), in which vehicles must be routed dynamically to maximize the number of served customers. We formulate the DVRPSRs as a Markov Decision Process (MDP) that captures the evaluation of stochastic and dynamic systems. The optimal solution can be viewed as a sequence of decisions. To solve this problem, we propose a Reinforcement Learning (RL) framework based on the attention mechanism. One of the challenges of RL is finding a compact and sufficiently extensive state representation. This is crucial for finding a feasible solution in real-time. To tackle this challenge, we develop an attention-based RL model that contains an encoder-decoder structure. We use graph embedding to represent the customer information in the encoder at every step, while the decoder finds the next customer to visit based on attention values. We train the RL model using a policy-gradient algorithm to optimize the parameters. The trained model produces sequential decisions in real- time and on new instances. Finally, we demonstrate the effectiveness of the trained model on real-world street data and compare its performance with promising methods in the literature.