CleverShuttle's ride pooling algorithm currently works by assigning points for various characteristics of a ride and then selecting the shuttles with the highest number of points. This approach is simple to design but makes it difficult to compare the performance of various candidate objective functions and thus define an "optimal" one. However, what we can do is retrospectively identify the correct ride pooling decisions. With this information we were able to define an algorithm that searches for the optimal objective function for the ride pooling problem and uses the correct past decisions to guide this search.

Ride pooling allows customers that have similar destinations to travel together (for at least part of the journey), which thereby aims to reduce the overall level of traffic in the city. This is of course similar to how traditional modes of public transport such as bus and metro operate; the difference here is that ride pooling vehicles follow neither a set route nor a time table. Thus, the goal of ride pooling is not to replace traditional public transport, but to complement it by offering an extra form of mobility that is both on demand and flexible, while still achieving the logistical efficiency of shared transport.

This is particularly advantageous in cities that are not well covered by existing transport services. However, it is also useful in filling in the gaps in cities with strong transport networks that nonetheless cannot cover the whole city. So, by simultaneously servicing numerous customers with one (electric) vehicle, who would otherwise have travelled in individual vehicles, companies such as CleverShuttle are able to not only transport people around the city more efficiently, but also help the environment by reducing CO2 emissions.

In order to realise the efficiency that ride pooling can offer, the "best" shuttle must be matched to the customer at the time of booking. The difficulty here lies in defining what "best" means in this context. To do this, we need to somehow understand how the choice of shuttle affects a variety of factors including the customer experience, the interplay with other customers' rides, the efficiency of this ride and the future state of the entire system.

Let us consider the following (simplified) example. Our customer in Berlin would like to travel from Checkpoint Charlie to Alexanderplatz and book a ride via the app. We now have to decide which shuttle to offer the customer. In this example, we have two vehicles available.

The first shuttle is currently free at Bahnhof Zoo and can get to the customer in 10 minutes.

**Waiting time:** 10 minutes**Detours:** None

The second shuttle is currently on its way to Bergmannkiez to pick up another customer who will then be dropped off at Gendarmenmarkt

**Waiting time:** 5 minutes**Detours:** 5 minutes for each customer

The question is now - which of these shuttles is better for our customers as well as for Clevershuttle? And how can this be determined algorithmically? Given the scale of our operations, these decisions must also be made quickly - within a few milliseconds.

• *c _{w}* ,

• *u _{c}* is the detour for each customer on this tour

• *w* is the waiting time for the booking customer

The coefficients should be selected such that the objective function always selects the perfect solution.

However, in addition to the detours and waiting time, there are naturally numerous further criteria according to which the optimal vehicle could be selected, such as:

• the probability of the customer accepting or cancelling a ride

• the demand throughout the city

• driver shift times and labour regulations (whereby some vehicles need to be back at the depot by a certain time)

We have also assumed a linear objective function here. But is the effect of each factor actually linear?

We are just starting to scratch the surface of the complexity of our problem and it's quite apparent that an attempt to find an optimal objective function without any assistance would be rather difficult. So, what is the solution here?

Let us now assume that the best solution for a certain set of bookings were known. In our hypothetical situation, for the bookings *b _{1}* , . . . ,

All that’s missing now is a way to search for this function. One group of methods that are particularly well suited to this task are evolutionary algorithms, which can be used to iterably improve the objective function.

Evolutionary algorithms are a family of metaheuristic based optimisation methods that use mechanisms inspired by biological evolution. The specific subset of evolutionary algorithms that we'll focus on here are the natural selection inspired genetic algorithms. A genetic algorithm starts with a random population of candidate solutions and with each generation, the strongest are selected for crossover (which carries over their characteristics to the next generation), and the weakest are removed.

The strength or weakness of a candidate solution is evaluated through a concept called fitness. Fitness can be calculated in various ways, the simplest way is to define fitness as a function of the components of the candidate solution, e.g. a sum of all the components. However, this is of course not the only way. In our hypothetical situation where we know the optimal shuttle for a ride, we were able to judge the performance of a potential objective function by the number of correct decisions it makes. In the context of an evolutionary algorithm, this would become the fitness evaluation. Thus, with each candidate objective function, we could use this fitness to guide our search towards an objective function that maximises the number of optimally matched shuttles.

If you are familiar with machine learning, you may notice that this approach is reminiscent of many supervised learning algorithms. The difference here is that instead of using the correct answers to recognise underlying patterns in the data as in machine learning, this "supervised" approach to fitness uses the correct answers as a heuristic to help inform our optimisation search.

The structure of the objective function should naturally be relatively free, i.e. not restricted to a linear function. Each candidate function is comprised of a sum of terms that are defined as follows:

The variables to be optimised by the evolutionary algorithm are in this case are exponents *e _{1}*,...,

Another advantage of evolutionary algorithms is that they can be extended to solve multiobjective optimisation problems, where two or more criteria must be optimised simultaneously. This is especially useful in our case, where the interests of multiple parties (the customer that booked, other customers already in shuttles, the logistical efficiency of the entire system, etc.) need to be taken into account. A multiobjective algorithm doesn't deliver a single optimal result, but rather a set of results called a Pareto frontier, in which no result is strictly better than any other according to all criteria; this is known as Pareto efficiency.

The evolutionary algorithm that we used was a multiobjective algorithm called Strength Pareto Evolutionary Algorithm 2 (SPEA2) (Zitzler et al. 2001). Using this we optimised on two criteria:

**1. customer friendliness**, defined only from the perspective of the customer that is booking; and

**2. overall efficiency**, which accounts for the needs of all others customers as well as the state of the whole system

The first criterion is relatively easy to evaluate; as the most customer friendly shuttle we selected the one that gets the customer to its destination quickest, i.e. the shuttle that minimises the sum of waiting time and travel time. Identifying the optimal shuttle for the second criterion is a much more ambiguous problem and depends on a lot of factors that are not known at the time of booking. However, by retrospectively analysing a set of rides, we are able to incorporate information about these factors (i.e. events that occurred shortly after the booking of the ride in question). With this knowledge we can understand how different decisions would be affected by subsequent events and what other options would have been available. This allowed us to fairly accurately determine what would have been the best solution for each booking. This knowledge is of course unknown when a customer books, however, it can be approximated using various models such as geospatial demand forecasts.

Finding the best solutions was a fairly labour intensive process to begin with, that involved visualising each ride and manually deciding which shuttle would have been best in each situation based off the visual information and numerous other data points describing each of the options. But eventually, much of the process was able to be automated by deducing rules and heuristics to make the correct decisions for us. This gave us the optimal decisions for the fitness calculations.

Our candidate objective function included terms for various characteristics of the ride, such as acceptance probability, measures of efficiency, detours, waiting times, demand forecasts and levels, measures of fleet availability, etc. In addition to this, we defined interaction terms for characteristics that have joint relevance, e.g. efficiency and fleet availability (the fewer shuttles available, the more important efficiency becomes). Thus, the variables to be optimised were:

• coefficients for each term

• exponents for each term

• arithmetic operators for each interaction term

In each generation, new variables were generated (from either a numeric interval or a discrete set depending on the variable) using random number generators and various distributions. Fitness was evaluated using the retrospectively identified optimal decisions and the candidate objective functions were iteratively improved using crossover, elitism and mutation to effectively explore the solution space and keep the strongest solutions. The result is a set of Pareto efficient objective functions.

Developed by Juri Alexander, Alexander Hoen, Tristan Williams. Designed by Maxime Letellier and Sebastian Wiege.