Optimize Restaurant Staff Levels

Garrett Bushnell
7 min readAug 9, 2020

--

Photo by ELEVATE on Pexels.com

I think it’s semi-rare to find data science problems that aren’t using iris or titanic. Which is why, for my first-ever Medium post, I’m going to show a project from start-to-finish. Actual data. Actual problem. Actual solution that saves a business money.

QUESTION:

A restaurant would like to optimize its staff levels. It would like to ensure that it’s not paying for unneeded staff, while also having enough staff to serve customers. It also would like to take its staff into consideration by ensuring they’re not overworked. How can we achieve this?

DATA:

Data is from Point-of-Sale (POS) system.

MODEL:

An MMCK queuing model.

CRITERIA:

Staff not under-worked (Utilization Rate < 80%) Staff not overworked (Utilization Rate > 90%) Minimum staff level

We can first load the data downloaded directly from the POS system.

#read in data
original_data <- read.csv(‘receipt_data.csv’, sep = ‘,’, na=c(“”,”NA”))
#clean datadata <- data.frame(original_data$Date, original_data$Time) %>% drop_na()#recast our data columns as Dates, Times, & Timestampscolnames(data) <- c(‘Date’, ‘Time’)data$Date <- as.Date(data$Date, format = ‘%m/%d/%y’)data$Timestamp <- as.POSIXct(paste(data$Date, data$Time),format = ‘%Y-%m-%d %I:%M %p’)data$Month <- month(data$Timestamp)str(data)## ‘data.frame’: 7510 obs. of 4 variables:
## $ Date : Date, format: “2019–09–26” “2019–09–26” …
## $ Time : Factor w/ 665 levels “1:00 pm”,”1:01 pm”,..: 557 430 558 527 592 524 557 523 496 473 …## $ Timestamp: POSIXct, format: “2019–09–26 20:07:00” “2019–09–26 18:00:00” …## $ Month : num 9 9 9 9 9 9 9 9 9 9 …

The data is read in, cleaned, and we’ve added some additional columns to help with our analysis. We can see we have 7,510 receipts to analyze.

Now let’s plot our distribution of receipts to see if anything is abnormal.

ggplot(data = data) + geom_bar(mapping = aes(x = Month)) + theme_economist_white() + ggtitle(‘Receipts per Month’)

It appears that the beginning and end of our data may be missing, since their frequencies are much lower than the other months. Let’s check.

summary(data$Date)## Min. 1st Qu. Median Mean 3rd Qu.
## “2019–03–28” “2019–05–13” “2019–06–28” “2019–06–26” “2019–08–07”
## Max.## “2019–09–26”

We were correct. Our data starts almost at the end of March. Our data ends slightly before the last day of September. We’ll have to remove both to prevent skews in our data.

#filter March and September data
data <- filter(data, Month > 3, Month < 9)
summary(data$Date )## Min. 1st Qu. Median Mean 3rd Qu.
## “2019–04–01” “2019–05–10” “2019–06–18” “2019–06–16” “2019–07–23”
## Max.## “2019–08–31”ggplot(data = data) + geom_bar(mapping = aes(x = Month))+ theme_economist_white() + ggtitle(‘Receipts per Month’)

Looks much better. So we can see that we have on average between 1,100 and 1,500 receipts each month.

Now let’s determine if our customers begin eating lunch and dinner at the same time each month. It may be that these change throughout the year. If there are significant changes, we may need to model each period separately.

data$Hour <- hour(data$Timestamp)
#group data by month and hour
hour_grouedby_month <- data %>% group_by(Month, Hour) %>% summarize(count_of_receipts =n())ggplot(data = data ) + geom_bar(mapping = aes(x=Hour)) + facet_wrap(~Month, nrow = 5, ncol = 1)+ theme_economist_white() + ggtitle(‘Receipts per Month per Hour’)

Perfect. Above, we can see that lunches are always the same time, 11am — 1pm (which makes since since these are the normal lunch times). Secondly, we see that dinner tends to start at the same time each month (4pm). Generally, dinner ends around 8, yet in July we end a little later at 9. We can likely get away with a simple model, but may want to go back and run a separate model for July, if the client sees that as being a significant problem.

We now know our times: 1. Lunch: 11:00–13:00 2. Dinner: 16:00–20:00

Let’s create a variable so we can easily pick out receipts in these two periods. We can ignore all others.

data <- data %>% filter((Hour >= 11 & Hour <= 14) | (Hour >=16 & Hour <= 20 ))
hist(data$Hour)
data$Meal <- ifelse((data$Hour >= 11 & data$Hour <= 14),’lunch’, ‘dinner’ )

To use a queuing model, we’re interesting in how often customers arrive, as well as how long they typically spend in the system (restaurant). We don’t have access to how often customers arrive, but we can use the timestamp of their receipt as a proxy. Given that the time to eat is most likely normally distributed, we’re likely to find the same distribution on the arrival times as we would on departure times(receipt times).

We can calculate the arrival times by lagging our timestamp. Of course if we’re in between days or period, the lagged value will be much greater than 1 hour. We will omit these.

#create lags
data <- data %>% arrange(desc(Timestamp))
data$lag <- lag(data$Timestamp, n=1)data$intra_arrival <- as.numeric(data$lag — data$Timestamp, units=”mins”)data <- data %>% filter(intra_arrival < 60)head(data)

Now we can begin modeling. We’ll model the Lunch and Dinner crowds separately, so we can obtain better estimates of the optimal staff levels.

#separate data since we’ll model each separately
lunch_data <- data %>% filter(Meal == ‘lunch’)
dinner_data <- data %>% filter(Meal == ‘dinner’)hist(lunch_data$intra_arrival, breaks = 60, col = ‘pink’)hist(dinner_data$intra_arrival, breaks = 60, col = ‘lightblue’)

Both distributions look very similar let’s see when the average intra-arrival time is for each.

#average intra-arrival times
avg_lunch_arrival <- mean(lunch_data$intra_arrival)
avg_dinner_arrival <- mean(dinner_data$intra_arrival)‘Average Intra-Arrival Times’## [1] “Average Intra-Arrival Times”paste(‘Lunch: ‘, round(avg_lunch_arrival,2))## [1] “Lunch: 8.28”paste(‘Dinner: ‘, round(avg_dinner_arrival,2))## [1] “Dinner: 8”

So, on average, we have a customer arriving every 8.3 minutes for lunch and every 8.0 minutes for dinner.

For our model, we’ll also need to know the average length of a meal. There is a good Cornell study that finds that the average meal in North America is 59.0 minutes.

Finally, we need to know our capacity. For this problem, it’ll be the number of tables we have — 10. We will then pick the optimal number of servers that keeps our utilization rate between 80% and 90%.

Thus, for our lunch model we’ll have the following parameters:

· lambda = 7.23 customers/hr

· mu = 1.02 customers/hr

· k=10

Time to finally model. We’ll start with lunch

lambda = 7.23
mu = 1.02
k = 10c = seq(1,10)#arrival #time to eat #servers #capacityl_results <- list(0)for(c_i in c){inputs <- NewInput.MMCK(lambda = lambda, mu = mu, c=c_i, k=k)model <- QueueingModel.i_MMCK(inputs)l_results[c_i] <- model$RO}l_final_data <- do.call(rbind,Map(data.frame, ‘Number of Servers’=sort(c, decreasing = FALSE), ‘Utilization’=l_results))
ggplot(data = l_final_data) + geom_col(mapping = aes(x=Number.of.Servers, y=Utilization)) +scale_x_continuous(breaks = c(seq(0,10, by=1)), labels = c(seq(0,10, by=1))) + geom_hline(yintercept = .8, color = ‘red’) + geom_hline(yintercept = .9, color = ‘red’)+ theme_economist_white() + ggtitle(‘Server Utilization (Lunch)’)

As we can see above, we need to have at least 7 servers if we want at least an 80% utilization rate while not overworking servers (at 90% utilization).

Thus, for our dinner model we’ll have the following parameters: lambda = 7.50 customers/hr mu = 1.02 customers/hr k = 10

· lambda = 7.50 customers/hr

· mu = 1.02 customers/hr

· k = 10

lambda = 7.50
mu = 1.02
k = 10c = seq(1,10)#arrival #time to eat #servers #capacityd_results <- list(0)for(c_i in c){inputs <- NewInput.MMCK(lambda = lambda, mu = mu, c=c_i, k=k)model <- QueueingModel.i_MMCK(inputs)d_results[c_i] <- model$RO}d_final_data <- do.call(rbind,Map(data.frame, ‘Number of Servers’=sort(c, decreasing = FALSE), ‘Utilization’=d_results))d_final_data
ggplot(data = d_final_data) + geom_col(mapping = aes(x=Number.of.Servers, y=Utilization)) +scale_x_continuous(breaks = c(seq(0,10, by=1)), labels = c(seq(0,10, by=1))) + geom_hline(yintercept = .8, color = ‘red’) + geom_hline(yintercept = .9, color = ‘red’)+ theme_economist_white() + ggtitle(‘Server Utilization (Dinner)’)

Similar to lunch, it our optimal server level for Dinner is 7, given that we’d like a utilization rate of at least 80% while having as little staff as possible.

We can now safely say that our restaurant should staff 7 employees for the Lunch and Dinner shifts (turns out, we really didn’t need to model each problem individually). This should ensure each employee is busy (with a utilization of at least 80%), but is not being overworked. This will also ensure the business is operating with minimal payroll expenses.

--

--