Finally a clustering model you won’t have to spending time explaining

Image for post
Image for post
Image by Author

Any good data scientist is (or at least should be) adept at taking complex mathematical and statistical models and explaining them in a simple and concise manner. In the end, our job is to create value for our company or client. Even if we have a model with 99.9999999% accuracy, management is unlikely to use it to make decisions unless they understand (at the very least) the basics of the model.

Our Problem

A large part of any business is built around understanding the company’s clients, and ensuring their needs and wants are being satisfied. This helps us ensure our clients are actually using the products we’re creating/providing and that we’re spending our own resources in optimal business segments. A common approach to understanding our clients is to segregate them into distinct groups. Instead of trying to understand and develop products for hundreds of thousands of individual people or companies, we can instead focus our efforts on a few distinct groups which represent our underlying clients. This allows us to make more informed, targeted decisions that will have a greater impact. …


What are the chances of rolling a dice and getting 5? Of course, 1/6. What are the chances a randomly selected number between 1 and 100 is 32? 1/100.

Suppose you download your bank transactions for 2020. What are the chances that a random transaction’s amount begins with 3? Considering that there are 9 possible digits (omitting 0 as a 1st digit), you’d logically guess 1/9. Surprisingly, this is wrong. The true probability is actually around 12%. And the probability that the first digit is a 1 is amazingly over 30%.

So where did this rule come from and how can we use it?

History

Although commonly known as Benford’s Law, like many famous laws, it’s not named after the first person to discover it. It was actually an astronomer named Simoon Newcomb who noticed in the late 1800’s that in logarithm tables, some pages were worn much more than others — particularly the first few pages. …


If computers only do what we tell them, how do they create random numbers?

Image for post
Image for post

Random numbers are all around us, particuarly when we look at computers. Our “auto-generated” passwords, the amount of coins you win for logging in daily to your favorite game, and, of course, the =RAND() Excel function — all random. So where do these random numbers come from? Is there some magical random place within your computer?

Like all things in computer (quantum computers excluded), things just don’t happen on their own. Computers do what they’re programmed to do. The same applies to random numbers. Not to burst your bubble, but those “random” numbers aren’t actually random, as we’ll see. …


Image for post
Image for post

Is our company’s Facebook advertising even worth the effort?

QUESTION:

A company would like to know if their advertising is effective. Before you start, yes…. Facebook does have analytics for users who actually utilize their advertising platform. Our customer does not. Their “advertisements” are posts on their feed and are not marketed by Facebook.

DATA:

Data is from the client’s POS system and their Facebook feed.

MODEL:

KISS. A simple linear model will suffice.

First, we need to obtain our data. We can use a nice Facebook scraper to scrape the last posts in a usable format.

#install & load scraper !pip install facebook_scraper from facebook_scraper import get_posts import pandas as…


Image for post
Image for post
Photo by ELEVATE on Pexels.com

I think it’s semi-rare to find data science problems that aren’t using iris or titanic. Which is why, for my first-ever Medium post, I’m going to show a project from start-to-finish. Actual data. Actual problem. Actual solution that saves a business money.

QUESTION:

A restaurant would like to optimize its staff levels. It would like to ensure that it’s not paying for unneeded staff, while also having enough staff to serve customers. It also would like to take its staff into consideration by ensuring they’re not overworked. How can we achieve this?

DATA:

Data is from Point-of-Sale (POS) system.

MODEL:

An MMCK queuing model. …

About

Garrett Bushnell

Consumed with learning. lowhangingfruitanalytics.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store