Any good data scientist is (or at least should be) adept at taking complex mathematical and statistical models and explaining them in a simple and concise manner. In the end, our job is to create value for our company or client. Even if we have a model with 99.9999999% accuracy, management is unlikely to use it to make decisions unless they understand (at the very least) the basics of the model.
A large part of any business is built around understanding the company’s clients, and ensuring their needs and wants are being satisfied. This helps us ensure our clients are actually using the products we’re creating/providing and that we’re spending our own resources in optimal business segments. A common approach to understanding our clients is to segregate them into distinct groups. Instead of trying to understand and develop products for hundreds of thousands of individual people or companies, we can instead focus our efforts on a few distinct groups which represent our underlying clients. This allows us to make more informed, targeted decisions that will have a greater impact. …
Suppose you download your bank transactions for 2020. What are the chances that a random transaction’s amount begins with 3? Considering that there are 9 possible digits (omitting 0 as a 1st digit), you’d logically guess 1/9. Surprisingly, this is wrong. The true probability is actually around 12%. And the probability that the first digit is a 1 is amazingly over 30%.
So where did this rule come from and how can we use it?
Although commonly known as Benford’s Law, like many famous laws, it’s not named after the first person to discover it. It was actually an astronomer named Simoon Newcomb who noticed in the late 1800’s that in logarithm tables, some pages were worn much more than others — particularly the first few pages. …
If computers only do what we tell them, how do they create random numbers?
Random numbers are all around us, particuarly when we look at computers. Our “auto-generated” passwords, the amount of coins you win for logging in daily to your favorite game, and, of course, the =RAND() Excel function — all random. So where do these random numbers come from? Is there some magical random place within your computer?
Like all things in computer (quantum computers excluded), things just don’t happen on their own. Computers do what they’re programmed to do. The same applies to random numbers. Not to burst your bubble, but those “random” numbers aren’t actually random, as we’ll see. …
Is our company’s Facebook advertising even worth the effort?
A company would like to know if their advertising is effective. Before you start, yes…. Facebook does have analytics for users who actually utilize their advertising platform. Our customer does not. Their “advertisements” are posts on their feed and are not marketed by Facebook.
Data is from the client’s POS system and their Facebook feed.
KISS. A simple linear model will suffice.
First, we need to obtain our data. We can use a nice Facebook scraper to scrape the last posts in a usable format.
#install & load scraper !pip install facebook_scraper from facebook_scraper import get_posts import pandas as…
I think it’s semi-rare to find data science problems that aren’t using iris or titanic. Which is why, for my first-ever Medium post, I’m going to show a project from start-to-finish. Actual data. Actual problem. Actual solution that saves a business money.
QUESTION:
A restaurant would like to optimize its staff levels. It would like to ensure that it’s not paying for unneeded staff, while also having enough staff to serve customers. It also would like to take its staff into consideration by ensuring they’re not overworked. How can we achieve this?
DATA:
Data is from Point-of-Sale (POS) system.
MODEL:
An MMCK queuing model. …
When you think of the largest companies in the world, who do you think of?