machine learning – Karthik Shashidhar

analytics, arbit, data big data, data science, machine learning, statistics

Statistical analysis revisited – machine learning edition

January 7, 2020

Over ten years ago, I wrote this blog post that I had termed as a “lazy post” – it was an email that I’d written to a mailing list, which I’d then copied onto the blog. It was triggered by someone on the group making an off-hand comment of “doing regression analysis”, and I had…

analytics, data, work clustering, customer segmentation, machine learning, segmentation

Segmentation and machine learning

October 3, 2019

For best results, use machine learning to do customer segmentation, but then get humans with domain knowledge to validate the segments There are two common ways in which people do customer segmentation. The “traditional” method is to manually define the axes through which the customers will get segmented, and then simply look through the data…

analytics, data, work amazon, artificial intelligence, customers, intelligence, machine learning, spotify

Taking Intelligence For Granted

August 26, 2019

There was a point in time when the use of artificial intelligence or machine learning or any other kind of intelligence in a product was a source of competitive advantage and differentiation. Nowadays, however, many people have got so spoiled by the use of intelligence in many products they use that it has become more…

analytics, arbit, data goodness of fit, machine learning, regression, signal to noise, statistics

More on statistics and machine learning

August 9, 2019

I’m thinking of a client problem right now, and I thought that something that we need to predict can be modelled as a function of a few other things that we will know. Initially I was thinking about it from the machine learning perspective, and my thought process went “this can be modelled as a…

arbit, computer science, data animals, artificial intelligence, machine learning, netflix

Human, Animal and Machine Intelligence

November 10, 2018

Earlier this week I started watching this series on Netflix called “Terrorism Close Calls“. Each episode is about an instance of attempted terrorism that has been foiled in the last 2 decades. For example, there is one example of the plot to bomb a set of transatlantic flights from London to North America in 2006…

business, data, work data science, machine learning, suitcase term, wave

I’m not a data scientist

October 24, 2018

After a little over four years of trying to ride a buzzword wave, I hereby formally cease to call myself a data scientist. There are some ongoing assignments where that term is used to refer to me, and that usage will continue, but going forward I’m not marketing myself as a “data scientist”, and will…

analytics, data, work data science, machine learning, regression, statistics

Statistics and machine learning approaches

September 3, 2018

A couple of years back, I was part of a team that delivered a workshop in machine learning. Given my background, I had been asked to do a half-day session on Regression, and was told that the standard software package being used was the scikit-learn package in python. Both the programming language and the package…

analytics, data, work classification, credit scoring, domain knowledge, image recognition, machine learning

Meaningful and meaningless variables (and correlations)

August 30, 2018

A number of data scientists I know like to go about their business in a domain-free manner. They make a conscious choice to not know anything about the domain in which they are solving the problem, and instead treat a dataset as just a set of anonymised data, and attack it with the usual methods.…

analytics, computer science, data, technology collaborative filtering, computer science, machine learning, netflix

Beer and diapers: Netflix edition

April 14, 2018

When we started using Netflix last May, we created three personas for the three of us in the family – “Karthik”, “Priyanka” and “Berry”. At that time we didn’t realise that there was already a pre-created “kids” (subsequently renamed “children” – don’t know why that happened) persona there. So while Priyanka and I mostly use…

analytics, computer science, data analytics, data science, jupyter, machine learning, python, sklearn, stirring the pile

Stirring the pile efficiently

February 14, 2018

Warning: This is a technical post, and involves some code, etc. As I’ve ranted a fair bit on this blog over the last year, a lot of “machine learning” in the industry can be described as “stirring the pile”. Regular readers of this blog will be familiar with this image from XKCD by now: Basically…

arbit, data, personal, randomness astrology, data science, machine learning

Astrology and Data Science

February 12, 2018

The discussion goes back some 6 years, when I’d first started setting up my data and management consultancy practice. Since I’d freshly quit my job to set up the said practice, I had plenty of time on my hands, and the wife suggested that I spend some of that time learning astrology. Considering that I’ve…

analytics data analysis, data science, dimension, machine learning

High dimension and low dimension data science

April 7, 2017

I’ve observed that there are two broad approaches that people take to getting information out of data. One approach is to simply throw a kitchen sink full of analytical techniques at the data. Without really trying to understand what the data looks like, and what the relationships may be, the analyst simply uses one method…