This morning, I found this tweet by John Burn-Murdoch, a statistician at the Financial Times, about a graphic he had made for a Simon Kuper (of Soccernomics fame) piece on Jose Mourinho. Burn-Murdoch also helpfully shared the code he had written to produce this graphic, through which I discovered ClubElo, a website that produces chess-style…
Just Plot It
One of my favourite work stories is from this job I did a long time ago. The task given to me was demand forecasting, and the variable I needed to forecast was so “micro” (this intersection that intersection the other) that forecasting was an absolute nightmare. A side effect of this has been that I…
Bangalore names are getting shorter
The Bangalore Names Dataset, derived from the Bangalore Voter Rolls (cleaned version here), validates a hypothesis that a lot of people had – that given names in Bangalore are becoming shorter. From an average of 9 letters in the name for a male aged around 80, the length of the name comes down to 6.5…
Smashing the Law of Conservation of H
A decade and half ago, Ravikiran Rao came up with what he called the “law of conservation of H“. The concept has to do with the South Indian practice of adding a “H” to denote a soft consonant, a practice not shared by North Indians (Karthik instead of Kartik for example). This practice, Ravikiran claims,…
The Comeback of Lakshmi
A few months back I stumbled upon this dataset of all voters registered in Bangalore. A quick scraping script followed by a run later, I had the names and addresses and voter IDs of all voters registered to vote in Bangalore in the state assembly elections held this way. As you can imagine, this is…
Human, Animal and Machine Intelligence
Earlier this week I started watching this series on Netflix called “Terrorism Close Calls“. Each episode is about an instance of attempted terrorism that has been foiled in the last 2 decades. For example, there is one example of the plot to bomb a set of transatlantic flights from London to North America in 2006…
Single Malt Recommendation App
Life is too short to drink whisky you don’t like. How often have you found yourself in a duty free shop in an airport, wondering which whisky to take back home? Unless you are a pro at this already, you might want something you haven’t tried before, but don’t want to end up buying something…
I’m not a data scientist
After a little over four years of trying to ride a buzzword wave, I hereby formally cease to call myself a data scientist. There are some ongoing assignments where that term is used to refer to me, and that usage will continue, but going forward I’m not marketing myself as a “data scientist”, and will…
Attractive graphics without chart junk
A picture is worth a thousand words, but ten pictures are worth much less than ten thousand words One of the most common problems with visualisation, especially in the media, is that of “chart junk”. Graphics designers working for newspapers and television channels like to decorate their graphs, to make it more visually appealing. And…
The missing middle in data science
Over a year back, when I had just moved to London and was job-hunting, I was getting frustrated by the fact that potential employers didn’t recognise my combination of skills of wrangling data and analysing businesses. A few saw me purely as a business guy, and most saw me purely as a data guy, trying…
Statistics and machine learning approaches
A couple of years back, I was part of a team that delivered a workshop in machine learning. Given my background, I had been asked to do a half-day session on Regression, and was told that the standard software package being used was the scikit-learn package in python. Both the programming language and the package…
Dam capacity
In Mint, Narayan Ramachandran has a nice op-ed on the issue of dam capacity and damn management in the wake of the floods in Kerala last year. In that, he writes: For dams to do their jobs in extreme situations, they should have large unfilled capacity in their reservoirs when extreme events occur Reading this…
Why data scientists should be comfortable with MS Excel
Most people who call themselves “data scientists” aren’t usually fond of MS Excel. It is slow and clunky, can only handle a million rows of data (and nearly crash your computer if you go anywhere close to that), and despite the best efforts of Visual Basic, is not very easy to program for doing repeatable…
Meaningful and meaningless variables (and correlations)
A number of data scientists I know like to go about their business in a domain-free manner. They make a conscious choice to not know anything about the domain in which they are solving the problem, and instead treat a dataset as just a set of anonymised data, and attack it with the usual methods.…
Yet another way of classifying data scientists
There are many axes along which we can classify data scientists. We can classify based on the primary specialty, in terms “analytics”, “business intelligence” and “machine learning”. We can classify based on domain, into “financial data scientists” and “retail data scientists” and “industrial data scientists”. We can classify by the choice of primary software tool,…
Stocks and flows
One common mistake even a lot of experienced analysts make is comparing stocks to flows. Recently, for example, Apple’s trillion dollar valuation was compared to countries’ GDP. A few years back, an article compared the quantum of bad loans in Indian banks to the country’s GDP. Following an IPL auction a few years back, a…
Why AI will always be biased
Out on Marginal Revolution, Alex Tabarrok has an excellent post on why “sexism and racism will never diminish“, even when people on the whole become less sexist and racist. The basic idea is that there is always a frontier – even when we all become less sexist or racist, there will be people who will…
Podcast on election forecasting
I recorded a podcast for Pragati on opinion polls, exit polls, election forecasting and all such. You can listen to it right here, or on the podcast page here. The Pragati Podcast is available on all major podcast apps.
Beer and diapers: Netflix edition
When we started using Netflix last May, we created three personas for the three of us in the family – “Karthik”, “Priyanka” and “Berry”. At that time we didn’t realise that there was already a pre-created “kids” (subsequently renamed “children” – don’t know why that happened) persona there. So while Priyanka and I mostly use…
Stirring the pile efficiently
Warning: This is a technical post, and involves some code, etc. As I’ve ranted a fair bit on this blog over the last year, a lot of “machine learning” in the industry can be described as “stirring the pile”. Regular readers of this blog will be familiar with this image from XKCD by now: Basically…
Astrology and Data Science
The discussion goes back some 6 years, when I’d first started setting up my data and management consultancy practice. Since I’d freshly quit my job to set up the said practice, I had plenty of time on my hands, and the wife suggested that I spend some of that time learning astrology. Considering that I’ve…