This morning, I discovered the Club Elo Ratings, and promptly proceeded to analyse Liverpool FC’s performance over the years based on these ratings, and then correlated the performance by manager. Then, playing around with the data of different clubs, I realised that there are plenty more stories to be told using this data, and they…
Built by Shanks
This morning, I found this tweet by John Burn-Murdoch, a statistician at the Financial Times, about a graphic he had made for a Simon Kuper (of Soccernomics fame) piece on Jose Mourinho. Burn-Murdoch also helpfully shared the code he had written to produce this graphic, through which I discovered ClubElo, a website that produces chess-style…
Just Plot It
One of my favourite work stories is from this job I did a long time ago. The task given to me was demand forecasting, and the variable I needed to forecast was so “micro” (this intersection that intersection the other) that forecasting was an absolute nightmare. A side effect of this has been that I…
Bangalore names are getting shorter
The Bangalore Names Dataset, derived from the Bangalore Voter Rolls (cleaned version here), validates a hypothesis that a lot of people had – that given names in Bangalore are becoming shorter. From an average of 9 letters in the name for a male aged around 80, the length of the name comes down to 6.5…
Smashing the Law of Conservation of H
A decade and half ago, Ravikiran Rao came up with what he called the “law of conservation of H“. The concept has to do with the South Indian practice of adding a “H” to denote a soft consonant, a practice not shared by North Indians (Karthik instead of Kartik for example). This practice, Ravikiran claims,…
The Comeback of Lakshmi
A few months back I stumbled upon this dataset of all voters registered in Bangalore. A quick scraping script followed by a run later, I had the names and addresses and voter IDs of all voters registered to vote in Bangalore in the state assembly elections held this way. As you can imagine, this is…
Human, Animal and Machine Intelligence
Earlier this week I started watching this series on Netflix called “Terrorism Close Calls“. Each episode is about an instance of attempted terrorism that has been foiled in the last 2 decades. For example, there is one example of the plot to bomb a set of transatlantic flights from London to North America in 2006…
Single Malt Recommendation App
Life is too short to drink whisky you don’t like. How often have you found yourself in a duty free shop in an airport, wondering which whisky to take back home? Unless you are a pro at this already, you might want something you haven’t tried before, but don’t want to end up buying something…
Elegant and practical solutions
There are two ways in which you can tie a shoelace – one is the “ordinary method”, where you explicitly make the loops around both ends of the lace before tying together to form a bow. The other is the “elegant method” where you only make one loop explicitly, but tie with such great skill…
I’m not a data scientist
After a little over four years of trying to ride a buzzword wave, I hereby formally cease to call myself a data scientist. There are some ongoing assignments where that term is used to refer to me, and that usage will continue, but going forward I’m not marketing myself as a “data scientist”, and will…
Book challenge update
At the beginning of this year, I took a break from Twitter (which lasted three months), and set myself a target to read at least 50 books during the calendar year. As things stand now, the number stands at 28, and it’s unlikely that I’ll hit my target, unless I count Berry’s story books in…
Networking events and positions of strength
This replicates some of the stuff I wrote in a recent blog post, but I put this on LinkedIn and wanted a copy here for posterity Having moved my consulting business to London earlier this year, I’ve had a problem with marketing. The basic problem is that while my network and brand is fairly strong…
Triangle marketing
This blog post is based more on how I have bought rather than how I have sold. The basic concept is that when you hear about a product or service from two or more independent sources, you are more likely to buy it. The threshold varies by the kind of product you are looking at.…
Attractive graphics without chart junk
A picture is worth a thousand words, but ten pictures are worth much less than ten thousand words One of the most common problems with visualisation, especially in the media, is that of “chart junk”. Graphics designers working for newspapers and television channels like to decorate their graphs, to make it more visually appealing. And…
Taking your audience through your graphics
A few weeks back, I got involved in a Twitter flamewar with Shamika Ravi, a member of the Indian Prime Minister’s Economic Advisory Council. The object of the argument was a set of gifs she had released to show different aspects of the Indian economy. Admittedly I started the flamewar. Guilty as charged. Thinking about…
Analytics for general managers
While good managers have always been required to be analytical, the level of analytical ability being asked of managers has been going up over the years, with the increase in availability of data. Now, this post is once again based on that one single and familiar data point – my wife. In fact, if you…
The missing middle in data science
Over a year back, when I had just moved to London and was job-hunting, I was getting frustrated by the fact that potential employers didn’t recognise my combination of skills of wrangling data and analysing businesses. A few saw me purely as a business guy, and most saw me purely as a data guy, trying…
Statistics and machine learning approaches
A couple of years back, I was part of a team that delivered a workshop in machine learning. Given my background, I had been asked to do a half-day session on Regression, and was told that the standard software package being used was the scikit-learn package in python. Both the programming language and the package…
Dam capacity
In Mint, Narayan Ramachandran has a nice op-ed on the issue of dam capacity and damn management in the wake of the floods in Kerala last year. In that, he writes: For dams to do their jobs in extreme situations, they should have large unfilled capacity in their reservoirs when extreme events occur Reading this…
Why data scientists should be comfortable with MS Excel
Most people who call themselves “data scientists” aren’t usually fond of MS Excel. It is slow and clunky, can only handle a million rows of data (and nearly crash your computer if you go anywhere close to that), and despite the best efforts of Visual Basic, is not very easy to program for doing repeatable…
Meaningful and meaningless variables (and correlations)
A number of data scientists I know like to go about their business in a domain-free manner. They make a conscious choice to not know anything about the domain in which they are solving the problem, and instead treat a dataset as just a set of anonymised data, and attack it with the usual methods.…