As the old song went, “when the giver gives, he tears the roof and gives”. Last week the Government of Karnataka released its report on the covid-19 serosurvey done in the state. You might recall that it had concluded that the number of cases had been undercounted by a factor of 40, but then some…
Communicating binary forecasts
One silver lining in the madness of the US Presidential election counting is that there are some interesting analyses floating around regarding polling and surveying and probabilities and visualisation. Take this post from Andrew Gelman’s blog, for example: Suppose our forecast in a certain state is that candidate X will win 0.52 of the two-party…
How do bored investors invest?
Earlier this year, the inimitable Matt Levine (currently on paternity leave) came up with the “boredom markets hypothesis” ($, Bloomberg). If you like eating at restaurants or bowling or going to movies or going out dancing, now you can’t. If you like watching sports, there are no sports. If you like casinos, they are closed.…
Covid-19 Prevalence in Karnataka
Finally, many months after other Indian states had conducted a similar exercise, Karnataka released the results of its first “covid-19 sero survey” earlier this week. The headline number being put out is that about 27% of the state has already suffered from the infection, and has antibodies to show for it. From the press release:…
Election Counting Day
At the outset I must say that I’m deeply disappointed (based on the sources I’ve seen, mostly based on googling) with the reporting around the US presidential elections. For example, if I google, I get something like “Biden leads Trump 225-213”. At the outset, that seems like useful information. However, the “massive discretisation” of the…
Opinion polling in India and the US
(Relative) old-time readers of this blog might recall that in 2013-14 I wrote a column called “Election Metrics” for Mint, where I used data to analyse elections and everything else related to that. This being the election where Narendra Modi suddenly emerged as a spectacular winner, the hype was high. And I think a lot…
Covid-19 superspreaders in Karnataka
Through a combination of luck and competence, my home state of Karnataka has handled the Covid-19 crisis rather well. While the total number of cases detected in the state edged past 2000 recently, the number of locally transmitted cases detected each day has hovered in the 20-25 range. It might make news that Karnataka has…
Placing data labels in bar graphs
If you think you’re a data visualisation junkie, it’s likely that you’ve read Edward Tufte’s Visual Display Of Quantitative Information. If you are only a casual observer of the topic, you are likely to have come across these gifs that show you how to clean up a bar graph and a data table. And if…
Tests per positive case
I seem to be becoming a sort of “testing expert”, though the so-called “testing mafia” (ok I only called them that) may disagree. Nothing external happened since the last time I wrote about this topic, but here is more “expertise” from my end. As some of you might be aware, I’ve now created a script…
Simulating Covid-19 Scenarios
I must warn that this is a super long post. Also I wonder if I should put this on medium in order to get more footage. Most models of disease spread use what is known as a “SIR” framework. This Numberphile video gives a good primer into this framework. The problem with the framework is…
Statistical analysis revisited – machine learning edition
Over ten years ago, I wrote this blog post that I had termed as a “lazy post” – it was an email that I’d written to a mailing list, which I’d then copied onto the blog. It was triggered by someone on the group making an off-hand comment of “doing regression analysis”, and I had…
Big Data and Fast Frugal Trees
In his excellent podcast episode with EconTalk’s Russ Roberts, psychologist Gerd Gigerenzer introduces the concept of “fast and frugal trees“. When someone needs to make decisions quickly, Gigerenzer says, they don’t take into account a large number of factors, but instead rely on a small set of thumb rules. The podcast itself is based on…
Yet another “big data whisky”
A long time back I had used a primitive version of my Single Malt recommendation app to determine that I’d like Ardbeg. Presently, the wife was travelling to India from abroad, and she got me a bottle. We loved it. And so I had screenshots from my app stored on my phone all the time,…
Segmentation and machine learning
For best results, use machine learning to do customer segmentation, but then get humans with domain knowledge to validate the segments There are two common ways in which people do customer segmentation. The “traditional” method is to manually define the axes through which the customers will get segmented, and then simply look through the data…
Taking Intelligence For Granted
There was a point in time when the use of artificial intelligence or machine learning or any other kind of intelligence in a product was a source of competitive advantage and differentiation. Nowadays, however, many people have got so spoiled by the use of intelligence in many products they use that it has become more…
More on statistics and machine learning
I’m thinking of a client problem right now, and I thought that something that we need to predict can be modelled as a function of a few other things that we will know. Initially I was thinking about it from the machine learning perspective, and my thought process went “this can be modelled as a…
Data, football and astrology
Jonathan Wilson has an amusing article on data and football, and how many data-oriented managers in football have also been incredibly superstitious. This is in response to BT Sport’s (one of the UK broadcasters of the Premier League) announcement of it’s “Unscripted” promotion where “some of the world’s foremost experts in both sports and artificial intelligence…
Periodicals and Dashboards
The purpose of a dashboard is to give you a live view of what is happening with the system. Take for example the instrument it is named after – the car dashboard. It tells you at the moment what the speed of the car is, along with other indicators such as which lights are on,…
Telling stories with data
I’m about 20% through with The Verdict by Prannoy Roy and Dorab Sopariwala. It’s a fascinating book, except for one annoyance – it is full of tables that serve no purpose but to break the flow of text. I must mention that I’m reading the book on the Kindle, which means that the tables can pose…
The problem with spider charts
On FiveThirtyEight, Nate Silver has a piece looking ahead to the Democratic primaries ahead of the presidential elections in the US next year. I don’t know enough about US politics to comment on the piece itself, but what caught my eye is the spider chart describing the various Democratic nominees. This is a standard spider…
Football Elo Application
This morning, I discovered the Club Elo Ratings, and promptly proceeded to analyse Liverpool FC’s performance over the years based on these ratings, and then correlated the performance by manager. Then, playing around with the data of different clubs, I realised that there are plenty more stories to be told using this data, and they…