Yesterday at the local council library, I came across this book called “Duckworth Lewis” written by Frank Duckworth and Tony Lewis (who “invented” the eponymous rain rule). While I’d never heard about the book, given my general interest in sports analytics I picked it up, and duly finished reading it by this morning. The good…
Ratings revisited
Sometimes I get a bit narcissistic, and check how my book is doing. I log on to the seller portal to see how many copies have been sold. I go to the Amazon page and see what are the other books that people who have bought my book are buying (on the US store it’s…
Newsletter!
So after much deliberation and procrastination, I’ve finally started a newsletter. I call it “the art of data science” and the title should be self-explanatory. It’s pure unbridled opinion (the kind of which usually goes on this blog), except that I only write about one topic there. I intend to have three sections and then…
High dimension and low dimension data science
I’ve observed that there are two broad approaches that people take to getting information out of data. One approach is to simply throw a kitchen sink full of analytical techniques at the data. Without really trying to understand what the data looks like, and what the relationships may be, the analyst simply uses one method…
When a two-by-two ruins a scatterplot
The BBC has some very good analysis of the Brexit vote (how long back was that?), using voting data at the local authority level, and correlating it with factors such as ethnicity and educational attainment. In terms of educational attainment, there is a really nice chart, that shows the proportion of voters who voted to…
Medium stats
So Medium sends me this email: Congratulations! You are among the top 10% of readers and writers on Medium this year. As a small thank you, we’ve put together some highlights from your 2016. Now, I hardly use Medium. I’ve maybe written one post there (a long time ago) and read only a little bit (blogs…
Quantifying life
During a casual conversation on Monday, the wife remarked that given my interests and my profession (where I mostly try to derive insights from data), she was really surprised that I had never tried using data to optimise my own life. This is a problem I’ve had in the past – I can look at clients’…
Restaurants, deliveries and data
Delivery aggregators are moving customer data away from the retailer, who now has less knowledge about his customer. Ever since data collection and analysis became cheap (with cloud-based on-demand web servers and MapReduce), there have been attempts to collect as much data as possible and use it to do better business. I must admit to…
On Uppi2’s top rating
So it appears that my former neighbour Upendra’s new magnum opus Uppi2 is currently the top rated movie on IMDB, with a rating of 9.7/10.0. The Times of India is so surprised that it has done an entire story about it, which I’ve screenshot here: The story also mentions that another Kannada movie RangiTaranga (which…
Using all available information
In “real-life” problems, it is not necessary to use all the given data. My mind goes back eleven years, to the first exam in the Quantitative Methods course at IIMB. The exam contained a monster probability problem. It was so monstrous that only some two or three out of my batch of 180 could solve…
Recommendations and rating systems
This is something that came out of my IIMB class this morning. We were discussing building recommendation systems, using the whisky database (check related blog posts here and here). One of the techniques of recommendation we were discussing was the “market basket analysis“, where you recommend products to people based on combinations of products that…
Rating systems need to be designed carefully
Different people use the same rating scale in different ways. Hence, nuance is required while aggregating ratings taking decisions based on them During the recent Times Lit Fest in Bangalore, I was talking to some acquaintances regarding the recent Uber rape case (where a car driver hired though the Uber app in Delhi allegedly raped…
The Ramayana and the Mahabharata principles
An army of monkeys can’t win you a complex war like the Mahabharata. For that you need a clever charioteer. A business development meeting didn’t go well. The potential client indicated his preference for a different kind of organisation to solve his problem. I was about to say “why would you go for an army of…
Selection bias and recommendation systems
Yesterday I was watching a video on youtube, and at the end of it it recommended another (the “top recommendation” at that point in time). This video floored me – it was a superb rendition of Endaro Mahaanubhaavulu by Mandolin U Shrinivas. Listen and enjoy as you read the rest of the post. I was…
Data Science is a Creative Profession
About a month or so back I had a long telephonic conversation with this guy who runs an offshored analytics/data science company in Bangalore. Like most other companies that are being built in the field of analytics, this follows the software services model – a large team in an offshored location, providing long-term standardised data…
Datapukes and Dashboards
Avinash Kaushik has put out an excellent, if long, blog post on building dashboards. A key point he makes is about the difference between dashboards and what he calls “datapukes” (while the name is quite self-explanatory and graphic, it basically refers to a report with a lot of data and little insight). He goes on…
The most unique single malt
There might have been a time in life when you would’ve had some Single Malt whisky and thought that it “doesn’t taste like any other”. In fact, you might have noticed that some single malt whiskies are more distinct than others. It is possible you might want to go on a quest to find the…
The Signficicance of Statistical Significance
Last year, an aunt was diagnosed with extremely low bone density. She had been complaining of back pain and weakness, and a few tests later, her orthopedic confirmed that bone density was the problem. She was put on a course of medication, and then was given by shots. A year later, she got her bone…
Exponential need not mean explosive
Earlier on this blog I’ve written about the misuse of the term “exponential” when it is used to describe explosive increase in a particular number. My suspicion is that this misuse of the word “exponential” comes from Computer Science and complexity theory – where the hardest problems to crack are those which require time/space that…
Standard deviation is over
I first learnt about the concept of Standard Deviation sometime in 1999, when we were being taught introductory statistics in class 12. It was classified under the topic of “measures of dispersion”, and after having learnt the concepts of “mean deviation from median” (and learning that “mean deviation from mean” is identically zero) and “mean…
Calibration and test sets
When you’re doing any statistical analysis, the standard thing to do is to divide your data into “calibration” and “test” data sets. You build the model on the “calibration” data set, and then test it on the “test” data set. The purpose of this slightly complicated procedure is so that you don’t “overfit” your model.…