A few people who I’ve spoken to as part of my job hunt have asked to see some “detailed descriptions” of work that I’ve done. The other day, I put together an email with some of these descriptions. I thought it might make sense to “document” it in one place (and for me, the “obvious…
Scrabble
I’ve forgotten which stage of lockdown or “unlock” e-commerce for “non-essential goods” reopened, but among the first things we ordered was a Scrabble board. It was an impulse decision. We were on Amazon ordering puzzles for the daughter, and she had just about started putting together “sounds” to make words, so we thought “scrabble tiles…
This year on Spotify
I’m rather disappointed with my end-of-year Spotify report this year. I mean, I know it’s automated analytics, and no human has really verified it, etc. but there are some basics that the algorithm failed to cover. The first few slides of my “annual report” told me that my listening changed by seasons. That in January…
EPL: Mid-Season Review
Going into the November international break, Liverpool are eight points ahead at the top of the Premier League. Defending champions Manchester City have slipped to fourth place following their loss to Liverpool. The question most commentators are asking is if Liverpool can hold on to this lead. We are two-thirds of the way through the…
Periodicals and Dashboards
The purpose of a dashboard is to give you a live view of what is happening with the system. Take for example the instrument it is named after – the car dashboard. It tells you at the moment what the speed of the car is, along with other indicators such as which lights are on,…
Vlogging!
The first seed was sown in my head by Harish “the Psycho” J, who told me a few months back that nobody reads blogs any more, and I should start making “analytics videos” to increase my reach and hopefully hit a new kind of audience with my work. While the idea was great, I wasn’t…
Analytics for general managers
While good managers have always been required to be analytical, the level of analytical ability being asked of managers has been going up over the years, with the increase in availability of data. Now, this post is once again based on that one single and familiar data point – my wife. In fact, if you…
The missing middle in data science
Over a year back, when I had just moved to London and was job-hunting, I was getting frustrated by the fact that potential employers didn’t recognise my combination of skills of wrangling data and analysing businesses. A few saw me purely as a business guy, and most saw me purely as a data guy, trying…
Why data scientists should be comfortable with MS Excel
Most people who call themselves “data scientists” aren’t usually fond of MS Excel. It is slow and clunky, can only handle a million rows of data (and nearly crash your computer if you go anywhere close to that), and despite the best efforts of Visual Basic, is not very easy to program for doing repeatable…
Stocks and flows
One common mistake even a lot of experienced analysts make is comparing stocks to flows. Recently, for example, Apple’s trillion dollar valuation was compared to countries’ GDP. A few years back, an article compared the quantum of bad loans in Indian banks to the country’s GDP. Following an IPL auction a few years back, a…
Stirring the pile efficiently
Warning: This is a technical post, and involves some code, etc. As I’ve ranted a fair bit on this blog over the last year, a lot of “machine learning” in the industry can be described as “stirring the pile”. Regular readers of this blog will be familiar with this image from XKCD by now: Basically…
Duckworth Lewis Book
Yesterday at the local council library, I came across this book called “Duckworth Lewis” written by Frank Duckworth and Tony Lewis (who “invented” the eponymous rain rule). While I’d never heard about the book, given my general interest in sports analytics I picked it up, and duly finished reading it by this morning. The good…
The (missing) Desk Quants of Main Street
A long time ago, I’d written about my experience as a Quant at an investment bank, and about how banks like mine were sitting on a pile of risk that could blow up any time soon. There were two problems as I had documented then. Firstly, most quants I interacted with seemed to be solving…
Medium stats
So Medium sends me this email: Congratulations! You are among the top 10% of readers and writers on Medium this year. As a small thank you, we’ve put together some highlights from your 2016. Now, I hardly use Medium. I’ve maybe written one post there (a long time ago) and read only a little bit (blogs…
Restaurants, deliveries and data
Delivery aggregators are moving customer data away from the retailer, who now has less knowledge about his customer. Ever since data collection and analysis became cheap (with cloud-based on-demand web servers and MapReduce), there have been attempts to collect as much data as possible and use it to do better business. I must admit to…
On Uppi2’s top rating
So it appears that my former neighbour Upendra’s new magnum opus Uppi2 is currently the top rated movie on IMDB, with a rating of 9.7/10.0. The Times of India is so surprised that it has done an entire story about it, which I’ve screenshot here: The story also mentions that another Kannada movie RangiTaranga (which…
The Ramayana and the Mahabharata principles
An army of monkeys can’t win you a complex war like the Mahabharata. For that you need a clever charioteer. A business development meeting didn’t go well. The potential client indicated his preference for a different kind of organisation to solve his problem. I was about to say “why would you go for an army of…
Datapukes and Dashboards
Avinash Kaushik has put out an excellent, if long, blog post on building dashboards. A key point he makes is about the difference between dashboards and what he calls “datapukes” (while the name is quite self-explanatory and graphic, it basically refers to a report with a lot of data and little insight). He goes on…
Calibration and test sets
When you’re doing any statistical analysis, the standard thing to do is to divide your data into “calibration” and “test” data sets. You build the model on the “calibration” data set, and then test it on the “test” data set. The purpose of this slightly complicated procedure is so that you don’t “overfit” your model.…
Analytics and complexity
I recently learnt that a number of people think that the more the number of variables you use in your model, the better your model is! What has surprised me is that I’ve met a lot of people who think so, and recommendations for simple models haven’t been taken too kindly. The conversation usually goes…
Black Box Models
A few years ago, Felix Salmon wrote this article in Wired called “The Formula That Killed Wall Street“. It was about a formula called “Gaussian Copula”, which was a formula for estimating the joint probability of a set of events happening, if you knew the individual probabilities. It was a mathematical breakthrough. Unfortunately, it fell…