Products


Bayesian treatment of product metrics? 2

In a recent conversation with a friend, he complained to me that the product they operated has a weird behavior. On one hand, the daily active users of the product has declined over time but on the other hand, the metric they have used to measure user engagements has been increased dramatically. His thinking was that, the metric is wrong and they should find something else to be able to capture the dynamic of the user base more accurately.

While the situation described above could a bug in the system, it is actually quite possible that it may happen in many occasions. Let’s use Click-Through-Rate (CTR) as an example. For product, the CTR is defined as \( \theta_{1} = P( c = 1 | v =1) \) where \( c \) is a binary random variable to indicate whether a click happens on the product and \( v \) is a binary random variable to indicate whether the product is being viewed or not. The Maximum-Likelihood-Estimation (MLE) of the CTR is:

\begin{equation}
\theta_{1} = \frac{N_{c}}{V_{c}}
\end{equation}where \( N_{c} \) is the number of visitors who clicked on the product and \( V_{c} \) is the total number of visitors. From this estimation, we can easily see that:

  1. CTR is a ratio where the denominator is the total number of visitors.
  2. It is possible that, \( V_{c} \) is dropping while the whole ratio is increasing.

In fact, I would argue that, any metric that has the total number of visitors (users) in the denominator would possibly has this issue. One potential reason is that, like the CTR example mentioned above, the particular way the product is optimized might drive away some users, reducing the number of \( V_{c} \) but still, make some heavily engaged users more engaged, driving relative more \( N_{c} \). Thus, even though \( V_{c} \) and \( N_{c} \) may decrease altogether but the ratio may increase significantly.

OK, if that’s the case, what we can do after all? One alternative is to measure \( \theta_{2} = P( c=1 | \Omega ) = \int P( c  | v  ) P( v  | \Omega )\, dv \) where \( \Omega \) represents the whole population. The idea is to measure how likely a random user in the whole universe would click on your product rather than a visitor who has been on your product already. This is, of course, much more harder to accurately compute.

One way to compute \( P(v = 1 | \Omega) \) is to gather some data from Internet or mobile devices to have the total number of visitors per month or per year and therefore, you can see how popular your product is.