Signal, Noise, and HR Reporting: Less Really Is More, Part 1

Introduction

Everyone wants to be ahead of the curve and in the know. In HR analytics, this typically means more data and more reports. But there’s a big fat problem: the increased ease and frequency of our “analytics insights” dramatically increases the impact of random movements in those reports.

The result? More noise and less signal along with an ironic increase in our confidence in those decisions based on that fresh data. The disquieting reality is that when it comes to HR reporting, less is almost certainly more.

This is a strong claim so today we are going to dig down deep because I didn’t TRULY understand how this worked until I took it apart and put it back together.

What You Will Learn

  • Why more frequent reporting is often a bad thing
  • Why noise increases as we increase observation frequency
  • What the signal-to-noise ratio is and why it matters for HR reporting
  • How simulations can aid our understanding of analytics insights

Preliminary Notes

In Part 1, we use an example from an investment scenario to really dig into the details, but the insights are directly applicable for HR analytics and HR reporting. In Part 2, we’ll apply these insights to a critical HR measurement, turnover. Focus here on principles and don’t be distracted by surface differences in the topic. I always encourage everyone to play along at home but you can skip the coding details and get the main point just from the text and figures.

Signal, Noise and Insights from Nassim Taleb

In his insightful book “Fooled by Randomness”, Nassim Taleb observes that increasing the frequency of one’s observations for a given outcome (say, stock returns) can dramatically decrease the signal-to-noise ratio. The signal-to-noise ratio (SNR) is the the ratio of the mean to the standard deviation.

When we decrease the SNR, we end up getting results that swing around a bit more than we might expect. The result is that we end up mistaking movements in the observed value that arise simply from random variation for meaningful shifts in the underlying system.

Most of us recognize that continually buying and selling stocks based on small weekly or daily shifts in the value is likely to be a losing strategy in the long run; its better to target performance for the long haul. But we somehow fail to apply that same core logic to our activities in human capital analytics.

An Example with Stock Returns

To see how increasing the frequency of our observations can decrease the value of our observations, we’ll start with this example of a stock with a 15% annual return and a 10% volatility (standard deviation or SD; please note a MAJOR thank you to Steven Bagley for his post formally breaking down Taleb’s logic).

Let’s first use a small bit of R code to tell us the probability of observing an annual return greater than 0.

The result shows us that a mean annual return of 15% and a volatility (SD) of 10% will lead to a positive return 93% of the time for any given year.

library(knitr) # for the kable function to make a nice table
# using the distribution function, really the cumulative distribution function 
pnorm(0, mean = 15, sd = 10, lower.tail = F)
## [1] 0.9331928

To develop our intuitions about this result, without diving into the scary sounding “cumulative distribution function”, let’s take a more grounded approach.

Let’s pretend that we have a stock and we look at the value of the stock each year for a million years (just bear with me). The return of this stock each year is a randomly selected value from a normal distribution that has a mean of 15 and a standard deviation of 10.

To see what this would look like, we’ll make a histogram with these million individual annual returns; 93% of the returns (green) are greater than 0.

# using the distribution function, really the cumulative distribution function
set.seed(42)
returns <- rnorm(n = 10^6, mean = 15, sd = 10)
prop.table(table(returns > 0))
## 
##    FALSE     TRUE 
## 0.066924 0.933076
h <- hist(returns, breaks = 1000, plot = F)
cuts <- cut(h$breaks, c(-Inf, 0, Inf))
plot(h, col = c("red", "dark green")[cuts], lty = "blank", main = "Histogram of Annual Returns")

Observations: More is NOT Always Merrier

That’s nice but we’re only human and we can’t resist taking a look at our intermediate results. We are just dying to take a peak more than once a year. Besides, what happens if the stock value has recently declined and I don’t know about it? Shouldn’t I know this? Shouldn’t I at least consider intervening if the number looks bad?

In a moment of weakness, we decide to take just a little tiny peak at our stock value every quarter.

Frequency Impacts the Mean and Standard Deviation

To see what happens when we shift to quarterly observations, we need to first adjust the mean and the standard deviation to reflect the fact that we are looking more often.

Since we are starting with a 15% annual return, we need to break that return up that up into 4 pieces (1 for each quarter). If the average annual return is 15% then, on average, we should expect to return about 3.75% per quarter (15%/4 = 3.75%). That is, if I get an average quarterly return of 3.75% then overall I would expect an average annual return of 15%. In essence, we are scaling the initial, annual mean by the time (\(t\)) ratio of our transformation.

So far, so good. What about the standard deviation? This is where the action is!

Unlike the mean, which is scaled by the time ratio (\(t\)) directly, the standard deviation is scaled by the square root of the time ratio. Thus, we divide our standard deviation of 10 by \(\sqrt t\), not \(t\) directly. Our annual SD of 10 becomes a quarterly SD of 10/\(\sqrt 4\) = 5.

So by going to a quarterly frequency, we divide our return by 4 but our SD by only 2 (\(\sqrt 4\)). This little technical detail turns out to be a big deal.

Why? Remember that the signal-to-noise ratio is the mean divided by the SD. The SNR worsens as we increase the frequency of our observations because our mean value shrinks at a faster rate than the SD.

In this particular case, our the SNR for the annual observation was 15/10 = 1.5 but only 3.75/5 = .75 for our quarterly observations.

What’s the big deal you say? When we look at annual returns, we got positive returns 93% of the time. When we shift to quarterly observations we see positive returns only 77% of the time.

# using the distribution function, really the cumulative distribution function
pnorm(0, mean = 3.75, sd = 5, lower.tail = F)
## [1] 0.7733726

We can this more clearly with a histogram of 4 million random draws from our quarterly observations distribution.

# using the distribution function, really the cumulative distribution function
set.seed(42)
returns <- rnorm(n = 4*10^6, mean = 3.75, sd = 5)
prop.table(table(returns > 0))
## 
##     FALSE      TRUE 
## 0.2263763 0.7736238
h <- hist(returns, breaks = 1000, plot = F)
cuts <- cut(h$breaks, c(-Inf, 0, Inf))
plot(h, col = c("red", "dark green")[cuts], lty = "blank", main = "Histogram of Quarterly Returns")

Note that absolutely nothing has changed about the underlying system generating the values, only how often we choose to take a peak. Yet, when we look more frequently, we’ are more likely to see a negative outcome. We’ll see a negative outcome for almost 1 out of 4 observations.

Let that sink in.

We are running our own simulations where we actually choose the average annual return and the standard deviation. Yet simply looking at the intermediate, quarterly returns instead of the annual returns gives a substantially worse result.

The differene between a 93% positive return per observation v. 77% observation is big enough to make us think there is something different about the quality of our investment, but there isn’t. It is simply a consequence of increasing our reporting frequency.

This Is Not Sleight of Hand

You might be asking yourself “But why is the standard deviation proportional to the square root of time and not time directly?”

This is a good question and one that has been asked (and answered) elsewhere in (one such post that I found very helpful)[http://www.macroption.com/why-is-volatility-proportional-to-square-root-of-time/].

The short answer is that the variance IS proportional to time directly and the standard deviation is the square root of the variance.

In our example, we have a standard deviation of 10. Squaring this to get the variance gives us 100. When we go to quarterly observations, we divide both our mean and our variance by 4.

Accordingly, our expected mean quarterly return of is 15/4 = 3.75 and our expected mean quarterly variance is 100/4 = 25.

So far so good, right?

Ok, now remember that we use the standard deviation to calculate the signal-to-noise ratio; the standard deviation is the square root of the variance.

The end result is that dividing our variance by time is means that our standard deviation must be divided by the square root of time (in this case, \(\sqrt 4 = 2\)). Therefore our annual SD of 10 becomes a quarterly observation with a SD of 5.

Simulation Is Believing

If you are like me, you might sort of, kinda believe this result and the accompanying explanation but you don’t necessarily FEEL it.

To really understand it and see that this actually works out, it helps to run a few more simulations.

The goal of these simulations is to work backwards from the quarterly values up to the annual values and show how they are connected (and that my explanation involving \(\sqrt t\) is actually right).

As an aside, I am a big believer in using simulations to develop a deeper understanding of statistics and analytics. If you can recreate the critical values yourself through selections from the relevant distributions, you’ll understand your system better.

Simulation Steps

  1. Randomly draw a bunch of numbers from a distribution with a mean of 3.75 and a standard deviation of 2. These are the values for the mean and the standard deviation that we used when moving to quarterly observations. They are also the ones that gave us the more noisy results.
  2. Group these draws into sets of 4 and sum them for each set of 4. Summing sets of 4 quarterly values is like looking at the total return for the year instead of peaking at every quarter.
  3. Once we sum each set of 4, calculate the mean and standard deviation of those summed quarterly results to get annual results.

If our calculations check out, then the mean of those sets of 4 should be 15 and the standard deviation should be 10.

By creating a simulated set of quarterly returns but then only looking at them in groups of 4 to mimic annual observations, we are deriving the less noisy “annual” results from the more noisy “quarterly” simulations.

First, we’ll do our draws for the quarterly results and confirm that we get the noisier numbers we expect.

# Draws from a distribution with a mean o f 3.75 and a standard deviation of 2
set.seed(42)
randomdeviates4_big <- rnorm(10000000,15/4,10/sqrt(4))

# Confirming that we get what we expect
# Calculate the mean quarterly return

mean(randomdeviates4_big)
## [1] 3.752376
# Calculate the SD of quarterly returns

sd(randomdeviates4_big)
## [1] 5.001103
# Calculate the SNR of our quarterly returns
mean(randomdeviates4_big)/sd(randomdeviates4_big)
## [1] 0.7503098
# Calculate proportion of positive quarterly returns
prop.table(table(randomdeviates4_big > 0))
## 
##     FALSE      TRUE 
## 0.2265676 0.7734324

Having confirmed that our quarterly return distributions agree with our earlier results, we’ll now annualize these data by looking at them in groups of 4.

Taking quarterly results but only looking at them in groups of 4 is like waiting until the end of the year to get the overall result.

# Creating a data frame with a column for creating groups of 4
# This gives each set of 4 observations the same number
temp <- data.frame(val = randomdeviates4_big, group = rep(1:2500000, each = 4))


# getting the sum of the return for each group of 4
# Think of each group of 4 as equivalent to a year
temp_agg <- aggregate(val ~ group, data = temp, sum)

# Calcuate the mean annual return (based on group of 4)
mean(temp_agg$val)
## [1] 15.00951
# Calculate the SD of annual returns
sd(temp_agg$val)
## [1] 10.00322
# Calculate SNR of annual returns
mean(temp_agg$val)/sd(temp_agg$val)
## [1] 1.500468
# Calculate the proportion of positive annual returns
prop.table(table(temp_agg$val > 0))
## 
##     FALSE      TRUE 
## 0.0669504 0.9330496

What We Did and Why It Matters

The point here is that creating quarterly results using our time-adjusted values with all of that square root business above but then looking at the overall results on an aggregated, annual basis gave us a mean return of 15% and a standard deviation of 10% (along with a 93% chance of observing a positive result for a given year).

This shows us that the increase in noise coming from the increased the frequency of observations (here, from annual to quarterly) is not some dodgy sleight of hand. If we take more noisy quarterly data but look at the overall results on an annual basis, we get get a better, higher-signal picture of how the system is working as a whole.

The same system gives a different picture depending on how frequently you look at the outcomes…and looking less frequently reduces the chances of overinterpreting a random, negative outcome.

More Reporting Gets Ugly in a Hurry

The following table shows us these differences in a snapshot. The value of our observations decline rapidly as the frequency of observation increases.

Note especially how poorly our monthly observations fare…the very same frequency of descriptive reporting that dominates the HR sphere.

library(knitr)
  mean <- 15                            # 15% return
  sd <- 10                              # 10% error rate per annum
  time <- c(1, 4, 12, 365, 365*24, 365*24*60, 365*24*60*60)
  label <- c("year", "quarter", "month", "day", "hour", "minute", "second")
  ### what fraction of distribution is > 0?
  fbr <- data.frame(interval = time,
             prob = round(pnorm(0, mean=mean/time, sd=sd/sqrt(time), lower.tail=FALSE), 2), 
             SNR =round((mean/time)/(sd/sqrt(time)), 2), row.names = label)

kable(fbr)
interval prob SNR
year 1 0.93 1.50
quarter 4 0.77 0.75
month 12 0.67 0.43
day 365 0.53 0.08
hour 8760 0.51 0.02
minute 525600 0.50 0.00
second 31536000 0.50 0.00

Summary and Preview

Simply changing the frequency of observations and reporting can dramatically impact our signal-to-noise ratio. In this investment example, just moving from annual to quarterly results dramatically lowered the likelihood of seeing a positive return.

This happened even though the underlying system was exactly the same.

If you remember one thing from this post, it’s this: More frequent reporting means more noise.

In our follow-up post, we’ll apply these lessons to a few HR metrics and show you how much of our HR Analytics reporting efforts may be a waste of time at best and, at worst, counter-productive.

Like this post?

Get our FREE Turnover Mini Course!

You’ll get 5 insight-rich daily lessons delivered right to your inbox.

In this series you’ll discover:

  • How to calculate this critical HR metric
  • How turnover can actually be a GOOD thing for your organization
  • How to develop your own LEADING INDICATORS
  • Other insightful workforce metrics to use today

There’s a bunch more too. All free. All digestible. Right to your inbox.

Yes! Sign Me Up!



Comments or Questions?

Add your comments OR just send me an email: john@hranalytics101.com

I would be happy to answer them!

Leave a Reply

Your email address will not be published.