how to featured2

So that’s how you do a random sample in Excel

We’ve all been there. The boss shows up and says “I want you to select a random sample of files for the audit”. The boss leaves and you frantically begin searching for your old college textbooks.
Fear these technical challenges no longer. The Oregon Audits blog will be rolling out new posts covering practical and useful audit tools. Random sampling will be our first topic, but if you have any requests please don’t hesitate to contact us.

What the heck is random sampling?

Random sampling is useful to gain an understanding of a population without examining every file. By randomly selecting our sample, bias is also eliminated because every “file” has an equal chance of being selected. One word of caution though, if you are trying to look for outliers you will need a large sample size.

That begs the question: How big of a sample do I need to take? The short answer: 30 is usually good. If it is a simple test and not the critical element of your finding, 30 should cover you almost every time.

The longer answer is it depends. You need to consider your objectives, how confident you want to be about your results, how much margin of error is tolerable, and how big and varied the population is. More confidence requires larger samples. Less margin of error also increases sample size. Populations that are less uniform (have higher standard deviations) require larger samples too. And if this is a critical element of your finding, you need even more.

This is a handy online calculator for calculating the sample size needed to estimate the average of a population. Older textbooks like this one are great office resources (this one is super easy to follow). Better yet, it sells for about $10 making it a steal.

Excel: The easy way to pull a random sample

If the population you are reviewing is not numbered, you will need to create an index number for each file.

Excel has a built in random number generator. By using “RandBetween()” we can generate our sample. Enter “1” or the lowest possible index number for the bottom, and the largest possible index number for the top. In this scenario, I will use 1 and 100.

how to 1

Once you have your function looking like this how to 2you copy the formula into other cells. I am pulling a sample of 12, so I will drag the formula down 12 cells.

how to 3

You will note right away that each time you change something on the sheet, the numbers change. So if you want to lock in a sample, you need to copy the cells with the “randbetween” function and paste them as “values”.

how to 4

I prefer to paste over the cells I just copied.

Here’s the sample I got:how to 5
If you come across a duplicate number, you will need to add another row or replace the duplicate with a new “randbetween” function.

Pitfall: Weak audit trail

One of the drawbacks of Excel is that the audit trail is weak. The documentation you have are numbers in a spreadsheet that you could have easily entered manually. If you are working on a piece of evidence that is critical, you will probably want more documentation on how you arrived at your sample.

Our next post will cover how to document a random sample using technology such as ACL and how to document it the low-tech way using a “random number table”.

 

 

Data Wonk Featured How To