Auditing How To: Document Sample Selections in ACL

Hello again, fellow data wonks and wonk wannabes!

Last time, we discussed random sampling in Excel and what factors you should consider when determining your sample size. (Hint: 30 is generally large enough, but not in all cases)
One of the downfalls of Excel is the lack of an audit trail. In these examples, we will provide a high-tech and low-tech way to document your sample selection process in detail. First up, ACL.

The High-Tech Method

I am working with fictional data below. As you can see, our population contains 36 counties. Make note of your population size when working in ACL as this will be important later on. You can count a table by using the shortcut “CRTL + 3”.
ACL sampling pic 1

Next you select the “Sampling” menu and click on “Sample Records”. This also has a shortcut, which is “CTRL + 9”.

ACL sampling pic 2

Change “Sample Type” from “MUS” to “Record”. Then click on “random” on the middle left of the interface. Enter in the “size” of the sample. I pulled a sample of 10. The “Seed” allows you to document and repeat a random sample. Any number will do – just pick the first one that comes to mind.

I know what you’re thinking. However, just because something is repeatable does not change the fact that it is random.

Enter in the “population” we recorded earlier, then define the table name you want the sample sent to.

ACL sampling pic 3

There you have it; a random sample of 10 counties in Oregon, with a full log file and repeatable methodology in case you ever get questioned about how you pulled your sample.

The Low-Tech Method

If you are still hung up on what a seed has to do with random sampling, the low tech way will make it clear to you. Below we have a copy of a random number table. You can find these in the appendix of most statistics textbooks or via Google.

ACL sampling pic 5

The “seed” tells you where to start on the table. If I have a seed of 1, we would start at the 1st number, which also happens to be a 1. A seed of “3” start at the 3rd number in which in this case is 4. This is what makes it repeatable. Our population was 36, so to pull a sample we will be looking at sequences of 2-digit numbers. I will use a seed of “3” and pull just three samples.

In the random number table to the right, I’ve crossed out the first two numbers since our seed was “3”. ACL sampling pic 6Starting with the 3rd number, I looked at each 2 digit sequence. If the number fell between 01 and 36, it was a valid random sample and highlighted in green. If the number was above 36, I moved to the next sequence. Also, if repeats are not allowed in your sample you would move to the next number as well (e.g. 11 would be my next sample, but it was already pulled so I would skip over the repeat). Keep moving right and down until you have pulled the full sample.

In this case, my sample was 01, 11, and 20 or Baker, Gilliam, and Lane (shown below). Functionally, this manual low-tech process is identical to what ACL does.

ACL sampling pic 7

You can apply the Random Number table approach to extremely large files. If you had 1,000,000 records you would look at 7-digit sequences rather than 2-digit shown above.

And there we have it! Two useful methods for documenting sample selection.

If you are stuck on a project in ACL, Excel, or ArcGIS please submit your topic suggestions for a future blog post.

Auditors at Work Data Wonk Featured How To New Audit Release

So that’s how you do a random sample in Excel

We’ve all been there. The boss shows up and says “I want you to select a random sample of files for the audit”. The boss leaves and you frantically begin searching for your old college textbooks.
Fear these technical challenges no longer. The Oregon Audits blog will be rolling out new posts covering practical and useful audit tools. Random sampling will be our first topic, but if you have any requests please don’t hesitate to contact us.

What the heck is random sampling?

Random sampling is useful to gain an understanding of a population without examining every file. By randomly selecting our sample, bias is also eliminated because every “file” has an equal chance of being selected. One word of caution though, if you are trying to look for outliers you will need a large sample size.

That begs the question: How big of a sample do I need to take? The short answer: 30 is usually good. If it is a simple test and not the critical element of your finding, 30 should cover you almost every time.

The longer answer is it depends. You need to consider your objectives, how confident you want to be about your results, how much margin of error is tolerable, and how big and varied the population is. More confidence requires larger samples. Less margin of error also increases sample size. Populations that are less uniform (have higher standard deviations) require larger samples too. And if this is a critical element of your finding, you need even more.

This is a handy online calculator for calculating the sample size needed to estimate the average of a population. Older textbooks like this one are great office resources (this one is super easy to follow). Better yet, it sells for about $10 making it a steal.

Excel: The easy way to pull a random sample

If the population you are reviewing is not numbered, you will need to create an index number for each file.

Excel has a built in random number generator. By using “RandBetween()” we can generate our sample. Enter “1” or the lowest possible index number for the bottom, and the largest possible index number for the top. In this scenario, I will use 1 and 100.

how to 1

Once you have your function looking like this how to 2you copy the formula into other cells. I am pulling a sample of 12, so I will drag the formula down 12 cells.

how to 3

You will note right away that each time you change something on the sheet, the numbers change. So if you want to lock in a sample, you need to copy the cells with the “randbetween” function and paste them as “values”.

how to 4

I prefer to paste over the cells I just copied.

Here’s the sample I got:how to 5
If you come across a duplicate number, you will need to add another row or replace the duplicate with a new “randbetween” function.

Pitfall: Weak audit trail

One of the drawbacks of Excel is that the audit trail is weak. The documentation you have are numbers in a spreadsheet that you could have easily entered manually. If you are working on a piece of evidence that is critical, you will probably want more documentation on how you arrived at your sample.

Our next post will cover how to document a random sample using technology such as ACL and how to document it the low-tech way using a “random number table”.

 

 

Data Wonk Featured How To

Auditing and GIS: Visuals Help Engage Readers

In 2013 the Oregon Audits Division began expanding the use of Geographic Information System (GIS) software to improve the quality of our reports.  If you have not heard of GIS before, it is a tool used to create maps.

In our modern world, policy makers face a steady barrage of reports from multiple sources.  No one has enough time to read every page that passes by his or her desk.  The reality is that legislators and their staff skim reports.  As program evaluators, we should be aware of this fact. We must strive to capture the attention of our audience.  The use of striking visuals is a great method to do so.  A wonderful visual can engage the reader, signaling them to read the accompanying text of the report.

One important caveat: the quality of the text should match the quality of the visual.  You can have the best graphic in the world, but if the text is dense and lifeless, you will lose the reader.

Maps are a great example of a striking visual.  Everyone can easily relate to and interpret most maps.  Not only are maps easy for the public to understand, but legislators often focus on issues in the districts they serve.  By creating maps, you can effectively engage policy makers with the information they want.

unemployment

Unemployment Rates, 2007-2009

Maps can tell stories

The adage goes “a picture is worth a thousand words”.  I am here to tell you that maps can also tell a persuasive story.  To the left is an example from our audit of Oregon’s Temporary Assistance for Needy Families’ program.

The program is very sensitive to economic downturns.  When millions of people lost their jobs during the great recession, they sought out public assistance in record numbers.  That should not come as a surprise since helping people during tough times is the raison d’être of these safety net programs.

By highlighting the unemployment rates, we reminded our readers of the economic conditions of the time.  This important context was essential for our readers to consider when evaluating our findings.

Working with the press

The press loves including maps and graphics with their stories. By sharing your work with reporters, you will get more attention from the press and better articles than you would otherwise. Our office recently sent the maps and other graphics we worked on to a reporter who covered our audit. The reporter loved the free materials and the audit report ended up getting a second front page article that prominently featured four of our graphics (see image below).

Remember to check in with your leadership team before working with the press. The last thing you want is that to be a surprise.

Here are some other things to consider:u.s.

(1) Deadlines matter – hand off your materials sooner rather than later.  Do your research and make sure you go directly to people who cover the topic;

(2) Give advance notice when possible. Reporters need time to brief editors, call sources, coordinate with their graphics departments, and then write and fact check their stories. Do not expect that releasing information at 4:55pm will result in an article in the morning’s paper.

(3) Make yourself available. Provide your contact number or arrange an interview. Make sure you coordinate with communication staff and leadership so that the right people are involved.

(4) Learn the lingo. When you tell a reporter that information is “on background” the reporter will not attribute the information to you. This is the origin of all the stories with lines like “Senior officials said…”. When you provide information “off the record”, the reporter can’t use the information unless they can confirm it with another source.

(5) Establish clear expectations about when the material can be published.  Providing an advanced copy of your report with an “embargo” can help ensure the information is published on the right day at the right time. Reporters respect embargos, but there are situations where other sources may leak the same material. In those circumstances, a reporter may publish their article early without your approval. This does not violate the embargo because the source of the information is someone else; and

(6) Build a working relationship with reporters between projects. It takes time to develop the trust needed for a successful relationship. Remember, it is a two-way street.

Maps can effectively highlight issues

One of the best benefits of maps is the ability to spot problems that would not be found without a geographic perspective.

Below is a classic map.  It is 1854 and London is facing a daunting epidemic of cholera.  At the time, no one knew how the disease spread.  Day by day, more Londoners succumbed to the disease. John Snow, a local physician, set out to identify the cause of the outbreak. John believed that once the cause was identified the city would be able to develop a plan of action to stop the outbreak.

John examined the relationship between the location of individuals who had died and the location of their water source.  It was clear from his analysis that the Broad Street Pump in the center of the map was the source of the outbreak. John convinced local officials to disable the pump based on this information and helped quell the epidemic.

LGAQ_winter14_FINAL_201501050943150864

London Cholera Epidemic, 1854

Fraud Detection

Recently GIS has found applications in fraud detection.  GIS supplements other fraud tools by looking at fraud from a spatial point of view.

Consider for example the Medicaid program.  The program has both clients and providers.  It may be suspicious if you find a provider who is serving multiple clients that live hours away. Examining the average diving distance per provider might uncover some fraud that would otherwise be missed.  This concept can extend to any program where you can analyze some geographic component.

lousianaWe can see a real world example of this type of fraud detection in the map to the left.  This analysis was conducted by the Louisiana Legislative Auditor’s office in 2013. They examined the distance clients traveled to use SNAP benefits, a.k.a. food stamps, and found some suspicious patterns.

Some stores had individuals traveling five hours across the state to purchase food. These patterns make a case that fraud is being committed. GIS not only makes your findings stronger, but it will also help law enforcement prosecute the offenders. The best-case scenario is when you provide enough evidence to law enforcement that the individuals plead out, thereby avoiding the cost of a trial.

Another example of fraud detection could include looking at ambulance billings relative to distance traveled.  If the company is billing for more time or mileage a GIS analysis could quickly detect this.

Other Audit Applications

San Diego has conducted some interesting audits of their motor pool and fuel purchases that utilized GIS. This 2011 audit looked at the number of take home vehicles from the motor pool relative to their duty location.  The audit found a number of employees had vehicles they were allowed to take home that resulted in extra maintenance and expense for the motor pool. In some cases, employees were driving over 50 miles per day in a government vehicle.  By reducing the number of take home vehicles the city was poised to save half a million dollars per year.

The audit also looked at fuel purchases.  Each time a car fills up at a retail gas station the city pays full price rather than the wholesale price for city operated facilities.  A number of individuals were filling up at retail locations even though city owned facilities were nearby.

 Exploring Demography

There is a world of free demographic information out there.  The U.S. Census Bureau as part of the decennial census and the American Community Survey collects most of the information.

Analyzing available demographics can add depth to your work.  Suppose you are tasked with assessing economic development projects.  These racial dot maps from the Cooper Center take basic census data and create engaging visuals that highlight the divided nature of many of our communities.  By overlaying economic development projects, you could test if opportunities are being enjoyed by everyone. This same principle holds for looking at educational attainment or income.  When evaluating a human services program it is very easy to pull up poverty demographics, and compare and contrast office locations relative to where clientele live.  Alternatively, it is also possible to check to see what populations are being reached and if certain areas are being underserved.

dotmap

Dot maps emphasize density and grouping

Free tools

The U.S. Census Bureau is an excellent source to find data to map.  Recently, the Census has also put together an easy to use Census Data Mapper. Beyond data and interactive websites, there are a number of free analytic tools out there. A quick internet search for “open source GIS” will yield multiple products to try.  QGIS and GRASS GIS are two great options. Google also has some easy to implement interactive web maps if you feel comfortable working with basic HTML.

Pay the price

For some, the options mentioned above will meet your needs.  For others, you may have to open your checkbook to get the right tool.

The simplest to use, and one of the least expensive options, is SocialExplorer. It takes only a few seconds to create beautiful maps using the intuitive interface.  SocialExplorer even has a free version with limited functionality. One of the drawbacks is that you cannot import your own data and are limited to mostly U.S. Census surveys. It also lacks some customizable features found in other programs. If you want to start somewhere, SocialExplorer is a great place to start.

ArcGIS and Tableau are two power players in the data visualization market.  The features and analytical capabilities of their software are quite amazing.  The biggest drawback though is the cost of the licenses and the amount of staff time you need to invest to learn the software.

A more user friendly alternative is ArcGIS Online. This cloud based software is an add-in for Excel 2010 and 2013. ArcGIS Online is easier to use and much cheaper to purchase than the full desktop version, but you also lose some of the more advanced features. For example, some of the fraud detection tools such as calculating driving distance have not been implemented in the online version.

In either case, the benefits of using GIS can far outweigh the costs when you leverage the tool properly.

Good luck on your journey with GIS!

ian

Ian Green, M.Econ, CGAP Senior Auditor with Oregon Secretary of State Audits Division

Data Wonk Featured