Methods (to our Madness): Disaggregating Data to Improve Evaluation and Transparency of Economic Development Programs

 Periodically, we will highlight some of the methods used in a recently released audit. Every performance audit is unique and can require creative thinking and methodologies to answer our audit objective. Some of these methods could be replicated or present valuable lessons for future projects.

 

Disaggregating data can improve transparency and evaluation of programs. In a recent audit of Business Oregon, the audit team divided data about business finance and forgivable loans by different programs to estimate job growth and return on investment, as well as rural investments for each program.

Business Oregon reports jobs created for all of its programs combined, in one Key Performance Measure. The audit team saw their more detailed analysis as an approach that Business Oregon’s analysts could emulate and improve on, in order to foster greater transparency and improve understanding of the investments and outcomes of individual programs.

I recently sat down with Jon Bennett, a Performance Auditor, to learn about their analysis strategy and any lessons learned.

Business Oregon’s programs contribute to job growth

Jon’s team analyzed different business finance programs at Business Oregon and calculated net job growth for participating businesses over a four year period. They found that about two-thirds of businesses had net job growth. They also found that most of the awards went to businesses that paid wages below the county average, important since Business Oregon has a mission to create living wage jobs.

They also looked at investment by geography and found that most awards go to non-rural areas. This is important because rural areas contain 40% of Oregon’s population and were struggling to get out of the recession.

While it is valuable to look at the big picture, separating data into different programs and different measures can provide greater insights into the effectiveness of each program and how each program’s investments reflect the agency’s priorities.

Lessons Learned: Time management and planning ahead

Jon had a few different lessons learned, but the big take away is one I’ve experienced before – time management. Doing data analysis always seems to take longer than you expect it to. One of the time consuming aspects Jon faced was combining data from two different data sources. He thought it would be simple because he had a unique identifier, but it turns out that some of the businesses he was looking at had multiple locations and he had to look at the data more carefully.

Next time, Jon would also like to do a better job of planning how to document his work before he does it. As auditors, we always have our work checked for accuracy, which can be challenging if there is not a clear documentation trail. That is one of the benefits of using ACL, since it automatically creates a log. But sometimes other tools can be more useful. Jon interestingly switched between Excel, ACL, and STATA to use the tool that could do the task most efficiently in the way that he knew best.

 

CZ_photo

Caroline Zavitkovski, OAD Principal Performance Auditor, MPA

 

Auditing and Methodology Featured Performance Audit

How to: Use Benford’s law in ACL

Last time, we spoke about a fraud detection technique known as Benford’s Law.  In summary, it can easily detect outliers in certain sets of data by looking at the first digit of a number.  Those outliers can then be examined to see if they are fraudulent.  For many financial transactions, approximately 30% of the numbers should start with “1” and approximately 5% should start with “9”.  In essence, Benford’s law compares what your data exhibits against what is expected.  For more details, read the link above.

For the following, I’ll show how to do the first digit test and then I’ll show how to do the first two digit test. The two digit test is useful to drill down on the data to show patterns that might not exist in the first digit test. For example, the two digit test is very effective at picking up multiples of ten (10, 20, 30, etc.).

Steps

  • Load data in ACL
  • Analyze Tab ⇒ Benfords
  • Select field to analyze
  • Select number of digits
  • Graph

The following contains real data that was used to help identify and prosecute fraudsters in Oregon.


 

Screenshot of step 2

 benford1

Screenshot of steps 3 & 4

benford2

Screenshot of Step 5

Note the big difference between observed and expected values of the first digit “1”.

benford3

Conducting the first two digit test

After looking at the first digit, consider also looking at the first two digits, even if nothing showed up the first time. Just run Benford again and change to “2” leading digits.

 benford4

Look at all those spikes in the data presented below at multiples of $10. In this instance, many of those were fraudulent.

benford5

And if you look further you find a very suspicious pattern of dozens of $100 transactions in the first 3 digit test.

benford6

Read more about the outcome of the fraud investigation here, and check back next month for more on the how to’s of data analysis in the world of auditing!

 

Data Wonk Featured How To

Auditing How To: Document Sample Selections in ACL

Hello again, fellow data wonks and wonk wannabes!

Last time, we discussed random sampling in Excel and what factors you should consider when determining your sample size. (Hint: 30 is generally large enough, but not in all cases)
One of the downfalls of Excel is the lack of an audit trail. In these examples, we will provide a high-tech and low-tech way to document your sample selection process in detail. First up, ACL.

The High-Tech Method

I am working with fictional data below. As you can see, our population contains 36 counties. Make note of your population size when working in ACL as this will be important later on. You can count a table by using the shortcut “CRTL + 3”.
ACL sampling pic 1

Next you select the “Sampling” menu and click on “Sample Records”. This also has a shortcut, which is “CTRL + 9”.

ACL sampling pic 2

Change “Sample Type” from “MUS” to “Record”. Then click on “random” on the middle left of the interface. Enter in the “size” of the sample. I pulled a sample of 10. The “Seed” allows you to document and repeat a random sample. Any number will do – just pick the first one that comes to mind.

I know what you’re thinking. However, just because something is repeatable does not change the fact that it is random.

Enter in the “population” we recorded earlier, then define the table name you want the sample sent to.

ACL sampling pic 3

There you have it; a random sample of 10 counties in Oregon, with a full log file and repeatable methodology in case you ever get questioned about how you pulled your sample.

The Low-Tech Method

If you are still hung up on what a seed has to do with random sampling, the low tech way will make it clear to you. Below we have a copy of a random number table. You can find these in the appendix of most statistics textbooks or via Google.

ACL sampling pic 5

The “seed” tells you where to start on the table. If I have a seed of 1, we would start at the 1st number, which also happens to be a 1. A seed of “3” start at the 3rd number in which in this case is 4. This is what makes it repeatable. Our population was 36, so to pull a sample we will be looking at sequences of 2-digit numbers. I will use a seed of “3” and pull just three samples.

In the random number table to the right, I’ve crossed out the first two numbers since our seed was “3”. ACL sampling pic 6Starting with the 3rd number, I looked at each 2 digit sequence. If the number fell between 01 and 36, it was a valid random sample and highlighted in green. If the number was above 36, I moved to the next sequence. Also, if repeats are not allowed in your sample you would move to the next number as well (e.g. 11 would be my next sample, but it was already pulled so I would skip over the repeat). Keep moving right and down until you have pulled the full sample.

In this case, my sample was 01, 11, and 20 or Baker, Gilliam, and Lane (shown below). Functionally, this manual low-tech process is identical to what ACL does.

ACL sampling pic 7

You can apply the Random Number table approach to extremely large files. If you had 1,000,000 records you would look at 7-digit sequences rather than 2-digit shown above.

And there we have it! Two useful methods for documenting sample selection.

If you are stuck on a project in ACL, Excel, or ArcGIS please submit your topic suggestions for a future blog post.

Auditors at Work Data Wonk Featured How To New Audit Release