Methods (to our Madness): Disaggregating Data to Improve Evaluation and Transparency of Economic Development Programs

 Periodically, we will highlight some of the methods used in a recently released audit. Every performance audit is unique and can require creative thinking and methodologies to answer our audit objective. Some of these methods could be replicated or present valuable lessons for future projects.

 

Disaggregating data can improve transparency and evaluation of programs. In a recent audit of Business Oregon, the audit team divided data about business finance and forgivable loans by different programs to estimate job growth and return on investment, as well as rural investments for each program.

Business Oregon reports jobs created for all of its programs combined, in one Key Performance Measure. The audit team saw their more detailed analysis as an approach that Business Oregon’s analysts could emulate and improve on, in order to foster greater transparency and improve understanding of the investments and outcomes of individual programs.

I recently sat down with Jon Bennett, a Performance Auditor, to learn about their analysis strategy and any lessons learned.

Business Oregon’s programs contribute to job growth

Jon’s team analyzed different business finance programs at Business Oregon and calculated net job growth for participating businesses over a four year period. They found that about two-thirds of businesses had net job growth. They also found that most of the awards went to businesses that paid wages below the county average, important since Business Oregon has a mission to create living wage jobs.

They also looked at investment by geography and found that most awards go to non-rural areas. This is important because rural areas contain 40% of Oregon’s population and were struggling to get out of the recession.

While it is valuable to look at the big picture, separating data into different programs and different measures can provide greater insights into the effectiveness of each program and how each program’s investments reflect the agency’s priorities.

Lessons Learned: Time management and planning ahead

Jon had a few different lessons learned, but the big take away is one I’ve experienced before – time management. Doing data analysis always seems to take longer than you expect it to. One of the time consuming aspects Jon faced was combining data from two different data sources. He thought it would be simple because he had a unique identifier, but it turns out that some of the businesses he was looking at had multiple locations and he had to look at the data more carefully.

Next time, Jon would also like to do a better job of planning how to document his work before he does it. As auditors, we always have our work checked for accuracy, which can be challenging if there is not a clear documentation trail. That is one of the benefits of using ACL, since it automatically creates a log. But sometimes other tools can be more useful. Jon interestingly switched between Excel, ACL, and STATA to use the tool that could do the task most efficiently in the way that he knew best.

 

CZ_photo

Caroline Zavitkovski, OAD Principal Performance Auditor, MPA

 

Auditing and Methodology Featured Performance Audit

Goodwill Community Foundation Repost: Why, and how, to make a pivot table in Excel

When you have a lot of data, it can sometimes be difficult to analyze all of the information in your worksheet. Pivot tables can help make your worksheets more manageable by summarizing data and allowing you to manipulate it in different ways.

Check out the two-part video series below on the many uses and strengths of Excel pivot tables, and geek out quietly to yourself over your newfound powers of data analysis. You know we are!

 

 

 

Check out more here.

Accountability and Media Data Wonk Featured

How To: Use Excel’s Named Cell Range Feature

Developing formulas can be tricky at times when you are dealing with Excel’s built in cell references (e.g. tabname!$J$5:$J$356). You may know where in the spreadsheet this information is and what it describes, but you always need to spend time going back and forth between different tabs to pull the reference in or copying it from another cell. That can take a lot of valuable time.

Wouldn’t it be great if you could just call data by its own name? Such as “PayrollHours” or “AR” for accounts receivable. Well you can. Formulas don’t need to look like this:

The first step is accessing Excels named range feature. And as with everything there are a couple of ways to do this. The easiest is to highlight the cell or range you want to name. Then look to the top left of the spreadsheet and you should see a little box with a cell reference in it. It is just to the left of your formula bar and looks like this:

cellrange2a

Click in the box (shown as V8 above) and rename that to anything you want. For example, I renamed this reference “Apple”. Now in the formula bar you can type “Apple” anywhere you would normally put a cell reference and Excel will know exactly what data to pull in there.

Say you are a financial auditor and want to conduct a ratio analysis. You can go through the balance sheets and various financial documents and rename all the important cells to the terms you are used to. For example, call the cell that contains current assets “Assets” and current liabilities “Liabilities”. Now in Excel you can type in the current ratio by just entering “=assets/liabilities”.

The other way of naming references is to use the name manager on the Formula tab of the ribbon.

cellrange3a

Click on new to create a new name. Enter the name and whether it applies to the whole workbook or just that single tab. You can also add a comment to document what it is you are defining.

cellrange4

On the bottom, you can adjust what individual cell or ranges you are referencing. Click the little box with the arrow to select them manually.

Now you have the tools available to change your formulas to look like this:

cellrange5
The formula is now much easier to understand.  It is summing (adding) all payroll hours. The formula breaks out the totals by location, classification, and pay code. To learn more check out this website.

Ian Green, CGAP and OAD Senior Auditor

Ian Green, CGAP and OAD Senior Auditor

Enjoy reading our ‘how to’ posts? Subscribe to the Oregon Audits blog through the ‘Follow’ link in the right hand sidebar and keep tabs on the latest data wonk tips and tricks, audit releases, interviews, and more!

Data Wonk Featured How To

How To: use Excel shortcuts to save you time and be more productive

Excel is a great tool for auditors, but oftentimes we do not utilize the shortcuts and other tools available to increase our productivity. This post will focus on some shortcuts I use to be more efficient.

My favorite and most used shortcut is auto filling formulas. Make sure you read the post on relative and absolute referencing or else you might make a critical error when auto-filling formulas.

Data Entry Tips

CTRL_D_1

CTRL+D, =SUM formula

To auto-fill a formula down, you can use the command CRTL + D. First, enter the formula you want to apply in the topmost cell of the column you are working in. Highlight the cell with the formula and all the other cells you want filled in. Now hit CTRL + D and the formula will now be applied to all of those cells. If you are working left to right in a spreadsheet, the same approach can be used with CTRL + R.

A helpful trick when working with big datasets is to use CTRL + Arrowkey to jump to the end of a row or column. To highlight those cells use CTRL + SHIFT + Arrowkey. If you keep holding SHIFT, but drop CTRL, you can modify the selection using arrow keys. For example, you can deselect the row or column label.

Excel has a handy shortcut if you work with dates and times. To enter today’s date simply hit CTRL + semicolon. For the current time hit CTRL + Shift + colon.

For a long time I could never get Excel to work nicely with adding another row of text to a cell. I would turn on text wrapping and add in spaces until the text I wanted dropped down a line. Quite an inefficient process. Save yourself time and use ALT + ENTER to put a return space.

ALT_ENTER

Switch cell formats to currency by hitting CTRL + SHIFT + $. To change it to text/General use CTRL + SHIFT + ~. For Percentages use CTRL + SHIFT + %. For numbers, CTRL + SHIFT + !. For Dates, CTRL + SHIFT + #. Want to switch to accounting format? Send Microsoft an email requesting that a shortcut feature be added, because right now there is no shortcut and you’ll need to use the manual process. The quickest way is to pull up a cell format dialog box using CTRL + 1.

To auto-sum fields you have entered use ALT + =. Select either a cell below a column or row needing summation.

Inserting rows and columns can be done by highlighting either an entire row or column and then hitting CTRL + SHIFT + =.

Editing Cell Formulas

If you select a cell and hit F2, Excel switches you to editing the cell formula. This is a great shortcut rather than having to go to the ribbon each time. One of the most important changes you can make is switching between absolute and relative references using F4.

F3 is handy if you use named ranges in complex formulas. This command will pull up the named range dialog box.

Reviewing worksheets

One of the best ways to review worksheets is to inspect the formulas used to calculate various fields. You can access this option using the ribbon, but a quick CTRL + ` (the one above tab not the ‘ next to the Enter key). Just enter the command again to turn formulas off.

awesomesheet

Use CTRL+TAB instead

You can switch between worksheets by using CTRL + TAB to go to the next worksheet or CTRL + SHIFT + TAB to go back a worksheet.

 Other useful shortcuts

We probably all use CTRL + V to paste, but did you know CTRL + ALT + V allows you to quickly paste special? Paste special is great for removing formatting copied from other documents or transposing data.

Turn auto-filter on and off with CTRL + SHIFT + L.

Auto-updating charts

You can tie charts directly to data ranges. If you compile an annual report, creating charts that auto-update can save you a lot of time. Read more about this process in Stephanie Evergreen’s most recent blog post.

Conclusion

By using one or more of these tips above you will become more efficient in conducting your audit work and before no time you will be the next Excel super user in your office.

Ian Green, CGAP and OAD Senior Auditor

Ian Green, CGAP and OAD Senior Auditor

 

Enjoy reading our ‘how to’ posts? Subscribe to the Oregon Audits blog through the ‘Follow’ link in the right hand sidebar and keep tabs on the latest data wonk tips and tricks, audit releases, interviews, and more!

Data Wonk Featured How To

How to: Apply Benford’s Law in Excel to Detect Fraudulent Activity

We apply Benford’s Law here at the Oregon Audits Division as part of our fraud investigations.

For those who haven’t heard of it yet, Benford’s Law is a natural phenomenon that occurs in certain data sets. Just as the Bell Curve predicts certain distribution of numbers, so does Benford’s. You can use Benford’s to detect fraudulent transactions by looking for outliers.

Benford’s Law predicts that the number 1 will occur more often as the first digit than any other number. In fact, the number 1 is about 6 times more likely to occur than the number 9 (30.1% vs. 4.6%). The law can also be applied to the first two digits and other applications, but we won’t get into that now.

So what data sets conform to Benford’s? Well there are some, like the drainage of rivers, that do not apply to auditing, but there are also plenty of financial transactions that do.  First off, you want to have a dataset that has a large sample size.  Ideally, over 1,000 records.  This is one of the cases when 30 is a very inappropriate sample size.

Second, you want data that is not limited. ATM transactions for example are limited because there are minimum and maximum withdrawals.  They also generally require increments of $20.  Being limited also includes using assigned values like invoice numbers.  All of the digits (1 through 9) should be possible.

The data should also ideally cross multiple orders of magnitude (e.g. 1 to 10, 10 to 100, 100 to 1,000).

Here’s a list of data that should generally conform:

  • Home addresses
  • Bank account balances
  • Census data
  • Accounting related data such as Accounts Receivables
  • Transaction level data

Now that I know what data to use, how can I analyze it? With Excel of course!

Steps:

1 – Load Data in Excel

2 – Calculate first digit

3 – Run Benford’s using Countif

4 – Graph

The following uses real world data that helped to convict several fraudsters in Oregon.

Screenshot of Steps 1 & 2

benfords 1

Using the left function, you can calculate the first digit of a number.

Screenshot of Step 3

benfords 2

Using the countif function, you can calculate the number of first digit in your data. You will need to calculate the percentage too.  The log formula on the right is Benfords Law in numerical form.

Screenshot of Step 4

 benfords 4

Looking at the graph, you can see that the digit 1 is overrepresented. The next step is to drill down on records that do not match Benfords.  A closer examination of these records with a first digit of 1 will yield a large number of $100 transactions.  Those $100 transactions were largely, if not all, fraudulent.  By using Benfords you can quickly identify suspicious patterns to help detect fraud.

Benfords will lead to false positives, so do not assume that if there is an outlier it has to be fraud. Next time, how to do Benford’s in ACL and why you should use the 2-digit Benford’s test.

Data Wonk Featured Fraud Investigation How To

How To: Analyze payroll data using Excel’s SUMIFS function

A few years ago, I worked on an audit at the Oregon Department of Corrections (DOC). Elected officials were concerned that the DOC was spending too much money on overtime. We used a combination of ACL and Excel to conduct this audit work. We followed the National Institute of Corrections Net Annual Work Hours Post Relief Factor methodology.

First, we accessed our state’s central payroll database. We pulled the tables we needed that recorded hours worked by pay code by month. The way the table was organized was not right for Excel, so we first needed to prepare the information by using the Summarize function in ACL. For this example, you will need to summarize on employee classification, pay code, and location. Aggregate both hours and dollars.
With the data prepared, it should like something like this:

Table 1 – Data

sumif post pic 1

The first column denotes the prison. The next is the classification of the correctional employees (e.g. sergeant, corporal, officer etc.). The following column shows pay codes – we had 50 different pay codes in our data set. The next three are straight forward – dollars, hours, and counts. Lastly, we have a short description of the pay codes. For example, CD is career development/training.

After developing this table, I calculated the average for each row by dividing by the FTE in each classification at each prison. For example, there were 3881.5 hours of CD and 184 FTE, yielding an average of 21 CD hours per staff. Now that I have my data ready I can start analyzing it. I want to know if there are differences between locations, classifications, and pay codes to see if this is driving any overtime.

I set up a table in excel, shown below. CCCF stands for Coffee Creek Correctional Institution and TRCI is Two Rivers Correctional Institution. Pay code descriptions are above. Classifications range from officer (C6775) to captain (X6780).

Table 2 – PRF

sumif post pic 2

I can now use this table within my SUMIFS function to pull average hours from the other table.
The SUMIFS function has three main parameters: Sum_range, Criteria_range, and Criteria.

sumif post pic 3

Sum range is the range of data you want summarized. In this case, I wanted to pull average hours from my payroll data. You only have one sum_range, although you can have as many criteria as you want. I will have three criteria. I want the average to come from (1) the correctional institution (2) the employee classification and (3) the pay code.

Columns a, b, and c, from the table 1 will each be a criteria range. I will use table 2 as my criteria. Here is what the formula looks like. Not I use a combination of absolute and relative references (our next post will delve into this in more detail). Absolute references ‘lock’ in cell ranges in a formula so when you drag the formula it does not change. Absolute references are denoted by a “$”.

SUMIFS function

sumif post pic 4

Reading left to right, the function is asked to summarize the averages calculated from the table 1. It is to look at the first column for the prison acronym to match to “CCCF” highlighted in blue. Next it searches employee classification in column b for “C6775”. Finally, it matches pay code “CD” highlighted in purple.

Once I have the one formula set up, I can drag it over and down and calculate over 500 different averages in a few seconds. After setting up one year of data, all you need to do is copy the tab and re-link to the next data and you can compare year-over-year trends in minutes.

There’s a few steps I’m skipping over, but the end result from these calculations looks something like this:

sumif post pic 5

So what did we find in our audit? Overtime is not as big of a problem as people perceive it.

Most people think that overtime has to be more expensive because you are paying time and a half. What is often left out is the cost of leave time and other benefits, which often add up close to 50% of salary making the pay difference negligible. Furthermore, if you hire an officer to replace overtime you must pay them for about 2,000 hours per year. Whereas with overtime, you only need to pay it when you need it. What is cheaper? $65,000 for a new officer or $25,000 for 500 hours of overtime?

So if you pay only 500 hours of overtime per year in a given shift, it doesn’t make sense to hire an officer to cover that time, because they would be paid for hours not needed. Below is a great example.

Overtime at CCCF

sumif post pic 6

You can see that overtime is quite varied. It peaks around hunting and fishing seasons, flu season, and winter holidays. This is not that surprising as more people are calling in sick during these times and someone needs to work the overtime to replace them.

If CCCF hired an additional officer to work these hours, they would only reduce overtime by a small fraction. At best, CCCF could eliminate all of the overtime between 0 and 8 hours. To eliminate all overtime, CCCF would need to hire 6 FTE on the graveyard shift, or 48 hours of coverage, which is vastly more expensive than the cost.

As we found out in this audit, sometimes your gut, i.e. overtime is costly, is wrong.

ian

Post prepared by Guest Blogger and OAD Senior Auditor Ian Green

Data Wonk Featured How To