Evergreen Data Reblog: State of the Union 2016 Slideshow

“Dudes, the State of the Union is my Superbowl. I look forward to this event like some people anticipate the Oscars. This is my deal. Why? Because the enhanced version of the SOTU features a data-based slideshow, a perfect study in how graphics can help underscore a point. And, as every year, the White House graphic design team hits some out of the park and strikes out on others.”


SOTU example of using data to illustrate a point

Read more here from the wise and very entertaining data wizard Stephanie Evergreen and her “review” of the State of the Union 2016 address (or rather, the graphics used during the SOTU speech). Evergreen is widely regarded as a master of simple and digestible data visualization using a basic Excel program. She also shares useful how-to’s that anyone who works with data (not just auditors) can understand and start applying to their own work.

Data Wonk Featured

How to: know the difference between absolute and relative cell references in Excel

As auditors we are often working with spreadsheets. Over the next few months the OAD “How To” blog series will be focusing on some cool tips and tricks to use in Excel.

We will start with a relatively simple, but vital technique to understand: the difference between absolute and relative cell references.

Absolute cell references are used when you want to refer to a specific cell the same time every time. To do this, use the prefix $.  For example, =$A$2 will always refer to cell A2. This can be useful if you want to test transactions to see if they match a specified tax rate as shown in the example below.


This example can also be used to show relative references. Relative references change when you drag a formula down or across the spreadsheet. For example, C5 will become C6, C7, C8, etc. The same is true of the D5 reference. As you can see in the screenshot below, the formula maintains the $A$2 reference while the other references change based on the row number.


Sometimes you need to develop large tables within a spreadsheet with multiple column fields and row fields. You can use a combination of absolute and relative references to help you develop those quickly and efficiently.

For example, if you want a reference to stay within a column, use $A2. This will lock the reference on the A column as you drag the formula to the right.

If you want a reference to stay constant within a row, use A$2. This will lock the reference on row 2 as you drag the formula up and down.

Now you have the basic tools you need to create quick and easy tables. Just remember to think about what type of cell reference you need when you start developing your formulas.  A helpful trick is to use F4 to cycle through the various types of references (e.g. $A$2, $A2, A$2, A2).  For a more advanced walk-through of creating tables, see this previous post on using Sumifs.

R1C1 Style

Before Excel, Lotus 1-2-3 was the spreadsheet software of choice and it used a different cell referencing style that can be activated in Excel if you want. Just go to File > Options > Formulas > Working with formulas and click on R1C1 reference style.


The way R1C1 style works is that you use the R (row) and C (column) to create relative references. So if you were inputting a formula in cell B3 and you wanted to reference A2, you would enter = R[-1]C[-1]. B minus 1 is A and 3 minus 1 is 2 (or cell A2).  This is an example of a relative reference in R1C1 style. For an absolute reference use R2C1 for A2 (A2 is the 2nd row and 1st column of the spreadsheet, hence R2C1).

See the screenshot below for how the formulas from the examples above look in R1C1:


In some instances, it might be easier to use R1C1 style to create the formulas you need so keep this trick in mind. Most people prefer to work in A1 style.

Ian Green, CGAP and OAD Senior Auditor

Ian Green, CGAP and OAD Senior Auditor

Enjoy reading our ‘how to’ posts? Subscribe to the Oregon Audits blog through the ‘Follow’ link in the right hand sidebar and keep tabs on the latest data wonk tips and tricks, audit releases, interviews, and more!

Data Wonk Featured How To

Methods (to our Madness): Using Data to Tell the Story of a Debt Problem

Periodically, we will highlight some of the methods used in a recently released audit. Every performance audit is unique and can require creative thinking and methodologies to answer our audit objective. Some of these methods could be replicated or present valuable lessons for future projects.

Sometimes it takes a number to get the point across. And sometimes it takes actually doing the work to show that the work can be done.

These were two of the big takeaways from a recent conversation I had with Jamie Ralls, principal auditor at the OAD and project lead for a recently released performance audit on debt collection: Oregon Needs Stronger Leadership, Sustained Focus to Improve Delinquent Debt Collection.

Vendor offset analysis showed potential savings of at least $750,000 a year

Jamie conducted an analysis for the audit on vendor offset. Vendor offset is when a state matches a list of debtors that owe the state money to a list of vendors that the state pays money to for services. Then instead of paying money to the vendors for services, the state intercepts those payments and applies it to the debt. This is something that 40 other states do, but Oregon did not do at the time of the audit.

Jamie looked at what Oregon could have collected had it used vendor offset. The result: At least $750,000 a year.

Limitations in the data resulted in a cautious estimate

The $750,000 a year estimate was likely low considering that the list of debtors was incomplete from a statewide perspective. The Department of Revenue maintained the list and it did not include debt held at other agencies. Additionally, due to the complexity of the analysis, the team only calculated debt and payments by year. An ongoing monthly calculation would have produced a greater collection amount.

Lessons learned: document along the way

Jamie said that if she could go back she would have been better about documenting all of the steps she took in the analysis as she went along. She was so caught up in the excitement of the work that she did not always stop to document everything. She then had to go back and retrace some of her work.

Using data to tell the story of a debt problem

When I asked Jamie why the audit team did this specific analysis, she said that paying vendors who owe money to the state has been a long-standing problem. The Audits Division had first recommended vendor offset in 1997. However, in the past our office had only talked about it anecdotally.

Being able to show the extent of the problem through data analysis had a big impact. Actually going through the methodology also demonstrated that doing vendor offset was technically possible. During the course of the audit, in part due to testimony from the audit team showing this analysis, the Oregon Legislature passed Senate Bill 55. SB 55 requires the Oregon Department of Revenue to do a vendor offset.

Breaking down the methodology

Here is a step-by-step look at how Jamie analyzed vendor offset:

  1. She took a list of approved vendors from the Oregon Department of Administrative Services and a list of debtors from the Oregon Department of Revenue. She matched the lists based on tax id numbers. She found 9140 debtors who were approved as vendors to receive payment from the state. These vendors owed a total of $67 million in debt.
  2. Next, she pulled queries in the Statewide Financial Management Application (SFMA) to find and export records of all payments to these vendors for the time period of 2011 to 2014.
  3. She then summarized the debt by each year and summarized the payments each year.
  4. She took the debt for the first year (2011) and subtracted the payments for the following year (2012). If a balance of debt remained, it was rolled over to the next year (2012) to create a rolling balance of debt.
  5. For each year, the amount of debt that could have been collected through payments in the following year was also calculated and rolled forward, to create a rolling balance of what the state could have collected.
  6. She computed $3 million in debt that could have been collected, or an average of $750,000 a year.


Caroline Zavitkovski, OAD Senior Performance Auditor, MPA

Caroline Zavitkovski, OAD Senior Performance Auditor, MPA

Auditing and Methodology Data Wonk Featured

Governing Reblog: County Migration Trends

County in-migration and out-migration trends are tracked by the IRS, with tax returns serving as a proxy for the number of households. While any such singular tracking method may miss families and individuals that do not file taxes, it does provide some small insight into changes in county populations, and even economic health. While numerous other factors beyond in or out-migration are needed to fully understand a county’s economic wellbeing, whether or not there is enough of a working age population available from year to year to sustain local industry is a fairly reliable indicator.

We can see a stark difference in migration patterns between two of Oregon’s counties below:

Washington-County Migration Data


Harney-County Migration Data - Internet Explorer

While Washington County saw the rate of in-migration growth increase from 2011 to 2014, Harney County saw a fairly sizable net loss in the number of filed tax returns in that same period. The data available may not tell us why such a pattern is evident. It may point to other indicators (such as changes in local industry) that can, however.

Interested in comparing your home county to its neighbors? Checking out the original post over at Governing.com!

Data Wonk Featured

Stats in Auditing: Medicare, hospital-physician consolidation, and panel data analysis at the GAO

In an audit released last December, the Government Accountability Office (GAO) looked at whether increasing consolidation between hospitals and physicians had something to do with more evaluation and management (E/M) services being provided at hospital outpatient departments (HOPDs), rather than in the more typical setting of a physician’s office.

Why does this matter?

According to the audit, the concern is that these trends may be driving up Medicare expenditures. As the GAO notes in the report, Medicare pays providers a higher rate for E/M visits and other types of procedures if the service is provided at a hospital facility (at a HOPD) than if the same service is performed at a physician’s office. This rate premium then creates an incentive for shifting the provision of services like E/M visits from physician offices to hospitals. As more hospitals acquire physician practices or hire physicians as regular employees (also known as hospital-physician vertical consolidation or vertical integration), more hospitals are able to provide these services and be eligible for the higher rate.

The evidence

While performance audits routinely use data as evidence or background information, not many use statistical analysis in their methodology, which makes this GAO audit even more interesting. To make their analysis possible, the GAO auditors combined various sources of medical data to construct a dataset of county-level variables (for 3,121 counties) with annual observations over the 2007-2013 period.

The main variables of interest in the dataset are:

  1. The median percentage of all E/M visits in a given county that were conducted at hospital outpatient facilities.
  2. The level of hospital-physician vertical consolidation in each county, which is a measure of how much a county is served by hospitals that have consolidated with physician practices.

After grouping counties into varying levels of hospital-physician consolidation, the auditors compare the median percentage of E/M office visits conducted in hospital facilities for each of these groups.

The results for all years are in table-form in Appendix III of the report, but to make it easy on the eyes, we quickly put the data in the tables through Tableau and came up with some colorful charts (click to see the full interactive visualization):

E_M Office Visits and Vertical Consolidation

The charts show that counties with higher levels of hospital-physician consolidation tend to have higher percentages of E/M visits performed at hospital outpatient facilities. The charts also show that this positive correlation between the two variables increases over time.

But correlation does not imply causation. And it could certainly be the case that the increase in the volume of E/M visits performed at hospital outpatient facilities could be attributed to other factors, in which case the correlation would be a spurious one. This is where statistical analysis methods – such as regression analysis – come in handy because using these methods can help test whether the relationship between two variables is still there after accounting for other factors.

In this case, the GAO auditors used regressions to check whether the correlation above changes or disappears after adding other (relevant) variables into the mix, such as the level of competition between hospitals and between physicians in each county.

But what separates these regressions from the standard regression model is that they exploit the nature of the dataset used in the audit, which follows thousands of counties over several years. This type of dataset allows for the use of panel data analysis methods that have some advantages over standard regressions. In particular, it allows for the use of “fixed effects models” that can account for unobserved differences in county characteristics that remain stable over certain periods of time – such as urban/rural status, the state in which a county is located – as well as factors that vary over time but that affect all counties simultaneously, such as national boom and bust economic cycles. Note that the emphasis is on unobserved, and this is because the models automatically account for these differences without the need for the user to add data on these unobserved characteristics.

The regression results are presented in Appendix II and confirm the positive relationship between hospital-physician consolidation (‘Vertical consolidation’ in Table 4) and the percentage of E/M visits performed at hospital outpatient facilities.

Based on the results, here is what the GAO recommends: “In order to prevent the shift of services from physician offices to HOPDs from increasing costs for the Medicare program and beneficiaries, Congress should consider directing the Secretary of HHS to equalize payment rates between settings for E/M office visits–and other services that the Secretary deems appropriate–and to return the associated savings to the Medicare program.”

Luis Sandoval OAD Performance Auditor

Luis Sandoval
OAD Performance Auditor

Data Wonk Featured

How to: Use Benford’s law in ACL

Last time, we spoke about a fraud detection technique known as Benford’s Law.  In summary, it can easily detect outliers in certain sets of data by looking at the first digit of a number.  Those outliers can then be examined to see if they are fraudulent.  For many financial transactions, approximately 30% of the numbers should start with “1” and approximately 5% should start with “9”.  In essence, Benford’s law compares what your data exhibits against what is expected.  For more details, read the link above.

For the following, I’ll show how to do the first digit test and then I’ll show how to do the first two digit test. The two digit test is useful to drill down on the data to show patterns that might not exist in the first digit test. For example, the two digit test is very effective at picking up multiples of ten (10, 20, 30, etc.).


  • Load data in ACL
  • Analyze Tab ⇒ Benfords
  • Select field to analyze
  • Select number of digits
  • Graph

The following contains real data that was used to help identify and prosecute fraudsters in Oregon.


Screenshot of step 2


Screenshot of steps 3 & 4


Screenshot of Step 5

Note the big difference between observed and expected values of the first digit “1”.


Conducting the first two digit test

After looking at the first digit, consider also looking at the first two digits, even if nothing showed up the first time. Just run Benford again and change to “2” leading digits.


Look at all those spikes in the data presented below at multiples of $10. In this instance, many of those were fraudulent.


And if you look further you find a very suspicious pattern of dozens of $100 transactions in the first 3 digit test.


Read more about the outcome of the fraud investigation here, and check back next month for more on the how to’s of data analysis in the world of auditing!


Data Wonk Featured How To

How to: Apply Benford’s Law in Excel to Detect Fraudulent Activity

We apply Benford’s Law here at the Oregon Audits Division as part of our fraud investigations.

For those who haven’t heard of it yet, Benford’s Law is a natural phenomenon that occurs in certain data sets. Just as the Bell Curve predicts certain distribution of numbers, so does Benford’s. You can use Benford’s to detect fraudulent transactions by looking for outliers.

Benford’s Law predicts that the number 1 will occur more often as the first digit than any other number. In fact, the number 1 is about 6 times more likely to occur than the number 9 (30.1% vs. 4.6%). The law can also be applied to the first two digits and other applications, but we won’t get into that now.

So what data sets conform to Benford’s? Well there are some, like the drainage of rivers, that do not apply to auditing, but there are also plenty of financial transactions that do.  First off, you want to have a dataset that has a large sample size.  Ideally, over 1,000 records.  This is one of the cases when 30 is a very inappropriate sample size.

Second, you want data that is not limited. ATM transactions for example are limited because there are minimum and maximum withdrawals.  They also generally require increments of $20.  Being limited also includes using assigned values like invoice numbers.  All of the digits (1 through 9) should be possible.

The data should also ideally cross multiple orders of magnitude (e.g. 1 to 10, 10 to 100, 100 to 1,000).

Here’s a list of data that should generally conform:

  • Home addresses
  • Bank account balances
  • Census data
  • Accounting related data such as Accounts Receivables
  • Transaction level data

Now that I know what data to use, how can I analyze it? With Excel of course!


1 – Load Data in Excel

2 – Calculate first digit

3 – Run Benford’s using Countif

4 – Graph

The following uses real world data that helped to convict several fraudsters in Oregon.

Screenshot of Steps 1 & 2

benfords 1

Using the left function, you can calculate the first digit of a number.

Screenshot of Step 3

benfords 2

Using the countif function, you can calculate the number of first digit in your data. You will need to calculate the percentage too.  The log formula on the right is Benfords Law in numerical form.

Screenshot of Step 4

 benfords 4

Looking at the graph, you can see that the digit 1 is overrepresented. The next step is to drill down on records that do not match Benfords.  A closer examination of these records with a first digit of 1 will yield a large number of $100 transactions.  Those $100 transactions were largely, if not all, fraudulent.  By using Benfords you can quickly identify suspicious patterns to help detect fraud.

Benfords will lead to false positives, so do not assume that if there is an outlier it has to be fraud. Next time, how to do Benford’s in ACL and why you should use the 2-digit Benford’s test.

Data Wonk Featured Fraud Investigation How To