How To: Analyze payroll data using Excel’s SUMIFS function

A few years ago, I worked on an audit at the Oregon Department of Corrections (DOC). Elected officials were concerned that the DOC was spending too much money on overtime. We used a combination of ACL and Excel to conduct this audit work. We followed the National Institute of Corrections Net Annual Work Hours Post Relief Factor methodology.

First, we accessed our state’s central payroll database. We pulled the tables we needed that recorded hours worked by pay code by month. The way the table was organized was not right for Excel, so we first needed to prepare the information by using the Summarize function in ACL. For this example, you will need to summarize on employee classification, pay code, and location. Aggregate both hours and dollars.
With the data prepared, it should like something like this:

Table 1 – Data

sumif post pic 1

The first column denotes the prison. The next is the classification of the correctional employees (e.g. sergeant, corporal, officer etc.). The following column shows pay codes – we had 50 different pay codes in our data set. The next three are straight forward – dollars, hours, and counts. Lastly, we have a short description of the pay codes. For example, CD is career development/training.

After developing this table, I calculated the average for each row by dividing by the FTE in each classification at each prison. For example, there were 3881.5 hours of CD and 184 FTE, yielding an average of 21 CD hours per staff. Now that I have my data ready I can start analyzing it. I want to know if there are differences between locations, classifications, and pay codes to see if this is driving any overtime.

I set up a table in excel, shown below. CCCF stands for Coffee Creek Correctional Institution and TRCI is Two Rivers Correctional Institution. Pay code descriptions are above. Classifications range from officer (C6775) to captain (X6780).

Table 2 – PRF

sumif post pic 2

I can now use this table within my SUMIFS function to pull average hours from the other table.
The SUMIFS function has three main parameters: Sum_range, Criteria_range, and Criteria.

sumif post pic 3

Sum range is the range of data you want summarized. In this case, I wanted to pull average hours from my payroll data. You only have one sum_range, although you can have as many criteria as you want. I will have three criteria. I want the average to come from (1) the correctional institution (2) the employee classification and (3) the pay code.

Columns a, b, and c, from the table 1 will each be a criteria range. I will use table 2 as my criteria. Here is what the formula looks like. Not I use a combination of absolute and relative references (our next post will delve into this in more detail). Absolute references ‘lock’ in cell ranges in a formula so when you drag the formula it does not change. Absolute references are denoted by a “$”.

SUMIFS function

sumif post pic 4

Reading left to right, the function is asked to summarize the averages calculated from the table 1. It is to look at the first column for the prison acronym to match to “CCCF” highlighted in blue. Next it searches employee classification in column b for “C6775”. Finally, it matches pay code “CD” highlighted in purple.

Once I have the one formula set up, I can drag it over and down and calculate over 500 different averages in a few seconds. After setting up one year of data, all you need to do is copy the tab and re-link to the next data and you can compare year-over-year trends in minutes.

There’s a few steps I’m skipping over, but the end result from these calculations looks something like this:

sumif post pic 5

So what did we find in our audit? Overtime is not as big of a problem as people perceive it.

Most people think that overtime has to be more expensive because you are paying time and a half. What is often left out is the cost of leave time and other benefits, which often add up close to 50% of salary making the pay difference negligible. Furthermore, if you hire an officer to replace overtime you must pay them for about 2,000 hours per year. Whereas with overtime, you only need to pay it when you need it. What is cheaper? $65,000 for a new officer or $25,000 for 500 hours of overtime?

So if you pay only 500 hours of overtime per year in a given shift, it doesn’t make sense to hire an officer to cover that time, because they would be paid for hours not needed. Below is a great example.

Overtime at CCCF

sumif post pic 6

You can see that overtime is quite varied. It peaks around hunting and fishing seasons, flu season, and winter holidays. This is not that surprising as more people are calling in sick during these times and someone needs to work the overtime to replace them.

If CCCF hired an additional officer to work these hours, they would only reduce overtime by a small fraction. At best, CCCF could eliminate all of the overtime between 0 and 8 hours. To eliminate all overtime, CCCF would need to hire 6 FTE on the graveyard shift, or 48 hours of coverage, which is vastly more expensive than the cost.

As we found out in this audit, sometimes your gut, i.e. overtime is costly, is wrong.

ian

Post prepared by Guest Blogger and OAD Senior Auditor Ian Green

Data Wonk Featured How To

Auditing How To: Document Sample Selections in ACL

Hello again, fellow data wonks and wonk wannabes!

Last time, we discussed random sampling in Excel and what factors you should consider when determining your sample size. (Hint: 30 is generally large enough, but not in all cases)
One of the downfalls of Excel is the lack of an audit trail. In these examples, we will provide a high-tech and low-tech way to document your sample selection process in detail. First up, ACL.

The High-Tech Method

I am working with fictional data below. As you can see, our population contains 36 counties. Make note of your population size when working in ACL as this will be important later on. You can count a table by using the shortcut “CRTL + 3”.
ACL sampling pic 1

Next you select the “Sampling” menu and click on “Sample Records”. This also has a shortcut, which is “CTRL + 9”.

ACL sampling pic 2

Change “Sample Type” from “MUS” to “Record”. Then click on “random” on the middle left of the interface. Enter in the “size” of the sample. I pulled a sample of 10. The “Seed” allows you to document and repeat a random sample. Any number will do – just pick the first one that comes to mind.

I know what you’re thinking. However, just because something is repeatable does not change the fact that it is random.

Enter in the “population” we recorded earlier, then define the table name you want the sample sent to.

ACL sampling pic 3

There you have it; a random sample of 10 counties in Oregon, with a full log file and repeatable methodology in case you ever get questioned about how you pulled your sample.

The Low-Tech Method

If you are still hung up on what a seed has to do with random sampling, the low tech way will make it clear to you. Below we have a copy of a random number table. You can find these in the appendix of most statistics textbooks or via Google.

ACL sampling pic 5

The “seed” tells you where to start on the table. If I have a seed of 1, we would start at the 1st number, which also happens to be a 1. A seed of “3” start at the 3rd number in which in this case is 4. This is what makes it repeatable. Our population was 36, so to pull a sample we will be looking at sequences of 2-digit numbers. I will use a seed of “3” and pull just three samples.

In the random number table to the right, I’ve crossed out the first two numbers since our seed was “3”. ACL sampling pic 6Starting with the 3rd number, I looked at each 2 digit sequence. If the number fell between 01 and 36, it was a valid random sample and highlighted in green. If the number was above 36, I moved to the next sequence. Also, if repeats are not allowed in your sample you would move to the next number as well (e.g. 11 would be my next sample, but it was already pulled so I would skip over the repeat). Keep moving right and down until you have pulled the full sample.

In this case, my sample was 01, 11, and 20 or Baker, Gilliam, and Lane (shown below). Functionally, this manual low-tech process is identical to what ACL does.

ACL sampling pic 7

You can apply the Random Number table approach to extremely large files. If you had 1,000,000 records you would look at 7-digit sequences rather than 2-digit shown above.

And there we have it! Two useful methods for documenting sample selection.

If you are stuck on a project in ACL, Excel, or ArcGIS please submit your topic suggestions for a future blog post.

Auditors at Work Data Wonk Featured How To New Audit Release

So that’s how you do a random sample in Excel

We’ve all been there. The boss shows up and says “I want you to select a random sample of files for the audit”. The boss leaves and you frantically begin searching for your old college textbooks.
Fear these technical challenges no longer. The Oregon Audits blog will be rolling out new posts covering practical and useful audit tools. Random sampling will be our first topic, but if you have any requests please don’t hesitate to contact us.

What the heck is random sampling?

Random sampling is useful to gain an understanding of a population without examining every file. By randomly selecting our sample, bias is also eliminated because every “file” has an equal chance of being selected. One word of caution though, if you are trying to look for outliers you will need a large sample size.

That begs the question: How big of a sample do I need to take? The short answer: 30 is usually good. If it is a simple test and not the critical element of your finding, 30 should cover you almost every time.

The longer answer is it depends. You need to consider your objectives, how confident you want to be about your results, how much margin of error is tolerable, and how big and varied the population is. More confidence requires larger samples. Less margin of error also increases sample size. Populations that are less uniform (have higher standard deviations) require larger samples too. And if this is a critical element of your finding, you need even more.

This is a handy online calculator for calculating the sample size needed to estimate the average of a population. Older textbooks like this one are great office resources (this one is super easy to follow). Better yet, it sells for about $10 making it a steal.

Excel: The easy way to pull a random sample

If the population you are reviewing is not numbered, you will need to create an index number for each file.

Excel has a built in random number generator. By using “RandBetween()” we can generate our sample. Enter “1” or the lowest possible index number for the bottom, and the largest possible index number for the top. In this scenario, I will use 1 and 100.

how to 1

Once you have your function looking like this how to 2you copy the formula into other cells. I am pulling a sample of 12, so I will drag the formula down 12 cells.

how to 3

You will note right away that each time you change something on the sheet, the numbers change. So if you want to lock in a sample, you need to copy the cells with the “randbetween” function and paste them as “values”.

how to 4

I prefer to paste over the cells I just copied.

Here’s the sample I got:how to 5
If you come across a duplicate number, you will need to add another row or replace the duplicate with a new “randbetween” function.

Pitfall: Weak audit trail

One of the drawbacks of Excel is that the audit trail is weak. The documentation you have are numbers in a spreadsheet that you could have easily entered manually. If you are working on a piece of evidence that is critical, you will probably want more documentation on how you arrived at your sample.

Our next post will cover how to document a random sample using technology such as ACL and how to document it the low-tech way using a “random number table”.

 

 

Data Wonk Featured How To

Auditing and GIS: Visuals Help Engage Readers

In 2013 the Oregon Audits Division began expanding the use of Geographic Information System (GIS) software to improve the quality of our reports.  If you have not heard of GIS before, it is a tool used to create maps.

In our modern world, policy makers face a steady barrage of reports from multiple sources.  No one has enough time to read every page that passes by his or her desk.  The reality is that legislators and their staff skim reports.  As program evaluators, we should be aware of this fact. We must strive to capture the attention of our audience.  The use of striking visuals is a great method to do so.  A wonderful visual can engage the reader, signaling them to read the accompanying text of the report.

One important caveat: the quality of the text should match the quality of the visual.  You can have the best graphic in the world, but if the text is dense and lifeless, you will lose the reader.

Maps are a great example of a striking visual.  Everyone can easily relate to and interpret most maps.  Not only are maps easy for the public to understand, but legislators often focus on issues in the districts they serve.  By creating maps, you can effectively engage policy makers with the information they want.

unemployment

Unemployment Rates, 2007-2009

Maps can tell stories

The adage goes “a picture is worth a thousand words”.  I am here to tell you that maps can also tell a persuasive story.  To the left is an example from our audit of Oregon’s Temporary Assistance for Needy Families’ program.

The program is very sensitive to economic downturns.  When millions of people lost their jobs during the great recession, they sought out public assistance in record numbers.  That should not come as a surprise since helping people during tough times is the raison d’être of these safety net programs.

By highlighting the unemployment rates, we reminded our readers of the economic conditions of the time.  This important context was essential for our readers to consider when evaluating our findings.

Working with the press

The press loves including maps and graphics with their stories. By sharing your work with reporters, you will get more attention from the press and better articles than you would otherwise. Our office recently sent the maps and other graphics we worked on to a reporter who covered our audit. The reporter loved the free materials and the audit report ended up getting a second front page article that prominently featured four of our graphics (see image below).

Remember to check in with your leadership team before working with the press. The last thing you want is that to be a surprise.

Here are some other things to consider:u.s.

(1) Deadlines matter – hand off your materials sooner rather than later.  Do your research and make sure you go directly to people who cover the topic;

(2) Give advance notice when possible. Reporters need time to brief editors, call sources, coordinate with their graphics departments, and then write and fact check their stories. Do not expect that releasing information at 4:55pm will result in an article in the morning’s paper.

(3) Make yourself available. Provide your contact number or arrange an interview. Make sure you coordinate with communication staff and leadership so that the right people are involved.

(4) Learn the lingo. When you tell a reporter that information is “on background” the reporter will not attribute the information to you. This is the origin of all the stories with lines like “Senior officials said…”. When you provide information “off the record”, the reporter can’t use the information unless they can confirm it with another source.

(5) Establish clear expectations about when the material can be published.  Providing an advanced copy of your report with an “embargo” can help ensure the information is published on the right day at the right time. Reporters respect embargos, but there are situations where other sources may leak the same material. In those circumstances, a reporter may publish their article early without your approval. This does not violate the embargo because the source of the information is someone else; and

(6) Build a working relationship with reporters between projects. It takes time to develop the trust needed for a successful relationship. Remember, it is a two-way street.

Maps can effectively highlight issues

One of the best benefits of maps is the ability to spot problems that would not be found without a geographic perspective.

Below is a classic map.  It is 1854 and London is facing a daunting epidemic of cholera.  At the time, no one knew how the disease spread.  Day by day, more Londoners succumbed to the disease. John Snow, a local physician, set out to identify the cause of the outbreak. John believed that once the cause was identified the city would be able to develop a plan of action to stop the outbreak.

John examined the relationship between the location of individuals who had died and the location of their water source.  It was clear from his analysis that the Broad Street Pump in the center of the map was the source of the outbreak. John convinced local officials to disable the pump based on this information and helped quell the epidemic.

LGAQ_winter14_FINAL_201501050943150864

London Cholera Epidemic, 1854

Fraud Detection

Recently GIS has found applications in fraud detection.  GIS supplements other fraud tools by looking at fraud from a spatial point of view.

Consider for example the Medicaid program.  The program has both clients and providers.  It may be suspicious if you find a provider who is serving multiple clients that live hours away. Examining the average diving distance per provider might uncover some fraud that would otherwise be missed.  This concept can extend to any program where you can analyze some geographic component.

lousianaWe can see a real world example of this type of fraud detection in the map to the left.  This analysis was conducted by the Louisiana Legislative Auditor’s office in 2013. They examined the distance clients traveled to use SNAP benefits, a.k.a. food stamps, and found some suspicious patterns.

Some stores had individuals traveling five hours across the state to purchase food. These patterns make a case that fraud is being committed. GIS not only makes your findings stronger, but it will also help law enforcement prosecute the offenders. The best-case scenario is when you provide enough evidence to law enforcement that the individuals plead out, thereby avoiding the cost of a trial.

Another example of fraud detection could include looking at ambulance billings relative to distance traveled.  If the company is billing for more time or mileage a GIS analysis could quickly detect this.

Other Audit Applications

San Diego has conducted some interesting audits of their motor pool and fuel purchases that utilized GIS. This 2011 audit looked at the number of take home vehicles from the motor pool relative to their duty location.  The audit found a number of employees had vehicles they were allowed to take home that resulted in extra maintenance and expense for the motor pool. In some cases, employees were driving over 50 miles per day in a government vehicle.  By reducing the number of take home vehicles the city was poised to save half a million dollars per year.

The audit also looked at fuel purchases.  Each time a car fills up at a retail gas station the city pays full price rather than the wholesale price for city operated facilities.  A number of individuals were filling up at retail locations even though city owned facilities were nearby.

 Exploring Demography

There is a world of free demographic information out there.  The U.S. Census Bureau as part of the decennial census and the American Community Survey collects most of the information.

Analyzing available demographics can add depth to your work.  Suppose you are tasked with assessing economic development projects.  These racial dot maps from the Cooper Center take basic census data and create engaging visuals that highlight the divided nature of many of our communities.  By overlaying economic development projects, you could test if opportunities are being enjoyed by everyone. This same principle holds for looking at educational attainment or income.  When evaluating a human services program it is very easy to pull up poverty demographics, and compare and contrast office locations relative to where clientele live.  Alternatively, it is also possible to check to see what populations are being reached and if certain areas are being underserved.

dotmap

Dot maps emphasize density and grouping

Free tools

The U.S. Census Bureau is an excellent source to find data to map.  Recently, the Census has also put together an easy to use Census Data Mapper. Beyond data and interactive websites, there are a number of free analytic tools out there. A quick internet search for “open source GIS” will yield multiple products to try.  QGIS and GRASS GIS are two great options. Google also has some easy to implement interactive web maps if you feel comfortable working with basic HTML.

Pay the price

For some, the options mentioned above will meet your needs.  For others, you may have to open your checkbook to get the right tool.

The simplest to use, and one of the least expensive options, is SocialExplorer. It takes only a few seconds to create beautiful maps using the intuitive interface.  SocialExplorer even has a free version with limited functionality. One of the drawbacks is that you cannot import your own data and are limited to mostly U.S. Census surveys. It also lacks some customizable features found in other programs. If you want to start somewhere, SocialExplorer is a great place to start.

ArcGIS and Tableau are two power players in the data visualization market.  The features and analytical capabilities of their software are quite amazing.  The biggest drawback though is the cost of the licenses and the amount of staff time you need to invest to learn the software.

A more user friendly alternative is ArcGIS Online. This cloud based software is an add-in for Excel 2010 and 2013. ArcGIS Online is easier to use and much cheaper to purchase than the full desktop version, but you also lose some of the more advanced features. For example, some of the fraud detection tools such as calculating driving distance have not been implemented in the online version.

In either case, the benefits of using GIS can far outweigh the costs when you leverage the tool properly.

Good luck on your journey with GIS!

ian

Ian Green, M.Econ, CGAP Senior Auditor with Oregon Secretary of State Audits Division

Data Wonk Featured