Are you saving enough for retirement? Maybe you have given this question some serious thought. Then again, maybe not… According to research cited in a recent GAO audit report on retirement security, a large portion of workers in the United States have not done a whole lot in terms of retirement planning and many people don’t know how much they should be saving.

According to the audit, ** income replacement rates** are commonly used tools in retirement planning. These rates target the percentage of pre-retirement savings a worker would need to have in order to maintain a certain standard of living in retirement – when this lucky worker will finally be free to go fishing or have enough time to read interesting blogs about audits.

As you might have guessed from the title of the report – *Better Information on Income Replacement Rates Needed to Help Workers Plan for Retirement* – the GAO audit looked at how replacement rates are constructed by researchers and financial planners.

The audit also reviewed and recommends ways to improve the information that the US Department of Labor (DOL) provides to workers about these rates (such as these Savings Fitness Worksheets), as part of the DOL’s mission to promote retirement security.

But even *more* interestingly, the audit analyzed how spending patterns vary by age using national consumer survey data to characterize consumption once people retire. And in doing so, they use a technique that shows how the standard or linear regression model (a.k.a. Ordinary Least Squares, or OLS) can accommodate relationships among variables (spending and age, in this case) that are not linear.

### Household spending varies by age and income levels

To analyze spending patterns in the US, the GAO used 2013 household spending data from the Consumer Expenditure Survey (CES) (Bureau of Labor Statistics). You can access aggregate data from the survey through the BLS website, but if you ever feel ambitious (I mean, very ambitious!), you can also download the annual public-use microdata (PUMD) samples from 1996 to 2014 (the BLS even includes a link to the Analyze Survey Data for Free blog, which has a tutorial on how to start analyzing the CES using R).

The main results from this analysis are summarized in Figure 1 of the report, which shows average household expenditures for 2013 by age group and by type of expenditure.

As the graph shows, the relationship between average household spending and age follows an inverted U shape. Starting with younger households (under 25), spending increases from around $30,000 to a peak of almost $60,000 for households in the 40-49 age range. After that, spending decreases with age to a little over $30,000 for households in the 80+ age group.

[If you’re wondering how a “household” is defined here or how can households be grouped into age groups when people of different ages live under the same roof, the notes to Figure 1 (page 9) will help with that.]

As Figure 3 of the report shows, another interesting result is that the inverted U shape (i.e., the non-linearity) of the relationship between age and spending found in the data does not apply equally to all income levels.

In fact, for households in the lowest income quartile, spending is relatively flat across all age groups. Compare that to those in the highest income quartile. For these households, average household spending goes from around $50,000 to a peak of over $100,000 for the 50-54 age group and then down to around $55,000 for those in the 80+ age group.

### Non-linear relationships in the linear regression model

Although, from the results above, it seems clear that household spending varies by age, the audit more formally tests this relationship using regression analysis.

As explained in the methodology section, expenditures are modeled as a function of age and other control variables such as education and race.* To account for the non-linear relationship between spending and age (the inverted U shape above), the model includes *age squared* as an ‘extra’ variable, so that the regression equation includes a quadratic term for the age variable (in bold):

This is a common way of adding non-linearity to regression models. Without the inclusion of *age squared*, the model would be trying to fit a straight line through a set of observations that clearly imply a non-linear relationship.

This is better illustrated in the graph below, which plots * fake* observations on spending and age created in Excel (with randomness added using the RANDBETWEEN() function).

The Linear Model in the graph is a simple linear trend line added to the scatter plot and is not such a good fit (the model explains only 6.45% of the variation in spending).

Now, the quadratic model (which is an Excel 2nd-order polynomial trend line) is a much better fit for the observed data, with 60% of the variation in spending explained by the model.

Finally, the quadratic model has the added benefit of allowing us to find the age at which spending is predicted to peak. This is done through the wonderful magic of partial differentiation. Using the equation above, if you take the first-order derivative with respect to age and set it equal to zero, you would obtain:

Then, solving for age, we get an equation for peak age:

In the case of the fake Excel data above, the regression results produce an estimate of 8,652.61 for the first coefficient (B1) and -72.01 for the second (B2), which yields a peak age of 60.1 years. At this age, the model predicts a peak spending level of $123,452. As the graph shows, spending does seem to reach a maximum at age 60. After 60, spending declines.

The GAO audit actually uses this method to estimate the peak age for different types of expenditures (page 15):

*The age at which expenditures peak also shows how spending patterns **varied. For example, the amount a household spends on apparel was* *estimated to peak at age 42, which was significantly younger than for* *entertainment, where the amount a household spends was estimated to **peak around age 52. Spending on items such as apparel and **transportation may be more relevant during a household’s working years.*

In conclusion, all this gibberish about non-linearities and quadratic terms boils down to one lesson: regression models have some flexibility to accommodate relationships between variables that are not necessarily linear and one way of doing this is by adding the square of one or more explanatory variables to the model.

* Technically, the GAO uses the natural log of expenditures as the dependent variable. This allows the estimated coefficients to be more easily interpreted in percentage terms. So, for example, if the estimated coefficient for education is 0.10, this means that household expenditures are predicted by the model to be 10% higher for every additional year of education.