## Let’s play “Guess the smoother”!

Andre de Boer writes:

In my profession as a risk manager I encountered this graph:

I can’t figure out what kind of regression this is, would you be so kind to enlighten me?
The points represent (maturity,yield) of bonds.

My reply: That’s a fun problem, reverse-engineering a curve fit! My first guess is lowess, although it seems too flat and asympoty on the right side of the graph to be lowess. Maybe a Gaussian process? Looks too smooth to be a spline. I guess I’ll go with my original guess, on the theory that lowess is the most accessible smoother out there, and if someone fit something much more complicated they’d make more of a big deal about it. On the other hand, if the curve is an automatic output of some software (Excel? Stata?) then it could be just about anything.

Does anyone have any ideas?

1. Jayanth Varma says:

Yield curves are often estimated after imposing a constraint that it is asymptotically flat. It is also common to constrain the short end to an observable short term interest rate. This means that the initial U shape and the final flat portion could be coming from the constraints. Only the rest comes from the data. That middle portion could be anything including parametric functional forms.

2. Scott says:

“Yield”, “curve”, two inflection points. I’d guess a Svensson (Extended Nelson-Siegel) yield curve model. I don’t think it’s a generic smoothing technique.

3. I have no idea what kind of soothing was used, but having worked many years in the rubber industry, but I swear that looks like a vulcanization curve.

4. Michael Lew says:

I don’t know what process was used for the smooth, but the resulting curve doesn’t seem to represent the data very well. I can’t see a downward trend at the left hand end of the data. On that basis the smooth could be called a failing function…

5. Alex says:

Well, whatever created it, the curve looks pretty dodgy—one point with x>10 determines the whole shape on the RHS!

• Kaiser says:

I can’t get pass this feature either. What is also a mystery to me is how come the asymptote doesn’t pass right through that one outlier point? That would surely be a better “fit”.

• Andrew says:

Kaiser:

I assume it’s minimizing some function which has the form misfit + penalty. Going through that one point would minimize misfit and increase the penalty, I’m guessing by increasing the curvature of the fitted line.

What baffles me is that the last point seems to be at x=20, but the line goes to 30 and the x-axis goes all the way to 35.

• Foster Boondoggle says:

1. Almost certainly what Scott said: Svensson or some similar form (a sum of exponential-ish functions, motivated purely by having relatively few parameters and able to accommodate the typical large-scale behavior)
2. It’s typical to represent yield curves out to 30 years in the US (due to US Treasury bonds being issued out to 30 years), so even if the data only go out to 20, “30” is probably hardcoded somewhere.
3. Excel’s axis algorithm often runs past the next larger round value, so the x-axis goes to 35.

• Scott says:

1. It’s likely that the points and the curve are not directly comparable, and the curve is just there for illustration. The yield curve shows the interest rate for a payment made at each fixed future point in time, but bonds typically make a regular stream of payments (say, every six months) up until they mature. The curve is fit to those payment streams, rather than the “quoted” yields shown on the graph.

To check the fit, the actual yield for each security should be compared to the predicted yield for the security, which is a weighted sum of points along the curve, where the weights vary by security. In other words, you can’t see from this graph whether the fit of the point at 20 years is bad.

The graph you really want to see to assess the fit is on page 35 here: http://www.federalreserve.gov/pubs/feds/2006/200628/200628pap.pdf

2. I think Foster is right about the axes. It’s common to show these curves out to 30 years because of the US Treasury market, and finance guys love Excel. The Svensson yield curve model is also implemented in the YieldCurves package in R, where the only tricky part is figuring out what the payment streams of the bonds are.

I should note that this type of model tends to fit government securities, especially US government securities, very well. Something like a 3 basis point (0.03%) average absolute fitting error per security, or less. Here you have an asset class where there’s up to a 250bp spread between the lowest and highest at all maturities, so there’s probably a wide variety of asset qualities, and fitting a single curve to them seems unwise.

6. Alastair says:

The data:

If you hover over the chart it has the title “CollAAA” which would suggest that the “yield” points are the yield to maturity (i.e. interest rates) of a series of bonds (i.e. loans). Each yield point represents the interest rate of the bond on the vertical axis and the number of years until repayment of the loan on the horizontal axis.

The Coll in the title may refer to the fact that the bonds are “collateralised” i.e. there is specific security the investor will receive if the bond is not repaid (this is similar to the bank being able to reposes a house if mortgage payments are not made). The title also suggests that the bonds/loans are AAA rated (this is a credit rating issued by credit rating agencies) which would imply that they have a low probability of default (the investor not getting their money back). Note that each of the underlying loans/bonds will be issued by a number of companies/legal entities which will have different businesses/probabilities and this AAA rating is an attempt at grouping bonds/loans with a similar probability of repayment. So we would expect there to be some difference in the yields of the bonds (as investors may have a different view on the credit worthiness of a loan than the rating agencies) although the dispersion above seems pretty extreme. Take for example the 5 year point (on the horizontal axis) where one bond has a yield/interest rate (on the vertical axis) of around 1.2% and another has an interest rate over 5%.

Reading between the lines I would suspect that these are collateralised debt obligations or other “structured products”:
http://en.wikipedia.org/wiki/Collateralized_debt_obligation
They are often very illiquid and rating agencies have had a rather poor track record of predicting defaults for them. So the dispersion between points may be because the loans are not directly comparable in any meaningful sense even though they have the same rating. See here for an example :
http://centerforpbbefr.rutgers.edu/20thFEA/FinancePapers/Session10/Luo,%20Tang,%20and%20Wang.pdf

The curve:

As Scott says this will be some sort of parametric interest rate curve of the yields such as the Nelson Siegel model. This type of approach is typically used for bonds/loans which have similar features -e.g. US treasury bills. See page 77 for examples of other fits to rated bond data in this link:
http://jeanpaul.renne.pagesperso-orange.fr/pdf/Evaluation/p06.pdf

Interest rate curve modelling is quite a developed part of modern finance. This paper gives a good introduction and details of some approaches to modelling these curves.
http://www.frbatlanta.org/filelegacydocs/erq304_fisher.pdf

There may also be other factors impacting the curve like government bond yields acting as a floor to the curve.

7. Ken Carlson says:

Stata’s fracpoly often produces curves that look like that: sharp changes in regions with a lot of data, and relatively flat where data are thin

8. P. Saffi says:

The output looks like coming from Stata, not Excel. As someone mentioned above it seems to have come from fitting a model for the yield curve.

9. Raymond says:

Symmetric nearest neighbour smoothing.