Tuesday, March 21, 2017

Forecasting and "As-If" Discounting

Check out the fascinating and creative new paper, "Myopia and Discounting", by Xavier Gabaix and David Laibson.

From their abstract (slightly edited):
We assume that perfectly patient agents estimate the value of future events by generating noisy, unbiased simulations and combining those signals with priors to form posteriors. These posterior expectations exhibit as-if discounting: agents make choices as if they were maximizing a stream of known utils weighted by a discount function. This as-if discount function reflects the fact that estimated utils are a combination of signals and priors, so average expectations are optimally shaded toward the mean of the prior distribution, generating behavior that partially mimics the properties of classical time preferences. When the simulation noise has variance that is linear in the event's horizon, the as-if discount function is hyperbolic.
Among other things, then, they provide a rational foundation for the "myopia" associated with hyperbolic discounting.

Note that in the Gabaix-Laibson environment everything depends on how forecast error variance behaves as a function of forecast horizon \(h\). But we know a lot about that. For example, in linear covariance-stationary \(I(0)\) environments, optimal forecast error variance grows with \(h\) at a decreasing rate, approaching the unconditional variance from below. Hence it cannot grow linearly with \(h\), which is what produces hyperbolic as-if discounting. In contrast, in non-stationary \(I(1)\) environments, optimal forecast error variance does eventually grow linearly with \(h\). In a random walk, for example, \(h\)-step-ahead optimal forecast error variance is just \(h \sigma^2\), where \( \sigma^2\) is the innovation variance. It would be fascinating to put people in \(I(1)\) vs. \(I(0)\) laboratory environments and see if hyperbolic as-if discounting arises in \(I(1)\) cases but not in \(I(0)\) cases.

Sunday, March 19, 2017

ML and Metrics VIII: The New Predictive Econometric Modeling

[Click on "Machine Learning" at right for earlier "Machine Learning and Econometrics" posts.]

We econometricians need -- and have always had -- cross section and time series ("micro econometrics" and "macro/financial econometrics"), causal estimation and predictive modeling, structural and non-structural. And all continue to thrive.

But there's a new twist, happening now, making this an unusually exciting time in econometrics. P
redictive econometric modeling is not only alive and well, but also blossoming anew, this time at the interface of micro-econometrics and machine learning. A fine example is the new Kleinberg, Lakkaraju, Leskovic, Ludwig and Mullainathan paper, “Human Decisions and Machine Predictions”, NBER Working Paper 23180 (February 2017).

Good predictions promote good decisions, and econometrics is ultimately about helping people to make good decisions. Hence the new developments, driven by advances in machine learning, are most welcome contributions to a long and distinguished predictive econometric modeling tradition.

Monday, March 13, 2017

ML and Metrics VII: Cross-Section Non-Linearities

[Click on "Machine Learning" at right for earlier "Machine Learning and Econometrics" posts.]

The predictive modeling perspective needs not only to be respected and embraced in econometrics (as it routinely is, notwithstanding the Angrist-Pischke revisionist agenda), but also to be enhanced by incorporating elements of statistical machine learning (ML). This is particularly true for cross-section econometrics insofar as time-series econometrics is already well ahead in that regard.  For example, although flexible non-parametric ML approaches to estimating conditional-mean functions don't add much to time-series econometrics, they may add lots to cross-section econometric regression and classification analyses, where conditional mean functions may be highly nonlinear for a variety of reasons.  Of course econometricians are well aware of traditional non-parametric issues/approaches, especially kernel and series methods, and they have made many contributions, but there's still much more to be learned from ML.

Monday, March 6, 2017

ML and Metrics VI: A Key Difference Between ML and TS Econometrics

[Click on "Machine Learning" at right for earlier "Machine Learning and Econometrics" posts.]


So then, statistical machine learning (ML) and 
time series econometrics (TS) have lots in common. But there's also an interesting difference: ML's emphasis on flexible nonparametric modeling of conditional-mean nonlinearity doesn't play a big role in TS. 

Of course there are the traditional TS conditional-mean nonlinearities: smooth non-linear trends, seasonal shifts, and so on. But there's very little evidence of important conditional-mean nonlinearity in the covariance-stationary (de-trended, de-seasonalized) dynamics of most economic time series. Not that people haven't tried hard -- really hard -- to find it, with nearest neighbors, neural nets, random forests, and lots more. 

So it's no accident that things like linear autoregressions remain overwhelmingly dominant in TS. Indeed I can think of only one type of conditional-mean nonlinearity that has emerged as repeatedly important for (at least some) economic time series: Hamilton-style Markov-switching dynamics.

[Of course there's a non-linear elephant in the room:  Engle-style GARCH-type dynamics. They're tremendously important in financial econometrics, and sometimes also in macro-econometrics, but they're about conditional variances, not conditional means.]

So there are basically only two important non-linear models in TS, and only one of them speaks to conditional-mean dynamics. And crucially, they're both very tightly parametric, closely tailored to specialized features of economic and financial data.

Now let's step back and assemble things:

ML emphasizes approximating non-linear conditional-mean functions in highly-flexible non-parametric fashion. That turns out to be doubly unnecessary in TS: There's just not much conditional-mean non-linearity to worry about, and when there occasionally is, it's typically of a highly-specialized nature best approximated in highly-specialized (tightly-parametric) fashion.

Sunday, February 26, 2017

Machine Learning and Econometrics V: Similarities to Time Series

[Notice that I changed the title from "Machine Learning vs. Econometrics" to "Machine Learning and  Econometrics", as the two are complements, not competitors, as this post will begin to emphasize. But I've kept the numbering, so this is number five.  For others click on Machine Learning at right.]

Thanks for the overwhelming response to my last post, on Angrist-Pischke (AP).  I'll have more to say on AP a few posts from now, but first I need to set the stage.

A key observation is that statistical machine learning (ML) and time-series econometrics/statistics (TS) are largely about modeling, and they largely have the same foundational perspective. Some of the key ingredients are:

-- George Box got it right: "All models are false; some are useful", so search for good approximating models, not "truth".

-- Be explicit about the loss function, that is, about what defines a "good approximating model" (e.g., 1-step-ahead out-of-sample mean-squared forecast error)

-- Respect and optimize that loss function in model selection (e.g., BIC)

-- Respect and optimize that loss function in estimation (e.g., least squares)

-- Respect and optimize that loss function in forecast construction (e.g., Wiener-Kolmogorov-Kalman)

-- Respect and optimize that loss function in forecast evaluation, comparison, and combination (e.g., Mincer-Zarnowitz evaluations, Diebold-Mariano comparisons, Granger-Ramanathan combinations).

So time-series econometrics should embrace ML -- and it is.  Just look at recent work like this.

Sunday, February 19, 2017

Econometrics: Angrist and Pischke are at it Again

Check out the new Angrist-Pischke (AP), "Undergraduate Econometrics Instruction: Through Our Classes, Darkly".

I guess I have no choice but to weigh in. The issues are important, and my earlier AP post, "Mostly Harmless Econometrics?", is my all-time most popular.

Basically AP want all econometrics texts to look a lot more like theirs. But their books and their new essay unfortunately miss (read: dismiss) half of econometrics.

Here's what AP get right:

(Goal G1) One of the major goals in econometrics is predicting the effects of exogenous "treatments" or "interventions" or "policies". Phrased in the language of estimation, the question is "If I intervene and give someone a certain treatment \({\partial x}, x \in X\), what is my minimum-MSE estimate of her \(\ \partial y\)?" So we are estimating the partial derivative \({\partial y / \partial x}\).

AP argue the virtues and trumpet the successes of a "design-based" approach to G1. In my view they make many good points as regards G1: discontinuity designs, dif-in-dif designs, and other clever modern approaches for approximating random experiments indeed take us far beyond "Stones'-age" approaches to G1. 
(AP sure turn a great phrase...). And the econometric simplicity of the design-based approach is intoxicating: it's mostly just linear regression of \(y\) on \(x\) and a few cleverly-chosen control variables -- you don't need a full model -- with White-washed standard errors. Nice work if you can get it. And yes, moving forward, any good text should feature a solid chapter on those methods.

Here's what AP miss/dismiss:

(Goal G2) The other major goal in econometrics is predicting \(y\). In the language of estimation, the question is "If a new person \(i\) arrives with covariates \(X_i\), what is my minimum-MSE estimate of her \(y_i\)? So we are estimating a conditional mean \(E(y | X) \), which in general is very different from estimating a partial derivative \({\partial y / \partial x}\).

The problem with the AP paradigm is that it doesn't work for goal G2. Modeling nonlinear functional form is important, as the conditional mean function \(E(y | X) \) may be highly nonlinear in \(X\); systematic model selection is important, as it's not clear a priori what subset of \(X\) (i.e., what model) might be most important for approximating \(E(y | X) \); detecting and modeling heteroskedasticity is important (in both cross sections and time series), as it's the key to accurate interval and density prediction; detecting and modeling serial correlation is crucially important in time-series contexts, as "the past" is the key conditioning information for predicting "the future"; etc., etc, ... 

(Notice how often "model" and "modeling" appear in the above paragraph. That's precisely what AP dismiss, even in their abstract, which very precisely, and incorrectly, declares that "Applied econometrics ...[now prioritizes]... the estimation of specific causal effects and empirical policy analysis over general models of outcome determination".)

The AP approach to goal G2 is to ignore it, in a thinly-veiled attempt to equate econometrics exclusively with G1. Sorry guys, but no one's buying it. That's why the textbooks continue to feature G2 tools and techniques so prominently, as well they should.

Monday, February 13, 2017

Predictive Loss vs. Predictive Regret

It's interesting  to contrast two prediction paradigms.

A.  The universal statistical/econometric approach to prediction:  
Take a stand on a loss function and find/use a predictor that minimizes conditionally expected loss.  Note that this is an absolute standard.  We minimize loss, not some sort of relative loss.

B.  An alternative approach to prediction, common in certain communities/literatures:
Take a stand on a loss function and find/use a predictor that minimizes regret.  Note that this is a relative standard.  Regret minimization is relative loss minimization, i.e., striving to do no worse than others.

Approach A strikes me as natural and appropriate, whereas B strikes me as as quirky and "behavioral".  That is, it seems to me that we generally want tools that perform well, not tools that merely perform no worse than others.

There's also another issue, the ex ante nature of A (standing in the present, conditioning on available information, looking forward) vs. the ex post nature of B (standing in the future, looking backward).  Approach A again seems more natural and appropriate.

Sunday, February 5, 2017

Data for the People

Data for the People, by Andreas Weigend, is coming out this week, or maybe it came out last week. Andreas is a leading technologist (at least that's the most accurate one-word description I can think of), and I have valued his insights ever since we were colleagues at NYU almost twenty years ago. Since then he's moved on to many other things; see http://www.weigend.com

Andreas challenges prevailing views about data creation and "data privacy". Rather than perpetuating a romanticized view of data privacy, he argues that we need increased data transparency, combined with increased data literacy, so that people can take command of their own data. Drawing on his work with numerous firms, he proposes six "data rights":

-- The right to access data
-- The right to amend data
-- The right to blur data
-- The right to port data
-- The right to inspect data refineries
-- The right to experiment with data refineries

Check out Data for the People at http://ourdata.com.

[Acknowledgment: Parts of this post were adapted from the book's web site.]

Monday, January 30, 2017

Randomization Tests for Regime Switching

I have always been fascinated by distribution-free non-parametric tests, or randomization tests, or Monte Carlo tests -- whatever you want to call them.  (For example, I used some in ancient work like Diebold-Rudebusch 1992.)  They seem almost too good to be true: exact finite-sample tests without distributional assumptions!  They also still seem curiously underutilized in econometrics, notwithstanding, for example, the path-breaking and well-known contributions over many decades by Jean-Marie Dufour, Marc Hallin, and others.

For the latest, see the fascinating new contribution by Jean-Marie Dufour and Richard Luger. They show how to use randomization to perform simple tests of the null of linearity against the alternative of Markov switching in dynamic environments.  That's a very hard problem (nuisance parameters not identified under the null, singular information matrix under the null), and several top researchers have wrestled with it (e.g., GarciaHansen, Carasco-Hu-Ploberger). Randomization delivers tests that are exact, distribution-free, and simple. And power looks pretty good too. 

Monday, January 23, 2017

Bayes Stifling Creativity?

Some twenty years ago, a leading Bayesian econometrician startled me during an office visit at Penn. We were discussing Bayesian vs. frequentist approaches to a few things, when all of a sudden he declared that "There must be something about Bayesian analysis that stifles creativity.  It seems that frequentists invent all the great stuff, and Bayesians just trail behind, telling them how to do it right".

His characterization rings true in certain significant respects, which is why it's so funny.  But the intellectually interesting thing is that it doesn't have to be that way.  As Chris Sims notes in a recent communication: 
... frequentists are in the habit of inventing easily computed, intuitively appealing estimators and then deriving their properties without insisting that the method whose properties they derive is optimal.  ... Bayesians are more likely to go from model to optimal inference, [but] they don't have to, and [they] ought to work more on Bayesian analysis of methods based on conveniently calculated statistics.

See Chris' thought-provoking unpublished paper draft, "Understanding Non-Bayesians". 

[As noted on Chris' web site, he wrote that paper for the Oxford University Press Handbook of Bayesian Econometrics, but he "withheld [it] from publication there because of the Draconian copyright agreement that OUP insisted on --- forbidding posting even a late draft like this one on a personal web site."]