Christophe Gaillac

I am a postdoctoral research fellow in Economics at Nuffield College and the Department of Economics at the University of Oxford.

I hold a PhD in Economics from the Toulouse School of Economics. I am also a research fellow at the Center for Research in Economics and Statistics (CREST). I was coordinator of statistical and mathematical teachings at ENSAE in 2017-2019.

You can find my CV here.

My research focuses on Econometrics, Statistics, and Machine Learning.

I have also interests in Labor Economics and Political Science.

You can find here the handout of the course Machine learning for Econometrics. New version coming soon!



Keywords: Partially Linear Model; Data combination; Partial Identification; Intergenerational Mobility.

R Package: RegCombin and vignette (beta-version), with Xavier D’Haultfoeuille (CREST) and Arnaud Maurel (Duke university).

We consider the identification of and inference on a partially linear model, when the outcome of interest and some of the covariates are observed in two different datasets that cannot be linked. This type of data combination problem arises very frequently in empirical microeconomics. Using recent tools from optimal transport theory, we derive a constructive characterization of the sharp identified set. We then build on this result and develop a novel inference method that exploits the specific geometric properties of the identified set. Our method exhibits good performances in finite samples, while remaining very tractable. Finally, we apply our methodology to study intergenerational income mobility over the period 1850-1930 in the United States. Our method allows to relax the exclusion restrictions used in earlier work while delivering confidence regions that are informative.

Keywords: Recommender systems, Matching, Congestion, Optimal Transport.

We examine the properties of a recommender algorithm currently under construction at the Public Employment Service (PES) in France, before its implementation in the field. The algorithm associates to each offer-job seeker pair a predicted "matching probability" using a very large set of covariates. We first compare this new AI algorithm with a matching tool mimicking the one currently used at the PES, based on a score measuring the "proximity" between the job seeker's profile or preference and the characteristics of the offer. We detail and discuss the trade-off between matching probability and preference score when switching from one system to the other. We also examine the issue of congestion. We show on the one hand that the AI algorithm tends to increase congestion and on the other hand that this strongly reduces its performance. We finally show that the use of optimal transport to derive recommendations from the matching probability matrix allows to mitigate this problem significantly. The main lesson at this stage is that an algorithm ignoring preferences and competition in the labor market would have very limited performances but that tweaking the algorithm to fit these dimensions improves substantially its properties, at least "in the lab".

Keywords: Random Coefficients, Quasi-analyticity, Deconvolution, Identification.

Summary: This paper studies point identification of the distribution of the coefficients in some random coefficients models with exogenous regressors when their support is a proper subset, possibly discrete but countable. We exhibit trade-offs between restrictions on the distribution of the random coefficients and the support of the regressors. We consider linear models including those with nonlinear transforms of a baseline regressor, with an infinite number of regressors and deconvolution, the binary choice model, and panel data models such as single-index panel data models and an extension of the Kotlarski lemma.


Keywords: Rational expectations, Test, Subjective expectations, Data combination.

R Package: RationalExp and vignette, with Xavier D’Haultfoeuille (CREST) and Arnaud Maurel (Duke university).

Summary: In this paper, we build a new test of rational expectations based on the marginal distributions of realizations and subjective beliefs. This test is widely applicable, including in the common situation where realizations and beliefs are observed in two different datasets that cannot be matched. We show that whether one can rationalize rational expectations is equivalent to the distribution of realizations being a mean-preserving spread of the distribution of beliefs. The null hypothesis can then be rewritten as a system of many moment inequality and equality constraints, for which tests have been recently developed in the literature. The test is robust to measurement errors under some restrictions and can be extended to account for aggregate shocks. Finally, we apply our methodology to test for rational expectations about future earnings. While individuals tend to be right on average about their future earnings, our test strongly rejects rational expectations.

R Package: RationalExp. This package implements a test of the rational expectations hypothesis based on the marginal distributions of realizations and subjective beliefs. The package also computes the estimator of the minimal deviations from rational expectations than can be rationalized by the data. R and the package RationalExp are open-source software projects and can be freely downloaded from CRAN:

Keywords: Adaptation, Ill-posed Inverse Problem, Minimax, Random Coefficients.

R Package: RandomCoefficients and vignette, with Eric Gautier (TSE).

Summary: We consider a linear model where the coefficients - intercept and slopes - are random with a distribution in a nonparametric class and independent from the regressors. The main drawback of this model is that identification usually requires the regressors to have a support which is the whole space. This is rarely satisfied in practice. Rather, in this paper, the regressors can have a support which is a proper subset. This is possible by assuming that the slopes do not have heavy tails. Lower bounds on the supremum risk for the estimation of the joint density of the random coefficients density are derived for this model and a related white noise model. We present an estimator, its rates of convergence, and a data-driven rule which delivers adaptive estimators.

R Package: RandomCoefficients. This package implements the estimator proposed in Gaillac and Gautier (2019), which is based on Prolate Spheroidal Wave functions which are computed efficiently in RandomCoefficients based on Osipov, Rokhlin, and Xiao (2013). This package also provides a parallel implementation of the estimator.

Keywords: Analytic continuation, Nonbandlimited functions, Heavy tails, Uniform estimates, Extrapolation, Singular value decomposition, Truncated Fourier transform, Singular Sturm Liouville Equations, Superresolution.

Summary: The Fourier transform truncated on [-c,c] is usually analyzed when acting on L2(-1/b,1/b) and its right-singular vectors are the prolate spheroidal wave functions. This paper considers the operator acting on the larger space L2(exp(b|.|)) on which it remains injective. We give nonasymptotic upper and lower bounds on the singular values with similar qualitative behavior in m (the index), b, and c. The lower bounds are used to obtain rates of convergence for stable analytic continuation of possibly nonbandlimited functions whose Fourier transform belongs to L2(exp(b|.|)). We also derive bounds on the sup-norm of the singular functions. Finally, we propose a numerical method to compute the SVD and apply it to stable analytic continuation when the function is observed with error on an interval.


Jérémy L'hour and I will teach a class on High-dimensional econometrics at the The Fime Lab Summer School on Big Data & Finance, 12-16 June 2023.

We are also writing a textbook on machine learning methods for econometrics, which is inspired from the "Machine Learning for Econometrics" lecture notes that are available below. The French version is complete and will be released in 2023.

You can now download a copy of the handout here of the course Machine Learning for Econometrics, ENSAE Paris and Institut Polytechnique de Paris, joint with Jérémy L'hour (CFM, CREST) and Bruno Crépon (CREST).

Keywords: High-Dimension, Variable Selection, Post-Selection Inference, Methodology, Endogeneity, Synthetic Control Method, Heterogeneous Treatment Effects, Policy Evaluation, Text Data.

This course covers recent applications of high-dimensional statistics and machine learning to econometrics, including variable selection, inference with high-dimensional nuisance parameters in different settings, heterogeneity, networks and text data. The focus will be on policy evaluation problems. Recent advances in causal inference such as the synthetic controls method will be reviewed.

The goal of the course is to give insights about these new methods, their benefits and their limitations. It will mostly benefit students who are highly curious about recent advances in econometrics, whether they want to study theory or use them in applied work. Students are expected to be familiar with Econometrics 2 (2A) and Statistical Learning (3A).

In 2020, the outline was:

  1. Introduction

  2. High-Dimension, Variable Selection and Post-Selection Inference

  3. Methodology: Using Machine Learning Tools in Econometrics

  4. High-Dimension and Endogeneity

  5. The Synthetic Control Method

  6. Machine Learning Methods for Heterogeneous Treatment Effects

  7. Prediction Policy Problems

  8. Fairness and optimal treatment allocation


Install it by typing: ssc install mfelogit

Keywords: Fixed effects logit models, Panel Data, Partial Identification.

mfelogit implements the estimators of the sharp bounds on the AME and the related confidence intervals on the AME and ATE from Davezies et al. (DDL hereafter). It also implements the second method proposed in DDL, which is faster to compute but may result in larger confidence intervals. When the covariate is binary, the command computes the ATE; otherwise it computes the AME.

Keywords: Fixed effects logit models, Panel Data, Partial Identification.

This package implements the estimators of the sharp bounds on the AME and the related confidence intervals on the AME and ATE from Davezies et al. (DDL hereafter). It also implements the second method proposed in DDL, which is faster to compute but may result in larger confidence intervals. When the covariate is binary, the command computes the ATE; otherwise it computes the AME.



Machine Learning for Econometrics (2018, 2019, 2020), ENSAE Paris and Institut Polytechnique de Paris (previously "High-Dimensional Econometrics"), joint with Jérémy L'hour (INSEE, CREST) and Bruno Crépon (CREST).

Mathematics for Economists (Analysis and Optimisation) (2018), Master in Economics, Paris-Saclay university, Phd track

Mathematics for Economists (2017, 2018), Sciences-Po Paris, Phd track

Mathematics for Economists (2018), ENSAE Paris, Specialised Master

Algebra and Python (2018), HEC Paris and ENSAE Paris, Undergraduate.

TA sessions:

Advanced Econometrics (2021-2023), University of Oxford, Anders Kock and Martin Weidner

Statistics 1 (2017-2018), ENSAE Paris, Nicolas Chopin

Numerical Analysis (2016-2018), ENSAE Paris, Cristina Butucea

Econometrics 2, (2017-2018), ENSAE Paris, Xavier D’Haultfoeuille

Simulations and Monte-Carlo (2018), ENSAE Paris, Nicolas Chopin

Time Series analysis (2015-2017), ENSAE Paris, Christian Franck