Christophe Gaillac

I am a postdoctoral research fellow in Economics at Nuffield College and the Department of Economics at the University of Oxford.

I hold a PhD in Economics from the Toulouse School of Economics. I am also a research fellow at the Center for Research in Economics and Statistics (CREST). I was coordinator of statistical and mathematical teachings at ENSAE in 2017-2019. 

You can find my CV here.  

My research focuses on Econometrics, Statistics, and Machine Learning.

I have also interests in Labor Economics and Political Science

You can find here the handout of the course Machine learning for Econometrics. New version coming soon!


              WORKING PAPERS

Keywords: Partially Linear Model; Data combination; Partial Identification; Intergenerational Mobility.

New R Package available on CRAN: RegCombin and vignette with several simulated and real examples, with Xavier D’Haultfoeuille (CREST) and Arnaud Maurel (Duke university). 

We study partially linear models when the outcome of interest and some of the covariates are observed in two different datasets that cannot be linked. This type of data combination problem arises very frequently in empirical microeconomics. Using recent tools from optimal transport theory, we derive a constructive characterization of the sharp identified set. We then build on this result and develop a novel inference method that exploits the specific geometric properties of the identified set. Our method exhibits good performances in finite samples, while remaining very tractable. We apply our approach to study intergenerational income mobility over the period 1850-1930 in the United States. Our method allows us to relax the exclusion restrictions used in earlier work, while delivering confidence regions that are informative.

Keywords: Job Recommender Systems, Machine Learning, Two-sided Market, Congestion, Optimal Transport.

This paper questions the design of job recommender systems (RS). A direct application of sophisticated Machine Learning (ML) algorithms to build recommendations, such as identifying offers most likely to lead to a job from the prediction of successful matches, does not necessarily lead to an improvement in the situation of job seekers. This is because the objectives of these recommendations do not align with the ones of the job seekers and they are usually generated independently of each other, without taking into account the competition. Using a theoretical model of two-sided market with a step of applications, we show that the ML tools from which the recommendations are directly derived can be more usefully mobilized to identify quantities that job seekers might have difficulties to access. Our empirical analysis confirms these insights using the RS designed inside the framework of a long-term project we are conducting with the French Public Employment Service (Pôle Emploi), which leverages rich and detailed data on applicants, firms, and past job searches. It illustrates that RS based solely on the chances of being hired or on the utility of the jobs are dominated by ones that would mix the two dimensions, to come closer to the expected utility. We also discuss how RS can avoid increasing congestion in using a collective objective rather that an individual one to generate the recommendations, using optimal transport to make it tractable.

Keywords: Random Coefficients, Quasi-analyticity, Deconvolution, Identification.

Summary: This paper studies point identification of the distribution of the coefficients in some random coefficients models with exogenous regressors when their support is a proper subset, possibly discrete but countable. We exhibit trade-offs between restrictions on the distribution of the random coefficients and the support of the regressors. We consider linear models including those with nonlinear transforms of a baseline regressor, with an infinite number of regressors and deconvolution, the binary choice model, and panel data models such as single-index panel data models and an extension of the Kotlarski lemma. 


Keywords: Rational expectations, Test, Subjective expectations, Data combination.

R Package available on CRAN: RationalExp and vignette, with Xavier D’Haultfoeuille (CREST) and Arnaud Maurel (Duke university). 

Summary:  In this paper, we build a new test of rational expectations based on the marginal distributions of realizations and subjective beliefs. This test is widely applicable, including in the common situation where realizations and beliefs are observed in two different datasets that cannot be matched. We show that whether one can rationalize rational expectations is equivalent to the distribution of realizations being a mean-preserving spread of the distribution of beliefs. The null hypothesis can then be rewritten as a system of many moment inequality and equality constraints, for which tests have been recently developed in the literature. The test is robust to measurement errors under some restrictions and can be extended to account for aggregate shocks. Finally, we apply our methodology to test for rational expectations about future earnings. While individuals tend to be right on average about their future earnings, our test strongly rejects rational expectations.

R Package: RationalExp. This package implements a test of the rational expectations hypothesis based on the marginal distributions of realizations and subjective beliefs. The package also computes the estimator of the minimal deviations from rational expectations than can be rationalized by the data. R and the package RationalExp are open-source software projects and can be freely downloaded from CRAN: 

Keywords: Adaptation, Ill-posed Inverse Problem, Minimax, Random Coefficients.

 R Package archive available on CRAN: RandomCoefficients and vignette, with Eric Gautier (TSE).

Summary: We consider a linear model where the coefficients - intercept and slopes - are random with a distribution in a nonparametric class and independent from the regressors. The main drawback of this model is that identification usually requires the regressors to have a support which is the whole space. This is rarely satisfied in practice. Rather, in this paper, the regressors can have a support which is a proper subset. This is possible by assuming that the slopes do not have heavy tails. Lower bounds on the supremum risk for the estimation of the joint density of the random coefficients density are derived for this model and a related white noise model. We present an estimator, its rates of convergence, and a data-driven rule which delivers adaptive estimators. 

R Package: RandomCoefficients. This package implements the estimator proposed in Gaillac and Gautier (2019), which is based on Prolate Spheroidal Wave functions which are computed efficiently in RandomCoefficients based on Osipov, Rokhlin, and Xiao (2013). This package also provides a parallel implementation of the estimator. 

Keywords: Analytic continuation, Nonbandlimited functions, Heavy tails, Uniform estimates, Extrapolation, Singular value decomposition, Truncated Fourier transform, Singular Sturm Liouville Equations, Superresolution. 

Summary: The Fourier transform truncated on [-c,c] is usually analyzed when acting on L2(-1/b,1/b) and its right-singular vectors are the prolate spheroidal wave functions. This paper considers the operator acting on the larger space L2(exp(b|.|)) on which it remains injective. We give nonasymptotic upper and lower bounds on the singular values with similar qualitative behavior in m (the index), b, and c. The lower bounds are used to obtain rates of convergence for stable analytic continuation of possibly nonbandlimited functions whose Fourier transform belongs to L2(exp(b|.|)). We also derive bounds on the sup-norm of the singular functions. Finally, we propose a numerical method to compute the SVD and apply it to stable analytic continuation when the function is observed with error on an interval.  


              HANDOUTS & BOOKS

Jérémy L'hour and I will teach a class on High-dimensional econometrics at the The Fime Lab Summer School on Big Data & Finance, 12-16 June 2023.

We are also writing a textbook on machine learning methods for econometrics, which is inspired from the  "Machine Learning for Econometrics" lecture notes that are available below. The French version is complete and will be released in 2023.

You can now download a copy of the handout here of the course Machine Learning for Econometrics, ENSAE Paris and Institut Polytechnique de Paris, joint with Jérémy L'hour (CFM, CREST) and Bruno Crépon (CREST). 

Keywords: High-Dimension, Variable Selection, Post-Selection Inference, Methodology, Endogeneity, Synthetic Control Method, Heterogeneous Treatment Effects, Policy Evaluation, Text Data

This course covers recent applications of high-dimensional statistics and machine learning to econometrics, including variable selection, inference with high-dimensional nuisance parameters in different settings, heterogeneity, networks and text data. The focus will be on policy evaluation problems. Recent advances in causal inference such as the synthetic controls method will be reviewed.

The goal of the course is to give insights about these new methods, their benefits and their limitations. It will mostly benefit students who are highly curious about recent advances in econometrics, whether they want to study theory or use them in applied work. Students are expected to be familiar with Econometrics 2 (2A) and Statistical Learning (3A). 

In 2020, the outline was: 


Install it by typing: ssc install mfelogit

Keywords: Fixed effects logit models, Panel Data, Partial Identification. 

mfelogit implements the estimators of the sharp bounds on the AME and the related confidence intervals on the AME and ATE from Davezies et al. (DDL hereafter). It also implements the second method proposed in DDL, which is faster to compute but may result in larger confidence intervals. When the covariate is binary, the command computes the ATE; otherwise it computes the AME.

Keywords: Fixed effects logit models, Panel Data, Partial Identification. 

This package implements the estimators of the sharp bounds on the AME and the related confidence intervals on the AME and ATE from Davezies et al. (DDL hereafter). It also implements the second method proposed in DDL, which is faster to compute but may result in larger confidence intervals. When the covariate is binary, the command computes the ATE; otherwise it computes the AME.



Machine Learning for Econometrics (2018, 2019, 2020), ENSAE Paris and Institut Polytechnique de Paris (previously "High-Dimensional Econometrics"), joint with Jérémy L'hour (INSEE, CREST) and Bruno Crépon (CREST).

Mathematics for Economists (Analysis and Optimisation) (2018), Master in Economics, Paris-Saclay university, Phd track

Mathematics for Economists (2017, 2018), Sciences-Po Paris, Phd track

Mathematics for Economists (2018), ENSAE Paris, Specialised Master 

Algebra and Python (2018), HEC Paris and ENSAE Paris, Undergraduate.

TA sessions:

Advanced Econometrics (2021-2023), University of Oxford, Anders Kock and Martin Weidner

Statistics 1 (2017-2018), ENSAE Paris, Nicolas Chopin

Numerical Analysis (2016-2018), ENSAE Paris, Cristina Butucea

Econometrics 2, (2017-2018), ENSAE Paris, Xavier D’Haultfoeuille

Simulations and Monte-Carlo (2018), ENSAE Paris, Nicolas Chopin 

Time Series analysis (2015-2017), ENSAE Paris, Christian Franck