Lecture 08: Machine Learning and Causal Inference

Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs

Research Content: This paper focuses on enhancing the statistical inference in regression-discontinuity (RD) designs, a widely used method for causal inference in policy evaluation and economics. The authors develop robust nonparametric confidence intervals that account for potential nonparametric errors at the cutoff, allowing researchers to make credible inferences about treatment effects at that point of discontinuity.
Main Ideas and Contributions
- Robust Methodology: The proposed method improves on conventional parametric approaches by utilizing nonparametric techniques that maintain robustness against misspecifications. This is particularly relevant in cases where the treatment effect can vary significantly at and around the cutoff.
- Confidence Interval Construction: The authors derive new confidence intervals that are valid under weaker conditions while employing a data-driven bandwidth selection approach, ensuring optimal interval calibration based on the underlying data distribution.

Lecture 08

Causal Inference & Machine Learning in Finance

Outlines

Potential Outcomes Model

Potential Outcomes Model: Treatment Effects

Randomized Controlled Trials (RCTs)

RCTs: Importance of Random Assignment

RCTs: Assumption (SUTVA)

RCTs: Assumption (Random assignment)

RCTs: The difference-in-means (DM) estimator

RCTs: Estimating Treatment Effects

Assumptions:

Estimation Approach:

RCTs: OLS Estimation of Treatment Effect

Conditional Independence & Propensity Score

Key Assumption

Propensity Score

Characterizing the ATE: Using Inverse-Propensity Weighting (IPW)

Characterizing the ATE: Regression Function Differences

Efficient estimation of treatment effect

Instrumental Variables (IV): 1 Endogeneity and instrumental variables

Two-Stage Least Squares (2SLS) — Intuition

Multiple Instruments and Transformations

Efficiency Considerations

IV: Optimal Instrumental Variables

Necessary Condition for an Efficient Instrument

Reformulation via Iterated Expectations

Optimal Instrument and Efficiency Bound

Practical Remarks

Summary of Key Concepts

Difference-in-difference (DID): Introduction

DID: Basic Concept

DID: Key Components

DID: Example

DID: Steps to Implement

DID: Limitations and Considerations

DID: Applications of Machine Learning

DID: Conclusion and References

Propensity Score Matching (PSM) Methodology: Introduction

PSM: Basic Concept

PSM: Key Components

PSM: Example

PSM: Steps to Implement

PSM: Limitations and Considerations

PSM: Applications of Machine Learning

PSM: Conclusion and References

Regression Discontinuity Design (RDD) Methodology: Introduction

RDD: Basic Concept

RDD: Key Components

RDD: Example

RDD: Steps to Implement

RDD: Limitations and Considerations

RDD: Applications of Machine Learning

RDD: Conclusion and References

The Post-selection Inference Problem

The Model

Consistent Model Selection

Distribution of the Post-selection Estimator

High Dimension, Sparsity, and the Lasso

Theoretical Elements on the Lasso

Theorem: Consistency of Lasso

Regularization Bias

The Double Selection Method

Empirical Application: Education on Wage

Summary

Additional References

Generalization in Econometrics

Statistical Learning Theory

Model Complexity

Overfitting

Cross-Validation Techniques

Regularization Techniques

Model Selection Criteria

Conclusion

Further Reading

Introduction to Endogeneity

Sources of Endogeneity

Instrumental Variables (IV)

The Two-Stage Least Squares (2SLS) Method

Limitations of IV