Home / Papers / Logistic regression geometry

Logistic regression geometry

1 Citations2013
K. Anaya-Izquierdo, F. Critchley, P. Marriott
arXiv: Methodology

No TL;DR found

Abstract

The fact that the maximum likelihood estimate in a logistic regression model may not exist is a well-known phenomenon and a number of recent papers have explored its underlying geometrical basis. [9], [12] and [7] point out that existence, and non-existence, of the estimate can be fully characterised by considering the closure of the model as an exponential family. In this formulation it becomes clear that the maximum is always well-defined, but can lie on the boundary rather than in the relative interior. Furthermore, the boundary can be considered as a polytope characterised by a finite number of extremal points. This paper builds on this work and shows that the boundary affects more than the existence of the maximum likelihood estimate. In particular, even when the estimate exists, the geometry and boundary can strongly affect inference procedures. First and higher order asymptotic results can not be uniformly applied. Indeed, near the boundary, effects such as high skewness, discreteness and collinearity dominate, any of which could render inference based on asymptotic normality suspect. The paper presents a simple diagnostic tool which allows the analyst to check if the boundary is going to have an appreciable effect on standard inferential techniques. The tool, and the effect that the boundary can have, are illustrated in a well-known example and through simulated datasets. Example 1. The Fisher iris data set, [8], is often used to illustrate classification and binary regression. Even in this familiar case we show that the boundary is close enough to have significant effects for inference. Let us focus on the problem