Partial identification of probability distributions. (Q1812248): Difference between revisions

This book is an excellent and rigorous presentation of the state of research in the area of partial identification of populations and credible inference, in which the author has made many important contributions. The main theme of the book stems from the author's research in nonparametric regression analysis using data generated by random sampling processes with partial observability of outcomes if nothing is known about the missingness mechanisms. It reveals why there is much scope for statistical inference, mainly nonparametric, using data and assumptions in partially identifying the parameters of a population and how such inference can be made. The material is organized in ten chapters. Chapter 1 is devoted to the study of identification using empirical evidence and to the description of the identification regions of specific population parameters. The foundation of random sampling is extended to cases where the data generating mechanisms are multiple random sampling processes, each with only partially observable outcomes. Further, the problem of sampling with missing outcomes is presented as a special case of the problem of interval measurements of outcomes. Chapter 2 refers to various types of assumptions on the distributions of populations under study that employ instrumental variables as an aid in identifying their forms. Starting with the common assumptions that data are missing at random and that outcomes are statistically independent, the focus turns on the identification of expectations of real-valued functions of the outcomes under the weaker assumptions of mean independence of outcomes and of means missing at random. These assumptions are further weakened to distributional assumptions implying various types of monotonicity in the form of missingness of the outcomes (means missing monotonically) or replacing statistical independence of outcomes with some form of monotonicity in their means. The analysis readily extends to inference on conditional outcome distributions when the conditioning event is observable. Chapter 3 focuses on the possibility of conditional prediction when the realizations of outcomes or covariates are entirely observable or completely unobservable, thus extending the analysis of the previous chapter to inference on conditional outcome distributions when data on outcomes or on conditioning events may be missing. Three patterns of missing data are considered: sample members have only outcome data missing, only covariate data missing or have jointly missing outcome and covariate data. More general missing data patterns are also considered. Together with Chapters 1 and 2, Chapter 3 provides a thorough insight into the problem of prediction with outcome or covariate data that may be missing. The next two chapters of the book focus on the decomposition of finite mixtures. The view taken is that the available data are realizations of the model \(y= y^*z+ e(1- z)\), where \(z\) is an unobservable binary variable taking the values \(0\) or \(1\) according as \(e\) or \(y^*\) is observed, with \(e\) representing an error variable and \(y^*\) representing the error free part of \(y\). In the context of the above model, \(y\) is regarded as a contaminated version of \(y^*\). Chapter 4 studies the identification of two outcome distributions. One is the distribution of \(y|(z =1)\) (equivalently, the distribution of \(y^*|(z = 1)\)). The other is the distribution of \(y^*\). This is equivalent to the problem of identifying the components of a probability mixture. Identification regions for the distributions of \(y|(z= 1)\) and \(y^*\) are derived. The case of regions for event probabilities and for parameters under the assumption of stochastic dominance is also considered. Chapter 5 generalizes the binary mixture problem considered in Chapter 4 to the case where the random variable \(z\) instead of being binary with values indicating a data error \((z= 0)\) or an error-free realization \((z= 1)\), it takes values in a finite space. Such a case arises often in the context of ecological inference problems. Specifically, it addresses the problem of determing the structure of the regions of identification of the conditional distributions of variables of the form \(y|(z= z)\) or of the form \(y|(z= z,x= x)\) with \(x\) being another covariate. Interest subsequently shifts to the case of inference on the corresponding structure of identification regions for conditional expectations. Chapter 6 is about approaches to the identification problem in the case of data obtained via response-based sampling. This is a case that often arises in certain application fields of statistics, where random sampling can be very costly as far as yielding observations with very rare attributes is concerned and hence a less expensive stratified sampling design is adopted by dividing the population into a stratum of items having the attribute \((y= 1)\) and its complementary stratum of items not having the attribute \((y= 0)\). The chapter is devoted to various approaches to the problem of identification of the distribution of \(y\) conditional on covariates within the inferential frames of work of reverse regression, epidemiology (retrospective sampling) and econometrics (choice-based sampling), and examines inference on partial identification in binary response set ups and when covariate data are available from one of the response strata. The remaining four chapters of the book deal with the problem of the nonobservability of counter factual outcomes in empirical analysis of treatment response. Chapter 7 examines the problem of predictinig outcomes under conjectural treatment rules from knowledge of the realized outcomes under the treatment rule applied (selection problem) and addresses the question of treatment choice in heterogeneous populations. Approaches to the selection problem using empirical evidence or various distributional assumptions (including instrumental variables) are also presented. Chapter 8 deals with the selection problem under restrictions on the shape of the response function. In particular, it examines the case where the response function is monotone in treatments, in the sense that treatment response outcomes vary monotonically with the intensity of the treatment. The weaker case of semi-monotone responses and the stronger case of concave-monotone responses are also considered. Chapter 9 also deals with the selection problem under weaker monotonicity assumptions on the treatment responses. These refer to monotonicity restrictions on their means conditional on instrumental variables. These restrictions are the analogues of restrictions introduced in Chapter 2 in the context of prediction with missing outcome data. So, the effect on the selection problem of the assumption of monotonicity of treatment means in treatments is examined and sharp bounds on mean responses conditional on treatments, under both monotone treatment selection and monotone treatment response assumptions, are provided. Chapter 10, the last chapter, focuses on the problem of predicting outcomes under conjectural rules assuming treatments rules that allow different treatment of persons with the same observable covariates (mixing problem). Thus, the interest is now on the identification of the overall distribution of outcomes as a mixture of the response distributions over the various treatments. The overall quality of the book is very good. The presentation is not too mathematically sophisticated, largely resting on elementary probability theory, and the notation and terminology are unified. The main part of each chapter is written in a textbook style. All references to sources as well as historical remarks are made in a section titled ``endnotes'' at the end of each chapter. Further, each chapter has very useful complements which provide insight on how various aspects of the material presented can be brought in an application context, some of them providing even numerical examples. Clearly, both methodology and the applications presented are intended to provide statisticians with a good foundation for further study in the subject and scientists in applied fields (e.g., econometricians or epidemiologists) with a good statistical background and useful insight into methodological approaches to the identification problems in their fields.

0 references

reviewed by

Evdokia Xekalaki

0 references

zbMATH Keywords

partial identification

0 references

prediction with missing data

0 references

instrumental variables

0 references

response-based sampling

0 references

treatment response

0 references

monotonicity

0 references

mean independence

0 references

mean monotonicity

0 references

contaminated outcomes

0 references

mixture models

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.1007/b97478

0 references

Identifiers

zbMATH Open document ID

1047.62001

0 references

DOI

10.1007/b97478

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1812248

@@ Property / full work available at URL @@
+https://doi.org/10.1007/b97478
@@ Property / full work available at URL: https://doi.org/10.1007/b97478 / rank @@
+Normal rank
@@ Property / OpenAlex ID @@
+W1544362532
@@ Property / OpenAlex ID: W1544362532 / rank @@
+Normal rank