NL FR EN
www.belgium.be

Statistical techniques and modelling for complex substantive questions with complex data

Research project P5/24 (Research action P5)

Persons :

Description :

A key task for statistics is to provide researchers with tools to frame their substantive questions within formal models so as to make them amenable to empirical research. Regarding the latter, an important related task in statistical analysis is to take into account the very nature of the data. In this respect, nowadays one may note an increasing demand in many fields to capture more adequately the complexity of the data that are collected to investigate substantive research questions. Moreover, the substantive research questions themselves also display an ever increasing complexity, especially since the last decade. Both types of complexity constitute a major challenge for contemporary statistics. Novel models and techniques are clearly needed to handle questions and data with a complicated underlying structure, using up-to-date methods in statistical modeling and inference and often involving adaptations/modifications of techniques available for simpler structures.

The point of departure of the proposed network activities is that of a broad range of complex substantive data sets and questions arising in various disciplines (including psychology, biomedical sciences, economics, and climatology). The overall aim of our project then is to develop appropriate statistical models and techniques to deal with these data and questions.

As such, the network activities will be organized into 6 work packages which have been further grouped in two major sections. Section I includes 4 work packages (WP1-WP4) that focus on 4 well-delineated classes of models. Section II includes 2 work packages (WP5-WP6) that can be considered to deal with statistical meta-modeling aspects; the latter can be studied in their own right but, in addition, can also be included within different classes of models as distinguished within Section I.

The key aims of the six work packages (WPs) can be summarized as follows:

- WP1 (Functional estimation): to expand classical functional estimation of one- and multidimensional curves in line with more realistic (but more complex) substantive theories (in particular: economic theories involved in frontier estimation) and to capture in appropriate ways change or break points;

- WP2 (Time series): to deal with two major sources of complexity in the analysis of multivariate time series: nonstationarity and high-dimensional data;

- WP3 (Survival analysis): the study of nonparametric regression models with a complex censoring mechanism or involving discontinuities, and of frailty models to capture heterogeneity;

- WP4 (Mixed models): to look for adequate random effects distributions;

- WP5 (Classification and mixture models): how to capture the heterogeneity in a population and what its exact nature is;

- WP6 (Incompleteness and latent variables): the development of (semi)parametric missingness models for incomplete and latent data, and the study of sensitivity to various assumptions implied by this modeling.

Integration of the network activities will be achieved on four different levels:

1) substantive: data sets will be shared by different work packages and as such will be analyzed in terms of different, complementary models;

2) cross-links will be established between pairs of work packages: e.g., survival models will be studied in WPs 3 and 4; latent variables will be addressed in WPs 5 and 6;

3) interaction between Section I and Section II: e.g., classification techniques and mixture models as studied in WP5 to capture heterogeneity will be applied within various WPs of Section I;

4) common methodological ground: (a) the vast majority of the work packages will make use of smoothing and bootstrap/Bayesian data analysis techniques as common methodological tools; (b) a number of methodological research topics will be addressed by assembling methodological findings from different work packages, which should finally lead to the drawing of generic conclusions.

The proposed research should result into novel types of statistical methods, models and model expansions that fit better to complex substantive theories as well as to complex data. As such they should provide researchers with more effective and useful analysis tools for answering important present-day questions.