NL FR EN
www.belgium.be

Developing crucial Statistical methods for Understanding major complex Dynamic Systems in natural, biomedical and social sciences (StUDyS)

Research project P7/06 (Research action P7)

Persons :

Description :

Stochastic systems are influenced by internal characteristics of the system, but may as well depend on (interactions with) external factors, and may evolve over time and/or space. The main goal of this IAP-network is to develop statistical methods that are crucial for complete understanding of certain classes of complex dynamic systems, and to use these to answer challenging questions in focused applications.

Due to advances in technology, the complexity of the total data structure is often quite involved: different types of data, several sources of information on a same subsystem, few measurements on a large number of characteristics, … Although more and/or diverse information is to be applauded for, consequently, extraction of the important information from it becomes a real challenge.

Examples of complex stochastic systems where such challenges arise are:

(a) According to the WHO, depression will very soon become the biggest health burden on society, both economically and sociologically. Understanding depression and affective disorders, and more generally human affect, however, is an extremely hard problem, because emotions and affects are: (1) highly multifaceted and multilayered phenomena that comprise a broad range of interrelated system components (including cognitive, experiential, and physiological ones), (2) inherently time-bound phenomena that can only be adequately captured provided a true account of their temporal dynamics, (3) subject to sizeable individual differences, which also play a major role in the development of psychopathology.

(b) A main lesson of the recent economic and financial crisis is that detection in real time of macroeconomic risk requires the understanding of the potentially perverse interactions between a large number of markets and institutions. This requires the analysis of the joint dynamical behavior of a large number of time series. Similar situations are very common in economic analyses, e.g. professional forecasters and policy makers who look at a variety of different indicators to predict aggregate key variables to make their decisions.

(c) Approving the release of a new vaccine on the market and determining whether it is eligible for reimbursement or approving a change in recommended vaccination schedules are two of the major themes in infectious disease epidemiology. Given the limited data available, statistical analysis to determine the host-dynamics, the time-, space- and age-specific serial seroprevalence and the cost-effectiveness have to take into account the complexities and uncertainties in the data. To quantify the impact on the infectious disease stochastic dynamics and the associated costs, a full account of the different sources of uncertainty and the interplay between the different (sub)systems is needed.

(d) In food industry, an important issue concerns the classification of meat samples, in view of quality and/or safety considerations. Various types of measurements on the meat samples are available, from contents of protein, fat content, and water (among others) to near infrared absorbance spectra. Important research questions are the discrimination between the samples, and their relationship with various characteristics. The complexity of the data (continuous values, categories, functional data, ...) and their statistical interdependencies need to be fully taking into account to answer these questions.

Common elements in these examples are: the complex interplays between various characteristics (of possibly very different conceptual nature), but also the various layers of dynamics (time, space, ...). The analysis of such complex stochastic systems based on advanced data structures, faces important challenges for statistics research, translated in the following main objectives:

1. How to model and analyze dependence structures between random variables (of possibly different nature – such as real numbers, discrete values, functions, graphs) that themselves may vary: (i) with other covariates (also of possibly different nature); (ii) in time and/or space; (iii) differently in the tails of the distributions;

2. How to efficiently analyze data that exhibit several dynamics, e.g. in space and time. What are the most efficient statistical modeling techniques incorporating the various layers of dynamics?

3. How to efficiently analyze data that are hierarchically structured (e.g. data with some cluster structure, network structure, missing data...)?

4. How to account for the influence of non-observable variables in complex stochastic systems? What are the most efficient modeling techniques and associated statistical methods?

5. How to select from a large, very large, or even huge set of measured characteristics (of possibly different nature) those that influence a variable of interest? What are the most efficient sparse so-called regularization techniques and how to select regularization parameters? How to draw conclusions from data sets that contain multiple sorts of information regarding the same complex (sub)system, and how to do data fusion?

According to these main objectives the research work in the network will be organized around five work packages, and one meta work package:

Work package 1: The study of associations and dependencies in complex systems.
Measuring associations between characteristics (of scalar type, functional type, ...) of a stochastic system can enter at various levels (in a time evolution, in tails of distributions, ...) This work package studies the statistical modeling of associations and dependencies in complex stochastic systems, including testing for
specific association structures.

Work package 2: The study of different dynamics in complex systems.
A stochastic system can exhibit different dynamics (time dynamics, spatial dynamics, ...). This work package is concerned with efficiently modeling these different layers of dynamics, and with the development of statistical methodology for these complex dynamic systems.

Work package 3. Multivariate modeling and hierarchically structured data.
The aim of this work package is two-fold: analyzing hierarchically structured data, and using hierarchical modeling with the aim to unravel the dynamics of the underlying stochastic system. Survival data, for example, are often structured hierarchically, and exhibit complex association structures, possibly changing in time or space. Hierarchical non-linear spatial and/or temporal modeling of species in the environment (describing e.g. the biological behavior of animals) is an important modeling tool.

Work package 4. Dynamics of a stochastic system and the impact of non-observed characteristics.
For complex systems it is often impossible to have observations on all important variables. In micro-econometrics for example, the price of a good in a country may also depend on the welfare of the people living in that country, and the latter is difficult to measure. This work package studies how to deal with non-observed (latent) characteristics in complex dynamic systems.

Work package 5. Variable and model selection and the study of (ultra-)high dimensional data.
The number of observed characteristics (variables) can be large to high to ultra-high, when compared to the number of individuals (subjects) for which these variables are measured. Methods that automatically can select the important characteristics (possibly of different nature) in complex systems need to be developed, also for situations for which the model adopted specifies that the number of characteristics grows very fast (even at a polynomial rate) with the number of subjects. Even for a set of selected characteristics, several models may be plausible, and model selection comes into play.

Meta Work package. The developed statistical methods in full use.
In this meta work package we aim, through continuous interactions with the other work packages, at answering specific questions in focused application areas, in particular in econometrics, biomedical sciences, human and natural sciences. Among others, we refer back to the challenging questions in a)—d). These questions on the one hand stand for the motivation of the research questions and on the other hand serve at demonstrating the impact of the planned research on application areas.

The emphasis on each work package is on different aspects of the complexities of the dynamics of a stochastic system, and only their synergies and joint successes will lead to a final comprehensive statistical analysis of the system. A high level of interaction between the various work packages is indispensable, and will be stimulated by:

(i) the synergy needed in the meta work package;
(ii) the use of common state-of-the-art statistical tools and the complementary expertise’s related to these within the network: flexible modeling techniques, hierarchical modeling techniques, sparse representation techniques; dimension reduction methods; techniques for data aggregation and data fusion; rank-based and robust methods;
(iii) specific managerial and networking activities (joint postdoctoral researchers, interuniversity PhD committees, focused working groups, …