AbstractNetworks are ubiquitous in modern science. Network extraction has become, for many fields, a popular
approach to explore a complex of interrelations between entities of interest. The entities in which we take
interest are molecular features stemming from omics studies. A challenge with omics-data is that they are often high-dimensional, i.e., the number of features exceeds the number of observations.
We approach network extraction for high-dimensional data as a problem in penalized graphical
modeling. Graphical models utilize graphs to express conditional (in)dependence relations between random
variables. Penalization then ensures that these models are estimable from high-dimensional data. We use, in
contrast to popular L1 approaches, an L2-approach to penalization.
In this course we first show why L2-based network extraction may be preferred over its L1-based analogue. We will then focus on the following situations of interest:
- Extracting a single network from steady-state data;
- Simultaneously extracting multiple networks from multiple related data sets and/or data consisting of distinct (disease) subclasses;
- Extracting networks from time-course data.
Importantly, for each of these situations we will explore methodology to analyze and exploit the networks in
order to enhance their practical value. Hence, the course revolves around (i) estimating graphical models, and (ii) translating these models into tangible information and practical consequences for the medical collaborator.
PrerequisitesThe course is intended to be challenging, but will not be overly difficult. It will be oriented towards all (applied) statisticians with an interest in reverse-engineering and analyzing networks from high-dimensional omics data. We expect participants to have a working knowledge of (i) linear algebra, (ii) penalization methods, and (iii) the R platform and language. Knowledge of network science is not required: basic network concepts will be introduced during the course.
Learning ObjectivesParticipants will become familiar with basic concepts from network science and current approaches for the extraction of networks from high-dimensional data. Participants will also gain hands-on experience with the extraction, visualization and basic analysis of molecular networks. Moreover, we believe the course will support participants in envisioning how network-information can be of clinical interest.
TextbookNo textbook is required. Recommended reading will consist of articles. Especially:
- Bilgrau, A.E., Peeters, C.F.W., et al. (2020). Targeted fused ridge estimation of inverse covariance matrices from multiple high-dimensional data classes. Journal of Machine Learning Research, 21(26): 1-52.
- Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3): 432-441.
- Miok, V., Wilting, S.M., & van Wieringen, W.N. (2016). Ridge estimation of the VAR(1) model and its time series chain graph from multivariate time-course omics data. Biometrical Journal, 59(1): 172-191.
- van Wieringen, W.N., & Peeters, C.F.W. (2016). Ridge estimation of inverse covariance matrices from high-dimensional data. Computational Statistics & Data Analysis, 103(November): 284-303.
LaptopWe require the participants to bring a laptop with the latest version of R installed. Moreover, we expect the participants to have the latest version of the R packages rags2ridges and ragt2ridges. Preferably, the participants will also have RStudio installed.
About the Instructors
Carel F.W. Peeters is an associate professor of Statistical Learning at the Division of Mathematical & Statistical Methods of Wageningen University & Research. He specializes in multivariate and high-dimensional statistical learning. Current interest lies with the network-integration of omics data and representation learning. For more information, see his personal website.
Latest course taught:
- Statistics for Data Scientists
- Focusing on the interface between statistics and machine learning
- Taught in English
- Audience: Master students in Applied Data Science
Wessel N. van Wieringen is an associate professor of Molecular Biostatistics at the Department of Mathematics of the VU University Amsterdam and at the Statistics for Omics Research Group in the Department of Epidemiology & Data Science, VU University medical center Amsterdam, the Netherlands. His research facilitates the deduction of the biological tangible conclusions (w.r.t. the dys/functioning of the cell) from omics data. This requires a.o. 1) the formulation of multivariate statistical models (e.g. graphical models) describing cellular processes, 2) the estimation of model parameters from the (highdimensional)
data, 3) understanding the models' limitations in their capacity to explain observed data from dysregulated processes, and 4) re-iterating the previous three steps to improve the models.
Latest courses taught:
- 1) Advanced biostatistics and 2) High-dimensional data analysis
- Focusing on 1) biomedical applications of Markov model and 2) regularized learning.
- Taught in 1) Dutch and 2) English
- Audience: 1) bachelor students of Medical Natural Sciences 2) master students in Statistical Science