Title: Reproducibility challenges in data analysis with examples in brain imaging.
Facilitator: JB Poline (McGill University)
Duration: full day (9:00 am to 4:00 pm).
Description of the workshop:
The scientific community as a whole, and in particular the life sciences community, increasingly recognizes that many of the published results are difficult to reproduce (Collins and Tabak, Nature, 2014). The problem seems to be caused by a variety of factors, ranging from technical to sociological. In this workshop, we will first discuss scientific reproducibility, with examples from the life sciences and specific illustrations drawn from the field of brain imaging. We will then introduce you to some practical open science tools that should encourage the development of more reproducible research from a software and data management point of view. We will show you how the statistical community can use these tools and we will offer you specific statistical problems (association or forecast tests). Examples and practical exercises will be presented using Python on brain imaging data. Participants must bring their own laptop. We will provide instructions for installing the required software.
9:30 am-10:30am Introduction to reproducibility (1h)
- Conference on the causes of non-reproducibility in science and potential solutions
10.30am-12.30pm: Tools for a reproducible science: software (2h)
- Introduction to version control
- Local and distributed version control (Git, interactive)
- Collaborative Web infrastructures for Git (Github, interactive)
12:30 p.m. - 1:30 p.m .: dinner
1:30 p.m. - 2:45 p.m . : Practical tools for a reproducible science: data management
- Introduction: Common issues in data management
- Data versions: introduction of git-annex and git-lfs (interactive)
- Introduction to containers
- Data processing under Datalad (interactive)
2:45 p.m. - 3:00
p.m .: coffee break 3:00 p.m. - 4:30 p.m .: Statistical reproducibility: challenges
- In this session, we propose specific statistical reproducibility challenges to illustrate some common problems faced by statisticians and data scientists when developing a solution to a biological question. It will be an interactive session under Python.
Collins, Francis S., and Lawrence Tabak. 2014. “NIH Plans to Enhance Reproducibility.” Nature 505: 612–13.