Wednesday, September 7, 2016

Webinar series - Fundamentals of Data Science

The BD2K Guide to the Fundamentals of Data Science Series

Every Friday beginning September 9, 2016

12pm - 1pm Eastern Time / 9am - 10am Pacific Time


Working jointly with the BD2K Centers-Coordination Center (BD2KCCC) and the NIH Office of Data Science, the BD2K Training Coordinating Center (TCC) is spearheading this virtual lecture series on the data science underlying modern biomedical research. Beginning in September 2016, the seminar series will consist of regularly scheduled weekly webinar presentations covering the basics of data management, representation, computation, statistical inference, data modeling, and other topics relevant to “big data” biomedicine. The seminar series will provide essential training suitable for individuals at all levels of the biomedical community. All video presentations from the seminar series will be streamed for live viewing, recorded, and posted online for future viewing and reference. These videos will also be indexed as part of TCC’s Educational Resource Discovery Index (ERuDIte), shared/mirrored with the BD2KCCC, and with other BD2K resources.

View all archived videos on our YouTube channel: 
https://www.youtube.com/channel/UCKIDQOa0JcUd3K9C1TS7FLQ

Please join our weekly meetings from your computer, tablet or smartphone.
https://global.gotomeeting.com/join/786506213
You can also dial in using your phone.
United States +1 (872) 240-3311
Access Code: 786-506-213 
First GoToMeeting? Try a test session: http://help.citrix.com/getready

SCHEDULE
9/9/16:  Introduction to big data and the data lifecycle (Mark Musen, Stanford).
9/16/16: SECTION 1: DATA MANAGEMENT OVERVIEW (Bill Hersh, Oregon Health Sciences).
9/23/16: Finding and accessing datasets, Indexing  and Identifiers (Lucila Ohno-Machado, UCSD).
9/30/16: Data curation and Version control (Pascale Gaudet, Swiss Institute of Bioinformatics).
10/7/16: Ontologies (Michel Dumontier, Stanford).
10/14/16: Provenance(Zachary Ives, Penn).
10/21/16: Metadata standards (Susanna-Assunta Sansone, Oxford).

10/28/16: SECTION 2: DATA REPRESENTATION OVERVIEW  (Anita Bandrowski, UCSD).
11/4/16:  Databases and data warehouses, Data: structures, types, integrations (Chaitan Baru, NSF).
11/11/16: No lecture - Veteran's Day.
11/18/16: Social networking data (TBD).
12/2/16:  Data wrangling, normalization, preprocessing (Joseph Picone, Temple).
12/9/16:  Exploratory Data Analysis (Brian Caffo, Johns Hopkins).
12/16/16  Natural Language Processing (Noemie Elhadad, Columbia).

The following topics will be covered in January through May of 2017:
SECTION 3: COMPUTING OVERVIEW
  Workflows/pipelines
  Programming and software engineering; API; optimization
  Cloud, Parallel, Distributed Computing, and HPC
  Commons: lessons learned, current state

 SECTION 4: DATA MODELING AND INFERENCE OVERVIEW
   Smoothing, Unsupervised Learning/Clustering/Density Estimation
   Supervised Learning/prediction/ML, dimensionality reduction
   Algorithms, incl. Optimization
   Multiple testing, False Discovery rate
   Data issues: Bias, Confounding, and Missing data
   Causal inference
   Data Visualization tools and communication
   Modeling Synthesis

SECTION 5: ADDITIONAL TOPICS
   Open science
   Data sharing (including social obstacles)
   Ethical Issues
   Extra considerations/limitations for clinical data
   Reproducible Research
   SUMMARY and NIH context