Announcements, projects, and favorite things from the UNCG health sciences librarian.
Wednesday, September 7, 2016
Webinar series - Fundamentals of Data Science
The BD2K Guide to the Fundamentals of Data Science Series
Every Friday beginning September 9, 2016
12pm - 1pm Eastern Time / 9am - 10am Pacific Time
Working jointly with the BD2K Centers-Coordination Center (BD2KCCC) and the NIH Office of Data Science, the BD2K Training Coordinating Center (TCC) is spearheading this virtual lecture series on the data science underlying modern biomedical research. Beginning in September 2016, the seminar series will consist of regularly scheduled weekly webinar presentations covering the basics of data management, representation, computation, statistical inference, data modeling, and other topics relevant to “big data” biomedicine. The seminar series will provide essential training suitable for individuals at all levels of the biomedical community. All video presentations from the seminar series will be streamed for live viewing, recorded, and posted online for future viewing and reference. These videos will also be indexed as part of TCC’s Educational Resource Discovery Index (ERuDIte), shared/mirrored with the BD2KCCC, and with other BD2K resources.
SCHEDULE 9/9/16: Introduction to big data and the data lifecycle (Mark Musen, Stanford). 9/16/16: SECTION 1: DATA MANAGEMENT OVERVIEW (Bill Hersh, Oregon Health Sciences). 9/23/16: Finding and accessing datasets, Indexing and Identifiers (Lucila Ohno-Machado, UCSD). 9/30/16: Data curation and Version control (Pascale Gaudet, Swiss Institute of Bioinformatics). 10/7/16: Ontologies (Michel Dumontier, Stanford). 10/14/16: Provenance(Zachary Ives, Penn). 10/21/16: Metadata standards (Susanna-Assunta Sansone, Oxford).
10/28/16: SECTION 2: DATA REPRESENTATION OVERVIEW (Anita Bandrowski, UCSD). 11/4/16: Databases and data warehouses, Data: structures, types, integrations (Chaitan Baru, NSF). 11/11/16: No lecture - Veteran's Day. 11/18/16: Social networking data (TBD). 12/2/16: Data wrangling, normalization, preprocessing (Joseph Picone, Temple). 12/9/16: Exploratory Data Analysis (Brian Caffo, Johns Hopkins). 12/16/16 Natural Language Processing (Noemie Elhadad, Columbia).
The following topics will be covered in January through May of 2017: SECTION 3: COMPUTING OVERVIEW Workflows/pipelines Programming and software engineering; API; optimization Cloud, Parallel, Distributed Computing, and HPC Commons: lessons learned, current state
SECTION 4: DATA MODELING AND INFERENCE OVERVIEW Smoothing, Unsupervised Learning/Clustering/Density Estimation Supervised Learning/prediction/ML, dimensionality reduction Algorithms, incl. Optimization Multiple testing, False Discovery rate Data issues: Bias, Confounding, and Missing data Causal inference Data Visualization tools and communication Modeling Synthesis
SECTION 5: ADDITIONAL TOPICS Open science Data sharing (including social obstacles) Ethical Issues Extra considerations/limitations for clinical data Reproducible Research SUMMARY and NIH context