Data Analysis and Machine Learning Applications
 Course: PHYS 398MLA
 Instructor: Prof. Mark Neubauer, msn@illinois.edu
 TA: Dewen Zhong, dzhong6@illinois.edu
 Lectures: Mondays from 34:50 pm in 222 Loomis Laboratory of Physics
 NOTE: Due to a room conflict, the first lecture will be in 322 Loomis
 Need help?

 It sends message digests to people who aren’t active in the room, so feel free to ask a question even if no one’s around.
 Look through and create issues
 Office Hours
 Dewen Zhong: Thursdays from 23 pm in 420 Loomis
 Prof. Neubauer: Thursdays from 34 pm in 411 Loomis
 Email for 1on1 help, or to set up a time to meet
Course Description
Welcome you to the Data Analysis and Machine Learning Application (for physicists) course!
In this course, you will learn fundamentals of how to analyze and interpret scientific data and apply modern machine learning tools and techniques to problems common in physics research such as classification and regression. This course offering is very timely given the explosion of interest and rapid development of data science and artificial intelligence. Every day there are new applications of machine learning to the physical sciences in ways that are advancing our knowledge of nature.
This course is designed to be interactive and collaborative, employing Active Learning methods, at the same time developing your own skills and knowledge (and course grade :). I initiated this course out of my view that we live in an increasingly datacentric world, with both people and machines learning from vast amounts of data. There has never been a time where earlycareer physicists were more in need of a solid understanding in the basics of scientific data analysis, datadriven inference and machine learning, and a working knowledge of the most important tools and techniques from modern data science than today.
This is the second offering of a new course and unlike any that is being taught in our Department. As such, I ask for your feedback on any aspect of the course so that I can work to improve the curriculum.
Prerequisites
Courses
Hardware
 You need a laptop for this course. It is assumed that you have a laptop running MacOS, Linux or Windows for use both inside and outside of the class.
Software
 Some knowledge of python preferred but not required. You do need to have a working knowledge of the basics of computer programming.
Setting up your environment
 There is some setup required to ensure a consistent and functioning software environment on your computer to use the Jupyter notebooks in this course. This setup is detailed here and is best started before the first lecture to work out any wrinkles so that we can get started on the physics and data science content of the course.
Course Overview
Topics covered include:
 Notebooks and numerical python
 Handling and Visualizing Data
 Finding structure in data
 Measuring and reducing dimensionality
 Adapting linear methods to nonlinear problems
 Estimating probability density
 Probability theory
 Statistical methods
 Bayesian statistics
 Markovchain Monte Carlo in practice
 Stochastic processes and Markovchain theory
 Variational inference
 Optimization
 Computational graphs and probabilistic programming
 Bayesian model selection
 Learning in a probabilistic context
 Supervised learning in ScikitLearn
 Cross validation
 Neural networks
 Deep learning
Topics will be demonstrated inclass through livecode examples/slides in Juypter notebooks, available at syllabus/notebooks.
Class Participation
The lectures will include physics and data science pedagogy, demonstrated through live examples in Jupyter notebooks that you will work through in class. You are required to attend each lecture with your laptop and working environment. Attendance will be taken.
Homework
Homework is an important part of the course where you will have an opportunity to apply the techniques you are learning to problems relevant to the analysis of scientific data. All assignments are listed within the Course Outline. You will submit your homework via your private Github repository.
Projects
Grading
 Class Participation: ~20%
 Homework: ~45%
 Research project: ~35%
Course Outline
[Aug 26] Lec 01: Introduction
Goals
 Getting overview of the course, including reading list and homework assignments
 Setting up your environment
Lecture notebooks
Homework
 Complete setting up your environment so that you can launch and execute notebooks
Required reading
 A Whirlwind Tour of Python, Jake VanderPlas: free PDF, notebooks online.
Supplemental reading
[Sep 02] NO LECTURE (Labor Day)
[Sep 09] Lec 02: Data Science
Goals
 Gain familiarity with Jupyter Notebooks and Numerical python
 Learn about handling and describing data
Lecture notebook(s)
Homework
 Homework 1: Numerical python and data handling
 Released on Monday, Sept 9
 Due by 3:00 pm CDT on Monday, Sep 16
Supplemental reading
[Sep 16] Lec 03: Visualizing & Finding Structure in Data
Goals
 Learn about visualizing data
 Learn about the importance of clustering data in physics
 Learn how to find structure in data (clustering)
 KMeans, Spectral Clustering, DBSCAN
Lecture notebook(s)
Homework
Supplemental reading
[Sep 23] Lec 04: Dimensionality & Linearity
Goals
 Measure and reduce dimensionality
 Adapt linear models to nonlinear problems
Lecture notebook(s)
Homework
 Homework 2: Visualization, Covariance and Correlation
 Released on Monday, Sep 23
 Due by 3:00 pm CDT on Monday, Sep 30
Supplemental reading
[Sep 30] Lec 05: Kernel Functions & Probability Theory
Goals
 Learn about Kernel functions
 Learn about Probability Theory
Lecture notebook(s)
Homework
 Homework 3: ExpectationMaximization Algorithm, KMeans, Principle Component Analysis
 Released on Monday, Sep 30
 Due by 3:00 pm CDT on Monday, Oct 6
Supplemental reading
[Oct 07] Lec 06: Probability Density Estimation & Statistics
Goals
 Estimate probability density
 Learn about Statistical Methods
Lecture notebook(s)
Homework
 Homework 4: Probability
 Released on Monday, Oct 07
 Due by 3:00 pm CDT on Monday, Oct 14
Supplemental reading
[Oct 15] Lec 07: Bayesian Statistics & Markovchain Monte Carlo
Goals
 Learn about Bayesian Statistics
 Markovchain Monte Carlo put into practice
Lecture notebook(s)
Homework
 Homework 5: Kernel Density Estimation
 Released on Monday, Oct 15
 Due by 3:00 pm CDT on Monday, Oct 21
Supplemental reading
[Oct 21] Lec 08: Stochastic Processes, Markov Chains & Variational Inference
Goals
 Learn about Stochastic processes in the realm of Data Science
 Learn about Markovchain Theory
 Learn about the Variational Inference Method
Lecture notebook(s)
Homework
 Homework 6: Bayesian Statistics and Markov Chain Monte Carlo
 Released on Monday, Oct 21
 Due by 3:00 pm CDT on Monday, Oct 28
Supplemental reading
[Oct 28] Lec 09: Optimization, Comput. Graphs & Prob. Prog.
Goals
 Learn about Optimization and Stochastic Gradient Descent
 Learn about Frameworks for Computational Graphs
 Learn about Probabilistic Programming methods
Lecture notebook(s)
Homework
 Homework 7: Markov Chains
 Released on Monday, Oct 28
 Due by 3:00 pm CDT on Monday, Nov 11
Supplemental reading
[Nov 04] NO LECTURE
[Nov 11] Lec 10: Bayesian Models & Probabilistic Learning
Goals
Lecture notebook(s)
Homework
Supplemental reading
[Nov 18] Lec 11: Supervised Learning & Cross Validation
Goals
 Learn about Cross Validation
Lecture notebook(s)
Homework
Supplemental reading
[Nov 25] NO LECTURE (Fall Break)
[Dec 02] Lec 12: Artificial Neural Networks
Goals
 Learning and Inference using Neural Networks
Lecture notebook(s)
Homework
 Homework 8: Cross Validation and Neural Networks
 Released on Monday, Dec 02
 Due by 3:00 pm CDT on Monday, Dec 09
Supplemental reading
[Dec 09] Lec 14: Deep Learning
Goals
 Learn about Deep Learning
Lecture notebook(s)
Homework
Resources
References

You can find the references list, including required and recommended reading, at Reading list

Some quick reference guides
Git and GitHub
Anaconda and Conda
Project Jupyter
Atom
I found the Atom editor to be the best option. It has full Github integration which avoids having to type git commands every time
With a plugin, it also does latex syntax highlighting. Install it with:
apm install latex
apm install languagelatex
Acknowledgements
I would like to acknowledge David Kirby at the University of California at Irvine for the materials and setup for which this course is based and the helpful discussions we have had. I would like to thank Matthew Feickert and Dewen Zhong for their guidance and contributions to the course. I also acknowledge the course at github.com/advancedjs for which the syllabus template was utilized.
_________________
Material for a University of Illinois course offered by the Physics Department.
Content is maintained on github and distributed under a BSD3 license.