Data Analysis and Machine Learning Applications
- Course: PHYS 398MLA
- Instructor: Prof. Mark Neubauer, firstname.lastname@example.org
- TA: Dewen Zhong, email@example.com
- Lectures: Mondays from 3-4:50 pm in 222 Loomis Laboratory of Physics
- NOTE: Due to a room conflict, the first lecture will be in 322 Loomis
- Need help?
- It sends message digests to people who aren’t active in the room, so feel free to ask a question even if no one’s around.
- Look through and create issues
- Office Hours
- Dewen Zhong: Thursdays from 2-3 pm in 420 Loomis
- Prof. Neubauer: Thursdays from 3-4 pm in 411 Loomis
- Email for 1-on-1 help, or to set up a time to meet
Welcome you to the Data Analysis and Machine Learning Application (for physicists) course!
In this course, you will learn fundamentals of how to analyze and interpret scientific data and apply modern machine learning tools and techniques to problems common in physics research such as classification and regression. This course offering is very timely given the explosion of interest and rapid development of data science and artificial intelligence. Every day there are new applications of machine learning to the physical sciences in ways that are advancing our knowledge of nature.
This course is designed to be interactive and collaborative, employing Active Learning methods, at the same time developing your own skills and knowledge (and course grade :). I initiated this course out of my view that we live in an increasingly data-centric world, with both people and machines learning from vast amounts of data. There has never been a time where early-career physicists were more in need of a solid understanding in the basics of scientific data analysis, data-driven inference and machine learning, and a working knowledge of the most important tools and techniques from modern data science than today.
This is the second offering of a new course and unlike any that is being taught in our Department. As such, I ask for your feedback on any aspect of the course so that I can work to improve the curriculum.
- You need a laptop for this course. It is assumed that you have a laptop running MacOS, Linux or Windows for use both inside and outside of the class.
- Some knowledge of python preferred but not required. You do need to have a working knowledge of the basics of computer programming.
Setting up your environment
- There is some setup required to ensure a consistent and functioning software environment on your computer to use the Jupyter notebooks in this course. This setup is detailed here and is best started before the first lecture to work out any wrinkles so that we can get started on the physics and data science content of the course.
Topics covered include:
- Notebooks and numerical python
- Handling and Visualizing Data
- Finding structure in data
- Measuring and reducing dimensionality
- Adapting linear methods to nonlinear problems
- Estimating probability density
- Probability theory
- Statistical methods
- Bayesian statistics
- Markov-chain Monte Carlo in practice
- Stochastic processes and Markov-chain theory
- Variational inference
- Computational graphs and probabilistic programming
- Bayesian model selection
- Learning in a probabilistic context
- Supervised learning in Scikit-Learn
- Cross validation
- Neural networks
- Deep learning
Topics will be demonstrated in-class through live-code examples/slides in Juypter notebooks, available at syllabus/notebooks.
The lectures will include physics and data science pedagogy, demonstrated through live examples in Jupyter notebooks that you will work through in class. You are required to attend each lecture with your laptop and working environment. Attendance will be taken.
Homework is an important part of the course where you will have an opportunity to apply the techniques you are learning to problems relevant to the analysis of scientific data. All assignments are listed within the Course Outline. You will submit your homework via your private Github repository.
- Class Participation: ~20%
- Homework: ~45%
- Research project: ~35%
[Aug 26] Lec 01: Introduction
- Getting overview of the course, including reading list and homework assignments
- Setting up your environment
- Complete setting up your environment so that you can launch and execute notebooks
- A Whirlwind Tour of Python, Jake VanderPlas: free PDF, notebooks online.
[Sep 02] NO LECTURE (Labor Day)
[Sep 09] Lec 02: Data Science
- Gain familiarity with Jupyter Notebooks and Numerical python
- Learn about handling and describing data
- Homework 1: Numerical python and data handling
- Released on Monday, Sept 9
- Due by 3:00 pm CDT on Monday, Sep 16
[Sep 16] Lec 03: Visualizing & Finding Structure in Data
- Learn about visualizing data
- Learn about the importance of clustering data in physics
- Learn how to find structure in data (clustering)
- KMeans, Spectral Clustering, DBSCAN
[Sep 23] Lec 04: Dimensionality & Linearity
- Measure and reduce dimensionality
- Adapt linear models to nonlinear problems
- Homework 2: Visualization, Covariance and Correlation
- Released on Monday, Sep 23
- Due by 3:00 pm CDT on Monday, Sep 30
[Sep 30] Lec 05: Kernel Functions & Probability Theory
- Learn about Kernel functions
- Learn about Probability Theory
- Homework 3: Expectation-Maximization Algorithm, K-Means, Principle Component Analysis
- Released on Monday, Sep 30
- Due by 3:00 pm CDT on Monday, Oct 6
[Oct 07] Lec 06: Probability Density Estimation & Statistics
- Estimate probability density
- Learn about Statistical Methods
- Homework 4: Probability
- Released on Monday, Oct 07
- Due by 3:00 pm CDT on Monday, Oct 14
[Oct 15] Lec 07: Bayesian Statistics & Markov-chain Monte Carlo
- Learn about Bayesian Statistics
- Markov-chain Monte Carlo put into practice
- Homework 5: Kernel Density Estimation
- Released on Monday, Oct 15
- Due by 3:00 pm CDT on Monday, Oct 21
[Oct 21] Lec 08: Stochastic Processes, Markov Chains & Variational Inference
- Learn about Stochastic processes in the realm of Data Science
- Learn about Markov-chain Theory
- Learn about the Variational Inference Method
- Homework 6: Bayesian Statistics and Markov Chain Monte Carlo
- Released on Monday, Oct 21
- Due by 3:00 pm CDT on Monday, Oct 28
[Oct 28] Lec 09: Optimization, Comput. Graphs & Prob. Prog.
- Learn about Optimization and Stochastic Gradient Descent
- Learn about Frameworks for Computational Graphs
- Learn about Probabilistic Programming methods
- Homework 7: Markov Chains
- Released on Monday, Oct 28
- Due by 3:00 pm CDT on Monday, Nov 11
[Nov 04] NO LECTURE
[Nov 11] Lec 10: Bayesian Models & Probabilistic Learning
[Nov 18] Lec 11: Supervised Learning & Cross Validation
- Learn about Cross Validation
[Nov 25] NO LECTURE (Fall Break)
[Dec 02] Lec 12: Artificial Neural Networks
- Learning and Inference using Neural Networks
- Homework 8: Cross Validation and Neural Networks
- Released on Monday, Dec 02
- Due by 3:00 pm CDT on Monday, Dec 09
[Dec 09] Lec 14: Deep Learning
- Learn about Deep Learning
You can find the references list, including required and recommended reading, at Reading list
Some quick reference guides
Git and GitHub
Anaconda and Conda
I found the Atom editor to be the best option. It has full Github integration which avoids having to type git commands every time
With a plug-in, it also does latex syntax highlighting. Install it with:
apm install latex
apm install language-latex
I would like to acknowledge David Kirby at the University of California at Irvine for the materials and setup for which this course is based and the helpful discussions we have had. I would like to thank Matthew Feickert and Dewen Zhong for their guidance and contributions to the course. I also acknowledge the course at github.com/advanced-js for which the syllabus template was utilized.
Material for a University of Illinois course offered by the Physics Department.
Content is maintained on github and distributed under a BSD3 license.