Oct 17, 2018 - University of Chicago. 17 October ... Another kind of computational materials science tool ..... Tools Exist for All Parts of Informatics Workflow.
A Practical Tutorial for Simple and Reproducible Materials Informatics Logan Ward Postdoctoral Scholar University of Chicago 17 October 2018
What is Materials Informatics 2
Another kind of computational materials science tool Goal: Accelerate design of materials Method: Replace experiments with computers
Many Theory-Based Tools: • •
• •
Density Functional Theory Phase Field Finite Element Analysis Computational Thermodynamics
Collaborators: UIUC Deep Learning of Materials Data
Collaborators: NU [D. Jha]
What have I learned from working on these projects?
Objectives 6
Goal: Pass on “lessons learned” from doing materials informatics
A Few General Questions: How Should I Construct a Problem? What Tools Are Available? How Should I Organize My Project? What About Publishing?
7
How to make/use ML models? Practical concerns: What tools should I use? How should I avoid doing unneeded work? How to make sure my science is reproducible?
Materials Informatics Workflow 8
Collect
Process
Represent 𝑋Ԧ
Δ𝐻𝑓 = −1.0
Δ𝐻𝑓 = −0.5
Learn
𝑦Ԧ
𝒁𝑨
𝒁𝑩
𝚫𝐇𝐟
3
4
-1.0
3
5
-0.5
Δ𝐻𝑓 = 𝑓 𝑍𝐴 , 𝑍𝐵
Data Collection and Processing 9
Collect
Process
Represent
Learn
Collect data, define training set Challenge 1: Finding/storing structured datasets LIMS Tools
Data Resources
NLP
There are a variety of tools that can help you Materials Commons
Challenge 2: Clearly define inputs and outputs
Δ𝐻𝑓 = −1.0 Δ𝐻𝑓 = −0.5
Data Management Need Not Be Fancy… 10
Write code to format data (Don’t edit by hand!)
Knowing “what you have” and “where you got it” should not rely on memory
…but There Are Modern Tools 11
Common Data Curation System
Citrination
Organizing your data provides a solid starting point
Data Representation 12
Collect Process Represent Learn Encode domain knowledge into inputs
𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦 = 𝒇 𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠 LiF
NaPb
Na2O 𝑥Na
NaPb
Representation of material Ex: Attributes = g(𝑥𝐻 , 𝑥𝐻𝑒 , …)
What does a representation need? |ΔΧ| Completeness: Differentiate materials Efficiency: Quick to compute Accuracy: Capture important effects Na2O LiF End product: Machine-learning compatible inputs
Data Representation: Open Codes 13
There is a variety of re-usable approaches, codes Electronic Structures
Crystal Structures
Microstructures
There may be a code that does what you need
Molecules
pyMKS Pictures from GitHub, Wikipedia
Matminer: Tookit for Materials ML 14
https://github.com/hackingmaterials/matminer/
Matminer: Tookit for Materials ML 15
Usable model in 5 lines of code data = MDFDataRetrieval().get_dataframe (source=‘oqmd’) data[‘composition’] = data[‘mdf.composition’].apply(Composition) f = ElementProperty().from_preset(‘magpie’) f.featurize_dataframe(data, ‘composition’) m = Lasso().fit(data[f.feature_labels()], data[‘oqmd.delta_e.value’])
https://github.com/hackingmaterials/matminer/
Matminer: “Featurizer” Library 16
Matminer has methods from many groups https://github.com/hackingmaterials/matminer/
Machine Learning 17
Collect
Process Represent Fit a model to the data
Learn
There are many algorithms, how do you decide? 1. Identify algorithms with desired properties. Ex: differentiability 2. Use cross-validation to optimize parameters 3. Pick the one with the lowest “loss”
Process/tools for building a ML model is well-established
Think Hard About Validation 18
Test Model How You Will Use It Evaluating new alloy systems? Exp.
ML
Predicting Future Events? Ward et al. npj Comp Mat. (2016)
Interpolating to new clusters? Ward et al, in preparation
Meredig et al. MSDE. (2018)
19
How to Organize? [Jupyter]
What is Jupyter? Why care? 20
• Jupyter is environment designed for
reproducible computational science • Stores code with outputs and documentation in a single notebook • •
Modeled after Mathematica https://www.theatlantic.com/science/ar chive/2018/04/the-scientific-paper-isobsolete/556676/
• Why do I care? Jupyter lets me… • organize my code • easily run on a remote system • keep track of results and rationale • communicate my research better
How I write a notebook 21
One notebook per “experiment” or “idea” Outline: 1.
Title and short abstract 1.
2.
Load in libraries 1.
3.
Put them up front, so it crashes early
Load in data from disk 1.
4.
“What am I doing here and why”
Ex: training data, results from other notebook
Each step in their own block 1. 2.
3. 4.
Introduction Code and explanation Figure/visualization Explanation of finding
Example Step 22 Introduction
Explanation
Visualization and conclusion
Someone should be able to understand this without knowing Python!
The Full Narrative 23
[A Collection of Many Small Steps]
Pitfalls When Making Notebooks 24 1.
Not including documentation Advice: Write what you’re going to do first
2.
Notebook not capturing entire process Problem: Jupyter lets you execute cells out of order Advice: Periodically “Restart Kernel and Run All Cells”
3.
Duplicate code between notebooks Advice: Make a separate module for common code
4.
Library conflicts Advice: Run mature projects in container, separate machines Advice: Make a “requirements.txt” file