技术教育社区
www.teccses.org

材料信息学导论(上):机器学习基础(英文版)

封面

作者:张统一

页数:480

出版社:科学出版社

出版日期:2022

ISBN:9787030728982

电子书格式:pdf/epub/txt

内容简介

材料信息学是一门新兴的交叉学科,为在材料基因组理念下加速材料科学研究和技术发展提供了一个全新的方法。作为材料和力学学者,作者在推动材料信息学发展方面做了大量工作,在人工智能(AI)、机器学习(ML)和材料科学技术融合交叉方面,有诸多的尝试和心得体会。作者旨在写一本易懂的材料信息学简介,以进一步推动材料信息学的发展。为便于读者尽快理解和掌握材料信息学的核心内容,兼顾成书的完整性,本书分为上下两卷,上卷侧重于机器学习基础,下卷侧重于深度学习并综述材料信息学的现状及发展前景。
本上卷共十二章,内容包括线性回归与线性分类、支持向量机、决策树和K近邻(KNN)、集成学习、贝叶斯定理和期望第一化(EM)算法、符号回归、神经网络、隐型马尔可夫链、数据预处理和特征选择、可解释性机器学习,等等。叙述力求从简单明了的数学定义和物理图像出发,密切结合材料科学研究案例,给出了各种算法的详细步骤,便于读者学习和运用。

目录

Contents

Foreword

Preface

Symbols and Notations

Chapter 1 Introduction 1

References 13

Chapter 2 Linear Regression 15

2.1 Least Squares Linear Regression 15

2.2 Principal Component Analysis and Principal Component Regression 26

2.3 Least Absolute Shrinkage and Selection Operator (L1) 37

2.4 Ridge Regression (L2) 40

2.5 Elastic Net Regression 44

2.6 Multiply Task LASSO (MultiTaskLASSO) 49

Homework 52

References 53

Chapter 3 Linear Classification 55

3.1 Perceptron 57

3.2 Logistic Regression 60

3.3 Linear Discriminant Analysis 73

Homework 80

References 82

Chapter 4 Support Vector Machine 83

4.1 SVC 83

4.2 Kernel Functions 88

4.3 Soft Margin 96

4.4 SVR 102

Homework 108

References 110

Chapter 5 Decision Tree and K-Nearest-Neighbors (KNN) 112

5.1 Classification Trees 112

5.2 Regression Tree 121

5.3 K-Nearest-Neighbors (KNN) Methods 129

Homework 133

References 134

Chapter 6 Ensemble Learning 136

6.1 Boosting 137

6.1.1 AdaBoost 137

6.1.2 Gradient Boosting Machine (GBM) 145

6.1.3 eXtreme Gradient Boosting (XGBoost) 151

6.2 Bagging 153

Homework 158

References 159

Chapter 7 Bayesian Theorem and Expectation-Maximization (EM) Algorithm 160

7.1 Bayesian Theorem 160

7.2 Naive Bayes Classifier 161

7.3 Maximum Likelihood Estimation 168

7.3.1 Gaussian distribution 168

7.3.2 Weibull distribution 170

7.4 Bayesian Linear Regression 175

7.5 Expectation-Maximization (EM) Algorithm 184

7.5.1 Gaussian mixture model (GMM) 185

7.5.2 The mixture of Lorentz and Gaussian distributions 197

7.6 Gaussian Process (GP) Regression 209

Homework 219

References 219

Chapter 8 Symbolic Regression 221

8.1 Overview of Evolutionary Computation 221

8.2 Genetic Programming 223

8.3 Grammar-Guided Genetic Programming and Grammatical Evolution 225

8.4 The Application of LASSO in Symbolic Regression 234

Homework 235

References 235

Chapter 9 Neural Networks 238

9.1 Neural Networks and Perceptron 238

9.2 Back Propagation Algorithm 241

9.3 Regularization in NNs 250

9.3.1 L1 regularization 250

9.3.2 L2 regularization 257

9.4 Classification NNs 261

9.4.1 Binary classification 261

9.4.2 Multiclassification of multiply grades in a category 267

9.5 Autoencoders 272

9.5.1 Introduction 272

9.5.2 Denoising autoencoder 273

9.5.3 Sparse autoencoder 280

9.5.4 Variational autoencoder 288

Homework 311

References 312

Chapter 10 Hidden Markov Chains 313

10.1 Markov Chain 313

10.2 Stationary Markov Chain 317

10.3 Markov Chain Monte Carlo Methods 318

10.3.1 Metropolis Hastings (M-H) algorithm 320

10.3.2 Gibbs sampling algorithm 321

10.4 Calculation Methods for the Probability of Observation Sequence 325

10.4.1 Direct method 325

10.4.2 Forward method 328

10.4.3 Backward method 330

10.5 Estimation of Optimal State Sequence 332

10.5.1 Direct method 332

10.5.2 Viterbi algorithm 333

10.6 Estimation of Intrinsic Parameters—The Baum-Welch Algorithm 334

Homework 344

References 345

Chapter 11 Data Preprocessing and Feature Selection 347

11.1 Reliable Data, Normals and Anomalies 348

11.1.1 Local outlier factor 348

11.1.2 Isolated forest 352

11.1.3 One-class support vector machine 355

11.1.4 Support vector data description 361

11.2 Feature Selection 365

11.2.1 Filter approach 366

11.2.2 Wrapper approach 394

11.2.3 Embedded approach 402

Homework 408

References 408

Chapter 12 Interpretative SHAP Value and Partial Dependence Plot 410

12.1 SHapley Additive exPlanation value 410

12.2 The joint SHAP value of two features 426

12.3 Partial Dependence Plot 427

Homework 440

References 440

Appendix 1 Vector and Matrix 442

A1.1 Definition 442

A1.1.1 Vector 442

A1.1.2 Matrix 442

A1.2 Matrix Algebra 442

A1.2.1 Inverse and transpose 442

A1.2.2 Trace 443

A1.2.3 Determinant 443

A1.2.4 Eigenvalues and eigenvectors 444

A1.2.5 Singular value decomposition (SVD) 444

A1.2.6 Pseudo inverse 445

A1.2.7 Some useful identities 445

A1.3 Matrix Analysis 446

A1.3.1 Derivative of matrix 446

A1.3.2 Derivative of the determinant of a matrix 446

A1.3.3 Derivative of an inverse matrix 447

A1.3.4 Jacobian matrix and Hessian matrix 447

A1.3.5 The chain rule 447

References 447

Appendix 2 Basic Statistics 448

A2.1 Probability 448

A2.1.1 Joint probability 448

A2.1.2 Bayesian theorem and conjugation 448

A2.1.3 Probability density of continuous variables 449

A2.1.4 Quantile function 449

A2.1.5 Expectation, variance and covariance of random variables 449

A2.2 Distributions 449

A2.2.1 Bernoulli distribution 450

A2.2.2 Binomial distribution 450

A2.2.3 Poisson distribution 450

A2.2.4 Gaussian distribution 450

A2.2.5 Weibull distribution 451

A2.2.6 The chi-square (χ2) distribution and χ2-test 451

A2.2.7 Th

节选

Chapter 1 Introduction Materials informatics is an emerging and rapidly developing field, particularly after the launch of Materials Genome Initiative (MGI) in 2011. Ramakrishna et al. (2019) defined materials informatics as that “the materials informatics employs techniques, tools, and theories drawn from the emerging fields such as data science, internet, computer science and engineering, and digital technologies to the materials science and engineering to accelerate materials, products and manufacturing innovations”. Agrawal and Choudhary (2016) proposed that materials informatics has become the “fourth paradigm” in the research and development of materials. Following the guidance of MGI, many free-access and open-to-public material databases have been built up, and materials data are exponentially growing thanks to the development of high-throughput experiments and high-throughput calculation techniques, as well as data sharing among the materials communities. A list of available materials databases can be found in the review paper of Ramakrishna et al. (2019). Materials informatics integrates materials science and engineering and the sciences and technologies from artificial intelligence (AI), database, and machine learning (ML) to accelerate the innovation in the whole materials development continuum from discovery, development, property optimization, system design and integration, manufacturing to deployment, and to speed up the process from data to knowledge in order to understand and master the relationship between material micro-structures and macro-properties based on material composition, processing, and performance. Materials informatics adds the novel tools of AI and ML into the toolbox of materials science and engineering, which will definitely strengthen and enhance the power of the methodologies in materials research and development. Materials informatics utilizes AI and ML to analyze a large ensemble of materials data from experiments, computations, manufactures, industries, and daily life, etc., efficiently and cost-effectively, and to deliver materials knowledge and technology in user-friendly ways to designers, scientists, engineers, and manufacturers of materials and products. ML gives computers the ability to learn from data and make predictions based on data (Samuel 1967). Mitchell (1997) defined ML as follows: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” In materials informatics, ML algorithms learn from existing material data with some properties or/and performance of interest, in order to improve targeted properties and performance of existing materials or/and to design and discover new materials with desired properties and performance. Since material data are usually small, adaptive design with feedback from experiments or/and computations has been proposed to enhance the ML ability. Figure 1.1 shows the adaptive learning and design in materials informatics. Initially successful and failed data, with material properties and performance of interest and material composition, processing, testing conditions, or/and service environments, etc. of adjustable parameters, are generated from experiments, calculations, or/and manufacturing. A dataset is usually built up by one’s own produced data or/and by collecting data from various sources. The material properties and performance of interest are usually called output variables, and the parameters of material composition, processing, or/and testing conditions are normally called input variables. In addition, many molecular, atomic and electronic parameters, thermodynamic properties, kinetic parameters, crystalline and amorphous information, etc. are often employed as input or/and output variables. ML is conducted on an initial dataset. The learning results will be evaluated and interpreted automatically or/and by domain experts, who are authorities in a particular area or topic, and clearly, domain experts in materials informatics are materials experts and scientists. The adaptive learning process is an iterative and interactive loop. The learning results are adopted to design and guide next experiments, calculations, or/and manufacture, and the new results will validate the ML prediction and simultaneously be fed back to the dataset for the next cycle of iterations in the adaptive learning loop, which means that the dataset will grow after each iteration. Obviously, the next round of adaptive learning may give a different result because of more data added, which provides modified guidance on experiments, calculations, or/and manufacture. This means that each step in the adaptive design loop will refine the inputs and outputs more or less during the cycle of iterations, and the iterations will go continuously until the designed goal of material properties is reached. Lookman

下载地址

立即下载

(解压密码:www.teccses.org)

Article Title:《材料信息学导论(上):机器学习基础(英文版)》
Article link:https://www.teccses.org/1401317.html