学术报告
Doubly Divided Massive Data for Prediction Using Model Aggregation——吴远山教授(中南财经政法大学)
“2020首师大青年统计论坛”系列报告
题目:Doubly Divided Massive Data for Prediction Using Model Aggregation
报告人:吴远山 教授(中南财经政法大学)
时间:2020年12月10日(周四)下午20:00-21:00
地点:线上腾讯会议(会议号:766 943 031)
Abstract : Nowadays, massive data are often featured with high dimensionality as well as huge sample size, which typically cannot be stored in a single machine and thus make both analysis and prediction challenging. We propose a distributed gridding model aggregation (DGMA) approach to predicting the conditional mean of response, which overcomes the storage limitation of a single machine and the curse of high dimensionality. Specifically, on each local machine that stores partial data of relatively moderate sample size, we develop the model aggregation approach by splitting predictors wherein a greedy algorithm is developed. To obtain the optimal weights across all local machines, we further design a distributed and communication-efficient algorithm. Our procedure effectively distributes the workload and dramatically reduces the communication cost. Theoretically, we establish the prediction error bound of the DGMA method, which can be explicitly expressed in terms of local sample size and communication rounds. We further show that if local sample size or communication rounds are sufficiently large, the proposed method can reach the prediction error bound of the oracle global method that has access to full data.
Extensive numerical experiments are carried out on both simulated and real datasets to demonstrate the feasibility of the DGMA method.
报告人简介:吴远山,中南财经政法大学统计与数学学院教授、博士生导师。主要从事生存分析及相关问题的研究,主持多项国家级和省部级科研项目,在统计学主流学术期刊发表论文20余篇。
联系人:周洁、胡涛
举办单位:77779193永利官网统计系 北京应用统计学会
交叉科学研究院
欢迎全体师生积极参加!