CHINA·77779193永利(集团)有限公司-Official website

学术报告

Distributed Subdata Selection for Big Data via Sampling-Based Approach——张海祥 副教授(天津大学应用数学中心)

“2020首师大青年统计论坛”系列报告

题目:Distributed Subdata Selection for Big Data via Sampling-Based Approach

报告人:张海祥 副教授(天津大学应用数学中心)

时间:2020年12月30周三 下午20:00-21:00

地点:线上腾讯会议(会议号:766 943 031)

Abstract : With the development of modern technologies, it is possible to gather an extraordinarily large number of observations. Due to the storage or transmission burden, big data are usually scattered at multiple locations. It is difficult to transfer all of data to the central server for analysis. A distributed subdata selection method for big data linear regression model is proposed. Particularly, a two-step subsampling strategy with optimal subsampling probabilities and optimal allocation sizes is developed. The subsample-based estimator effectively approximates the ordinary least squares estimator from the full data.  The convergence rate and asymptotic normality of the proposed estimator are established. Simulation studies and an illustrative example about airline data are provided to assess the performance of the proposed method.

报告人简介:张海祥,天津大学应用数学中心副教授,硕士生导师。2012年于吉林大学获得博士学位,中国科学院和美国西北大学博士后。主要研究方向包括: 大数据统计推断、中介分析、微生物组数据分析等。已经在Statistica Sinica,Bioinformatics, Computational Statistics and Data Analysis, Journal of Multivariate Analysis等国际期刊上发表学术论文27篇。

联系人:周洁、胡涛

举办单位:77779193永利官网统计系 、北京应用统计学会、

交叉科学研究院