CHINA·77779193永利(集团)有限公司-Official website

学术报告

Optimal distributed subsampling under heterogeneity

CHINA·77779193永利(集团)有限公司-Official website

题目:Optimal distributed subsampling under heterogeneity

报告人:王磊 南开大学统计与数据科学学院

摘要:Distributed subsampling approaches have been proposed to process massive data in a distributed computing environment, where subsamples are taken from each site and then analyzed collectively to address statistical problems when the full data is not available. In this paper, we consider that each site involves a common parameter and site-specific nuisance parameters and then formulate a unified framework of optimal distributed subsampling under heterogeneity for general optimization problems with convex loss functions that could be nonsmooth. By establishing the consistency and asymptotic normality of the distributed subsample estimators for the common parameter of interest, we derive the optimal subsampling probabilities and allocation sizes under the A- and L-optimality criteria. A two-step algorithm is proposed for practical implementation and the asymptotic properties of the resultant estimator are established. For nonsmooth loss functions, an alternating direction method of multipliers method and a random perturbation procedure are proposed to obtain the subsample estimator and estimate the covariance matrices for statistical inference, respectively. The finite-sample performance of linear regression, logistic regression and quantile regression models is demonstrated through simulation studies and an application to the National Longitudinal Survey of Youth Dataset is also provided.

报告人简介:王磊,南开大学统计与数据科学学院教授、博导、百名青年学科带头人。研究方向是统计学习和复杂数据分析,已在统计学期刊Biometrika、Science China Mathematics、AOAS、Bernoulli、JCGS、Statistica Sinica、KBS及交叉学科期刊Current Biology和eLife 发表学术论文共90多篇,主持3项国家自然科学基金和1项天津市自然科学基金项目。

报告时间:2025年6月10日(星期二) 10:00-11:00

报告地点:#腾讯会议:700-479-742

联系人:胡涛