学术活动

美国中佛罗里达大学王军教授讲座

报告人:Dr. Jun Wang (University of Central Florida, USA)

地点:信息楼229

时间:2015年12月18日 13:30

题目:Analysis and Optimization of Parallel Data Access on Big Data File Systems

 

报告摘要

In this work, we study parallel data access on distributed file systems, e.g., the Hadoop file system. Our experiments show that parallel data read requests are often served data remotely and in an imbalanced fashion. This results in a serious disk access and data transfer contention on certain cluster/storage nodes. We conduct a complete analysis on how remote and imbalanced read patterns occur and how they are affected by the size of the cluster. We then propose a novel method to Optimize Parallel Data Access on Distributed File Systems referred to as Opass. The goal of Opass is to reduce remote parallel data accesses and achieve a higher balance of data read requests between cluster nodes. To achieve this goal, we represent the data read requests that are issued by parallel applications to cluster nodes as a graph data structure where edges weights encode the demands of data locality and load capacity. Then we propose new matching-based algorithms to match processes to data based on the configurations of the graph data structure so as to compute the maximum degree of data locality and balanced access. Our proposed method can benefit parallel data-intensive analysis with various parallel data access strategies. Experiments are conducted on PRObEs Marmot 128-node cluster testbed and the results from both benchmark and well-known parallel applications show the performance benefits and scalability of Opass.

报告人简介:

王军博士现任美国University of Central Florida大学电子工程与计算机科学系计算机系统结构和存储实验室主任。王军博士是美国国家科学基金会杰出青年职业奖(NSF CAREER AWARD) 和美国能源部杰出青年奖获得者 (DOE EARLY CAREER PRINCIPAL INVESTIGATOR AWARD)。他已经在相关领域的高级杂志和一流会议上发表了80多篇论文, 包括IEEE Transactions Computers和 IEEE Transactions on Parallel and Distributed Systems(共12篇,其中通讯作者和第一作者11篇), HPDC, ICS, EUROSYS, MIDDLEWARE, FAST, IPDPS 等等。王教授的论文多次被世界顶级研究人员引用, 包括 UIUC,微软Research, IBM T. J. Watson Research。他所发表论文由Google Scholar统计被期刊引用次数已超过7000次以上(统计到2015年1月31日为止)。王军博士领导的计算机系统结构和存储实验室在过去五年内主持7个科研项目, 共参加十多个科研项目,总共获得超过500万美元的美国联邦基金研究支助。王教授目前承担三个美国国家科学基金研究项目和一个美国航空航天局研究项目。王教授最新主持验收了一个美国国家科学基金研究项目,三年投资近四十万美元开发研究新一代云计算系统平台来有效支持超级高性能大数据分析的应用。王教授是这个项目首席和唯一的研究者。王军博士多次担任美国国家科学基金会评委(总11次), 美国能源部研究项目评委和美国卫生组织研究项目评委,同时担任IEEE Transactions on Parallel and Distributed Systems, 和International Journal of Parallel, Emergent and Distributes Systems (IJPEDS)期刊编委,和多个国际学术会议的程序委员会委员,是第一届国际存储, 虚拟化,性能和能源 会议(SPEED2008)的组织者,第10届IEEE NAS网络,体系结构和存储会议的会议主席, 第7届IEEE NAS网络,体系结构和存储会议的存储项目主席,和第23届IEEE ICCCN会议的Cyber Physical System Cloud panelist。王军博士指导毕业的八个博士生均在美国一流IT公司任职,包括GOOGLE, APPLE, MICROSOFT 和 EMC.