HPL Single Node Evaluation
21.4.2 HPL Single Node Evaluation
In order to determine whether the effects we have seen using DGEMM scale to multiple nodes, we use the HPL benchmark (Petitet et al., 2010 ) that solves a system of linear equations via LU factorization. HPL can scale to large compute grids. However, we first consider HPL given similar matrices to the previous DGEMM experiments, namely, solving a system of linear equations on a single node where the data does fit and does not fit into the LLC of the processor. In the following section, we extend HPL to examine internode scaling using parallel algorithms.
In Fig. 21.5 we repeatedly execute HPL with n = 25 k on the Amazon EC2 c1.xlarge instance. We plot average and minimum execution times against the num- ber of cores. As with the DGEMM experiments, error bars show one standard
Minimum Average
1600 min 1 core = 1203.42 min 2cores = 612.490
1400 min 4cores = 321.110 min 6cores = 228.250 min 8cores = 182.440
ime (secs) 800 T 600 400 200
Number of cores
Fig. 21.5 Average execution time of repeated HPL with n = 25 k on c1.xlarge instance by number of cores used. The matrices cannot fit within the last-level cache. The input configuration to each execution of HPL is identical. Error bars on average show one standard deviation
21 HPC on Competitive Cloud Resources 505 deviation. As with the first DGEMM experiment, these matrices do not fit within
the LLC of the processor, but do fit within the main memory of a single instance. The results are quite similar to DGEMM, but show slightly different trends. We do not directly compare the DGEMM and HPL experiments because of the use of very different algorithms. We only point out that the trends show similar behavior on a single node.
In the last single node figure, we show HPL with a matrix of size n = 1 k so that all data for the computation fits within the LLC. Figure 21.6 plots the average (and one standard deviation) and minimum execution times of repeated runs of the HPL benchmark. The results show similar behavior to the corresponding DGEMM experiment. We could not find a source for the anomalous improved performance at six cores as compared to four or eight. In the following section, we extend HPL to multiple nodes to take a look at the effects of contention on parallel computations in a commercial cloud environment.
Minimum Average min 1 core = 0.100 50 min 2cores = 0.060 min 4cores = 0.070
min 6cores = 0.070 min 8cores = 0.090
30 T ime (secs) 20
Number of cores
Fig. 21.6 Average execution time of repeated HPL with n = 1 k on c1.xlarge instance by number of cores used. The matrices fit within the last-level cache. The input configuration to each execution of HPL is identical. Error bars on average show one standard deviation
From the single node experiments we conclude that the very high variance in performance implies that the best achieved performance is not a good measure of expected performance. The average expected performance can be several orders of magnitude worse depending on the cache behavior of the algorithm in both efficiency and time to solution. In our experiments, the best average expected per- formance is obtained on Amazon using much less of the machine than is allocated: In an experiment accessing a large amount of main memory, using only a quarter of the machine provided the best expected performance!
506 P. Bientinesi et al.
Parts
» Cloud Computing Model Application Methodology
» Cloud Computing in Development/Test
» Cloud-Based High Performance Computing Clusters
» Case Study: Cloud as Infrastructure for an Internet Data Center (IDC)
» Case Study – Cloud Computing for Software Parks
» Case Study – an Enterprise with Multiple Data Centers
» Service-Management Integration
» Cloud Deployment Models and the Network
» Latency, Bandwidth, and Scale – The Span
» Security, Resiliency, and Service Management – The Superstructure
» Network Architecture for Hybrid Cloud Deployments
» Characteristics of Data-Intensive Computing Systems
» Related Work on DS Scheduling
» Scheduling Issues Inside Service Oriented Environments
» Scheduling Through Negotiation
» Prototype Implementation Details
» Basics of Grid and Cloud Computing
» Service Orientation and Web Services
» Scheduling, Metascheduling, and Resource Provisioning
» Interoperability in Grids and Clouds
» Security and User Management
» Modeling and Simulation of Clouds and Grids
» Enterprise Knowledge Cloud Technologies
» NC State University Cloud Computing Implementation
» Computational/Data Node Network
» Integrating High-Performance Computing into the VCL
» Large Scale Workflow Applications
» Overview of SwinDeW-G Environment
» SwinDeW-C System Architecture
» History on Enterprise IT Services
» Remote Sensing Geo-Reprojection
» Singular Value Decomposition
» System Model for Auction in CACM
» Multi-objective Genetic Algorithm
» Anagram Based GrADS Data Distribution Service
» Hyrax Based Five Dimension Distribution Data Service
» Design and Implementation of an Instrument Service for NetCDF Data Acquisition
» A Weather Forecast Quality Evaluation Scenario
» Implementation of the Grid Application
» Overview of Amazon EC2 Setup
» DGEMM Single Node Evaluation
» Data Clouds: Emerging Technologies
» Cloud Architecture as Foundation of Cloud-Based Scientific Applications
» Emergence of Cloud-Based Scientific Computational
» Setup and Experiment on Tiny Cloud Infrastructure
» On Economical Use of the Enterprise Cloud
» Toward Integration Of Private and Public Enterprise Cloud Environment
» Brief Discussion of Medical Standards
» Objective 1: A Service Oriented Architecture for Interfacing Medical Messages
» Objective 4: A Globally Distributed Dynamically Scalable Cloud Based Application Architecture
» Distributing Multidimensional Environmental Data
» Enabling the NetCDF Java Interface to S3
» The NetCDF Service Architecture
» NetCDF Service Deployment Scenarios
» Evaluation of S3- and EBS-Enabled NetCDF Java Interfaces
Show more