Amazon Web Services Accommodates Big Data Storage
Amazon's new High Storage EC2 package is customized for jobs with large volumes of data and high throughput
Fri, December 21, 2012
IDG News Service (New York Bureau) — Eyeing the growing market for big data analysis, Amazon Web Services (AWS) has introduced a storage package, called High Storage, that can offer fast access to large amounts of data.
High Storage, an Amazon Elastic Compute Cloud (EC2) package, is designed to run data intensive analysis jobs, such as seismic analysis, log processing and data warehousing, according to the company. It is built on a parallel file system architecture that allows data to be moved on and off multiple disks at once, speeding throughput times.
"Instances of this family provide proportionally higher storage density per instance, and are ideally suited for applications that benefit from high sequential I/O performance across very large data sets," AWS states in the online marketing literature for this service. The company is pitching the service as a complement to its Elastic MapReduce service, which provides a platform for Hadoop big data analysis. AWS itself is using the High Storage instances to power its Redshift data warehouse service.
An AWS instance is a bundle of compute units, memory, storage and other services configured to the characteristics of a particular type of workload. High Storage is the ninth type of compute instance that AWS has introduced. It joins other instant types customized for particular workloads, such as instances optimized for using GPUs (graphics processing units) or for HPC (high performance computing) jobs.
The High Storage instance offers 35 EC2 compute units (ECUs) of compute capacity and 117GB of working memory. Up to 48TB of storage is spread across 24 direct attached storage (DAS) hard disk drives. Spreading data across multiple disks can speed data transfers because the read-and-write speed of a single disk is no longer a bottleneck. The system can offer more than 2.4GB per second of sequential I/O performance.
Customers can evoke High Storage instances from the AWS Management Console, from the EC2 or Elastic MapReduce command lines, or from the AWS SDK (software development kit) or third-party libraries. The High Storage instance is currently available on the U.S. east coast and will be available in other parts of the world in the next few months. High Storage instances can be purchased ether on-demand or be reserved ahead of time at reduced cost.