Tech-Flix: Amazon S3

Amazon S3

Saturday, September 20, 2008 10:38 PM By Dhina

Amazon S3 is storage for the Internet. It is designed to make web-scale computing easier for developers.

Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers.

Amazon S3 is intentionally built with a minimal feature set.

Write, read, and delete objects containing from 1 byte to 5 gigabytes of data each. The number of objects you can store is unlimited.
Each object is stored in a bucket and retrieved via a unique, developer-assigned key.
A bucket can be located in the United States or in Europe. All objects within the bucket will be stored in the bucket's location, but the objects can be accessed from anywhere.
Authentication mechanisms are provided to ensure that data is kept secure from unauthorized access. Objects can be made private or public, and rights can be granted to specific users.
Uses standards-based REST and SOAP interfaces designed to work with any Internet-development toolkit.
Built to be flexible so that protocol or functional layers can easily be added. Default download protocol is HTTP. A BitTorrent(TM) protocol interface is provided to lower costs for high-scale distribution. Additional interfaces will be added in the future.
Reliability backed with the Amazon S3 Service Level Agreement.

Amazon S3 is based on the idea that quality Internet-based storage should be taken for granted. It helps free developers from worrying about how they will store their data, whether it will be safe and secure, or whether they will have enough storage available. It frees them from the upfront costs of setting up their own storage solution as well as the ongoing costs of maintaining and scaling their storage servers. The functionality of Amazon S3 is simple and robust: Store any amount of data inexpensively and securely, while ensuring that the data will always be available when you need it. Amazon S3 enables developers to focus on innovating with data, rather than figuring out how to store it.

Amazon S3 was built to fulfill the following design requirements:

Scalable: Amazon S3 can scale in terms of storage, request rate, and users to support an unlimited number of web-scale applications. It uses scale as an advantage: Adding nodes to the system increases, not decreases, its availability, speed, throughput, capacity, and robustness.
Reliable: Store data durably, with 99.99% availability. There can be no single points of failure. All failures must be tolerated or repaired by the system without any downtime.
Fast: Amazon S3 must be fast enough to support high-performance applications. Server-side latency must be insignificant relative to Internet latency. Any performance bottlenecks can be fixed by simply adding nodes to the system.
Inexpensive: Amazon S3 is built from inexpensive commodity hardware components. As a result, frequent node failure is the norm and must not affect the overall system. It must be hardware-agnostic, so that savings can be captured as Amazon continues to drive down infrastructure costs.
Simple: Building highly scalable, reliable, fast, and inexpensive storage is difficult. Doing so in a way that makes it easy to use for any application anywhere is more difficult. Amazon S3 must do both.

A forcing-function for the design was that a single Amazon S3 distributed system must support the needs of both internal Amazon applications and external developers of any application. This means that it must be fast and reliable enough to run Amazon.com's websites, while flexible enough that any developer can use it for any data storage need.

Amazon S3 Design Principles

The following principles of distributed system design were used to meet Amazon S3 requirements:

Decentralization: Use fully decentralized techniques to remove scaling bottlenecks and single points of failure.
Asynchrony: The system makes progress under all circumstances.
Autonomy: The system is designed such that individual components can make decisions based on local information.
Local responsibility: Each individual component is responsible for achieving its consistency; this is never the burden of its peers.
Controlled concurrency: Operations are designed such that no or limited concurrency control is required.
Failure tolerant: The system considers the failure of components to be a normal mode of operation, and continues operation with no or minimal interruption.
Controlled parallelism: Abstractions used in the system are of such granularity that parallelism can be used to improve performance and robustness of recovery or the introduction of new nodes.
Decompose into small well-understood building blocks: Do not try to provide a single service that does everything for everyone, but instead build small components that can be used as building blocks for other services.
Symmetry: Nodes in the system are identical in terms of functionality, and require no or minimal node-specific configuration to function.
Simplicity: The system should be made as simple as possible (but no simpler).

Tech-Flix

Amazon S3

0 comments: