Taxonomy of virtualization techniques 18
Hardware-level virtualization 21
Operating system-level virtualization 24
Programming language-level virtualization 24
Application-level virtualization 24
Other types of virtualization 25
Application server virtualization 26
Understanding Amazon Web Services 42
AWS Components and Services 42
Amazon Virtual Private Cloud (VPC) 53
Amazon Simple Notification Service (SNS) 54
Amazon Simple Email Service (SES) 55
Virtualization covers a wide range of emulation techniques that are applied to different areas of computing. A classification of these techniques helps us better understand their characteristics and use
classification discriminates against the service or entity that is being emulated. Virtualization is mainly used to emulate execution environments, storage, and networks.
In particular we can divide these execution virtualization techniques into two major categories by considering the type of host they require.
Process-level techniques are implemented on top of an existing operating system, which has full control of the hardware.
System-level techniques are implemented directly on hardware and do not require—or require a minimum of support from—an existing operating system.
Within these two categories we can list various techniques that offer the guest a different type of virtual computation environment: bare hardware, operating system resources, low-level programming language, and application libraries.
Execution virtualization includes all techniques that aim to emulate an execution environment.
All these techniques concentrate their interest on providing support for
the execution of programs, whether these are the
operating system,
a binary specification of a program compiled against an abstract machine model
or an application.
Therefore, execution virtualization can be implemented directly on top of the hardware by the operating system
Virtualizing an execution environment at different levels of the computing stack requires a reference model that defines the interfaces between the levels of abstractions, which hide implementation details.
Figure 2: Computing Reference Model
Modern computing systems can be expressed in terms of the reference model described in Figure 2.
At the bottom layer, the model for the hardware is expressed in terms of the Instruction Set Architecture (ISA), which defines the instruction set for the processor, registers, memory, and interrupt management.
ISA is the interface between hardware and software, and it is important to the operating system (OS) developer (System ISA) and developers of applications that directly manage the underlying hardware (User ISA).
The application binary interface (ABI) separates the operating system layer from the applications and libraries, which are managed by the OS. ABI covers details such as low-level data types, alignment, and call conventions and defines a format for executable programs. System calls are defined at this level. This interface allows portability of applications and libraries across operating systems that implement the same ABI.
The highest level of abstraction is represented by the application programming interface (API), which interfaces applications to libraries and/or the underlying operating system.
For any operation to be performed in the application level API, ABI and ISA are responsible for making it happen.
The high-level abstraction is converted into machine-level instructions to perform the actual operations supported by the processor.
The machine-level resources, such as processor registers and main memory capacities, are used to perform the operation at the hardware level .
This layered approach simplifies the development and implementation of computing system and multiple executing environments.
In fact, such a model not only requires limited knowledge of the entire computing stack, but it also provides ways to implement a minimal security model for managing and accessing shared resources
For this purpose, the instruction set exposed by the hardware has been divided into different security classes that define who can operate with them.
The first distinction can be made between privileged and nonprivileged instructions.
Nonprivileged instructions are those instructions that can be used without interfering with other tasks because they do not access shared resources. This category contains, for example, all the floating, fixed-point, and arithmetic instructions.
Privileged instructions are those that are executed under specific restrictions and are mostly used for sensitive operations, which expose or modify the privileged state.
Hardware-level virtualization is a virtualization technique that provides an abstract execution environment in terms of computer hardware on top of which a guest operating system can be run.
In this model, the guest is represented by the operating system
the host by the physical computer hardware
the virtual machine by its emulation
and the virtual machine manager by the hypervisor
The hypervisor is generally a program or a combination of software and hardware that allows the abstraction of the underlying physical hardware.
Hypervisors
A fundamental element of hardware virtualization is the hypervisor, or virtual machine manager (VMM).
It recreates a hardware environment in which guest operating systems are installed.
There are two major types of hypervisor: Type I and Type II (see Figure 3)
Type I hypervisors run directly on top of the hardware. Therefore, they take the place of the operating systems and interact directly with the ISA interface exposed by the underlying hardware, and they emulate this interface in order to allow the management of guest operating systems.
Type II hypervisors require the support of an operating system to provide virtualization services. This means that they are programs managed by the operating system, which interact with it through the ABI and emulate the ISA of virtual hardware for guest operating systems. This type of hypervisor is also called a hosted virtual machine since it is hosted within an operating system.
Conceptually, a virtual machine manager is internally organized as Three main modules, dispatcher, allocator, and interpreter, coordinate their activity in order to emulate the underlying hardware.
The dispatcher constitutes the entry point of the monitor and reroutes the instructions issued by the virtual machine instance to one of the two other modules.
The allocator is responsible for deciding the system resources to be provided to the VM.
The interpreter module consists of Interpreter routines. These are executed whenever a virtual machine executes a privileged instruction: a trap is triggered and the corresponding routine is executed.
Figure 3: Two types of Hypervisors
Hardware-assisted virtualization : This technique was originally introduced in the IBM System/370
At present, examples of hardware-assisted virtualization are the extensions to the x86-64 bit architecture introduced with Intel VT (formerly known as Vanderpool) and AMD V.
After 2006, Intel and AMD introduced processor extensions, and a wide range of virtualization solutions took advantage of them: Kernel-based Virtual Machine (KVM), VirtualBox, Xen, VMware, Hyper-V, Sun xVM, Parallels, and others.
Operating system-level virtualization offers the opportunity to create different and separated execution environments for applications that are managed concurrently.
Differently from hardware virtualization, there is no virtual machine manager or hypervisor, and the virtualization is done within a single operating system, where the OS kernel allows for multiple isolated user space instances.
The kernel is also responsible for sharing the system resources among instances and for limiting the impact of instances on each other.
A user space instance in general contains a proper view of the file system, which is completely isolated, and separate IP addresses, software configurations, and access to devices.
FreeBSD Jails, IBM Logical Partition (LPAR), SolarisZones
Programming language-level virtualization is mostly used to achieve ease of deployment of applications, managed execution, and portability across different platforms and operating systems. It consists of a virtual machine executing the byte code of a program. BCPL, C, Pascal, Java
Application-level virtualization is a technique allowing applications to be run in runtime environments that do not natively support all the features required by such applications.
In this scenario, applications are not installed in the expected runtime environment but are run as though they were.
Such emulation is performed by a thin layer—a program or an operating system component—that is in charge of executing the application.
In this case, one of the following strategies can be implemented:
Interpretation. In this technique every source instruction is interpreted by an emulator for executing native ISA instructions, leading to poor performance. Interpretation has a minimal startup cost but a huge overhead, since each instruction is emulated.
Binary translation. In this technique every source instruction is converted to native instructions with equivalent functions.
Emulation, as described, is different from hardware-level virtualization.
The former simply allows the execution of a program compiled against a different hardware, whereas the latter emulates a complete hardware environment where an entire operating system can be installed
One of the most popular solutions implementing application virtualization is Wine, which is a software application allowing Unix-like operating systems to execute programs written for the Microsoft Windows platform.
Wine features a software application acting as a container for the guest application and a set of libraries, called Winelib, that developers can use to compile applications to be ported on Unix systems
-----------------------
Amazon Web Services
Amazon is the world’s largest online retailer and market leader with wide range of products.
Started as online book seller.
Amazon.com is one of the most important and heavily trafficked Web
sites in the world. Popularity and usage of Amazon resulted in heavy traffic over internet.
It provides a vast selection of products using an infrastructure based on Web services.
As a result Amazon.com has grown its infrastructure to accommodate peak traffic times.
Over time the company has made its network resources available to partners and affiliates, which also has improved its range of products.
Starting in 2006, Amazon.com made its Web service platform available to developers on a usage-basis model.
Through hardware virtualization on Xen hypervisors
so Amazon.com has made it possible to create private virtual servers that you can run worldwide.
These servers can be provisioned with almost any kind of application software you might envisage, and they tap into a range of support services that not only make distributed cloud computing applications possible,
Amazon Web Services is based on SOA standards, including HTTP, REST,
and SOAP transfer protocols, open source and commercial operating systems, application servers, and browser-based access.
Virtual private servers can provision virtual private clouds connected through virtual private networks providing for reasonable security and administrative control.
AWS has a great value proposition: You pay for what you use. While you
may not save a great deal of money over time using AWS for enterprise class Web applications,
Amazon.com is the world’s largest online retailer. The company is a long way past selling books and records.
Amazon.com offers the largest number of retail product SKUs through a large ecosystem of partnerships.
By any measure, Amazon.com is a huge business. To support this business, Amazon.com has built an enormous network of IT systems to support not only average,
but peak customer that demands Amazon Web Services (AWS). This is essentially unused infrastructure capacity on Amazon.com’s network and turns it into a very profitable business.
( http://aws.amazon.com/ ).
AWS is having enormous impact in cloud computing.
Indeed, Amazon.com’s services represent the largest pure Infrastructure as a Service (IAAS) play in the marketplace today.
The structure of Amazon.com’s Amazon Web Services (AWS) is therefore highly educational in understanding just how disruptive cloud computing can be to traditional fixed asset IT deployments,
virtualization enables a flexible approach to system rightsizing, and how dispersed systems can impart reliability to mission critical systems.
Amazon Web Services is comprised of a number of components like
EC2, S3, Amazon SimpleDB, Amazon Relational Database Service, Amazon Elastic MapReduce is an interactive data analysis tool for performing indexing, data mining, file analysis, machine learning, financial analysis
At the base of the solution stack are services that provide raw compute and raw storage: Amazon Elastic Compute (EC2) and Amazon Simple Storage Service (S3).
The largest component of Amazon’s offerings is Amazon’s
Elastic Compute Cloud (EC2)
Amazon Elastic Compute Cloud (EC2; http://aws.amazon.com/ec2/ ), is the central application in the AWS portfolio.
Amazon (EC2) is a virtual server platform that allows users to create and run virtual machines on Amazon’s server.
With EC2, you can launch and run server instances called Amazon Machine Images (AMIs) running different operating systems.
You can add or subtract virtual servers elastically as needed. The term elastic refers to the ability to size your capacity quickly as needed.
Amazon Machine Images (AMIs) are templates from which it is possible to create a virtual machine.
They are stored in Amazon S3 and identified by a unique identifier in the form of ami-xxxxxx and manifest XML file.
An AMI contains a physical file system layout with a predefined operating system installed. These are specified by the Amazon Ramdisk Image (ARI, id: ari-yyyyyy) and the Amazon Kernel Image (AKI, id: aki-zzzzzz), which are part of the configuration of the template.
AMIs are either created from scratch or “bundled” from existing EC2 instances.
A common practice is to prepare new AMIs to create an instance from a preexisting AMI, log into it once it is booted and running, and install all the software needed.
Using the tools provided by Amazon, we can convert the instance into a new image.
Once an AMI is created, it is stored in an S3 bucket and the user can decide whether to make it available to other users or keep it for personal use.
Finally, it is also possible to associate a product code with a given AMI, thus allowing the owner of the AMI to get revenue every time this AMI is used to create EC2 instances.
EC2 instances represent virtual machines.
They are created using AMI as templates - represented by selecting their computing power, and the installed memory.
The processing power is expressed in terms of virtual cores and EC2 Compute Units (ECUs).
The ECU is a measure of the computing power of a virtual core;
it is used to express a predictable quantity of real CPU power that is allocated to an instance.
By using compute units instead of real frequency values, Amazon can change computing power over time
the mapping of such units to the real amount of computing power allocated, keeps the performance of EC2 instances consistent with standards set by the times.
Over time, the hardware supporting the underlying infrastructure
will be replaced by more powerful hardware, and the use of ECUs helps give users a consistent view of the performance offered by EC2 instances.
Users rent computing capacity rather than buying hardware, this approach is reasonable.
Table shows all the currently available configurations
We can identify
Standard instances. This class offers a set of configurations that are suitable for most applications. EC2 provides three different categories of increasing computing power, storage, and memory.
Micro instances. This class is suitable for those applications that consume a limited amount of computing power and memory. Micro instances can be used for small Web applications with limited traffic.
High-memory instances. This class targets applications that need to process huge workloads and require large amounts of memory. Three-tier Web applications characterized by high traffic are the target profile.
High-CPU instances. This class targets compute-intensive applications. Two configurations are available where computing power proportionally increases more than memory.
Cluster Compute instances. This class is used to provide virtual cluster services. Instances in this category are characterized by high CPU compute power and large memory and an extremely high I/O and network performance, which makes it suitable for HPC applications.
Cluster GPU instances. This class provides instances featuring graphic processing units (GPUs) and high compute power, large memory, and extremely high I/O and network performance. This class is particularly suited for cluster applications that perform heavy graphic computations, such as rendering clusters. Since GPU can be used for general-purpose computing, users of such instances can benefit from additional computing power, which makes this class suitable for HPC applications.
EC2 instances are executed within a virtual environment, which provides them with the services they require to host applications.
The EC2 environment is in charge of allocating addresses, attaching storage volumes, and configuring security in terms of access control and network connectivity.
By default, instances are created with an internal IP address, which makes them capable of communicating within the EC2 network and accessing the Internet as clients.
It is possible to associate an Elastic IP to each instance, which can then be remapped to a different instance over time.
Elastic IPs allow instances running in EC2 to act as servers
Together with an external IP, EC2 instances are also given a domain name that generally is in the form ec2-xxx-xxx-xxx.compute-x.amazonaws.com.
Currently, there are five availability zones that are priced dif ferently: two in the United States (Virginia and Northern California), one in Europe (Ireland), and two in Asia Pacific (Singapore and Tokyo).
Instance owners can partially control where to deploy instances. Instead, they have a finer control over the security of the instances as well as their network accessibility.
Instance owners can associate a key pair to one or more instances when these instances are created. A key pair allows the owner to remotely connect to the instance once this is running and gain root access to it.
Amazon EC2 controls the accessibility of a virtual instance with basic firewall configuration, allowing the specification of source address, port, and protocols (TCP, UDP, ICMP).
Rules can also be attached to security groups, and instances can be made part of one or more groups before their deployment.
Security groups and firewall rules constitute a flexible way of providing basic security for EC2 instances, which has to be complemented by appropriate security configuration within the instance itself.
S3 key concepts
As the name suggests, S3 has been designed to provide a simple storage service that’s accessible through a Representational State Transfer (REST) interface, which is quite similar to a distributed file system but which presents some important differences that allow the infrastructure to be highly efficient:
The storage is organized in a two-level hierarchy
S3 organizes its storage space into buckets that cannot be further partitioned.
So it is not possible to create directories or other kinds of physical groupings for objects stored in a bucket.
Stored objects cannot be manipulated like standard files.
S3 has been designed to essentially provide storage for objects that will not change over time.
Therefore, it does not allow renaming, modifying, or relocating an object.
Once an object has been added to a bucket, its content and position is immutable
Only way to change it is to remove the object from the store and add it again.
Content is not immediately available to users.
The main design goal of S3 is to provide an eventually consistent data store.
As a result, because it is a large distributed storage facility, changes are not immediately reflected.
For instance, S3 uses replication to provide redundancy and efficiently serve objects across the globe; this practice introduces latencies when adding objects to the store—especially large ones—which are not available instantly across the entire globe.
Requests will occasionally fail.
Due to the large distributed infrastructure being managed, requests for object may occasionally fail.
Under certain conditions, S3 can decide to drop a request by returning an internal server error.
Therefore, it is expected to have a small failure rate during day-to-day operations, which is generally not identified as a persistent failure.
Amazon provides facilities to structure and facilitate the communication among existing applications and services residing within the AWS infrastructure.
These facilities can be organized into two major categories: virtual networking and messaging.
Virtual networking comprises a collection of services that allow AWS users to control the connectivity to and between compute and storage services. Amazon Virtual Private Cloud (VPC) and Amazon Direct Connect provide connectivity solutions in terms of infrastructure.
Amazon VPC provides a great degree of flexibility in creating virtual private networks within the Amazon infrastructure
The service providers prepare either templates covering most of the usual scenarios or a fully customizable network service for advanced configurations.
Prepared templates include public subnets, isolated networks, private networks accessing Internet through network address translation (NAT), and hybrid networks including AWS resources and private resources.
Allows AWS users to create dedicated networks between the user private network and Amazon Direct Connect locations, called ports.
This connection can be further partitioned in multiple logical connections and give access to the public resources hosted on the Amazon infrastructure.
The advantage of using Direct Connect versus other solutions is the consistent performance of the connection between the users’ premises and the Direct Connect locations.
This service is compatible with other services such as EC2, S3, and Amazon VPC and can be used in scenarios requiring high bandwidth between the Amazon network and the outside world.
Messaging services constitute the next step in connecting applications by leveraging AWS capabilities. The three different types of messaging services offered are Amazon Simple Queue Service (SQS), Amazon Simple Notification Service (SNS), and Amazon Simple Email Service (SES).
Amazon SQS constitutes the model for exchanging messages between applications by means of message queues, hosted within the AWS infrastructure.
Using the AWS console or directly the underlying Web service AWS, users can create an unlimited number of message queues and configure them to control their access.
Applications can send messages to any queue they have the access.
These messages are securely and redundantly stored within the AWS infrastructure for a limited period of time, and they can be accessed by other (authorized) applications.
While a message is being read, it is kept locked to avoid spurious processing from other applications.
Such a lock will expire after a given period.
Amazon SNS provides a publish-subscribe method for connecting heterogeneous applications.
With respect to Amazon SQS, where it is necessary to continuously poll a given queue for a new message to process, Amazon SNS allows applications to be notified when new content of interest is available.
This feature is accessible through a Web service whereby AWS users can create a topic, which other applications can subscribe to.
At any time, applications can publish content on a given topic and subscribers can be automatically notified.
The service provides subscribers with different notification models (HTTP/HTTPS, email/email JSON, and SQS).
Amazon SES provides AWS users with a scalable email service that leverages the AWS infrastructure.
Once users are signed up for the service, they have to provide an email that SES will use to send emails on their behalf.
To activate the service, SES will send an email to verify the given address and provide the users with the necessary information for the activation.
Upon verification, the user is given an SES sandbox to test the service, and he can request access to the production version.
Using SES, it is possible to send either SMTP-compliant emails or raw emails by specifying email headers and Multipurpose Internet Mail Extension (MIME) types.
Emails are queued for delivery, and the users are notified of any failed delivery. SES also provides a wide range of statistics that help users to improve their email campaigns for effective communication with customers.