Oracle Database 10g - Automatic Storage Management. ASM Addresses these Challenges. DBA Storage Management Challenges. Automatic Storage. Oracle Automatic Storage Management Administrator's Guide, 11g Release 2 ( ) . Oracle Automatic Storage Management 11g Release 2 () New. Performance expert in a box. • Integrate all managers together. • Automatically provides database-wide performance diagnostic, including RAC. • Real-time.
|Language:||English, Spanish, Japanese|
|Distribution:||Free* [*Registration needed]|
Oracle Internal & Oracle Academy Use Only Oracle Database 12c: Oracle Automatic Storage Management Administration Activity Guide DGC10 Edition. Automatic Storage Management is a new feature in Oracle ASM was specifically designed to operate with Oracle database files. ASM is http://www ryaleomitsuvi.cf Oracle Automatic Storage Management (ASM) is a well-known, largely used multi -platform volume manager and file system, designed for.
It would be better to help our partners improve their volume managers and filesystems rather than compete with them. We should patent the technology for spreading data over many disks and license it to our volume manager partners. The lack of clustered filesystems was not such an important issue because very few customers ran Oracle Parallel Server. Implementing my proposed filesystem was a huge task that was not as important as providing other smaller features that our customers have requested.
No customer had asked us to solve their storage manageability problems. This view prevailed for over two years. The e-mail did influence a few of the features implemented in Oracle Database 9i.
Oracle-managed files OMF was a direct result of the proposed feature to create and delete files automatically as needed for database operations such as creating or deleting Foreword a tablespace. This is a richer filesystem interface for Oracle to create, delete, read, and write files. It is tailored to work with OMF to simplify management. This project produced a white paper on how to achieve the same load-balancing results as my proposal using existing volume manager technology.
The configurations were tested and shown to perform well without the need to place data carefully. The only problem was that it was difficult to add storage to a system using the SAME methodology.
It required downtime for restripping. This problem was an important validation of the underlying assumptions in the ASM design. I refined the design to flesh out some details and make the claims more believable. People in other areas of the company saw this as an important manageability feature that would help Oracle compete with other database systems.
By implementing this design in user mode, it would not be available to other database vendors. If features of automatically load balancing the disks were available in the OS, then our competitors could use them as well.
In any case, the volume manager companies were not interested in licensing a feature they did not invent. That was just human nature. RAC was becoming much more important and the lack of a clustered filesystem was becoming an important problem.
It was understandable that customers did not ask us to solve their storage management problems since we never had any features in that area. Describing my proposal to people in other areas of the company eventually succeeded. They brought the proposal to the attention of top management and recommended that it be funded.
By , the project was approved and resources allocated to the project. However, my first task was to refine the design further to the point where developers could create detailed design and functional specifications. This task was not my only responsibility at the time, so it took several months.
By , Rich Long a coauthor of this book became the development manager for the project. He formed a team of six developers and the project took off.
ASM became my full time job. Several months were spent producing a detailed design document before coding began. The resulting design was amazingly close to the proposal in my first e-mail. We did not use any of the existing media server filesystem code, but some of the developers from that project also worked on ASM. We could not eliminate the need for filenames completely. People needed meaningful names for their files, so we developed a traditional directory tree of filenames and automatically created filenames.
We never implemented any real-time performance guarantees. The apple pie and motherhood feature would have been nice, but one ever has enough time to implement everything. Otherwise, ASM lives up to all the promises in that first e-mail. The most difficult design challenge was managing the metadata so that it was redundant and evenly spread across all the disks, even when disks were added or dropped.
Some of the metadata related to just one disk so they could be located at fixed locations on that disk, and xix xx Oracle Automatic Storage Management they did not need to be redundant because they would not be needed if the disk failed. However, most of the metadata had to be relocatable and redundant.
An important design feature was the ability to move an entire diskgroup from one set of disks to another. We solved this problem by keeping the metadata in files just like database data. The metadata file extents are mirrored and relocated just like database file extents. This elegant solution works well, but causes many tricky coding issues to deal with the recursive nature of the metadata. For example, file 1 contains one block for each file describing the size of the file and how to find all the extents of the file.
Block 1 of file 1 describes file 1 itself. Most of the development was done under the name OSM. A few months before the product announcement at Oracle Open World, the marketing department decided the name should be Automatic Storage Management ASM to stress how management of the database is becoming more automated.
ASM implements the storage grid for Oracle databases. The first release of ASM had phenomenal acceptance by customers. This level of adoption is unheard of for a first release feature. The most important new feature is the ability to bring disks back online after a temporary failure that does not lose data. This was a significant development effort that had been planned for the first release of ASM, but the feature had to be dropped to make the release date for Oracle Database 10g Release 1.
Now you have this book, written by the people who know ASM better than anyone else. Nitin and I have coauthored a number of ASM white papers. This book contains a lot of practical information about how to use ASM in specific situations. Some statements about ASM are made by people who jump to incorrect conclusions based on misunderstandings. Some of the claims about ASM sound like marketing hype.
When you understand what is really going on, you will understand the advantages and limitations of ASM. You will see there is no magic. This book will give you the knowledge you need to succeed in deploying and maintaining a database that uses ASM. T —Nitin Vengurlekar This book has been a real team effort, with each of us sharing several of the responsibilities and providing a helping hand where required, all with the goal of having a wonderful book as this one, on time without any compromise on quality.
This is the first time I have coauthored a book and would like to take this opportunity to thank Nitin and Rich for their patience through the slow periods of my writing process and showing me the correct direction in completing my chapters. I am always without words in expressing my thanks to Jaya and my two children, Grishma and Nabhas, for accepting the unbelievable amount of time that I have been away from home due to my assignments and travel.
Although we missed each other every week, we just kept on going with the understanding that these sacrifices are all for the benefit of the family.
I love you all so much. To the technical review team Charles Kim and Phil Newlan, you guys were awesome in providing excellent feedback.
I thank Zafar Mahmood and Anthony Fernandez for the great comments. I wish you had the time to contribute to the entire project. Thanks for all your support and patience the several times that I missed my deadlines, while keeping up the pressure and encouragement to get this out on time. Thanks are also in order for Madhu Bhardwaj and her team for their great patience and for putting it all together.
In addition to providing sustenance to my family, you have made learning and solving issues an everyday challenge and benefit. Thank you all for your business. I am proud to have been involved in such an incredible project, and I hope that you benefit from the efforts of so many to bring this book into print. Without their amazing work we would not have a product to write about.
Thanks to Angelo Pruscino for obtaining funding for the ASM project and for giving me the opportunity to lead the development team. Thanks to Nadim Salah, who has been an invaluable mentor and friend.
Thanks to Annie Chen for offering me a development position at Oracle when my original team dissolved immediately after I joined the company. Thanks to all of my friends who put up with my long work hours. As my colleagues who have received writing reviews from me know, my high school English teacher Mary Mecom had a major influence on my writing. I hope that this book honors her memory. Nitin had the idea to write this book, and asked me to help.
Murali has helped us navigate the waters of the publishing industry. He has also managed to transform my scribbles into figures that I hope have helped explain ASM. Not surprisingly, the adoption rate for ASM has been outstanding. Users were fervent about getting information on how ASM works and how they could integrate it into their environment. Then we will quickly review ASM at a high level and its benefits. We have designed this book to provide the reader with the essentials of ASM and the best practices for implementation.
This design includes a chapter-by-chapter process flow for enabling, configuring, and managing ASM in any environment, as well as ensuring that the databases relational database management system [RDBMS] instances are configured correctly to leverage ASM.
Bill walks through the thought process through which he came up with the idea of ASM. This book has a unique feel, because the authors have either been associated with ASM since its inception or have been implementing ASM in production environments from the day that ASM was released.
This book is based on our background in the development of ASM and our experiences testing and configuring ASM and helping customers migrate to the technology. The book does not cover things that can be easily found on the Internet or in Oracle documentation, such as ASM SQL command reference and basic database operations and concepts. Rather than replicate the reference material contained in the Oracle documentation, the authors decided to jam-pack this book with information that would be most useful to the user migrating, implementing, or managing ASM.
This book also assumes that the reader is familiar with Oracle Database concepts and is familiar with common Oracle terminology. ASM Overview ASM is a management tool designed specifically to simplify database storage management by building filesystem and volume manager capabilities into the Oracle database kernel. This design simplifies storage management tasks, such as creating and laying out databases and disk space management.
ASM also offers a simple, consistent storage management interface across all server and storage platforms. In addition to the aforementioned benefits for single-instance databases, ASM provides a clustered filesystem for Real Application Cluster RAC databases and a consistent clusterwide namespace for database files.
ASM diskgroups provide much simpler management than shared raw devices while providing the same performance.
Because ASM enables the user to manage disks using familiar create, alter, and drop SQL statements, DBAs do not need to learn a new skill set or make crucial decisions on provisioning. The following is a high-level overview of the benefits of ASM: Reduces costs of storage management and increases utilization: Works with any type of storage array from modular storage to network attached storage NAS devices to storage area network SAN disk arrays. Keep in mind that several of these components——such as Fibre Channel FC and small computer system interface SCSI ——cannot be easily compartmentalized into a specific category because they actually belong in several areas.
T The chapter starts with disk drive technology and works its way up to disk interfaces, storage architecture, host bus adapters HBAs , and finally host-side components such as logical volume managers, which is exactly where ASM picks up. Disk Drive Technology The terms hard disk drive, spindle, disk drive, and disk unit are all used interchangeably. In our discussion, we use the term disk drive.
From a component level, all disk drives are created essentially the same——that is, they all use the same types of parts. These parts include several thousand of miniaturized components and several movable parts. The focus of this section is on platters, heads, and actuators.
Platters are circular, flat disks typically made from aluminum or a glass substrate material. They are coated on both sides with a magnetically corrosive media that provides a surface area in which discrete data bits can be recorded into a series of tracks. Platters have holes cut out from the center disk, similar to a small donut hole. With these holes, the platters are mounted and secured onto a spindle. The platters are driven by a special spindle motor see Figure connected to the spindle, allowing it to rotate at very high speeds.
The sliders are mounted onto arms. Heads, sliders, and arms all are connected into a single assembly and positioned over the surface of the disk by a unit called an actuator. The surface of each platter, which can hold billions of bits of data, is organized into larger units to simplify organization and enable faster access to the data recorded on the platter.
Each platter has two heads, one on the top of the platter and one on the bottom, so that a hard disk with three platters has six surfaces and six total heads. As illustrated in Figure , each platter has its information recorded in concentric circles called tracks. Each track is further broken down into smaller pieces called sectors, each of which holds bytes of information. However, most Chapter 1: A cylinder is a logical grouping of a single track that all heads can read without perfoming a seek operation, think of a small slice of pie.
Nevertheless, for addressing purposes, a cylinder is equivalent to a track. Disk platters 3 4 Oracle Automatic Storage Management Disk Drive Performance The measure of performance for the disk drive shown in Figure is dictated by the following factors: This is measured in revolutions per minute rpm. So for a disk with 5 milliseconds ms seek time and 4ms rotational latency, it could take about 9ms between the moment the disk initiates the read request to the moment when it actually starts reading data.
The advent of faster disks, which essentially means disks that spin faster at a higher rpm , translates into a reduced rotational delay, with rotational delays typically around 4ms.
However, by the same measure, the seek time has not decreased significantly over the years. Seek time and seek speed Chapter 1: Seek time is one of the most commonly discussed metrics for hard disks, and it is one of the most important positioning performance specifications. Switching between tracks requires the head actuator to move the head arms physically, which, because it is a mechanical process, takes a specific amount of time.
The amount of time required to switch between two tracks depends on the distance between the tracks. However, there is a certain amount of overhead involved in track switching, so the relationship is not linear. Seek time is normally expressed in milliseconds, with average seek times for most drives today in a rather tight range of 5—7ms. However, a millisecond is an enormous amount of time, considering that the speed of the system bus is measured in nanoseconds.
That is a difference of about a million. There are three different seek time specifications defined by disk drive manufacturers: This is similar in concept to the often advertised track switch time, and is usually around 0. The range is typically between At first glance, it is obvious that it takes a long time for a head to move just to the next track as compared to the head moving across a thousand tracks.
This is because a significant portion of the seek time is really the settling time and is a predetermined delay programmed in the drive electronics. Considering the preceding observation, it makes sense to have many fewer expensive 10K or even 7, rpm disks than to have a few expensive 15K rpm disks. Moreover, as the drive capacity increases within a particular family of drives, the performance per physical drive does not increase.
Thus when deploying an appropriate configuration, it is often more important to configure the system for the number of drives required for performance rather than to configure only for the required capacity, striking a balance between performance and capacity. Disk Interface The disk drive interface is the language or protocol that a drive uses to communicate with other servers. There are three main types of drive interfaces: The ATA standard, first developed in , defines a command and register set for the interface between the disk drive and the PC.
ATA is currently the standard hard disk drive interconnect in desktop PCs and desktop systems. Storage Stack Overview most disk functions. As with most parallel architecture, ATA drives run at halfduplex, meaning that they can send data in one direction at a time. Target devices may include devices such as disks, backup tape devices, optical storage devices as well as printers and scanners.
The client is typically a host system such as file server that issues requests to read or write data. The server is a resource such as a disk array that responds to client requests. In storage parlance, the client is an initiator and plays the active role in issuing commands. The storage array is a target and has a passive role in fulfilling client requests.
Parallel SCSI limitations include the following: First, the SCSI transport specification was separated from the protocol layer. Serial ATA has thinner cables and hence allows for smaller chassis designs. These cables can extend up to one meter. SATA supports only one drive per interface. For example, backups and archived datasets are usually stored as SATA storage. It is both a high-speed switched fabric technology and a disk interface technology.
All devices in an arbitrated loop operate similarly to token ring networking: When one device stops functioning, the entire loop gets interrupted. To overcome this failure, channel hubs are present to connect multiple devices together. FC-AL is deployed using either copper or fiber-optics optical.
FC over copper is generally the standard for FC drives used inside storage array enclosures. FC-AL is designed for high-bandwidth, high-end systems and is compatible with mass storage devices and other peripheral devices that require very high bandwidth. Unlike the parallel SCSI interface, FC is a serial interface that employs a full-duplex architecture, thus one path can be used to transmit signals and another to receive signals.
The main features of FC-AL include the following: This interface is more expensive than a SCSI interface. Chapter 1: FC allows for an active intelligent interconnection scheme, called a fabric, to connect heterogeneous computer systems and peripherals with all the benefits. Peripherals can include storage devices such as disk or tape arrays. Today, an FC option is mandatory for high-end mass storage products, such as disk or tape arrays. FC infrastructure offers the following features: When implemented in a continuous arbitrated loop FC-AL , FC can support up to individual storage devices and host systems without a switch.
FC was first approved as a standard in and is primarily implemented in highend SAN systems. FC has three different types of topologies, as shown in Figure FC not optic topologies Point-to-point is the simplest and least expensive; it is simply two N-ports communicating via a point-to-point connection. FC-AL Arbitrated Loop Topology in Figure is similar in concept to token ring, in that multiple devices as many as end nodes can share a common medium, but must arbitrate for access to it before beginning transmission.
Switched fabric topology, the most common in the Enterprise Data Center, consists of one or more FC Fabric Topology in Figure switches in a single network. This large number is strictly theoretical, because in practice it has been difficult for FC fabrics to support even a few hundred devices.
SAS offers point-to-point serial connection, providing more reliable connection than a traditional shared bandwidth connection, Chapter 1: Storage Stack Overview and also allows a much higher level of scalability not attainable by parallel interfaces. Storage System Architectures Storage architectures define how servers are connected to the storage array units.
This section discusses four popular storage architectures variants: DAS Direct attached storage is the term used to describe a storage device that is directly attached to a host system. This storage architecture has been the most common and most mature high-performance storage design for a considerably long period because of its simplicity; however, this model has several drawbacks.
The following drawbacks prevent this model from fitting into high-availability, high-performance architectural goals: Because no switches or fabric infrastructure is required for DAS, the learning curve associated with DAS technologies is also a factor that many organizations consider.
DAS is ideal for localized file sharing in environments with a single or a few servers, such as small businesses or departments and workgroups that do not need to share information over long distances or across an enterprise. Small companies traditionally utilize DAS for file serving and e-mail. DAS also offers ease of management and administration in this scenario, since it can be managed using the network operating system of the attached server.
However, management complexity can escalate quickly with the addition of new servers, because storage for each server must be administered separately. From an economic perspective, the initial investment in DAS is cheaper. This is a great benefit for information technology IT managers faced with shrinking budgets, who can quickly add storage capacity without the planning, expense, and greater complexity involved with networked storage.
DAS can also serve as an interim solution for those planning to migrate to networked storage in the future. For organizations that anticipate rapid data growth, it is important to keep in mind that DAS is limited in its scalability. Examples of NAS devices include: For this reason, it is recommended to configure NAS traffic on a segmented network, isolated from the public network traffic.
The following are the most common NAS-supported protocols: Common Internet Filesystem was developed by Microsoft. So what is the difference between a NAS device and a standard file server? Therefore, NAS is particularly well suited to network topologies that have a mixture of clients and servers running different operating systems. NAS devices are part of a growing category of appliance-like servers that are easy to set up and manage.
If more storage space is needed, another NAS appliance can be added. Snapshots provide the capability to create a point in copy of filesystems or files. NAS has the following strengths: Limitations of NAS include the following: Storage Stack Overview SAN A storage area network is a specialized network——that is, a communication infrastructure——that provides physical connections and a management layer, as well as access to high performance and highly available storage subsystems using block storage protocols.
The SAN is made up of specific devices or nodes , such as HBAs in the host servers, and front-end adapters that reside in the storage array. SANs use special switches, similar to Ethernet networking switches, as mechanisms to connect these SAN nodes together. Switches help route storage traffic to and from disk storage subsystems. The main characteristic of a SAN is that the storage subsystems are generally available to multiple hosts at the same time, making SANs scalable and flexible. Unlike DAS or NAS, SANs require careful planning before implementation, because they have more interconnecting components as well as storage security, which is usually done in the fabric network.
SANs also differ from other storage architectures in that they are dedicated storage networks and use their own network protocols FC and hardware components. There are essentially two types of SANs: In Figure , the SAN configuration consists primarily of three tiers: The following are advantages to SAN: The following are disadvantages to SAN: This section clarifies these terms and explains how these features fit in a SAN.
SAN is specifically discussed here, because it is the most prevalent network infrastructure for Oracle databases. Storage Stack Overview changes, such as the adding or dropping of a network node. Ethernet as well as FC-SAN networks are constantly changing and the associated routing mechanism changes along with it——that is, the decision making is determined in-route and generally in the host. Networks are generally serial architectures and are characterized as spanning long distances but also experiencing higher latencies.
The FC infrastructure serves several functions across the stack, but when it is referred to in the protocol context, FC is generally referred to as FCP. FCP networks are characterized by high speeds, low latency, and longer distances. More specifically, FCP offers the following: FC has several layers in its architecture, each of which performs a specific function. FC standards are developed in the National Committee of Industrial Technology Standards NCITS T11 standards body, which has defined a multilayer architecture for the transport of block data over a network infrastructure.
The following describes the layers in greater detail: For storage applications, FC-4 is responsible for mapping the SCSI-3 protocol for transactions between host initiators and storage targets. This layer also includes class-of-service implementations and flow-control mechanisms to facilitate transaction integrity. FC-1 provides facilities for encoding and decoding data for shipment, and defines the command structure for accessing the media.
FC-0 establishes standards for different media types, allowable lengths, and signaling. SAN Components When downloading hardware components, it is generally a best practice to work from the bottom up——that is, you should first choose the storage array, as this will then help drive the choice of the other components based on the array certification. All of the ports of switches run at full port speed. A discussion of fabric and FC switches is not complete without discussing switch fabric node security.
Fabric node security in this context refers to the security and isolation of ports and resources. This is generally referred to zoning. Zoning allows logical segmentation of fabric devices to servers. These devices typically include components such as servers, storage devices, subsystems, and HBAs. While zoning is performed on a switch, it enforces security by setting up barriers using an access control list ACL.
Zoning can be of two types: As the name indicates, hard zoning is implemented at the hardware level and consists of port- or switch-based zoning and is considered the more secure of the two types of zoning.
Soft zoning is implemented at the software level. Hard Zoning Hard, or hardware-enforced, zones are the most secure zones. They are created when all members of the zone are specified as switch ports also called port zoning.
When a physical fabric port number specifies a zone member, then any and all devices connected to that port are in the zone. If this port is an arbitrated loop, then all devices on the loop are in the zone. Any number of ports in the fabric can be configured as a hard zone. Hard zoning physically blocks access to a zone from any device outside of the zone.
The switch contains a table of port addresses that are allowed to communicate with each other. If a port tries to communicate with a port in a different zone, the frames from the nonauthorized port are dropped and no communication can occur. Soft Zoning Soft, or software-enforced, zones simply influence the list of items returned to by the name server. A host will be given the list of devices available within its zone. Soft zoning uses a filtering method implemented in the FC switches to prevent ports 19 20 Oracle Automatic Storage Management from being seen from outside of their assigned zones.
Soft zoning is automatically created whenever any element in the zone is specified by a world wide number WWN; also called WWN zoning. The security vulnerability in soft zoning is that the ports are still accessible if the user in another zone correctly guesses the FC address. However, if the HBA changes for any reason, then zoning must be updated. It is an identification or addressing mechanism that works similarly to a MAC address being assigned to identify NICs. Hard zoning is more secure than its software-based counterpart.
However, hard zoning is less flexible than soft zoning. Because the zone assignment remains with the port rather than the device, keeping track of configuration changes is more difficult. If a device moves from one port to another, the network manager or administrator must reconfigure the zone assignment, which can result in a significant amount of overhead.
This approach can be particularly cumbersome in dynamic environments in which frequent configuration changes are required. Basically, apart from providing protection at the server level, each LUN or set of LUNs can be masked to allow access to only one server or one set of servers. LUN masking is actually performed at a level above zoning——that is, although a zone grants access to only a given port on a storage array, LUN masking grants access to some of the LUNS on a port to one server and the reminder to another server.
Having the appropriate system bus can make the difference in system scalability. It is also important to ensure that the bus is not overloaded. These adapters have the potential to saturate the bus very quickly.
The HBA is essentially a circuit board that provides physical connectivity among a server, a storage device, or any intermediary devices. HBA devices can be either single- or dual-ported, but always run at full-duplex transmission. This section describes a typical high-end storage array. Storage arrays typically have several front-end adapters. These front-end adapters have two sides. One side has one or more ports port logic that connect into the FC switch. The host servers also connect into this same fabric switch.
The other side of the front-end adapter connects into the inside of storage array; more specifically, it connects into the cache controller. The cache controller manages the storage array cache. The storage array cache has a read and write-cache area, with read cache used to store recently accessed data, and the write cache used to buffer writes write-back cache. Moreover, the write cache maybe mirrored in some arrays. When a read request for a block enters the cache controller via the front-end port, the cache controller checks to see whether that buffer block exists in the cache.
If the read request for a data buffer is found in the cache, then this event is called a read-cache hit. The cache controller is also responsible for providing prefetch of data blocks for sequentially accessed data.
If the cache controller determines that the application is accessing contiguous disk blocks, the prefetch algorithms are triggered set with thresholds to prefetch data that are stored in the read cache, providing significant improvement into access time.
If a write request is made to the storage array, it enters the cache controller via the front-end port just like the read request ; however, the write request is written only to the write-cache area and destaged later to the back-end disks. Many people wonder what happens if the storage array crashes before the data are destaged to disks.
Most storage arrays have a nonvolatile random access memory NVRAM battery system that protects the system from loss of power. There is some debate about the usefulness of the read cache in databases that are 5TB or larger. Contrast this size with the array cache size of GB. To determine whether the cache will be beneficial, you first must know your application and data pattern usage temporal and spatial locality of reference. All the storage array disks are connected to the back-end adapters.
RAID is the use of two or more physical disks to create one logical disk, where the physical disks operate in tandem to provide greater size and more bandwidth. RAID configurations can be performed in either software or hardware. This section discusses hardware RAID. The controller can be internal to the server, in which case it is a card or a chip, or it can be external, in which case it is an independent enclosure.
The following are two popular types of RAID controllers: The cache is not shared across RAID controllers that reside on separate hosts. Within the box, the RAID controller manages the drives in the array, typically using SCSI, and then presents the logical drives of the array over a standard interface again, typically a variant of SCSI to the server using the array.
The server sees the array or arrays as just one or more very fast hard disks; the RAID is completely hidden from the machine. RAID 0 RAID 0 provides striping, where a single data partition is physically spread across all the disks in the stripe bank, effectively giving that partition the aggregate performance of all the disks combined. The unit of granularity for distributing the data across the drives is called the stripe size or chunk size.
Typical settings for the stripe size are 32K, 64K, and K. In Figure , there are eight disks all striped across in different stripes or partitions.
This provides a high-availability solution; if the first disk fails, the second disk or mirror can take over without any data loss. Apart from providing redundancy for data on the disks, mirroring also helps reduce read contention by directing reads to disk volumes that are less busy. Figure illustrates a four-way striped mirrored volume with eight disks A through H. Due to the method in which these disks are grouped and striped, if one of the pieces becomes unavailable due to a disk failure, the entire mirror member becomes unavailable.
However, the organization of mirrored sets differs from the previous configuration. This illustration shows eight mirrored and striped disks. This means that the parity that is, error checking is distributed across the number of drives configured in the volume. Parity algorithms contain error correction code ECC capabilities, which calculate parity for a given stripe or chunk of data within a RAID volume.
If a single drive fails, the RAID 5 array can reconstruct that data from the parity information held on other disks.
Figure illustrates the physical placement of stripes DATA 01 through DATA 04 with their corresponding parities distributed across the five disks in the volume. It is a four-way striped RAID 5 illustration where data and parity are distributed. This is because the continuous processes of reading a stripe, calculating the new parity, and writing the stripe back to the disk with new parity make write significantly slower.
RAID 5 has proven to be a very good solution where the application is readmostly or if the writes are significantly sequential. RAID 6 As with RAID 5, data are striped on a block level across a set of drives, and a second set of parity is calculated and written across all the drives.
RAID 6 provides for an extremely high data fault tolerance and can sustain multiple simultaneous drive failures. However, unlike the other RAID levels. However, RAID 6 does impose significant controller overhead to compute parity addresses, making it a poor choice for write-intensive applications. RAID 6 eliminates the risk of data loss if a second hard disk drive fails or an unrecoverable read error occurs while the RAID array is rebuilding.
As indicated in Figure , a second set of parity is calculated, written, and distributed across all the drives. This second parity calculation provides significantly more robust fault tolerance because two drives can fail without losing data. This is particularly useful for database applications.
The iSCSI protocol is used on servers initiators , on storage devices targets , and in protocol transfer gateways. As with normal IP implementations, direct connection to the IP network pushes responsibility for device discovery and connection establishment to the end devices.
Storage devices connected using this protocol also require a list of IP addresses to Chapter 1: This list is provided by a lookup table or a domain naming service DNS —type service on the network. System configurations comprise of one or more computers on the front end that do the basic processing of user requests, and on the back end they contain a storage subsystem.
The front end is where the request for data is initiated and thus is called the initiator and the back end is a target storage from where data are retrieved and returned. The iSCSI protocol stack illustrated in Figure consists of a stack which is the initiator and another at the target that resides on the gigabit Ethernet interfaces.
For example, when the application issues a data write operation, the SCSI CDB encapsulates this information and transports it over a serial gigabit link that is then delivered to the target. The iSCSI protocol consists of the following process flow: This is done using IP address lookup.
IPSec is an optional component of this layer. The physical layer transmits and receives IP packets, typically Ethernet packets. Make sure that the overall network infrastructure supports them. TCP was designed as a message-passing protocol that provides reliable messaging on unreliable networks. However, TCP was not designed for high-speed, highthroughput, low-latency block data transfers. The following subsections review the three types of iSCSI initiators implementations.
However, the host still has to perform SCSI command set processing. The demands of the Internet and distributed computing are challenging the scalability, reliability, availability, and performance of servers. This is done for high-performance InterProcess Communication IPC and to target channel adapters connecting InfiniBand-enabled servers to remote storage and communication networks through InfiniBand switches.
InfiniBand links transfer data at 2. InfiniBand architecture has the following communication characteristics: As there is no CPU, cache, or context switching overhead needed to perform the transfer, and transfers can continue in parallel with other system operations, RDMA is particularly useful in applications that require high-throughput, low-latency networking, such as massively parallel Linux clusters.
RDMA supports zero-copy Chapter 1: Storage Stack Overview networking by enabling the network adapter to transfer data directly to or from application memory, eliminating the need to copy data between application memory and the data buffers in the operating system. Such transfers require no work to be done by CPUs, caches, or context switches, and transfers continue in parallel with other system operations.
When an application performs an RDMA read or write request, the application data are delivered directly to the network, reducing latency and enabling fast message transfer. The memory protection mechanism defined by the InfiniBand architecture allows an InfiniBand HCA to transfer data directly into or out of an application buffer. To protect these buffers from unauthorized access, InfiniBand employs a process called memory registration.
Memory registration allows data transfers to be initiated directly from user mode, eliminating costly context switches to the kernel. Another benefit of allowing the InfiniBand HCA to transfer data directly into or out of application buffers is that it can make system buffering unnecessary. This eliminates the context switches to the kernel and frees the system from having to copy data to or from system buffers on a send or receive operation, respectively.
Another unique feature of the InfiniBand architecture is the memory window. Data could be transferred either by the push or pull method——that is, either the sending node would send push the data over to the requester or the requester could get to the holder and get pull the data.
The three characteristics of an application are: Raw file access RAW see Figure goes through a character device and bypasses the internal page cache.
Each page in the internal cache houses at least one filesystem page. Thus sequential reads and writes benefit from filesystem usage. Figure shows the filesystem stack to access an Oracle database. In other words, random file access is not contiguous block access. This type of workload is typically used for data warehouses; queries with table or index scans; direct data loads; and backups, restores, and archives. DSS applications, which employ large full-table scans, are examples of high-data-rate systems.
Read fname, open-type, address where fname is the filename to open, open-type is the file open type flag, and address is the file address or offset. The return code from the queued request is a confirmed context handle. Storage Stack Overview handle as a way to identify this request uniquely. Each disk device in the array has the same stripe size. The following subsections describe how a user request results in driver execution.
The kernel does some processing related to managing the process and resources for the request. This includes the following: The device 35 36 Oracle Automatic Storage Management switch table is essentially a table matrix that lists the names of all device drivers and their associated routines. Each device file has a major number and minor number. The kernel uses the major number as an index into the device switch table to locate the device driver for the requested device.
The minor number is then used to locate the actual device and any device-specific information. There are character and block device switch tables. Therefore, each type of device on the system has a device driver. The kernel uses the major number to index into a device switch table and sets up parameters, if any, to be passed to the device driver. A device driver routine waits by calling sleep , in which case the user process is put to sleep until another routine issues a corresponding call to wakeup.
In the Interface Driver The interface driver processes the request once the device is available. The interface driver is the software component that acts as the interfaces between the bus and the physical device. Only the device driver can call the interface driver. There is an interface driver for each interface hardware adapter card on the system.
A device driver has two logical components: A system call from a user program activates the upper half of the driver the usermode context. The lower half of the driver, or the interrupt context, processes interrupts from the device.
The halves work as follows: Interrupts are handled by an interrupt service routine ISR and supporting routines in the interface driver. Storage Stack Overview 5. In the Device Driver If the device driver has called sleep and is waiting for the device to complete the transfer, the interrupt routine calls wakeup to awaken the sleeping process.
When the process awakens, it continues to execute from the point at which it put itself to sleep, doing processing appropriate to complete the system call. Then it returns an integer value indicating the success or failure of the request to the kernel routine that invoked it, completing the original request.
In the Kernel The kernel interprets the return value from the device driver and sets the return value of the system call accordingly. It then returns control to the user process.
Protecting the Storage Stack from Failures Unlike direct attached or network attached storage, SANs require careful planning before implementation.
File-level access is preferable for users and applications that need to access a particular file, whereas block-level data access is better for applications that need to access data quickly. SANs also differ from other storage architectures in that they are dedicated storage networks and use their own network protocols and hardware components.
SANs are the best way to ensure predictable performance and constant data availability and reliability. The importance of this is obvious for companies that conduct business on the web and require high-volume transaction processing.
It is also obvious for managers who are bound to service-level agreements SLAs and must maintain certain performance levels when delivering IT services. For a high-availability solution, it is important to consider redundant components of the entire infrastructure stack and not just the storage array.
For example, in Figure , which is a very basic model of SAN configuration, there are several areas of single points of failure: The fibre channel conductor from the server to the first switch 3.
The switch the incoming and outgoing gigabit interface converters [GBICs] 4. SAN failure points 5. The second switch the incoming and outgoing GBICs 6. The FC conductor between second switch and storage array 7. The controller on the storage array Failure of any of these components will result in unavailability of data to the database server when requests are made or when data is required to be persisted.
In Figure , the preceding SAN configuration has been redone to show how to provide additional redundancy for high availability.
The figure shows two data paths of access, A and B, illustrating redundancy and distribution of workload. These components not only provide availability when their counterparts fail, but can also transport data to provide dual connectivity to the database storage arrays.
The architecture illustrated in Figure raises another issue. Thus two concurrent operations cannot be made using a SCSI protocol. Several vendors have overcome this limitation using a multipathing software layer: Apart from this functionality, the multipathing software offers two other functions: Secondly it performs dynamic load balancing by sending streams of data through the leastused path. Multipathing is the use of redundant storage network components responsible for transfer of data between the server and storage.
These components include cabling, adapters, and switches and the software that enables this transfer. Beyond multipathing, failover, and availability, it is crucial that any one or more sets of servers can access the ports of a SAN.
This accessibility is obtained by creating zones. Just as zoning can be applied at the disk level, as you learned earlier 39 40 Oracle Automatic Storage Management in this chapter, zoning can also be applied at the SAN level.
In the five storage farms shown in Figure , farms 1 through 3 may be assigned only to the set of servers on the left, and farms 4 and 5 may be assigned to servers on the right. Apart from protecting data at the server level, each LUN or set of LUNs can be masked to provide access to only one server or a set of servers. LUN masking is actually performed at a level above zoning——that is, although a zone grants access only to a given port on a storage array, LUN masking grants access to some of the LUNs on the given port to one server and the remainder to another server.
As discussed earlier, each type of RAID is optimized for a specific type of operation. For example, RAID 5 is good for read-only operations and is found in data warehouse implementations. Each type of RAID has its own pros and cons. Understanding this, Oracle Corporation has developed a methodology based on the RAID 01 technology for best placement of data among all the allocated groups of disks.
Summary The storage stack can get complicated if adequate care is not taken in its configuration. With several choices available today, the right type of configuration should be selected based on careful analysis, in-house expertise, and application data access patterns. We covered the basics of the disk technology followed by the various storage protocols and other networking options available.
ASM instances mount and manage diskgroups. RDBMS instances mount and manage databases. This section focuses on the ASM instance. The ASM instance executes only a small portion of the code in the Oracle kernel, thus it is less likely to encounter failures or contention. Therefore, only one ASM instance is required per node regardless of the number of database instances on the node. ASM instances message other ASM instances to acknowledge or convey changes to diskgroup——for example, disk state changes, diskgroup membership changes, and ASM cluster membership.
Chapter 2: The startup of an ASM instance is as follows: If no diskgroups are listed in the init. If the listed diskgroups cannot be mounted, then the following messages appear: ASM discovered an insufficient number of disks for diskgroup.
This option provides a more granular restriction. On Unix, the ASM processes can be listed using the following command: This process runs in the ASM instance and is started only when an offlined disk is onlined. If you initially create a pfile for ASM, the Spfile can be manually configured later if desired. However, if a single Spfile is used in clustered ASM environments, then it must be on a shared access device that is, a cluster filesystem, shared raw device, 45 46 Oracle Automatic Storage Management or network attached storage [NAS] device.
The init. For example, the following directory structures will be created: This value suits most configurations.
This block size is the buffer page size of the cached metadata and has no bearing or impact on the database block size. The value is also used to store open file extent maps. You may need to modify this parameter from its default setting for Oracle Database 10g ASM instances.
The following recommendation pertains to Oracle Database The following guidelines work most configurations: Nevertheless, it is a meaningless value in the ASM instance.
If this file is removed, the health check information for the instance will be inaccurate. These ASM. They only have connection details that are needed to talk to the ASM instance. In these cases, in Oracle Database 10g, Oracle Support requests that you add a parameter that will allow the ASM instance to process and accept these values.
However, when the OS filesystem or volume manager fails, it usually leads to a system crash, whereas the ASM instance can be restarted without restarting the server. The best-practice approach to mitigating ASM as the single point of failure is to run ASM in a RAC environment and take advantage of high availability through clustering. An ASM failure is a softer crash and has quicker recovery than traditional filesystems or volume managers.
DBCA can be launched using the following command: This step provides the option to select the type of operation to be performed. Select the Configure Automatic Storage Management option. Click Yes to start up and configure ASM.
This is shown in Figures and , respectively. DBCA verifies whether the listener is running; if it is not, the program prompts the user with a message to click OK to start the listener. The ASM Parameters screen enables you to adjust init. With either method, the new Oracle Chapter 2: Note that this upgrade is simply a software upgrade; the diskgroup compatibility needs to be advanced as well to take advantage of all the Oracle Database 11g features.
However, this can be done at a later time. Diskgroup compatibility is explained further in Chapter 4. Manual Upgrade With the manual upgrade process, the following needs to be performed: Copy over the ASM init.
This can be done using the srvctl command: DBUA performs the following steps: It copies the password file and re-creates init. Click Next to start the upgrade process. The next screen is the Upgrade Operations screen shown in Figure This screen presents two options: Review the summary, and if everything is correct, click Finish. This starts the upgrade of the ASM instance. When you are finished reviewing the results, click Close.
Figure shows the final stage of the upgrade process, where the results of the upgrade are displayed for informational purposes. Click Close to complete the upgrade. After the upgrade is completed, you can start ASM using the standard methods. This includes applying critical patch updates CPUs as well. With rolling upgrade support for ASM, patchsets and migration to future releases of ASM can be applied in rolling upgrade fashion, providing an even higher level of availability for the underlying applications.
Note that the Rolling Migration feature is applicable only if the initial ASM software version is at least version Oracle Database version Additionally, Rolling Migration requires that the Oracle Clusterware software be at the highest level. Furthermore, database instances are connection load balanced across the set of available ASM instances. The default ASM cardinality is 3, but that can be changed with a Clusterware command. There are a reduced number of ASM instances on selected servers in the cluster and Oracle Database 12c clients can connect across the network to ASM instances on different servers.
Furthermore, Oracle Database 12c clients can failover to a surviving server with an ASM instance if a server with an ASM instance fails, all without disruption to the database client. The traffic on the ASM network is usually not overly significant and mostly metadata such as a particular files extent map. Historically, this approach worked well because the database instances and the ASM instance ran on the same server.
With Oracle Database 12c, the database instances and ASM instances can now be on different servers, so ASM instances require a password file that is used to authenticate a database instance connecting to an ASM instance within the cluster. The ability of storing password files in a Disk Group is extended to Oracle Databases 12c clients.
Having a common global password file in a cluster addresses common issues related to synchronizing multiple password files that had to be used previously. All the instances in an ASM cluster ensure they are running the same code release by validating the patch level across the cluster. The critical capabilities unavailable are associated with prec database clients not being able to access Oracle Flex ASM instances running on remote servers from the server the database instance is running on.
This model offers the most separation between previous database releases and Oracle Database 12c. See figure 6. Figure 6 The second approach provides for a mixed environment of previous database releases and Oracle Database 12c clients in the same cluster. In this particular model, both Oracle release 12c and previous databases releases are operated in a cluster with an Oracle 12c ASM instance running on every server in the cluster. As before, ASM Disk Group compatibility attribute is use for managing the compatibility between database instances.
There are two ways to achieve this mixed model of operation. The first is to install the cluster in standard mode which assigns an ASM instance to every server. The advantage of this approach is that if an Oracle 12c database instance loses connectivity with an ASM instance, then the database connection will failover to another ASM instance on a different server.
See figure 7. The primary objective for administrators is that the ASM instances are up and running. This can be done using the Clusterware Oracle Clusterware srvctl command. In addition, the default parameter settings have been adjusted to suit the Oracle Flex ASM architecture, making them sufficient to effectively support most situations.
While these features add value to Oracles engineered systems, most of these features also provide similar capability for deployments of non-engineered systems. Typically, it is more likely that some transient issue caused the failure associated with the ASM Failure Group.
For example, all the disks in an ASM Failure Group in an Exadata environment would become unavailable when there is a transient failure of an Exadata storage cell. Because Failure Group outages are more likely to be transient in nature, and recovering redundancy through an ASM rebalance is far more expensive than replacing a single disk, it makes sense in these environments for the loss of a Failure Groups to have a larger repair time than that of an individual disk.
A larger repair time value ensures that all the disks are not dropped automatically in the event of a short term and recoverable Failure Group outage. ASM Disk Failure Handling Enhancements Oracle 12c introduces a new feature that helps with the management associated with mirror synchronization in normal and high redundancy Disk Groups.
This feature provides administrators with the ability to control the amount of resources dedicated to mirror re-synchronization. This is similar to the capability in previous ASM releases allowing administrators to control the use of resources dedicated to a Disk Group rebalance operations.
A related feature of resync in Oracle Database 12c ASM is that if a resync operation is interrupted and later restarted, the previously completed phases of the resync are skipped and processing is recommence at the beginning of the first remaining incomplete phase of the resync operation. If an ASM Disk goes offline and cannot be repaired, administrators have to replace the disk.
In prior versions of ASM, administrators had to drop the faulty disk and then add a new one back into the Disk Group. In the process, the entire Disk Group is rebalanced. This can be quite expensive with respect to moving data and time consuming.
Oracle Database 12c ASM allows administrators to simply replace an offline disk using one fast and efficient operation. Using this feature, the replacement disk is populated with mirror copies of the ASM extents from other disks, and there is no need for any additional reorganization or rebalancing across the rest of the unaffected Disk Group.
For example, if a disk fails and no replacement disk is available, a rebalance is required to redistribute the data across the remaining disks in the Disk Group and restore redundancy. This feature prioritizes on quickly restoring the redundancy of critical files first, such as control files and online redoes log files, to ensure that they are first protected against a secondary failure.
Since the first release of ASM, when data block is read, a series of checks are performed on data validating the blocks logical consistency. If corruption is detected, the database automatically recovers by reading the mirror copy when normal and high redundancy Disk Groups are used. Extending this type of protection against hidden corruption for non-accessed data is a new feature for Oracle Database 12c. Under administrative control, ASM can proactively check for data corruption, even without any database client accessing the data.
The value of proactive scrubbing is that without it, there is the possibility of multiple corruptions affecting all the copies of data that are infrequently accessed. Proactive scrubbing checks for, and where possible, repairs corruptions detected.
Furthermore, this data checking can be triggered during rebalance operations or under execution of a scrubbing 10 Oracle Whitepaper - A Technical Overview of New Features for Automatic Storage Management in Oracle Database 12c command. On-demand scrubbing can be performed on a Disk Group, on individual files or individual disks.
ASM Data Allocation Enhancement For normal and high redundancy ASM Disk Groups, the algorithm determining the placement of secondary extents uses an adjacency measure to determine the placement of mirror copies of data. Three possible settings are allowed: data, recovery and system.