Wednesday 25 July 2012

RAID interview Q&A


1. What is RAID?
Redundant Array of Independent Drives (or Disks), also known as Redundant Array of Inexpensive Drives (or Disks) (RAID) is an important term for data storage schemes that divide and/or replicate data among multiple hard drives. They offer, depending on the scheme, increased data reliability and/or throughput.

RAID is a way of storing the same data in different drives(thus, redundantly) on multiple hard disks.

2. What are the advantages of RAID?
Increased redundancy
Increased data availability
Higher READ/Write performance in some RAID levels
Higher Data throughput
better reliability

    *  Higher Data Security: Through the use of redundancy, most RAID levels provide protection for the data stored on the array. This means that the data on the array can withstand even the complete failure of one hard disk (or sometimes more) without any data loss, and without requiring any data to be restored from backup. This security    feature is a key benefit of RAID and probably the aspect that drives the creation of more RAID arrays than any other. All RAID levels provide some degree of data protection, depending on the exact implementation, except RAID level 0.
    * Fault Tolerance: RAID implementations that include redundancy provide a much more reliable overall storage subsystem than can be achieved by a single disk. This means there is a lower chance of the storage subsystem as a whole failing due to hardware failures. (At the same time though, the added hardware used in RAID means the chances of having a hardware problem of some sort with an individual component, even if it doesn't take down the storage subsystem, is increased; see this full discussion of RAID reliability for more.)
    * Improved Availability: Availability refers to access to data. Good RAID systems improve availability both by providing fault tolerance and by providing special features that allow for recovery from hardware faults without disruption. See the discussion of RAID reliability and also this discussion of advanced RAID features.
    * Increased, Integrated Capacity: By turning a number of smaller drives into a larger array, you add their capacity together (though a percentage of total capacity is lost to overhead or redundancy in most implementations). This facilitates applications that require large amounts of contiguous disk space, and also makes disk space management simpler. Let's suppose you need 300 GB of space for a large database. Unfortunately, no hard disk manufacturer makes a drive nearly that large. You could put five 72 GB drives into the system, but then you'd have to find some way to split the database into five pieces, and you'd be stuck with trying to remember what was were. Instead, you could set up a RAID 0 array containing those five 72 GB hard disks; this will appear to the operating system as a single, 360 GB hard disk! All RAID implementations provide this "combining" benefit, though the ones that include redundancy of course "waste" some of the space on that redundant information.
    * Improved Performance: Last, but certainly not least, RAID systems improve performance by allowing the controller to exploit the capabilities of multiple hard disks to get around performance-limiting mechanical issues that plague individual hard disks. Different RAID implementations improve performance in different ways and to different degrees, but all improve it in some way. See this full discussion of RAID performance issues for more.



3. What are different levels of RAID?
There are many levels like
RAID 0,RAID 1,RAID 2,RAID 3,RAID 4,RAID 5,RAID 10,RAID 01,RAID 50,RAID 6
But popular are RAID 0,RAID 1,RAID 5,RAID 10,RAID 01,RAID 50,RAID 6
generally used are R0,R1,R5

4. Explain RAID0, RAID1, RAID5 ?
RAID 0:
The lowest designated level of RAID, level 0, is actually not a valid type of RAID. It was given the designation of level 0 because it fails to provide any level of redundancy for the data stored in the array. Thus, if one of the drives fails, all the data is damaged.

RAID 0 uses a method called striping. Striping takes a single chunk of data like a graphic image, and spreads that data across multiple drives. The advantage that striping has is in improved performance. Twice the amount of data can be written in a given time frame to the two drives compared to that same data being written to a single drive.

RAID 1

RAID version 1 was the first real implementation of RAID. It provides a simple form of redundancy for data through a process called mirroring. This form typically requires two individual drives of similar capacity. One drive is the active drive and the secondary drive is the mirror. When data is written to the active drive, the same data is written to the mirror drive.

RAID 5

This is the most powerful form of RAID that can be found in a desktop computer system. Typically it requires the form of a hardware controller card to manage the array, but some desktop operating systems can create these via software. This method uses a form of striping with parity to maintain data redundancy. A minimum of three drives is required to build a RAID 5 array and they should be identical drives for the best performance.


5. Whats the difference between RAID0 & RAID1 ?

RAID 0+1

This is a hybrid form of RAID that some manufacturers have implemented to try and give the advantages of each of the two versions combined. Typically this can only be done on a system with a minimum of 4 hard drives. It then combines the methods of mirroring and striping to provide the performance and redundancy. The first set of drives will be active and have the data striped across them while the second set of drives will be a mirror of the data on the first two.

RAID 10 or 1+0

RAID 10 is effectively a similar version to RAID 0+1. Rather than striping data between the disk sets and then mirroring them, the first two drives in the set are a mirrored together. The second two drives form another set of disks that is are mirror of one another but store striped data with the first pair. This is a form of nested RAID setup. Drives 1 and 2 are a RAID 1 mirror and drives 3 and 4 are also a mirror. These two sets are then setup as stripped array.


6. Whats the difference between RAID1 & RAID5 ?
RAID1 : Minimum 2 drives are required . Gives only 50% disk space.

RAID5 : Minimum 3 drives are required . Gives only (n-1)X Capacity where n is the no. of disks, disk space.


7. Whats the difference between RAID3 & RAID5 ?
RAID 3 and RAID 4: Striped Set (3 disk minimum) with Dedicated Parity, the parity bits represent a memory location each, they have a value of 0 or 1, whether the given memory location is empty or full, thus enhancing the speed of read and write. : Provides improved performance and fault tolerance similar to RAID 5, but with a dedicated parity disk rather than rotated parity stripes. The single disk is a bottle-neck for writing since every write requires updating the parity data. One minor benefit is the dedicated parity disk allows the parity drive to fail and operation will continue without parity or performance penalty.

RAID 5 does not have a dedicated parity drive but the parity is rotated across all the drives hence the parity is distributed.
RAID 5: Striped Set (3 disk minimum) with Distributed Parity: Distributed parity requires all but one drive to be present to operate; drive failure requires replacement, but the array is not destroyed by a single drive failure. Upon drive failure, any subsequent reads can be calculated from the distributed parity such that the drive failure is masked from the end user. The array will have data loss in the event of a second drive failure and is vulnerable until the data that was on the failed drive is rebuilt onto a replacement drive.

8. Whats the difference between RAID01 & RAID10 ?
RAID 0+1: Striped Set + Mirrored Set (4 disk minimum; Even number of disks) provides fault tolerance and improved performance but increases complexity. Array continues to operate with one failed drive. The key difference from RAID 1+0 is that RAID 0+1 creates a second striped set to mirror a primary striped set, and as a result can only sustain a maximum of a single disk loss, whereas 1+0 can sustain multiple drive losses as long as no two drive loss comprise a single pair.

RAID 1+0: Mirrored Set + Striped Set (4 disk minimum; Even number of disks) provides fault tolerance and improved performance but increases complexity. Array continues to operate with one or more failed drives. The key difference from RAID 0+1 is that RAID 1+0 creates a striped set from a series of mirrored drives.


9. How many minimum disk drives are needed for R0,R1,R5,R10,R01 ?
R0: Minimum 1
R1: Minimum 2
R5: Minimum 3
R10: Minimum 4
R01: Minimum 4

10.How RAID 5 works and how parity is calculated ?
The parity calculation is typically performed using a logical operation called "exclusive OR" or "XOR". As you may know, the "OR" logical operator is "true" (1) if either of its operands is true, and false (0) if neither is true. The exclusive OR operator is "true" if and only if one of its operands is true; it differs from "OR" in that if both operands are true, "XOR" is false.


11.Other than RAID feature what are the other features in Software Management Functionalities?
Hotspare
Raid level migration (RLM)
SNMP interaction/management

12.What is initialization ?
Intialization is the process of preparing a drive for storage use. It erases all data on the drive & makes way for new file system creation.

13.What is Check consistency ?
Consistency check or CC verifies correctness of data in logical drives. This is a feature of some of the RAID hardware controller cards.

14.What is background initialization?
This is a Consistency check process forced when a new logical drive is created. This is an automatic operation that starts 5 minutes after the new logical drive is created.

15.What is a RAID array ?
RAID array is a group of disks which are configured with RAID. That means they are in a redundant setup to tolerate any disk failures.

16.Whats the difference between a JBOD & a RAID array ?
Just A Bunch Of Disks (JBOD) - hard disks that aren't configured in a RAID configuration. They are just disks piled or connected in one single enclosure.

RAID is having the advantage of bearing a disk failure & still give data availability.

17.When JBOD is preferred over RAID array ?
When there is no need for redundancy & when it is ok if there is some hard disk failure or data unavilability in such scenarios JBOD is prefered over RAID because JBOD is inexpensive storage solution. It is also easy to setup & start using compared to RAID.

18.What is a hot spare ?
Hot spare is an extra,unused disk drive that is part of the disk subsystem. It is usually in standby mode ready for service if a drive fails. Whenever there is a drive failure this hotspare kicksin & takes over that failed drive's role.

19.What is a Logical drive or Virtual drive ?
The partitioning or division of a large hard drive into smaller units. A single, large Physical Drive can be partitioned into two or more smaller Logical Drives.

20.What is rebuilding of array ?
Whenever there is a disk failure in the RAID array the array goes to DOWNGRADED STATE. SO when we plug out the failed drive & insert a new functioning drive the RAID configured array starts regenerating the data to the newer drive. This process is called rebuilding.

21.What you do when a drive in an array fails, how you bring it back to optimal online mode ?
We swap out failed drive & plugin new functioning drive & wait for the rebuilding process to complete. We make sure rebuild process happens without any error. Once that completes array is back to optimal online state.

22.What are the different states an array can be in and explain each state?
Online
Downgraded
Offline
Rebuilding

23.Explain Online,Offline,Degraded states of an array ?
Online - when all drives are working fine
Downgraded - Whenever there is a drive failure but still the array is functioning fine
Offline - Array or whole data storage is down
Rebuilding - Storage access is there but since a new drive has been inserted in place of a failed drive data is being written to new drive which might slow down the performance of the whole RAID array.

24.What is the difference between a global hotspare & a dedicated hotspare ?
Global hotpsare is available for the any  array in the whole enclosure or Storage subsystem.

If there is an enclosure having 10 drives & we have 3 drives in RAID5(1st array) , 3 more drives in second RAID5(2nd array) & 2 more drives in RAID 1 config.We can specify in RAID config utility whether a Dedicated hotspare is assigned for 1st RAID5 array. If there is a drive failure in 2nd or 3rd array this dedicated hotspare will not be involved there. But if the array for which this is dedicated has any drive failure this dedicated hotspare takes over .

25.How RAID is configured through BIOS ?
If we have a Hardware RAID controller card it gives an option while machine booting to enter into RAID BIOS utility. Here we have options which give us options to create RAID using a semi-GUI(DOS based GUI) interface.

26.HoW RAID is configured in OS level?
Once we install device drivers & also RAID config or management utility using that we can configure RAID in OS level.

27.What is the difference between a software RAID & hardware RAID ?
In order for RAID to function, there needs to be software either through the operating system or via dedicated hardware to properly handle the flow of data from the computer system to the drive array. This is particularly important when it comes to RAID 5 due to the large amount of computing required to generate the parity calculations.

In the case of software implementations, CPU cycles are taken away from the general computing environment to perform the necessary tasks for the RAID interface. Software implementations are very low cost monetarily because all that is necessary to implement one is the hard drives. The problem with software RAID implementations is the performance drop of the system. In general, this performance hit can be anywhere from 5% or even greater depending upon the processor, memory, drives used and the level of RAID implemented. Most people do not use software RAID anymore due to the decreasing costs of hardware RAID controllers over the years.

Hardware RAID has the advantage of dedicated circuitry to handle all the RAID drive array calculations outside of the processor. This provides excellent performance for the storage array. The drawbacks to hardware RAID have been the costs. In the case of RAID 0/1 controllers, those costs have become so low that many chipset and motherboard manufacturers are including these capabilities on the motherboards. The real costs rest with RAID 5 hardware that require more circuitry for added computing ability.

28.Which is best RAID level for performance and which is best for redundancy?
RAID 0 for performance
RAID 5 or RAID 6 better for redundancy(availibility)

No comments:

Post a Comment