I\'m trying to understand how the 2nd parity bit or byte is set in RAID 6. I\'m
ID: 648272 • Letter: I
Question
I'm trying to understand how the 2nd parity bit or byte is set in RAID 6. I'm reading a paper by H. Peter Anvin, and it goes into Galois field algebra, which is somewhat new to me. Anyway, a rep from HP was trying to explain RAID 6 to me and she thought it was merely two XOR operations, one for the 1st parity bit and one for the 2nd. This doesn't make sense to me, but since I'm still working through the paper I don't know if it reduces to something simple for RAID 6 as opposed to RAID n. It looks to me like the 2nd parity bit is quite a bit more complicated than the XOR based 1st parity bit. Is that true?
Explanation / Answer
The general case is indeed a bit complicated.
However, in the case of 4 disks you can simplify it a lot; you do not really need to know any fancy math. You only need to know how to store 4 bits redundantly, and then you already know everything; just repeat the same scheme for each group of 4 bits that you need to store.
We can represent the scheme as 4 x 4 tables. The first 2 bits of our data determine the row and the last 2 bits of our data determine the column.
Disk 1: Just store the first 2 bits. That is:
00 00 00 00
01 01 01 01
10 10 10 10
11 11 11 11
Disk 2: Just store the last 2 bits. That is:
00 01 10 11
00 01 10 11
00 01 10 11
00 01 10 11
So far so good. Given disk 1 + disk 2, we can recover our original data: disk 1 tells us the row (the first 2 bits of the original data) and disk 2 tells us the column (the last 2 bits of the original data).
Disk 3: This is what we do on RAID5, just XOR the bits:
00 01 10 11
01 00 11 10
10 11 00 01
11 10 01 00
Again, everything is still fine. You can recover the original data using disk 1 + disk 3 or disk 2 + disk 3. A key observation is that the lookup table for disk 3 forms a Latin square: all elements of each row are distinct, and all elements of each column are distinct. For example, if you know the data on disk 1, you know the right row, and then you can use the data on disk 3 to recover the column. Conversely, if you know the data on disk 2, you know the right column, and then you can use the data on disk 3 to recover the row.
Disk 4: Here we can use the following lookup table:
00 01 10 11
10 11 00 01
11 10 01 00
01 00 11 10
Don't worry how it is constructed; we do not really care about that. The crucial properties are:
The lookup tables of both disk 3 and disk 4 are Latin squares. Therefore if you know 1 + 3 or 1 + 4 or 2 + 3 or 2 + 4, you can recover both row and column (i.e., the original data).
The lookup tables of disk 3 and disk 4 form orthogonal Latin squares, which makes it possible to recover the data if we only have disks 3 + 4.
Let's elaborate on the second point. By concatenating the lookup tables of disk 3 and disk 4, we get this matrix:
0000 0101 1010 1111
0110 0011 1100 1001
1011 1110 0001 0100
1101 1000 0111 0010
Now note that each 4-bit string occurs in this table exactly once. That is, if we know what is stored on disks 3 + 4, we know where we are on this table. We know both the row and the column, and hence we can recover the original data.
If you insist on seeing the connection to Galois fields, consider the field F=GF(22). Label the elements of the field with F={0,1,x,x+1}; these correspond to 2-bit strings (0 ? 00, 1 ? 01, x ? 10, x+1 ? 11). Now any 4-bit string can be encode as a pair (a,b), where a,b?F.
A pair (a,b) is now stored as follows:
Disk 1 stores a.
Disk 2 stores b.
Disk 3 stores a+b.
Disk 4 stores xa+b.
Now given, for example, just p=a+b and q=xa+b, you can solve a and b. Using the rule 2?0, you can find
p+q=(xa+b)+(a+b)=(x+1)a+2b?(x+1)a,
p+xq=(xa+b)+x(a+b)=2xa+(x+1)b?(x+1)b.
Then divide by x+1 to get a and b, etc. But in the end, this approach gives you precisely the same lookup tables as what was presented above.