A database architect must decide how to best design a database server for perfor
ID: 3876164 • Letter: A
Question
A database architect must decide how to best design a database server for performance. Assume that the database is 50 GB in size. The architect can use up to 32 GB of memory, may choose between hard drives and solid state drives, but only has $1000 to spend on memory and drives. Given the following limitations, come up with the best server design and provide its queries/second.
The database server requires at least 4 GB of memory without buffering any data. Any memory provided beyond 4 GB will buffer the data.
Must have secondary storage (HDD or SSD) of at least twice the database size for persistence.
Costs: Memory = $50/GB, SSD = $5/GB, HDD = $1/GB
Two types of queries: lookup and scan. Lookup finds a single record. # of lookup queries/second per type: Memory = 100K, HDD = 1K, SSD = 5K Assume that buffering in memory allows for faster lookup queries. For example, if memory is 29 GB, then 25 GB of data will be buffered in memory (50%), so lookup can be done from memory 50% of the time.
A scan reads all data in the database. Scan rate: Memory = 20 GB/sec., SSD = 500 MB/sec., HDD = 100 MB/sec. We will assume a scan can read as much out of the memory buffer that is there. i.e. If 50% of data is buffered in memory, then only the other 50% needs to be read from disk.
Query mix is 99% lookup and 1% scans. We will assume (naively) that lookups and scans will not affect each other's execution performance.
Calculate the time to perform 1 million queries given the above query parameters and then convert that into queries/second.
Test these two configurations: one with HDD and one with SSD with minimum storage amount (100GB) then spend rest of money to maximize memory available.
Explanation / Answer
Answer:
Given:
Size of database: 50 GB
Useable memory: 32 GB
Amount available to spend: $1000
Minimum memory requirement: 4 GB (without buffering)
Secondary storage requirement: At least twice the database size = 2*50 = 100 GB (at least)
Cost:
Memory = $50/GB, SSD = $5/GB, HDD = $1/GB
Types of query: 1) Lookup, 2) Scan
Query required to run: 1 million
Proportion of lookup queries = 1*99/100 = 0.99 million = 990000
Proportion of scan queries = 1*1/100 = 0.01 million = 10000
1. Server configuration 1:
Secondary storage: 100 GB HDD
Cost of secondary storage = 100*1 = $100
Balance amount to be spend on memory = 1000 - 100 = $900
Size of memory costing $900 = 900/50 = 18 GB
Memory available for buffering = 18 - 4 = 14 GB
Data to be read from HDD = 50 - 14 = 36 GB
Calculation for Lookup query:
Lookup for buffed data will be done from memory, while for rest will be done from HDD.
%data to be lookup from memory = 14*100/50 = 28%
Hence number of queries to be lookup from memory = 990000*28/100 = 277200
Number of queries to be lookup from HDD = 990000 - 277200 = 712800
Time required for lookup queries from memory = 277200/100000 = 2.772 seconds
Time required for lookup queries from HDD = 712800/1000 = 712.8 seconds
Total time required for lookup queries = 712.8+2.772 = 715.572 seconds
Calculation for scan query:
Time required to read buffered data from memory = 14/20 = 0.7 seconds
Time required to read 36 GB data from HDD = 36000/100 = 360 seconds
Total time required for scan queries = 360+0.7 = 360.7 seconds
Total time required for all queries = 715.572+360.7 = 1076.272 seconds
Queries/second = 1000000/1076.272 = 929.133 queries/second
-----------------------------------------
2. Server configuration 2:
Secondary storage: 100 GB SDD
Cost of secondary storage = 100*5 = $500
Balance amount to be spend on memory = 1000 - 500 = $500
Size of memory costing $500 = 500/50 = 10 GB
Memory available for buffering = 10 - 4 = 6 GB
Data to be read from HDD = 50 - 6 = 44 GB
Calculation for Lookup query:
Lookup for buffed data will be done from memory, while for rest will be done from SDD.
%data to be lookup from memory = 6*100/50 = 12%
Hence number of queries to be lookup from memory = 990000*12/100 = 118800
Number of queries to be lookup from SDD = 990000 - 118800 = 871200
Time required for lookup queries from memory = 118800/100000 = 1.118 seconds
Time required for lookup queries from SDD = 871200/5000 = 174.24 seconds
Total time required for lookup queries = 174.24+1.118 = 175.358 seconds
Calculation for scan query:
Time required to read buffered data from memory = 6/20 = 0.3 seconds
Time required to read 44 GB data from SDD = 44000/500 = 88 seconds
Total time required for scan queries = 88+0.3 = 88.3 seconds
Total time required for all queries = 175.358+88.3 = 263.658 seconds
Queries/second = 1000000/263.658 = 3792.792 queries/second
Conclusion: From this calculation, it is clearer that SSD based configuration will have much better performance than HDD based configuration i.e. almost 3-4 times better.