I have the simcpu and memory. I just want the cache done with the following requ
ID: 3547385 • Letter: I
Question
I have the simcpu and memory. I just want the cache done with the following requirements. Thanks. I dont have time to finish it.
Please label which part you finish..
For this assignment you will be implementing the simplified, yet operation instruction
cache memory shown in Figure 1. Cache memory is described in Chapter 5 of Computer
Organization and Design, by Patterson and Hennessy. You will be working with a readonly memory, so there is no need to handle memory writes and cache consistency.
Create a new project for the assignment, download MemCacheTB.vhdlfrom the
course website, and create a new VHDL entity named Cache in Cache.VHDL.
2. Copy provided models for the SimCPU and Memory from Lab 11 into your
workspace directory.
3. Use the following entity for the Cachecomponent (code on next page):
Generics:
- BITS_IN_BLOCKOFFSET: number of bits of the address for the block offset.
- BITS_IN_INDEX: number of bits of the address for the index.
- BITS_IN_TAG: number of bits of the address for the tag.
- Assume 4 byte words and a 32 bit address space.
entity Cache is
generic(
BITS_IN_BLOCKOFFSET : integer;
BITS_IN_INDEX : integer;
BITS_IN_TAG : integer
);
port (
-- CPU side of interface:
cpu_address : in std_logic_vector(31 downto 0);
cpu_reading : in std_logic;
cpu_data : out std_logic_vector(31 downto 0);
cpu_rdy : out std_logic;
-- MEMORY side of interface
mem_address : out std_logic_vector(31 downto 0);
mem_reading : out std_logic;
mem_data : in std_logic_vector(31 downto 0);
mem_rdy : in std_logic;
clk : in std_logic
);
end entity Cache;
4. General description of signals:
xxx_reading: On the clock rising edge, if '1' then a read is to be performed; '0'
otherwise. Driven by the unit doing the reading to tell the next level memory to
fetch a value.
Note that the CPU holds cpu_readinghigh the entire time it is running the
"program" because it always wants to read the next instruction.
The reading line between Cache and Memory will only be active when
fetching a new block into cache.
xxx_address:Byte address which is being read. Only valid if readingis high.
When reading from Memory, the address is the byte address of the first
word of the block, not necessarily the address the CPU asked for.
Since the memory only deals in words, the byte address will always be a
multiple of 4.
xxx_rdy: '1' if valid data is being returned at the clock's rising edge. Will be low
while the read is take more than one cycle to complete and will then go high when
data is ready. The requesting component must wait (stall) until the data is ready.
xxx_data: Transmits the data being return to the requesting component. To be
sampled at the next rising edge of the clock, but only when rdy is '1'.
5. The circuit shown in Figure 1 is implemented in MemCacheTB.
- It instantiates SimCPU, Cache, and Memory.
- Cacheis given generic parameters for the number of bits in the address for the block
offset, the index, and the tag. These tell cache its size (block size, number of blocks).
6. In MemCacheTB, the following constants adjust the cache and memory configuration.
These values are converted to a number of bits for you by the log2()function.
During testing, you will need to change the bold-italicnumbers.
constant BYTES_PER_WORD : integer := 4;
constant BLOCKSIZE_BYTES : integer := BYTES_PER_WORD * 1;
constant CACHESIZE_BYTES : integer := BYTES_PER_WORD * 4;
constant MEMSIZE_BYTES : integer := 1024 * 20; -- 20KB
2.1 Cache Misses
Below is a screen-shot of reading from addresses (32, 36), and shows two cache misses.
Cache size 4 words, block size 1 word. See image on website: Miss(32,36).jpg
1. During the first 10 ns the system has not yet initialized. This period is not critical.
2. In the first (real) clock cycle, the CPU asserts its reading signal and puts out the
first address (3210= 0x20).
Focus on rising clock edges at the endof the clock cycle; this is theimportant
moment when cache samples the reading signal and reads in the address.
3. During the second and third clock cycles, cache determines it is a read miss, and so
the block must be fetched from memory. Cache puts the byte address for the start of
the blockonto mem_address and drives mem_readinghigh for 1 clock-cycle.
4. The next 4 clock-cycles simulate memory accessing the data. Nothing happens during
this time. (Simulates 1 cycle for memory to read the request, and 3 extra cycles to
'find' the data).
5. Next memory returns the data to cache. Cache sees mem_rdy is high and so on the
clock's rising edge it "clocks in" the value from mem_data, storing it in the cache.
Since this simulation is using a block size of one word, only one word is transferred.
6. Next the cache passes the value back to the CPU. At the same time, the CPU is
outputting the nextaddress it will want. At the rising clock edge, the CPU reads in
fetched data value from cache at the same time the cache reads in the next address
from the CPU (3610= 0x24).
7. The process repeats with memory returning a value to cache, and cache returning it to
the CPU.
8. Note that the second to last value oncpu_address is 0x0de2c0de (sound it out as
a word). This is just a value I chose to help highlight that the CPU has no new
addresses it wants to load. In fact, the important part is that cpu_reading has been
pulled low and stays low. While 0x0de2c0de is being output, it means the CPU has
no further instructions to perform, but has not yet finished its last instruction.
9. Finally, the cpu_address goes to 0xdeadc0de (sound it out as a word), which
shows us that the full simulation has completed. When measuring how long a given
sequence of reads takes, measure from the moment cpu_reading goes high until
cpu_addresstransitions to 0xdeadc0de.
2.2 Cache Miss then Hit
This screen-shot shows the same sequence as above, except with two extra reads:
(32, 36, 32, 36). The last two reads will be cache hits, which do not stall the CPU: the
processor is able to read values at full speed (1 per clock-cycle). This highlights the value of
the cache because the CPU need not wait for the slow memory to fetch values.
2.3 Cache Miss then Hit with 2 Word Block Size
The following is the same sequence as above (32, 36, 32, 36), but with a block size of 2
words. This means that when reading from memory, memory automatically returns the
entire block.
3.1 Cache Setup
1. You will likely find you want to create constants for commonly used values such as
the number of words per block, and the number of blocks.
Hint: Calculate 2
n
with the following VHDL code:
constant DA_VALUE : integer :=2 ** n;
Calculate these constants in your cache based on the generic values it is given.
Give the constants very clear names to express what they represent.
2. Create array types and signals as follows (see Figure 2for diagram of types/signals):
- Create a typefor a single block of data. Make it an array of std_logic_vector.
- Create a typefor all the blocks of data. Make it an array of blocks.
- Create a signalwhich is an instance of the above array-of-blocks type.
- Create a typefor the valid bits. Make it an array of std_logic(single bits).
- Create a signalwhich is an instance of your valid-bits-array type.
- Create a typeand signalinstance for the tags. Make it an array of integers. You
could make it an array of std_logic_vectorto more closely resemble a real cache,
but using integers can simplify your coding.
3. Initialize valid bits to 0. The easiest way to do this is add the following at the end of
the line of your signal declaration: := (others => '0');
4. Create signals for each of the index, tag, and block-offset. It is easiest if these
are integers because they will be used for accessing elements in arrays.
Explanation / Answer
In computers having very wide address fields, the size of the page table could become prohibitively large: for instance, 64 bit addresses, with pages of 4 Kbytes would result in a page table with 264-12 = 252 entries. It is quite obvious that such a table is impossible to store but, one should also notice that it would be an almost empty table because no program needs up to 252 pages. Just as for large segment tables the solution is to be found in the classical techniques to store and retrieve sparse sets of data, such as hashing or trees. Techniques based upon two or three successive layers of page tables, organized as a multiway tree, seem to be the most popular approach. The principle of a two-levels page table is described in fig.8.12.
Fig.8.12. Two-levels page table.
Pagination has several applications, two will be described in the following paragraphs, a third will be the subject of the next section.
In many older computer families, the number of bits in the address fields is inadequate for the size of modern memories. Memory mapping is often used to allow larger central memories to be used despite the short addresses. In such situations, the physical page number has more bits than the logical page number. This means, that at a given moment only a subset of the physical memory can be accessed from a program, but, it suffices that, when requested by the program or by the operating system, the memory manager changes the contents of the page table, to make an other part of physical memory accessible.
This technique was used in early PC