We supply you a simple write-back block cache, a shared lock server, and a simple client interface to the lock server. Your job is to modify fs.C to use locks, to modify the lock client to cache released locks and give them back to the server only when other hosts request them, and to achieve cache consistency by ensuring that ccfs only caches data as long as it holds the lock that protects the data.
Your server will be a success if it manages to operate out of its local block and lock cache when reading/writing files and directories that other hosts aren't looking at, but maintains correctness when the same files and directories are concurrently read and updated on multiple hosts.
% wget http://pdos.lcs.mit.edu/6.824/labs/fs-lab-2.tgz % tar xzvf fs-lab-2.tgz % cd fs-lab-2 % ./setup % ./configure --with-dmalloc --with-classfs=/u/6.824/classfs-0.0 i386-freebsd % gmakeAt this point you should add your code for NFS3_REMOVE from the previous lab.
To illustrate the problem you're trying to solve, try this:
% ./blockdbd 5567 & % ./ccfs d1 localhost 5567 & root file handle: 69f83479b7517cab % ./ccfs d2 localhost 5567 69f83479b7517cab & % touch /classfs/d1/fff % ls -lt /classfs/d2/. %Even though /classfs/d1 and /classfs/d2 should be the same underlying file system, a file created in one did not appear in the other. The reason is that this new version of the ccfs software has write-back block caching: when you created fff, the first ccfs did not write the modified blocks back to the block server, but modied them only in its local cache.
Since your file server from the previous lab doesn't have a block cache, it would have shown fff in the d2 directory listing.
blockdbc provies two new functions to control the cache: flush() and flush_prefix(). flush(key, cb) writes the indicated block to the server if it is dirty (modified) in the local cache, discards the block from the cache, and calls the call-back when the server replies. flush_prefix(prefix, cb) does the same thing for every cached block whose key starts with the indicated prefix. flush_prefix() could be used to flush a file handle and all associated data blocks in one call.
A client asks for a lock by sending the lock server an ACQUIRE RPC with the name of the lock as argument. The lock server immediately replies, but the reply does not mean anything other than it received the request. The server adds the client to a list of clients waiting for locks. When the lock is free, the server will send that client a GRANT RPC with the name of the lock.
When a client is done with a lock, it sends the server a RELEASE RPC. The server may then grant the lock to some other waiting client, or (if no-one is waiting) simply forget about the lock.
To help support client caching of locks, the lock server sends a client a REVOKE RPC if the client is holding a lock that some other client has requested.
fs.C can ask for a lock by calling the lock client's acquire(name, cb) method; the lock client then asks the lock server for the named lock, and calls cb when the server grants the lock. fs.C should call release(name) when it's done with the lock.
As a convenience for cache management, you can supply the lock client with a callback which it calls just before sending each RELEASE RPC to the lock server. This gives you the chance to flush blocks out of the block cache when you give a lock to some other ccfs process. fs.C uses this feature to arrange for fs::flush() to be called whenever a lock is released. The lock client calls fs::flush() with a callback which flush() must call when it's done flushing the block cache; it's this callback that actually sends the RELEASE RPC back to the lock server.
Concurrent NFS RPCs should be atomic with respect to each other. Suppose two ccfs processes are serving the same underlying file system (i.e. they share a block server and a root file handle). If you send each of them an NFS RPC at the same time, the final result should be a result that could have occurred if you had sent the two RPCs one at a time.
For example, suppose there's a directory containing the file "f". If you send concurrent NFS3_REMOVE("f") RPCs to the two ccfs servers, then one REMOVE should delete the file from the directory and return a successful reply, and the other REMOVE should return a NFS3ERR_NOENT error and do nothing to the directory. They should not both erase the directory entry and both return a successful result.
A consistent cache is one that always provides the latest value of each block. If ccfs #1 modifies a block, and then ccfs #2 reads the block, then ccfs #2 should see the modifications made by ccfs #1.
First, use locks to build yourself a server that executes NFS RPCs atomically. You probably want to have one lock per file and directory, using the file handle as the lock's name. Modify each RPC handler to acquire the relevant lock (perhaps locks) at the beginning, and release them just before the RPC sends the reply. You might find it convenient to modify get_fh() and put_fh() to acquire() and release() the relevant lock.
Second, add cache consistency by flushing blocks when they are no longer protected by locks. You can do this by calling the block client's flush_prefix() method from fs::flush(). You want to maintain some invariants: every block is protected by exactly one lock, and the block is only in a ccfs's cache when the ccfs holds the lock. If you get() blocks only after acquiring the relevant lock, and flush blocks from your cache before the lock client sends the lock back to the lock server, you should be OK.
Third, add lock caching (or "lazy release" of locks). You want to cache locks because at this point the block cache isn't doing you much good. You can only cache blocks while you hold the corresponding lock, and you only hold locks for the duration of NFS RPCs. The solution is to modify the lock client to retain each lock after fs.C calls release(), and only send the lock back to the server when the server sends a REVOKE RPC. This means that you can also leave blocks in the client cache until the corresponding lock is REVOKEd. If no other ccfs's are using the same blocks as you, you can then operate entirely out of the block and lock caches, and never talk to the block server.
You're not responsible for any of the following, some of which could make good 6.824 projects:
test2a tests cache consistency and atomicity. Run two copies of ccfs on the same host, using the same block server and root handle, but different directory names under /classfs. Then run test2a, specifying the two directory names. For example:
% ./blockdbd 5567 & % ./ccfs dir1 localhost 5567 & root file handle: 4d1def4be444163c % ./ccfs dir2 localhost 5567 4d1def4be444163c & % ./test2a /classfs/dir1 /classfs/dir2 Create then read: OK Unlink: OK Append: OK Readdir: OK Many sequential creates: OK Concurrent creates: OK Concurrent creates of the same file: OK Concurrent create/delete: OK Concurrent creates, same file, same server: OK test2a: Passed all tests.If test2a says "Passed all tests", then it thinks your server provides both atomicity and cache consistency. test2a works by creating and deleting files in the same directory via the two ccfs's.
Use test2b to test performance. It creates two sub-directories and creates/deletes files in only one of the two directories through each of two ccfs servers. If your ccfs caches both blocks and locks, this should be pretty fast. In order to measure performance, we need the block server to print out how many put and get operations the ccfs's performance; you can do this with the -v flag. So do this in one window:
% ./blockdbd -v 5567And this in another window:
% ./ccfs dir1 localhost 5567 & root file handle: 4d1def4be444163c % ./ccfs dir2 localhost 5567 4d1def4be444163c & % ./test2b /classfs/dir1 /classfs/dir2 Create/delete in separate directories: OK %The blockdbd should print out lines of numbers. It prints out a line every 100 put or get operations. Each lines specifies the total number of puts and gets so far. You're interested in the last line it prints, when test2b finishes. If you've implemented caching of locks and blocks well, the last line should show about 10 puts and about 800 gets. If your ccfs didn't cache anything, blockdbd would print about 1400 puts and 64000 gets; this is about the performance your ccfs from Lab 4 should achieve. Good caching can dramatically reduce the load on the shared block server.
You can test atomicity alone by un-defining BLOCKCACHE at the beginning of blockdbc.C. This will turn off caching, so that all put()s and get()s go to the block server. If your server has implemented atomicity correctly, then you should pass all the test2a tests when you turn off BLOCKCACHE.
Don't use the nfscall or the RPC arguments or reply areas after you send the RPC reply; those objects are deallocated as part of replying. In practice this means you should release locks before replying, since typically the lock will be named by a file handle that's contained in the RPC arguments.
The lock client interface uses str to name locks, but you'll probably be using file handles as lock names. You can convert between the two representations with fs::fh2s() and fs::s2fh().
You can turn on dmalloc debugging to help you find malloc()/free() errors and use of deallocated memory:
% setenv DMALLOC_OPTIONS debug=0x24f41d03,inter=100 % ccfs ...
gmake handin-2
. Copy this file to
~/handin/fs-lab-2-handin.tgz. We will use the first
copy of the file that we can find after the deadline.