You are currently browsing the daily archive for October 4, 2007.
With this post I am starting a series of articles on some of the fundamental concepts that you should have at least a passing familiarity with if you are planning on working with GemStone/S. These articles aren’t intended to replace our most excellent documentation, but to serve as a tantalizing introduction to the wonderful delicacies one can enjoy if he or she decides to crack the shell of a fresh SAG(pdf), open a case of Topaz(pdf) and sit by a cheery fire some chilly evening this autumn.
First on the menu is the transaction. GemStone/S transactions are light on the nose and have a warm, smokey flavor with echoes of strawberries and glazed donuts that reverberates off the palate with the force of a Dandelion archene….ahem…one shouldn’t write a blog on an empty stomach.
GemStone/S is a full-fledged database complete with ACID properties. Not ACID as in Lucy in the Sky with Diamonds, but ACID as in a guarantee that the Smalltalk objects in your transaction have been completely and correctly written to disk: Atomicity, Consistency, Isolation, and Durability.
In GemStone/S transactions are used for persistence, but they are also useful for sharing objects between vms, via the Shared Page Cache (SPC).
I will warn you that I’ll get a little geeky during the following discussion and it will probably help if you take a peek at the Glossary to familiarize yourself with some of the terms I’ll be using, but I promise to try to avoid dragging you too far down the rabbit hole.
All transactions in GemStone start with an abort. When an abort is performed the following steps are performed in a gem:
- Invalidate objects
- Acquire new view
All of the dirty objects (i.e., persistent objects previously modified) in the vm are marked as invalid. The list of dirty objects in a vm is called a writeSet. All objects that have been changed since the last time the gem updated its view (called a writeSetUnion) are also marked as invalid.
A subsequent reference to an invalid object will cause a fresh version of the object to be copied into the vm.
Each persistent object is identified by a unique id called an OOP. An OOP is a 61 bit value. The three extra bits in a 64 bit word are used as tag bits. Tag bits differentiate between regular objects (which need to be physically stored in the data base) and special objects. The value of a special object is encoded in the 61 bits of the OOP itself. SmallIntegers, SmallDoubles, and Characters are examples of special objects.….Was that a rabbit????
One or more objects are stored on a data page. Objects that are larger than a page are transparently broken up into page-sized chunks. Because of this chunking it is possible to reference objects in a million element array without having to load the entire million element array into memory.
Objects are read from and written to extents on disk in units of pages. Not suprisingly pages are cached in memory in the Shared Page Cache.
The Object Table (OT) is a Btree that maps OOPs to pages. The OT is stored in a set of pages, cached in the SPC and written to an extent just like the data pages.
Acquire new view
The gem contacts the stone and gets a new view of the database. A view is a reference to the latest OT along with some other bookkeeping information.
As the vm executes code following an abort, it starts keeping track of all of the objects that are modified during the transaction in the writeSet.
Object references are stored in the body of an object as an OOP, so when an instance variable is accessed in a persistent object and the object is not in the vm or the object has been invalidated, the OT is consulted, and the SPC is checked to see if the page containing the object is already present. If the page isn’t in the SPC, then the page is loaded from disk. Finally, the object is copied from the page into the vm.….A white rabbit????
When a commit is performed the following steps are performed in the gem:
- Flush dirty objects
- Write transaction log entries
- Acquire commit token
- Check for conflicts
- Finalize commit
Flush dirty objects
During this step all of the objects in the writeSet and all of the newly created objects that are reachable from persistent, dirty objects are copied from the vm into new pages in the SPC.
The gem’s copy of the OT is then updated with the new OOP to page mapping. The OT data structure is designed to do a copy on write, so only the portion of the Btree that is changed needs to be written to new OT pages. In practice large portions of the OT are shared amongst multiple views.….Nope, it’s a Dormouse. I’m sure of it.
The new pages in the SPC, containing the latest state of the modified objects are not written directly to disk as part of the transaction. Doing so would take too much time, as disk writes are notoriously slow. A separate process (AIO pageServer) asynchronously writes the ‘dirty’ data pages to disk.
At periodic intervals a concerted effort is made to ensure that all ‘dirty’ pages written before a certain point in time are flushed to disk. This is called a checkpoint.
Write transaction log entries
In order to ensure Durability we do have to write something to disk as part of the transaction. It turns out that we can write a minimum amount of information about the changes to objects much faster than we can write the entire object to disk.
During this step, tranlog records are written by the stone for all of the changed and new objects. Asynchronous i/o is used (when available on the host os) to write transaction logs, so that commit processing can go on while the tranlog records make their way to disk. In a performance sensitive installation, the transaction logs are located on a raw partition or on optimized disk arrays for the fastest i/o possible.….Oh Oh, now there’s a wacky, little guy in a top hat.
In the event of a system crash, one can recover the database by replaying all tranlog records written since the last checkpoint.
Acquire commit token
Up to this point commit processing in multiple gems can occur in parallel, but in this final phase of the commit, only one gem can proceed at a time. The stone manages the queue of gems by handing out a commit token to one gem at a time.
Check for conflicts
When a gem gets the commit token, it begins checking for commit conflicts (i.e., a valid transaction). It does this by comparing the gem’s current writeSet with the writeSetUnion (the union of all writeSets from the transactions that occurred since the gem acquired its original view) and if any of the OOPs are in both sets a conflict has occurred and the commit fails. If the commit fails, the gem gives up the commit token and either aborts or attempts to recover from the commit failure.….Little balls of ….. hedgehogs????
If there are no conflicts, then the gem returns the commit token to the stone along with a copy of its writeSet (for writeSetUnion processing in other gems) and a reference to the gem’s updated OT. The stone is then free to pass along the commit token, but the gem must still wait until the stone informs it that the asynchronous transaction log i/o has completed.
….a toothy smile appears floating in mid-air and the Cheshire Cat slowly materializes, hands you a precious stone and fades away completely….
Oh well, maybe I went farther down the rabbit hole than I originally planned, but I did warn you!