With this post I am starting a series of articles on some of the fundamental concepts that you should have at least a passing familiarity with if you are planning on working with GemStone/S. These articles aren’t intended to replace our most excellent documentation, but to serve as a tantalizing introduction to the wonderful delicacies one can enjoy if he or she decides to crack the shell of a fresh SAG(pdf), open a case of Topaz(pdf) and sit by a cheery fire some chilly evening this autumn.
First on the menu is the transaction. GemStone/S transactions are light on the nose and have a warm, smokey flavor with echoes of strawberries and glazed donuts that reverberates off the palate with the force of a Dandelion archene….ahem…one shouldn’t write a blog on an empty stomach.
GemStone/S is a full-fledged database complete with ACID properties. Not ACID as in Lucy in the Sky with Diamonds, but ACID as in a guarantee that the Smalltalk objects in your transaction have been completely and correctly written to disk: Atomicity, Consistency, Isolation, and Durability.
In GemStone/S transactions are used for persistence, but they are also useful for sharing objects between vms, via the Shared Page Cache (SPC).
I will warn you that I’ll get a little geeky during the following discussion and it will probably help if you take a peek at the Glossary to familiarize yourself with some of the terms I’ll be using, but I promise to try to avoid dragging you too far down the rabbit hole.
Abort
All transactions in GemStone start with an abort. When an abort is performed the following steps are performed in a gem:
- Invalidate objects
- Acquire new view
Invalidate objects
All of the dirty objects (i.e., persistent objects previously modified) in the vm are marked as invalid. The list of dirty objects in a vm is called a writeSet. All objects that have been changed since the last time the gem updated its view (called a writeSetUnion) are also marked as invalid.
A subsequent reference to an invalid object will cause a fresh version of the object to be copied into the vm.
Each persistent object is identified by a unique id called an OOP. An OOP is a 61 bit value. The three extra bits in a 64 bit word are used as tag bits. Tag bits differentiate between regular objects (which need to be physically stored in the data base) and special objects. The value of a special object is encoded in the 61 bits of the OOP itself. SmallIntegers, SmallDoubles, and Characters are examples of special objects.….Was that a rabbit????
One or more objects are stored on a data page. Objects that are larger than a page are transparently broken up into page-sized chunks. Because of this chunking it is possible to reference objects in a million element array without having to load the entire million element array into memory.
Objects are read from and written to extents on disk in units of pages. Not suprisingly pages are cached in memory in the Shared Page Cache.
The Object Table (OT) is a Btree that maps OOPs to pages. The OT is stored in a set of pages, cached in the SPC and written to an extent just like the data pages.
Acquire new view
The gem contacts the stone and gets a new view of the database. A view is a reference to the latest OT along with some other bookkeeping information.
Transaction Body
As the vm executes code following an abort, it starts keeping track of all of the objects that are modified during the transaction in the writeSet.
Object references are stored in the body of an object as an OOP, so when an instance variable is accessed in a persistent object and the object is not in the vm or the object has been invalidated, the OT is consulted, and the SPC is checked to see if the page containing the object is already present. If the page isn’t in the SPC, then the page is loaded from disk. Finally, the object is copied from the page into the vm.….A white rabbit????
Commit
When a commit is performed the following steps are performed in the gem:
- Flush dirty objects
- Write transaction log entries
- Acquire commit token
- Check for conflicts
- Finalize commit
Flush dirty objects
During this step all of the objects in the writeSet and all of the newly created objects that are reachable from persistent, dirty objects are copied from the vm into new pages in the SPC.
The gem’s copy of the OT is then updated with the new OOP to page mapping. The OT data structure is designed to do a copy on write, so only the portion of the Btree that is changed needs to be written to new OT pages. In practice large portions of the OT are shared amongst multiple views.….Nope, it’s a Dormouse. I’m sure of it.
The new pages in the SPC, containing the latest state of the modified objects are not written directly to disk as part of the transaction. Doing so would take too much time, as disk writes are notoriously slow. A separate process (AIO pageServer) asynchronously writes the ‘dirty’ data pages to disk.
At periodic intervals a concerted effort is made to ensure that all ‘dirty’ pages written before a certain point in time are flushed to disk. This is called a checkpoint.
Write transaction log entries
In order to ensure Durability we do have to write something to disk as part of the transaction. It turns out that we can write a minimum amount of information about the changes to objects much faster than we can write the entire object to disk.
During this step, tranlog records are written by the stone for all of the changed and new objects. Asynchronous i/o is used (when available on the host os) to write transaction logs, so that commit processing can go on while the tranlog records make their way to disk. In a performance sensitive installation, the transaction logs are located on a raw partition or on optimized disk arrays for the fastest i/o possible.….Oh Oh, now there’s a wacky, little guy in a top hat.
In the event of a system crash, one can recover the database by replaying all tranlog records written since the last checkpoint.
Acquire commit token
Up to this point commit processing in multiple gems can occur in parallel, but in this final phase of the commit, only one gem can proceed at a time. The stone manages the queue of gems by handing out a commit token to one gem at a time.
Check for conflicts
When a gem gets the commit token, it begins checking for commit conflicts (i.e., a valid transaction). It does this by comparing the gem’s current writeSet with the writeSetUnion (the union of all writeSets from the transactions that occurred since the gem acquired its original view) and if any of the OOPs are in both sets a conflict has occurred and the commit fails. If the commit fails, the gem gives up the commit token and either aborts or attempts to recover from the commit failure.….Little balls of ….. hedgehogs????
Finalize commit
If there are no conflicts, then the gem returns the commit token to the stone along with a copy of its writeSet (for writeSetUnion processing in other gems) and a reference to the gem’s updated OT. The stone is then free to pass along the commit token, but the gem must still wait until the stone informs it that the asynchronous transaction log i/o has completed.
….a toothy smile appears floating in mid-air and the Cheshire Cat slowly materializes, hands you a precious stone and fades away completely….
Oh well, maybe I went farther down the rabbit hole than I originally planned, but I did warn you!
11 comments
Comments feed for this article
October 5, 2007 at 2:34 pm
Ramon Leon
Not too deep at all, great article, keep them coming. Eventually those of us lurking will get brave, find some hardware, and step through the Gemstone looking glass and try a world without Squeak, well, a production deployment without Squeak anyway (I’d never give it up for development).
October 5, 2007 at 3:03 pm
Dale Henrichs
Thanks Ramon. I’ve got a couple of follow on articles planned and this one was intended to lay a foundation. Production deployment is right where we think we add the most value. We’re still ironing out a thing or two, so it doesn’t hurt to wait. Meanwhile, we’ll keep the tea pot warm and a place cleared for you at the table..
October 7, 2007 at 8:35 pm
Ramon Leon
Ha, I appreciate the offer, I wish I could have made it up there but it was a bit out of the way. I just got home an hour ago, back in Phoenix now. Oregon was great though, really nice up there, I might find myself there when I get tired of the desert.
Anyway, looking forward to some more articles, as well as learning about Gemstone, I wouldn’t mind seeing some speed comparisons between Seaside in Gemstone and Squeak if you ever get around to it. Your VM has to be way better tuned than Squeak, even if I didn’t want the persistence, I’d want a faster VM for Seaside, a selling point I wouldn’t overlook.
October 8, 2007 at 9:54 am
Dale Henrichs
I’ve been doing a fair amount of performance work in the last week or so with an eye to a blog post … including Squeak numbers sounds like a good idea. Thanks, man!
October 8, 2007 at 11:20 am
Ramon Leon
No prob, I’m truly curious. I just notice how much snappier GemSource feels than SqueakSource.
October 8, 2007 at 12:26 pm
Philippe Marschall
stoned? Is that another of your legendary abbreviations Dale? ;)
@Ramon
The Speed of SqueakSource greatly depends on the size of the installation. The slowest I know is:
http://source.impara.de/
which lacks a a lot of the performance tuning we did for the main installation. Compare that to a really small installation:
http://mc.bioskop.fr/
We now have almost 1000 projects with more than 25000 versions. Having that said, the Gemstone VM certainly is way more mature and tuned.
October 8, 2007 at 12:55 pm
Dale Henrichs
@Philippe – stoned is the legitimate name of the stoned(aemon), so I can’t claim credit for that:)
In the Seaside tests I’ve done, our vm is definitely snappier than Squeak – part of the performance gain is the fact that our vm executes byte codes faster, part of the gain is due to the fact that the in-vm garbage collector has a a smaller load of objects to collect – we push seaside session state to disk and part of the gain is that we only keep a subset of all of the objects in the ‘image’ in memory at one time and that subset is the ‘most recently used’ objects….
October 13, 2007 at 9:58 pm
Ramon Leon
That’s another thing that will be an awesome feature you shouldn’t overlook, with Gemstone sessions going to disk, you no longer need to maintain session affinity in the web farm. Assuming one scaled up to say several servers all running in a Gem cluster (licensed of course), any server would be able to serve the request (unlike with Squeak). Needing session affinity is a big complaint for many Seaside newcomers because they think it limits scalability (Rails tends to hammer statelessness into them).
The truth is it just complicates deployment a bit. At the moment, when I roll out new code, I just copy out a new image and restart my 20 instance farm blowing away all current sessions, it’s the simplest thing that works for now. Once my app starts dealing with money and bookings, I’m going to have to revisit this a bit and look at having the running instances live upgrade themselves from Monticello to keep from losing session state.
October 15, 2007 at 2:02 pm
Dale Henrichs
@Ramon. Good observation on persistent session state.
Philippe has published an Eternity package (http://seaside.gemstone.com/ss/Eternity.html), which removes the session aging from Seaside2.8 (for both sessions and continuations), making it possible to keep session state around as long as you have enough disk space – bookmarking those ‘RESTless’ URLs would be a snap! Of course this illustrates the additional property of persistent session state in GemStone, that the expiration policy for sessions/continuations, does not have to take memory availability into account.
For deploying code updates to GemStone, the new code is basically available as soon as the session that loaded the code commits, so ther is no need to need to bounce servers when deploying an update.
March 17, 2008 at 2:24 pm
GemStone 101: Transaction Conflicts « (gem)Stone Soup
[…] article in the GemStone 101 series. If you haven’t already done so, I recommend that you read GemStone 101: Transactions and Unlimited GemStone VMs in every Garage? ….and a Stone in every Pot before reading this post. […]
June 2, 2008 at 8:58 am
Scaling Seaside with GemStone/S « (gem)Stone Soup
[…] graphs that appeared to be related to file system buffer flushing. If you have read the post on transactions, then you know that we write tranlog records on every commit (page request), so disk i/o can be a […]