udanax_green_internals

This is part of The Pile, a partial archive of some open source mailing lists and newsgroups.



Subject: more Green information
From: "David G. Durand" <david@dynamicdiagrams.com>
Date: Thu, 19 Oct 2000 23:37:05 -0400

I've been talking with Roger about the internals of Green, and 
thought that the information might be of interest.

Herewith, a few messages in a discussion of details.

Once I feel I've got these straight I'll try to summarize, but that 
takes time that I will have to find.


At 9:21 AM -0400 10/18/00, David G. Durand wrote:
>
>At 6:12 AM -0500 10/18/00, Jeff Rush wrote:
>>Tonight I was decoding Abraham and trying to find the CurrentPacker
>>class, with no luck.
>>I was seeking the low-level disk I/O, trying to build up a layered
>>architecture diagram
>>like the traditional 7-level ISO/OSI diagram.  What's on top and what's
>>on bottom,
>>generally speaking re software modules (*NOT* datastructures as I know
>>Ents have no top/bottom
>>per se).  CurrentPacker is not included in the coordspace.st and top.st
>>files I received.
>>I'll look for it in what you upload tonight.
>>
>
>CurrentPacker and a number of other undefined globals were Fluid 
>variables. Fluids (in case you're not an old LISPer) are dynamically 
>scoped globals. I'm not sure why a special mechanism like this was 
>needed in Smalltalk, where globals _are_ all fluid, but I have a 
>pretty good idea for two reasosn:
>
>1. They are part of the garbage collector's start-set for a collection sweep.
>
>2. And of course, these variables were required to be persistent, 
>and so the variables needed to have an explicit storage model, and 
>there needed to be overridable methods to detect each access and 
>update event.
>
>Each Fluid also had an Emulsion, which was a persistent storage Area 
>in which a set of FLuids could be stored. There was a global 
>variable (CurrentEmulsion/GlobalEmulsion/CurrentGlobalEmulsion? I 
>forget) that stored the default Emulsion for new globals.
>
>I wasn't able to figure out how to correctly initialize these 
>things, I think because several global methods for Object and Class 
>are missing in the changesets we have. One thing that I would love 
>to see is a complete FileOut of the whole Xanadu Image, as I could 
>then use Smalltalk-based tools to select out more of those methods, 
>and I could also examine the way the globals were actually set up.
>
>At this point, my real issues around the Ent have to do with:
>
>The details of the canopy (where things seemed to change several 
>times during the course of the project)
>
>The details of balancing, and to what extent balancing the O- and 
>H-trees is independent.
>
>Whether it's possible to have a purely binary H-tree implementation.
>
>How the generalizations to arbitrary addressing schemes worked (It 
>almost seems that they treated some of them as ordered dictionaries 
>based on Hash values for objects).
>
>I also would be very happy to understand the overview of how the 
>Green implementation works. Jeff's work looks rather nice so far 
>(for C++). In particular, orgls seem to be different in Green and 
>Gold. And I don't understand the purpose of the Spanfilade -- It's a 
>2-d enfilade, mapping points in some pair of coordinate spaces into 
>some other coordinate space(s).

At 10:33 AM -0700 10/18/00, roger gregory wrote:
>
>"David G. Durand" wrote:CurrentPacker and a number of other 
>undefined globals were Fluid
>
>>  variables. Fluids (in case you're not an old LISPer) are dynamically
>  > scoped globals. I'm not sure why a special mechanism like this was
>>  needed in Smalltalk, where globals _are_ all fluid, but I have a
>>  pretty good idea for two reasosn:
>>
>
>As I remember they are needed on the C++ side, remember the 
>{declaration} stuff
>is ignored by smalltalk.
>
>>
>>  1. They are part of the garbage collector's start-set for a 
>>collection sweep.
>>
>>  2. And of course, these variables were required to be persistent,
>>  missing in the changesets we have. One thing that I would love to see
>>  is a complete FileOut of the whole Xanadu Image, as I could then use
>>  Smalltalk-based tools to select out more of those methods, and I
>>  could also examine the way the globals were actually set up.
>  >
>
>Coming in the tarfile,if it ever gets out on this flakey line.
>
>>
>  > At this point, my real issues around the Ent have to do with:
>>
>>  The details of the canopy (where things seemed to change several
>>  times during the course of the project)
>>
>
>They SURE did, seemingly every week.
>
>>
>>  The details of balancing, and to what extent balancing the O- and
>>  H-trees is independent.
>>
>
>The dynamic ballancing was a convienience, for the rabid implementation, and
>it was convienently forgotten by the design/bozos that it broke disk reading
>severely, by meaning that every read had to be rewritten, completely breaking
>the model for write once media.
>
>>
>>  Whether it's possible to have a purely binary H-tree implementation.
>>
>>  How the generalizations to arbitrary addressing schemes worked (It
>>  almost seems that they treated some of them as ordered dictionaries
>>  based on Hash values for objects).
>>
>>  I also would be very happy to understand the overview of how the
>>  Green implementation works. Jeff's work looks rather nice so far (for
>>  C++). In particular, orgls seem to be different in Green and Gold.
>>  And I don't understand the purpose of the Spanfilade -- It's a 2-d
>>  enfilade, mapping points in some pair of coordinate spaces into some
>>  other coordinate space(s).
>>
>
>the spanf map maps ffom spans to the docs they are part of ,or map to.

At 5:09 PM -0400 10/18/00, David G. Durand wrote:
>
>>
>>  > The details of balancing, and to what extent balancing the O- and
>>>  H-trees is independent.
>>>
>>
>>The dynamic ballancing was a convienience, for the rabid implementation, and
>>it was convienently forgotten by the design/bozos that it broke disk reading
>>severely, by meaning that every read had to be rewritten, completely breaking
>>the model for write once media.
>
>I'm not even worried about the splay versus incremental balancing, 
>just whether one can independently rebalance H-trees when a new 
>O-tree is added, as long chains of versions can make for long, 
>linear H-trees, if I'm not mistaken.
>>  > I also would be very happy to understand the overview of how the
>>>  Green implementation works. Jeff's work looks rather nice so far (for
>>>  C++). In particular, orgls seem to be different in Green and Gold.
>>>  And I don't understand the purpose of the Spanfilade -- It's a 2-d
>>>  enfilade, mapping points in some pair of coordinate spaces into some
>>>  other coordinate space(s).
>>>
>>
>>the spanf map maps ffom spans to the docs they are part of ,or map to.
>
>I asked a whole bunch of questions (that I'm leaving in case I'm 
>confused), but is this the basic sketch of the Green Architecture?
>
>Begin sketch:
>
>The Granfilade stores all the text in the system under private 
>addresses. As an efficiency measure, text inserted into the 
>granfilade is inserted adjacent to the text that it's near. All text 
>in the granfilade belongs to an existing document.
>
>Somewhere, perhaps in the granfilade (using a distinguished range of 
>addresses?) there's a global map that maps Xanadu server, account, 
>document, etc. addresses down to a single poomfilade per document. 
>The poomfilade is edited just like any other enfilade, but it 
>doesn't contain characters, but rather contains offsets and lengths 
>of items in the granfilade (stored as granfilade addresses).
>
>The spanfilade maintains a global inverse map to all of the 
>poomfilades, maitaining an association between ranges in the 
>granfilade to specific document ranges in the poomfilades.
>
>When new text is inserted, it's added at an appropriate spot in the 
>granfilade, and a new (offset,length) pair of granfilade tumblers is 
>added to the poomfilade. Simultaneously, the spanfilade is updated 
>to reflect that the new range in the granfilade maps to a specific 
>range in the relevant poomfilade.
>
>When text is transcluded from one edition into another, a new entry 
>is made in the destination poomfilade to indicate that a certain set 
>of granfilade addresses is now included in that poomfilade, and a 
>set of new mappings is added to the spanfilade, reflecting that 
>those ranges of granfilade addresses also map to certain poomfilade 
>addresses in the destination poomfilade.
>
>To backfollow from a document address, you start at its poom, and 
>harvest a set of granfilade ranges, then you pump them into the 
>spanfilade, and harvest a set of document ranges that point into 
>other poomfilades.
>
>Links may just be special records stored in the poomfilades, since 
>the same backfollow approach works for them.
>
>end of David's guess.
>
>Some random thoughts on the above.
>
>The above is actually very similar to the stuff that Steve DeRose 
>and I explained to you at Hypertext '89. In particular, I think we 
>had already independently derived the model-T enfilade, and 
>structure-sharing versioning strategy. We were sticking Markup nodes 
>in there, ala SGML, but that was more of a religious difference than 
>a data structure one.
>
>The problem of fragmenting end-sets is a real one in all of these 
>schemes, as a linear number of edits can create a linear number of 
>items that need to be returned in these intermediate queries, which 
>can make the run-time worse than log in some cases.
>
>Here are the random queries I started before coming up with the design above:
>
>Is this spanfilade map, just a just mapping from a (start, end) pair 
>of global xanadu addresses to a range of granfilade addresses, or 
>does it map to document addresses?
>
>Does the granfilade use tumblers only to allow the creation of new, 
>linearly ordered insertion points between existing granfilade 
>addresses? couldn't the granfilade be a linear, append-only 
>text-space? I guess the granfilade does allow for changes to 
>non-frozen versions to vanish silently as their versions are edited.
>
>[BTW if you're jut generating insertion points you can create a 
>simpler address space based on paths in an infinite binary tree. The 
>resulting numbers turn out to be isomorphic to the rational numbers.]
>
>The POOMfilade seems more obscure to me. Does it map from Document 
>addresses (global Udanax addresses) into the granfilade, or the 
>spanfilade.
>
>Links seem like a natural for the spanfilade, but they seem like 
>they ought to be stored in document addresses, not granfilade 
>addresses.
>
>   -- David

At 10:49 PM -0700 10/18/00, roger gregory wrote:
>
>"David G. Durand" wrote:I'm not even worried about the splay versus 
>incremental balancing,
>
>>just whether one can independently rebalance H-trees when a new
>>O-tree is added, as long chains of versions can make for long, linear
>>H-trees, if I'm not mistaken.
>>
>Here I'm not sure, but I'd be surprised if things had to get linear,
>why couldn't it be reballanced?  Long chans of versions sould result in a tree
>with the versions at the bottom of the tree.
>
>>
>>>   > I also would be very happy to understand the overview of how the
>>>>   Green implementation works. Jeff's work looks rather nice so far (for
>>>>   C++). In particular, orgls seem to be different in Green and Gold.
>>
>Right since Gold is MANY generations of design from Green, there are few
>direct identities.  Mostly anything that would limit performance to 
>square root
>is avoided in Gold (note that poom and span have square root worst case).
>
>>
>>>>   And I don't understand the purpose of the Spanfilade -- It's a 2-d
>>>>   enfilade, mapping points in some pair of coordinate spaces into some
>>>>   other coordinate space(s).
>>>>
>>>
>>>the spanf map maps ffom spans to the docs they are part of ,or map to.
>>
>>I asked a whole bunch of questions (that I'm leaving in case I'm
>>confused), but is this the basic sketch of the Green Architecture?
>>
>>Begin sketch:
>>
>>The Granfilade stores all the text in the system under private
>>addresses. As an efficiency measure, text inserted into the
>>granfilade is inserted adjacent to the text that it's near. All text
>>in the granfilade belongs to an existing document.
>>
>>Somewhere, perhaps in the granfilade (using a distinguished range of
>>addresses?) there's a global map that maps Xanadu server, account,
>>document, etc. addresses down to a single poomfilade per document.
>>
>The addresses map directly to the granfilade via the addresses.
>
>>
>>The poomfilade is edited just like any other enfilade, but it doesn't
>>contain characters, but rather contains offsets and lengths of items
>>in the granfilade (stored as granfilade addresses).
>>
>>The spanfilade maintains a global inverse map to all of the
>>poomfilades, maitaining an association between ranges in the
>>granfilade to specific document ranges in the poomfilades.
>>
>>When new text is inserted, it's added at an appropriate spot in the
>>granfilade, and a new (offset,length) pair of granfilade tumblers is
>>added to the poomfilade. Simultaneously, the spanfilade is updated to
>>reflect that the new range in the granfilade maps to a specific range
>>in the relevant poomfilade.
>>
>>When text is transcluded from one edition into another, a new entry
>>is made in the destination poomfilade to indicate that a certain set
>>of granfilade addresses is now included in that poomfilade, and a set
>>of new mappings is added to the spanfilade, reflecting that those
>>ranges of granfilade addresses also map to certain poomfilade
>>addresses in the destination poomfilade.
>>
>>To backfollow from a document address, you start at its poom, and
>>harvest a set of granfilade ranges, then you pump them into the
>>spanfilade, and harvest a set of document ranges that point into
>>other poomfilades.
>>
>>Links may just be special records stored in the poomfilades, since
>>the same backfollow approach works for them.
>>
>>end of David's guess.
>>
>>
>Right, note that there isn't currently enough info stored to remove a document
>or edit a link, as the poom entries can't be found.  This should be 
>simple to fix.
>
>>
>>Some random thoughts on the above.
>>
>>The above is actually very similar to the stuff that Steve DeRose and
>>I explained to you at Hypertext '89. In particular, I think we had
>>already independently derived the model-T enfilade, and
>>structure-sharing versioning strategy. We were sticking Markup nodes
>>in there, ala SGML, but that was more of a religious difference than
>>a data structure one.
>>
>>
>Really, I'm surprised, I would have thought I would remember so close a
>similarity.  We consider the i<->v mapping as fundamental, did you have
>anything like that? Is there any paper on that, so I can compare, it might
>be useful to see what the differences were, it could give some insight.
>
>>
>>The problem of fragmenting end-sets is a real one in all of these
>>schemes, as a linear number of edits can create a linear number of
>>items that need to be returned in these intermediate queries, which
>>can make the run-time worse than log in some cases.
>>
>>
>Well yes but,
>recall that we mostly will want to retireve small pieces of the document, some
>small multiple of display sizes, and recall that much (often all) of 
>the structure
>and text will just get cached in ram, the problem is an in CPU problem and
>not a big "read it in form disk and take forever" kind of thing.
>
>>
>>Here are the random queries I started before coming up with the design above:
>>
>>Is this spanfilade map, just a just mapping from a (start, end) pair
>>of global xanadu addresses to a range of granfilade addresses, or
>>does it map to document addresses?
>>
>>
>grandfilade addresses == i (iinvarient) addresses
>Yes the spanf maps to the i addresses,
>
>>
>>Does the granfilade use tumblers only to allow the creation of new,
>>linearly ordered insertion points between existing granfilade
>>addresses? couldn't the granfilade be a linear, append-only
>>text-space? I guess the granfilade does allow for changes to
>>non-frozen versions to vanish silently as their versions are edited.
>>
>>
>More than that the tumblers allow locality and authorship info to be incoded.
>Note that Gold hasn't yet decided on a particular address encoding!
>
>>
>>[BTW if you're jut generating insertion points you can create a
>>simpler address space based on paths in an infinite binary tree. The
>>resulting numbers turn out to be isomorphic to the rational numbers.]
>>
>>
>note that tumblers (Transfilite nUMBERS) are a LARGER set than 
>rationals.  They
>are isomorphic to the transfinite ordinals.
>
>
>>
>>The POOMfilade seems more obscure to me. Does it map from Document
>>addresses (global Udanax addresses) into the granfilade, or the
>>spanfilade.
>>
>>
>Poom maps  i<->v
>spanf maps i<->i
>so given an i (or granf address) the poom gives a v  (document address)
>given an ispan the spanf gives the i addresses in the documents 
>(including links) that map
>to that ispan.
>
>>
>>Links seem like a natural for the spanfilade, but they seem like they
>>ought to be stored in document addresses, not granfilade addresses.
>>
>>
>Ha, that would mean as the document changed the links would point to 
>different stuff.
>
>For example showing links as caps for easy ascii representation
>
>this is A text with TWO spans linked.
>   now if the addresses were to the v (document addresses) and we 
>edited by adding 6xes and a space
>xxxxx thIs is a tEXT with two spans linked.
>Make more sense now?


At 9:46 AM -0400 10/19/00, David G. Durand wrote:
>To: roger@xanadu.net
>From: "David G. Durand" <david@dynamicdiagrams.com>
>Subject: Re: I need a place to put ....
>Cc:
>Bcc:
>X-Attachments:
>
>>"David G. Durand" wrote:I'm not even worried about the splay versus 
>>incremental balancing,
>>
>>>just whether one can independently rebalance H-trees when a new
>>>O-tree is added, as long chains of versions can make for long, linear
>>>H-trees, if I'm not mistaken.
>>>
>>Here I'm not sure, but I'd be surprised if things had to get linear,
>>why couldn't it be reballanced?  Long chans of versions sould 
>>result in a tree
>>with the versions at the bottom of the tree.
>
>Right. As long as you can rebalance in H. If I had 3 concentrated 
>hours to analyze this, I could probably write down the invariants, 
>and see if they can be maintained. I've been assuming all along that 
>I can rotate in H (only) or in O (only), and that that's enough.
>
>
>>Right since Gold is MANY generations of design from Green, there are few
>>direct identities.  Mostly anything that would limit performance to 
>>square root
>>is avoided in Gold (note that poom and span have square root worst case).
>
>You can do span (range) queries in log N + S time (log of number of 
>spans, plus the size of the result set). Of course that's a 
>different algorithm.
>
>
>>>Somewhere, perhaps in the granfilade (using a distinguished range of
>>>addresses?) there's a global map that maps Xanadu server, account,
>>>document, etc. addresses down to a single poomfilade per document.
>>>
>>The addresses map directly to the granfilade via the addresses.
>
>So some granfilade addresses are externally visible ones (V-space 
>addresses that MIRV out via the Poomfilades), and some are internal 
>I-space addresses.
>
>>>
>>>The poomfilade is edited just like any other enfilade, but it doesn't
>>>contain characters, but rather contains offsets and lengths of items
>>>in the granfilade (stored as granfilade addresses).
>>>
>>>The spanfilade maintains a global inverse map to all of the
>>>poomfilades, maitaining an association between ranges in the
>>>granfilade to specific document ranges in the poomfilades.
>>>
>>>When new text is inserted, it's added at an appropriate spot in the
>>>granfilade, and a new (offset,length) pair of granfilade tumblers is
>>>added to the poomfilade. Simultaneously, the spanfilade is updated to
>>>reflect that the new range in the granfilade maps to a specific range
>>>in the relevant poomfilade.
>>>
>>>When text is transcluded from one edition into another, a new entry
>>>is made in the destination poomfilade to indicate that a certain set
>>>of granfilade addresses is now included in that poomfilade, and a set
>>>of new mappings is added to the spanfilade, reflecting that those
>>>ranges of granfilade addresses also map to certain poomfilade
>>>addresses in the destination poomfilade.
>>>
>>>To backfollow from a document address, you start at its poom, and
>>>harvest a set of granfilade ranges, then you pump them into the
>>>spanfilade, and harvest a set of document ranges that point into
>>>other poomfilades.
>>>
>>>Links may just be special records stored in the poomfilades, since
>>>the same backfollow approach works for them.
>>>
>>>end of David's guess.
>>>
>>>
>>Right, note that there isn't currently enough info stored to remove 
>>a document
>>or edit a link, as the poom entries can't be found.  This should be 
>>simple to fix.
>>
>>>
>>>Some random thoughts on the above.
>>>
>>>The above is actually very similar to the stuff that Steve DeRose and
>>>I explained to you at Hypertext '89. In particular, I think we had
>>>already independently derived the model-T enfilade, and
>>>structure-sharing versioning strategy. We were sticking Markup nodes
>>>in there, ala SGML, but that was more of a religious difference than
>>>a data structure one.
>>>
>>>
>>Really, I'm surprised, I would have thought I would remember so close a
>>similarity.  We consider the i<->v mapping as fundamental, did you have
>>anything like that? Is there any paper on that, so I can compare, it might
>>be useful to see what the differences were, it could give some insight.
>
>Nah, we never wrote it down intelligibly. We never looked at 
>addressing as central in the same way, and concentrated on structure 
>sharing. We had a more basic structure: a family of trees, sharing 
>many nodes in common, plus an external structure for tracking link 
>endpoints to node-IDs. That's more basic because you don't have an 
>easy way to find all sharers of a datum. But we did have the idea of 
>accumulating widths down the tree, and external indices from spans 
>to spans.
>
>As I said, similar, not the same.
>
>I came to look at addressing as much more critical later on, and 
>it's the key to my dissertation work, but by then I was more 
>interested in arbitrary combinations of changes and undo, coupled 
>with the ability to make edit-stable pointers into an alternative 
>version space.
>
>I devoted a lot less time to algorithms, because it's less important now.
>
>>>The problem of fragmenting end-sets is a real one in all of these
>>>schemes, as a linear number of edits can create a linear number of
>>>items that need to be returned in these intermediate queries, which
>>>can make the run-time worse than log in some cases.
>>>
>>>
>>Well yes but,
>>recall that we mostly will want to retireve small pieces of the 
>>document, some
>>small multiple of display sizes, and recall that much (often all) 
>>of the structure
>>and text will just get cached in ram, the problem is an in CPU problem and
>>not a big "read it in form disk and take forever" kind of thing.
>
>CPU problems are not problems at all anymore. At least not for most 
>data structures. There's a lot of cache-hit questions though around 
>disk traffic.
>
>>>Here are the random queries I started before coming up with the 
>>>design above:
>>>Does the granfilade use tumblers only to allow the creation of new,
>>>linearly ordered insertion points between existing granfilade
>>>addresses? couldn't the granfilade be a linear, append-only
>>>text-space? I guess the granfilade does allow for changes to
>>>non-frozen versions to vanish silently as their versions are edited.
>>>
>>>
>>More than that the tumblers allow locality and authorship info to be incoded.
>>Note that Gold hasn't yet decided on a particular address encoding!
>
>But does the granfilade include server/node/account/info in 
>i-addresses? I guess that makes sense, as the bytes are perpetually 
>"owned" by the home document.
>
>I tend to think of owning the address space that you publish as 
>opposed to the bytes at the end, but that reflects my own mental 
>drift, not your model. That makes things clearer.
>
>>>
>>>[BTW if you're jut generating insertion points you can create a
>>>simpler address space based on paths in an infinite binary tree. The
>>>resulting numbers turn out to be isomorphic to the rational numbers.]
>>>
>>>
>>note that tumblers (Transfilite nUMBERS) are a LARGER set than 
>>rationals.  They
>>are isomorphic to the transfinite ordinals.
>
>Sure, you have a lot more split points available. That's why 
>accounts/nodes etc. can split more easily. Just for a text stream, 
>though, the rationals are sufficient.
>
>>>
>>>The POOMfilade seems more obscure to me. Does it map from Document
>>>addresses (global Udanax addresses) into the granfilade, or the
>>>spanfilade.
>>>
>>>
>>Poom maps  i<->v
>>spanf maps i<->i
>>so given an i (or granf address) the poom gives a v  (document address)
>>given an ispan the spanf gives the i addresses in the documents 
>>(including links) that map
>>to that ispan.
>
>I guess I still have the poom backwards. How do I go from a 
>v-address (e.g. from the front end) to an i-address?
>
>>>Links seem like a natural for the spanfilade, but they seem like they
>>>ought to be stored in document addresses, not granfilade addresses.
>>>
>>>
>>Ha, that would mean as the document changed the links would point 
>>to different stuff.
>
>yeah, this is a perspective difference. I've spent a decade and a 
>half dealing with editing problems, and from that perspective you're 
>primarily focusing on a small number of "current states" and 
>unwinding the history to determine the original contexts, and do 
>document comparisons.
>
>>For example showing links as caps for easy ascii representation
>>
>>this is A text with TWO spans linked.
>>   now if the addresses were to the v (document addresses) and we 
>>edited by adding 6xes and a space
>>xxxxx thIs is a tEXT with two spans linked.
>>Make more sense now?
>
>Oh yeah. I actually figured out this mistake before writing my 
>design note/query. How do you record the originating document 
>context where a link was made? I guess by the link's home address. 
>When you make a new rev of a document you also create new links 
>(though structure is shared by transclusion). So I guess that tells 
>you what poomfilade to use to turn the link spans back into v-spans 
>for a particular document.
>
>Are you saying that there's no way to find the POOMs pointing to a 
>given i-span?
>
>PS. Can I post this thread to the udanax list?
>
>   -- David

===

Subject: Last item in Green thread
From: "David G. Durand" <david@dynamicdiagrams.com>
Date: Fri, 20 Oct 2000 08:47:32 -0400

At 11:33 AM -0700 10/19/00, roger gregory wrote:
>
>
>"David G. Durand" wrote:
>
>>
>>  >Right since Gold is MANY generations of design from Green, there are few
>>  >direct identities.  Mostly anything that would limit performance to
>>  >square root
>>  >is avoided in Gold (note that poom and span have square root worst case).
>>
>>  You can do span (range) queries in log N + S time (log of number of
>>  spans, plus the size of the result set). Of course that's a different
>>  algorithm.
>>
>
>Right, we might wnat to replace the spanf with some other algorithm, when
>we make it allow for editibility of links, and visibility protections for
>protected documents.
>
>>
>  > >>Somewhere, perhaps in the granfilade (using a distinguished range of
>>  >>addresses?) there's a global map that maps Xanadu server, account,
>>  >>document, etc. addresses down to a single poomfilade per document.
>>  >>
>>  >The addresses map directly to the granfilade via the addresses.
>>
>>  So some granfilade addresses are externally visible ones (V-space
>>  addresses that MIRV out via the Poomfilades), and some are internal
>>  I-space addresses.
>>
>
>Well they are all I-addresses, but the document addresses are visiable.
>
>>
>  > I came to look at addressing as much more critical later on, and it's
>  > the key to my dissertation work, but by then I was more interested in
>>  arbitrary combinations of changes and undo, coupled with the ability
>>  to make edit-stable pointers into an alternative version space.
>>
>>  I devoted a lot less time to algorithms, because it's less important now.
>>
>
>Nope, remember user expectations are higher and the universe is larger.
>We knew all of that in advance.  Algorithms, especially those needed for
>versioning and the related ability to not ever have to lock documents,
>are needed more now than ever before.  In fact our design was too early
>to generate user interest.
>
>>
>>  >>The problem of fragmenting end-sets is a real one in all of these
>>  >>schemes, as a linear number of edits can create a linear number of
>>  >>items that need to be returned in these intermediate queries, which
>>  >>can make the run-time worse than log in some cases.
>>  >>
>>  >>
>>  >Well yes but,
>>  >recall that we mostly will want to retireve small pieces of the 
>>document, some
>>  >small multiple of display sizes, and recall that much (often all) of
>>  >the structure
>>  >and text will just get cached in ram, the problem is an in CPU problem and
>>  >not a big "read it in form disk and take forever" kind of thing.
>>
>>  CPU problems are not problems at all anymore. At least not for most
>>  data structures. There's a lot of cache-hit questions though around
>>  disk traffic.
>>
>
>My point mostly, but recall that we don't need all the document at once
>and most documents are small  <20Mbytes.   The encyclopedia britanica
>12ed seems to be abut 400M, text only!
>
>  >
>  > >>Here are the random queries I started before coming up with the 
>design above:
>  > >>Does the granfilade use tumblers only to allow the creation of new,
>>  >>linearly ordered insertion points between existing granfilade
>>  >>addresses? couldn't the granfilade be a linear, append-only
>>  >>text-space? I guess the granfilade does allow for changes to
>>  >>non-frozen versions to vanish silently as their versions are edited.
>>  >>
>
>Note that the granfilade is used as linear within a document for the 
>text append.
>The tumblers are the the same as the conventional military numbering system
>which has time tested uses that match our usage.
>
>>
>>  >>
>>  >More than that the tumblers allow locality and authorship info to 
>>be incoded.
>>  >Note that Gold hasn't yet decided on a particular address encoding!
>>
>>  But does the granfilade include server/node/account/info in
>>  i-addresses? I guess that makes sense, as the bytes are perpetually
>>  "owned" by the home document.
>>
>
>The addresses are encoded, and perhaps that is a mistake, though it 
>is a small one.
>Mostly the addresses incode that info to give some locality as 
>documents get moved around
>and cached through the net.
>
>
>>
>>  I tend to think of owning the address space that you publish as
>>  opposed to the bytes at the end, but that reflects my own mental
>>  drift, not your model. That makes things clearer.
>>
>
>I don't get this, you don't own your text?  How so?
>
>>
>>  >>
>>  >>The POOMfilade seems more obscure to me. Does it map from Document
>>  >>addresses (global Udanax addresses) into the granfilade, or the
>>  >>spanfilade.
>>  >>
>>  >>
>>  >Poom maps  i<->v
>>  >spanf maps i<->i
>>  >so given an i (or granf address) the poom gives a v  (document address)
>>  >given an ispan the spanf gives the i addresses in the documents
>>  >(including links) that map
>>  >to that ispan.
>>
>>  I guess I still have the poom backwards. How do I go from a v-address
>>  (e.g. from the front end) to an i-address?
>>
>
>Simple the matrix maps both ways.  The code does also.
>
>>
>>  >>Links seem like a natural for the spanfilade, but they seem like they
>>  >>ought to be stored in document addresses, not granfilade addresses.
>>  >>
>>  >>
>>  >Ha, that would mean as the document changed the links would point to
>>  >different stuff.
>>
>>  yeah, this is a perspective difference. I've spent a decade and a
>>  half dealing with editing problems, and from that perspective you're
>>  primarily focusing on a small number of "current states" and
>>  unwinding the history to determine the original contexts, and do
>>  document comparisons.
>>
>
>Not to us,  we are focused on a set of notes, and the document built from
>them.  Thus the parallel text face stuff.  Note that there may be 
>many documents
>that are notes, that are spun together into many different resulatnt 
>documents,
>each containing pieces from most/all of the original notes documents.  Some
>people habitualy write this way, Tolstoy was famous for it, just as Asimov was
>famous for writing high quality first draft at 120wpm and handing it 
>off to his
>2 fulltime personal editors.
>
>>
>>  >For example showing links as caps for easy ascii representation
>>  >
>>  >this is A text with TWO spans linked.
>>  >   now if the addresses were to the v (document addresses) and we
>>  >edited by adding 6xes and a space
>>  >xxxxx thIs is a tEXT with two spans linked.
>>  >Make more sense now?
>>
>>  Oh yeah. I actually figured out this mistake before writing my design
>>  note/query. How do you record the originating document context where
>>  a link was made? I guess by the link's home address. When you make a
>>  new rev of a document you also create new links (though structure is
>>  shared by transclusion). So I guess that tells you what poomfilade to
>>  use to turn the link spans back into v-spans for a particular
>>  document.
>>
>
>Mostly right.  In the current implementation there is no way to record/find
>the originating document context (whatever that means, thus the problem).
>This is a bug, but a bug caused by pureism.  A bug nonetheless.
>
>A related bug in both Green and Gold is that they don't have representations
>of time, because we couldn't find a way that we could believe that 
>was close to
>correct.  Not time zones but special relativity!
>
>>
>>  Are you saying that there's no way to find the POOMs pointing to a
>>  given i-span?
>>
>
>No the spanf finds the pooms that map to the i-span.
>
>>
>>  PS. Can I post this thread to the udanax list?
>
>By all means,  and once some one has reviewed the mailing archives, we
>can put that up too.  I would like to have a tool to allow us to 
>navigate them,
>and make sure that we generate/retain a unique id for each mail so we can
>refer to them later and still make sense.

===



Subject: My Rough Notes re Udanax-Green
From: Jeff Rush <jrush@taupro.com>
Date: Wed, 25 Oct 2000 11:45:26 -0500

I'm working up some diagrams in Gimp to go with this)

In the Beginning was the Docuverse:

The docuverse can be thought of as a infinite number line
with all the information it contains or ever will contain
strung out along the line.  Since the number line is
infinitely divisible, we can insert new material at any
point, similar to how real numbers work on a number line.
We can also refer to a portion of the information, whether
it exists today or will someday, using a (position, width)
tuple.

Arbitrarily we limit the docuverse to occupying the range
of numbers from 1.<0> to 1.<infinity>, which allows us
the convenience of referencing the entire docuverse with
a span of (position 1, width 1) while not limiting
capacity in any real way.

Tumbling thru the Docuverse:

To represent positions along this docuverse line, we use
'tumblers', which are essentially base infinity real numbers,
where carries to/borrows from other digits don't occur.

Consider that if we used based 10, we could only have ten
positions, with ten positions inside each of those, etc.
So if we had, say, eight file cabinets, each with seven
drawers, each with five dividers, things would work out.

But as we store more information, the day comes with we
need more dividers than ten, or more drawers than ten.
Because of our many links between documents, we really
don't want to renumber everything, so we switch to
base 16, and have 16 cabinets of 16 drawers, etc.  This
lets us continue using standard arithmetic to locate
items -- 2nd cabinet, third drawer, fifth divider.

Eventually we fill that, so let's just raise the base
so we can have any number of anything.  That's base
infinity.  Now the rules of arithmetic get funny, because
how can you mathematically have a carry from a digit
when the digit, being potentially infinite, never overflows.

That's tumblers in a nutshell.

Inside Tip: The Udanax-Green software currently uses base
2 ** 32, storing tumblers as an array of 32-bit integers,
because of current CPU architectures.  But when we go to
64-bit CPUs and 2 ** 64 integers, the scheme transitions
transparently.


Nomenclature of the Enfilade:

An Enfilade is a collection of nodes called Crums, arranged in
tree structure.  The single node at the top of the tree is
the root, called the fullcrum.  Nodes at the bottom are called,
naturally enough, BottomCrums.  If a node is not a BottomCrum,
it is an UpperCrum.  The root node is an UpperCrum.

Crums exist in two states: persistent (on disk) and volatile
(in ram).

Persistence Mechanism:

  Each Crum has a dirty boolean and an age integer.  All Crums are
  linked into a global doubly-linked list.  There is a reaping
  process whereby Crums not accessed in a period of time are
  removed from memory, being written to disk if they have been
  modified.

  Crums on disk are addressed by a (block#, byte#) tuple, called
  a DiskLoafAddr.  Once assigned, Crums retain the same address
  on disk and do not move about.

Nature of their Mapping:

  Enfilades are unidirectional, one-to-one mappings, where a
  given key corresponds to a given datum.  The choice of what
  is a key and what is a datum define three types of Enfilades:

  * Grand (granf)    I -> I

    Maps coordinates in the IStream up to but not into the
    VStream of a specific document.  The VStream is represented
    by either (a) a block of original text or (b) the root
    node of a nested POOM enfilade describing a permutation of
    the VStream of this and/or another document that when
    rearranged, windows into this (position, width) gap in the
    IStream.

    The Grand Enfilade is the starting point for front-ends
    reaching into the docuverse.

    Key:
        (a tumbler indicating a width along the IStream)

    Datum:
        one of:

	(a Crum containing a piece of original text)

	or:

	(a block#/byte# tuple pointing to a POOM enfilade)

    NOTE: By summing the width keys of each Crum as we descend
          thru the UpperCrums of the Grand Enfilade seeking a
          position in the docuverse, we can derive our
          displacement within the docuverse and know when we
          have arrived.

    There is only one Grand Enfilade in the Udanax-Green system.

  * Span (spanf)  V -> V

    Maps coordinates from the VStream to elsewhere in the VStream

    Key:
        (a displacement/width tuple in the VStream)

    Datum:
        (a displacement/width tuple elsewhere in the VStream)

    There is only one Span Enfilade in the Udanax-Green system.

    The Span Enfilade is used to answer the questions:
    
    1) Given a range of characters on my screen, determine
       where/in which other documents those characters also appear.

    2) ???

    The spanf finds the pooms that map to the i-span.

  * Poom (poom) [Permutation Of Order Matrix]   I -> V

    Maps coordinates from the VStream to the IStream

    There are many POOM Enfilades in the Udanax-Green system,
    one for every rearrangement of the document content in use.

    Key:
      (an IStream displacement/width tuple)

    Datum:
      (a VStream displacement/width tuple)

    The POOM says, in effect: given that there is a specific
    place and width of information in the docuverse, provide
    a map of how that information appears, in a linear fashion
    within that window, from the various pieces scattered
    elsewhere in the docuverse.

    NOTE: Since POOMs appear under BottomCrums in the Grand
          Enfilade, you'd think a shorter relative IStream
          would suffice.  But by retaining the context of
          fully-qualified IStream address in the key, a
          single POOM can be linked into an arbitrary number
          of Grand BottomCrums.


Coordinate Spaces:

  Enfilades rely upon two types of addresses:

  * a tumbler address representing an unchanging or invariant
    point in the docuverse, called an IStreamAddr.  The IStream
    is the sequence of original data as stored on disk, which
    never changes ordering once written.

  * a tumbler address representing a relative position within
    a document, whose mapping to data shifts about, called a
    VStreamAddr.

  The IStream

  An IStreamAddr is a fully-qualified tumbler broken into up
  to four fields by zero (.0.) digits.  If there are fewer than
  four fields present, the fields are associated with their
  functions left-to-right.

     <node>.0.<account>.0.<document>.0.<element>

     <node>.0.<account>.0.<document>

     <node>.0.<account>

     <node>

  IStreamAddrs are assigned to information or 'babtized' in a
  hierarchical fashion, via the API operations:
  
  * create node representing our server
  * create account within a (node)
  * create document within a (node, account)
  * create version within a (node, account, document)
  * insert/remove/rearrange text

  IStreamAddrs are never deleted, retired or reused.

  The VStream

  The Variant-Stream or VStream is the permutation of data that
  represents a particular version.  VStreamAddrs refer to
  positions within a specific document.  They are the <element> part
  after the:

      <node>.0.<account>.0.<document>

  where the above portion of a tumbler is called the 'docid'.

  The VStream of a document consists of several dataspaces, (a) any
  of which may be empty and (b) directly correspond to specific types
  of BottomCrums in Udanax-Green.

  1) VStream addresses beginning with a '1.n...' *may* represent
     either original text entered into this specific document, *or*
     text from another document that is *windowed* into this one
     at a specific offset/width.

     If it is original text, it is broken up into small pieces and
     stored within TextCrums, on the bottom layer of the Grand
     Enfilade.

     If it is windowed text, a (block#, byte#) tuple pointing to
     the root of a POOM enfilade is stored in an OrglCrum.

     [ORGL means 'link to original' ?]

  2) VStream addresses beginning with a '2.n...' represent links
     that reside within (have as their home document) this specific
     document.  Links may have multiple end-points all or none of
     which may point into this specific document.


Glossary:

The home document is the document within which the data was
originally stored.

A vaddress is an intra-document reference, using the
provided document ID.  A vspan is a tumbler range, within a
single document.  A vspan to cover an entire document,
exclusive of links, is displacement 1.1, length 1.  A span
is a tumbler range, using global or fully-specified tumbler
fields.

There are three transforms: I-to-V and V-to-I and V-to-V


(Clarify how sequential-versions and parallel-versions work.)



The default node is     "1.1".          (node 1)
The default account is  "1.1.0.1".      (account 1)
The default document is "1.1.0.1.0.1".  (document 1)


Explanations of Design Choices:

* Why do IStreamAddr tumblers always start with a 1?

  This "one" refers to the all-encompassing docuverse and
  derives from the fact that all server nodes are descended
  from it.  This convention allows us to refer to the
  entire docuverse with simply a "one" in the first position of a
  tumbler.

* Why is the server node broken out as a separate field in
  an IStreamAddr tumbler?

  ??? so we can link to all documents on a specific server ??? why?

* Why is the account broken out as a separate field in an
  IStreamAddr tumbler?

  So we can link to all documents owned by a group/subgroup.

* Why is the document number broken out as a separate field
  in an IStreamAddr tumbler?

  This allows us to indicate daughter documents and versions easily. 
  This convention allows us to make a link to all versions
  of a document.  [ACTUALLY ONLY TO OWNER-MADE VERSIONS, BUT
  NOT TO NON-OWNER-MADE VERSIONS]

===

Subject: Udanax mailing list archives now online
From: Andrew Pam <xanni@glasswings.com.au>
Date: Fri, 27 Oct 2000 13:09:05 +1100

The Udanax mailing list archives are now online at:
http://www.xanadu.com.au/mail/

Share and enjoy,
		Andrew
===



the rest of The Pile (a partial mailing list archive)

doom@kzsu.stanford.edu