June 1996 Date: Sun, 02 Jun 1996 00:00:01 +0200 From: Gary Hoffman Reply-To: ghoffman@UCSD.EDU Organization: IR/PS, UC San Diego X-Mailer: Mozilla 3.0b4 (Macintosh; I; 68K) MIME-Version: 1.0 To: genweb@UCSD.EDU Subject: GenWeb is a Trademark Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit To GenWeb host sites and developers: Since publishing over two years ago the proposal that lead to the development of the GenWeb network, I have been promoting the concept of a coordinated, standardized method of linking genealogy information on the World Wide Web. I am pleased that others have taken this idea and have added their improvements. This is one of the ways the Internet itself has developed and this mailing list is an aid in promoting advancement in the state of the art of the GenWeb concept. Early on, I realized that some central body should maintain some (loose) reins on the GenWeb network in order to enforce any standards that develop. For that reason, I and some close associates have organized The GenWeb Foundation as a non-profit organization. As yet, our only asset is the trademark on the term GenWeb and the domain name 'genweb.org'. It is planned that only those host sites who conform to an agreed-upon set of certification standards would be permitted to use the mark 'GenWeb' in identifying their site. To protect our rights to the term, we have put the standard label TM on the mark on its first appearance on a WWW page and are seeking official registration of the mark. We are also developing a logo that will have similar protection. We now ask anyone currently using the term GenWeb to indicate that it is a trademark belonging to The GenWeb Foundation. Genealogy database servers currently using the term 'GenWeb" may continue their use pending the development of certification standards. This may seem somewhat formalistic and legalistic, but I have just discovered a genweb.com domain and we need to make clear that we do not intend this term to enter the "public domain" or lose its descriptive value. Trademarks are terms used as adjectives that uniquely identify a good or service in commerce. We often refer to "The GenWeb" but it would be better to use it as a modifier: 'the GenWeb network,' or 'the GenWeb distributed genealogy database on the Internet." Thanks to all for your help. Remember, we are trying to encourage the development of the art, not stifle it. If you have concerns about this issue, please respond to the list, genweb@ucsd.edu. Cheers, Gary -- ************************************************************************* ** *Gary B. Hoffman, Computing Services Manager e-mail: ghoffman@ucsd.edu* *Graduate School of International Relations and Pacific Studies (IR/PS)* *University of California, San Diego (UCSD) voice: (619) 534-1989* *9500 Gilman Dr., La Jolla, CA 92093-0519 USA fax: (619) 534-3939* ************************************************************************* ** Date: Sun, 02 Jun 1996 07:31:29 -0700 To: genweb@UCSD.EDU From: Jeff Murphy Subject: Re: GenWeb is a Trademark I'm not trying to be argumentative here, but there are some points that any group stepping in to claim to represent "GenWeb" is going to have to address to achieve any legitimacy. At 12:00 AM 6/2/96 +0200, Gary Hoffman wrote: >Early on, I realized that some central body should maintain some (loose) >reins on the GenWeb network in order to enforce any standards that develop. But Gary, as far as I can tell, no standards have developed. There has been quite a bit of "discussion", but in the six months I've been participating in GENWEB, no standards have emerged. If there are standards, they remain remarkably well hidden. >For that reason, I and some close associates have organized The GenWeb >Foundation as a non-profit organization. As yet, our only asset is the >trademark on the term GenWeb and the domain name 'genweb.org'. This seems a little after the fact. One cannot create an organization in response to someone else using a similar organization name, and then claim retroactive existence. So, when was the trademark created? What use has been made of it to date? When was the organization created? >It is planned that only those host sites who conform to an agreed-upon set >of certification standards would be permitted to use the mark 'GenWeb' in >identifying their site. Let me cut to the bottom line. In developing the Kentucky Comprehensive Database Project, I have added the GenWeb logo to my county page, and have insisted that it be added to each subsequent page. This was done to indicate our participation in what we understood to be the GenWeb project - which at this time seems to be Gene Stark's index. And maybe John Rigdon's index. I have been asked if there was any problem with using the GenWeb logo, and told everyone that according to my understanding, it was in the public domain. If that is not true, then we need to remove it from our pages. But please bear in mind that we have grown from one county to 71. There are similar projects going on in Arkansas and (I believe) Rhode Island. This is the first project of its kind, and I expect will become a standard. Does that mean that we need to copyright kygenweb-l, the name of our mailing list? One would hope not. >We now ask anyone currently using the term GenWeb to indicate that it is a >trademark belonging to The GenWeb Foundation. Genealogy database servers >currently using the term 'GenWeb" may continue their use pending the >development of certification standards. See, what you're asking is that we agree that, first, the term GenWeb is a trademark. But the term predates the creation of The GenWeb Foundation. And second, you are asking us to agree to a set of certification standards that do not exist. This reminds me of two events, both of which I suspect you recall. There was an organization which offered a certificate in data processing - the CDP, CCP, and maybe a couple of other acronyms. Wanted to create the DP equivilent of the CPA. They insisted that they were "the right man for the job" when it came to certifying everyone else, which would have been fine if it were not for the fact that certification has always been fought by DP professionals (with a few exceptions, mostly those in government jobs), and they couldn't make it stick. The other is the Phil Katz lawsuit, who created a product called pkarc as an alternative product to arc. He was sued because of the similarity in names. So he changed his product name to pkzip, and hardly anyone uses arc these days. Both were unnecessary battles that took care of themselves. >This may seem somewhat formalistic and legalistic, but I have just >discovered a genweb.com domain and we need to make clear that we do not >intend this term to enter the "public domain" or lose its descriptive >value. Ah, but their genweb has to do with genetics. And I believe you will find they have been around for some time. When I was first trying to find GenWeb in the various search engines, they kept popping up. Of course, Gene Stark could always create a "GeneWeb" logo , but I suspect we'd be back where we started. There is no point in creating an adversarial relationship in all of this. We can replace the logo on the KY GenWeb Project pages if we have to, with one our own creation. But whereas there are no apparent existing standards for the GENWEB group, there *are* standards for the KY GenWeb Project. And I can see no reason for us to retroactively authorize a different group to establish and enforce standards, when those we have are working just fine, and creating a viable project. Okay, argue me out of it. :-) Jeff Murphy 735 NW 8th Redmond, Oregon 97756 h. (541) 548-4478 Specializing in the genealogy of Muhlenberg Co., Kentucky Comprehensive KY Genealogy Project: http://www.teleport.com/~jmurphy/ subscribe to PAFHELP-L or KYGENWEB-L at majordomo@teleport.com Date: Sun, 2 Jun 1996 11:41:57 -0400 (EDT) To: genweb@UCSD.EDU From: beaur@CAM.ORG (Denis Beauregard) Subject: Re: GenWeb is a Trademark One comment: this is totally useless. For one, I tried sometimes ago to search for GENWEB using this keyword and I remember many occurences of it in NOT GENEALOGICAL pages, where it stands for GENetics !!! Also, it would protect only in Genealogy field (i.e. you have to define what area is covered by the trade mark). GENWEB.COM (visit their site at www.genweb.com) covers GENETICS not GENEALOGY, and I would think they do so for a while. So, GENWEB alone can't be any more trade marked in a wide enough fashion that will limits it to genealogy on the web. But your protection can be set for genealogy area. Denis ### Denis Beauregard, genealogiste amateur, Internet: beaur@cam.org ### Page web de genealogie: http://www.cam.org/~beaur/gen/index.html ### Genealogy Web page: http://www.cam.org/~beaur/gen/welcome.html ### Sujets: Quebec, France, Acadie, experts francophones, etc. To: genweb@UCSD.EDU cc: "Dr. Brian Leverich" Reply-to: "Dr. Brian Leverich" Subject: Re: GenWeb is a Trademark In-reply-to: Your message of Sun, 02 Jun 1996 07:31:29 PDT. <1.5.4.32.19960602143129.00a5ea50@mail.teleport.com> Date: Sun, 02 Jun 1996 10:19:53 -0700 From: Brian Leverich -- Your message was: (from "Jeff Murphy" quoting "Gary Hoffman") >For that reason, I and some close associates have organized The GenWeb >Foundation as a non-profit organization. As yet, our only asset is the >trademark on the term GenWeb and the domain name 'genweb.org'. This seems a little after the fact. One cannot create an organization in response to someone else using a similar organization name, and then claim retroactive existence. So, when was the trademark created? What use has been made of it to date? When was the organization created? >It is planned that only those host sites who conform to an agreed-upon set >of certification standards would be permitted to use the mark 'GenWeb' in >identifying their site. ------------------ OK, let me preface these remarks by saying I'm *very* old-school Internet, and feel a total revulsion towards anything that smacks of centralization or nonconsensual standardization. There seems to be a continuing problem with genealogists on The Net just not grokking how you get things done around here. All that notwithstanding, I think Gary is pretty much on target trademarking GenWeb for the following reason. The InterNIC, braindead monopoly that it is, has implemented a policy that essentially allows any trademark holder to strip a current domain name holder of a domain name with no costs and only a 1-month delay. There's no effective defense short of a very messy lawsuit. Given current InterNIC policy, anyone who doesn't trademark their domain name is a fool. I'm not a trademark lawyer and I don't even play one on the Internet, but I think one requirement for having a defensible trademark is to have some controls on who may use it under what circumstances. One would hope that Gary would make those controls sufficiently broad in the case of GenWeb to encourage diversity and experimentation. BTW, Gary, you know about Tunisian trademarks, don't you? You can get those puppies in 24 hours, and they offer at least some defense while you wait a year for the U.S. government to do its thing ... Also BTW, it's generally recognized as a moby blunder to not register your domain name in as many TLDs as you can possibly justify under the RFCs. (And yes, I know the RFCs discourage that. Sometimes you gotta do The Wrong Thing ... ) Somebody should be concocting a good story line and getting genweb.net registered pronto before it goes the way of genweb.com ... -B -- Dr. Brian Leverich Co-moderator, soc.genealogy.methods/GENMTD-L RootsWeb Genealogical Data Cooperative leverich@rootsweb.com From: mavrogeorge@genealogysf.com Date: Sun, 2 Jun 1996 10:30:38 -0700 Subject: GENWEB trademark To: genweb@UCSD.EDU X-Mailer: SPRY Mail Version: 04.00.06.21 The concept and the term GENWEB predates Gene Stark's work. Gene's wonderful utility was a response to what GENWEB was proposing. It is unfortunate that Gary has not aggressively policed the use of the term to date but what the heck we are genealogists not trademark specialists. Gary's request that we add a trademark symbol seems understandable and easy to implement. Of course if he had included the HTML code for "trademark" that would have been icing on the cake . I also have been frustrated that no standards have emerged. There was lots of discussion at the beginning about the concept and the vision was well explained and promulgated. What has not happened is the emergence of the "mechanic" to get it implemented. How can we work to now get a mechanism in place? If we continue to explore nuances nothing will happen. There have been lots of very good ideas floated. Gary (and your organizers, which you should identify by the way) what can -I- personally do to move GENWEB from the vision to the implementation? What can other readers do? GENWEB seems to be moving towards stagnation. From: birger@sdata.no (Birger Wathne) Received: by alme (SMI-8.6) id TAA17594; Sun, 2 Jun 1996 19:53:38 +0200 Date: Sun, 2 Jun 1996 19:53:38 +0200 Message-Id: <199606021753.TAA17594@alme> To: genweb@UCSD.EDU, mavrogeorge@genealogysf.com Subject: Re: GENWEB trademark The logo used by most GenWeb sites seems to be the one I created for my own GenWeb software. The GenWeb organization may do as they choose with this logo. So if You wish to use it as the basis for a new 'official' logo, feel free to do so. It should be somewhat redone to look better on light-colored backgrounds. Birger From: birger@sdata.no (Birger Wathne) Received: by alme (SMI-8.6) id TAA17597; Sun, 2 Jun 1996 19:58:20 +0200 Date: Sun, 2 Jun 1996 19:58:20 +0200 Message-Id: <199606021758.TAA17597@alme> To: genweb@UCSD.EDU Subject: My home page is unavailable for some time I know there are lots of links to my home page around, so I just want to warn You all that my home pages are offline for a while. We are restructuring our internal networks, and I have to wait until we get a new external web server before I can get it up and running again. I have developed a new version of my software. Still not quite what I want, but closer.... It is now purely based on lifeLines. No ged2html any more. It is also far easier to install and get running. I'll mail the list again when it's available again. Birger From: birger@sdata.no (Birger Wathne) Received: by alme (SMI-8.6) id TAA17600; Sun, 2 Jun 1996 19:59:47 +0200 Date: Sun, 2 Jun 1996 19:59:47 +0200 Message-Id: <199606021759.TAA17600@alme> To: genweb@UCSD.EDU Subject: Forgot the URL of my temporarily closed pages The URL for my temporarily closed pages is: //www.vest.sdata.no/~birger/GenWeb/ Birger Date: Mon, 3 Jun 96 03:09:40 +0200 From: Anders Andersson Message-Id: <9606030109.AA28930@Mizar.DoCS.UU.SE> To: genweb@UCSD.EDU, mavrogeorge@genealogysf.com Subject: Re: GENWEB trademark [GenWeb project groups: Legal and Ethical Issues, User Interfaces] writes: >Gary's request that we add a trademark symbol seems >understandable and easy to implement. Of course if he had >included the HTML code for "trademark" that would have been icing >on the cake . Are we talking about a Registered Trademark? That would be simply GenWeb® to get the word "GenWeb" followed by an "R" in a circle. However, that assumes that Gary has indeed registered the name. For the more common claimed trademark, I suggest GenWebTM. Since superscripts and subscripts are not yet part of official HTML (currently 2.0), you can't rely on browsers supporting them, but at least NCSA Mosaic and Netscape do. Browsers not supporting the SUP tag will render it as "GenWebTM". -- Anders Andersson, Dept. of Computer Systems, Uppsala University Paper Mail: Box 325, S-751 05 UPPSALA, Sweden Phone: +46 18 183170 EMail: andersa@DoCS.UU.SE From list-relay@UCSD.EDU Sun Jun 2 20:14:12 1996 Received: from UCSD.EDU (mailbox2.ucsd.edu [132.239.1.54]) by fuji.ucsd.edu (8.6.9/8.6.9) with ESMTP id UAA09255 for ; Sun, 2 Jun 1996 20:14:11 -0700 Received: from emout14.mail.aol.com (emout14.mx.aol.com [198.81.11.40]) by UCSD.EDU (8.7.5/8.6.9) with SMTP id UAA18985 for ; Sun, 2 Jun 1996 20:08:44 -0700 (PDT) From: JohnR238@aol.com Received: by emout14.mail.aol.com (8.6.12/8.6.12) id XAA21547 for genweb@ucsd.edu; Sun, 2 Jun 1996 23:08:43 -0400 Date: Sun, 2 Jun 1996 23:08:43 -0400 Message-ID: <960602230842_209124260@emout14.mail.aol.com> To: genweb@UCSD.EDU Subject: Re: GenWeb is a Trademark Let me throw my two cents worth in here. While I have lurked (and sometimes participated) on the GenWeb list for well over a year now, I am not aware that any "standard" has emerged, with the possible exception of Gene Stark's effort which has emerged, not because we as a group set any standards and then decided to have our sites conform, but because Gene took the initiative to experiment with a procedure, and now a good number (I have no idea how many) have followed his procedure. I've personally refrained from using the GEN WEB name on any of my endeavors, primarily because I've looked at The Genealogist's Index to the World Wide Web to be an experiment in itself. Like Jeff, I'm all for participating in a group endeavor. Perhaps it's time to draw up some guidelines now, but then again perhaps not. There's still a LOT of logistical issues to be resolved, before we have a working structure. Jeff's project is one good example of a trial which is succeeding Cliff Manus' is another Gene Stark's is another Michael Cooley's is another I;m sure there are others which I've overlooked here, and for that I apologize. We're all going after different aspects of the same problem - and to a certain degree, all of our efforts have been in different directions to date. Ah, one other area which we at GEN WEB have essentially ignored, but is critical. Cindy's and Lori Hoffman's "Table of Contents" approach to organizing the sites on the WEB for genealogists is probably one of the "high level" tasks that a GEN WEB steering committee needs to address. If we're going to collectively have a working organization here, let's draw up an organization chart, designate committees, set agendas, all that stuff which we computer people detest, but which is necessary if we're going to "herd" these million plus genealogy cats who are now roaming cyberspace. John Rigdon WEB Index Master If you think getting your ducks in a row is tough, just try herding a dozen cats! To: mavrogeorge@genealogysf.com, genweb@UCSD.EDU From: ghoffman@UCSD.EDU (Gary Hoffman) Organization: IR/PS UC San Diego, La Jolla CA 92093-0519 Date: Sun, 02 Jun 1996 22:16:35 PDT Subject: Re: GENWEB trademark mavrogeorge@genealogysf.com writes: Gary's request that we add a trademark symbol seems understandable and easy to implement. Of course if he had included the HTML code for "trademark" that would have been icing on the cake . --- Thanks for the reminder, Brian. Here is what I use on my demo page to superscript the TM properly (at least to my eye). Please try it. GenWebTM The use of the TM merely puts the world on notice that we intend to assert rights to this mark. Full registration gives us those rights. I will look into Brian Leverich's suggestion to seek Tunisian registration. At least we could sue in Tunisia, hey! [Upon fine examination, this might become a Service Mark (or SM), depending on whether our GenWeb project is considered a "good" or a "service." I think SM is too awkward, unfamiliar, and obfascatory, so I'm sticking to TM. Someday, when I become a real lawyer, I may know all the answers.] Cheers, Gary From: JohnR238@aol.com Received: by emout14.mail.aol.com (8.6.12/8.6.12) id GAA08670; Mon, 3 Jun 1996 06:46:32 -0400 Date: Mon, 3 Jun 1996 06:46:32 -0400 Message-ID: <960603064631_548097098@emout14.mail.aol.com> To: leverich@rootsweb.com cc: genweb@UCSD.EDU Subject: Re: GenWeb is a Trademark In a message dated 96-06-02 13:27:33 EDT, Brian Leverich wrote... >Thing ... ) Somebody should be concocting a good story line and getting >genweb.net registered pronto before it goes the way of genweb.com ... -B Actually we should be registering GENWEB.ORG. NETs are reserved for ISP's and COM is taken - and I think the objective is to make the GENEWB a non-profit standards organization. John Rigdon To: genweb@UCSD.EDU From: ghoffman@UCSD.EDU (Gary Hoffman) Organization: IR/PS UC San Diego, La Jolla CA 92093-0519 Date: Mon, 03 Jun 1996 08:51:46 PDT Subject: You are tuned to GenWeb--Do Not Reply Anyone receiving this message is one of over 600 people subscribed to the e-mail mailing list GENWEB. The purpose of this list is to facilitate the development of a linked, worldwide distributed genealogy database. We do NOT generally discuss individual research problems. If this topic is not of interest to you ... here is how to unsubscribe: Send an e-mail message to listserv@ucsd.edu In the body of the message put the words: UNSUB GENWEB That's all! -Do not reply to this message. -Do not send these commands to genweb@ucsd.edu. -Do not send me a message about unsubscribing. Just do it as outlined above. (Note: some people have subscribed with an e-mail address that is no longer valid. If you have trouble unsubscribing, then please e-mail me with your problem.) If you still want to read about the GenWeb project, please point your WWW browser to the URL http://www.genweb.org/genweblist/genweblist.html All current and archived messages are there for your perusal without cluttering your mailbox. Thanks, Gary Date: Tue, 4 Jun 96 22:23:37 +0200 From: Anders Andersson Message-Id: <9606042023.AA02739@Mizar.DoCS.UU.SE> To: genweb@UCSD.EDU, ghoffman@UCSD.EDU Subject: HTML (was: GENWEB trademark) [GenWeb project group: User Interfaces] Gary Hoffman writes: >Thanks for the reminder, Brian. Here is what I use on my demo page to >superscript the TM properly (at least to my eye). Please try it. > >GenWebsize=2>TM Ouch! Please don't do that, it hurts! ;-) Seriously, if you care about proper kerning, the exact size and position of superscripts, and other fine typographical detail, then HTML is the wrong tool for the job. Use PostScript instead. The idea behind HTML is to tag the information with structural elements, and let the user's client software decide exactly how to represent a quoted piece of text, an emphasized word, a table, or a mathematical formula. That idea makes it possible to render the same information on a number of different devices with different capabilities. When you start examining the rendering offered by your particular browser and try to "polish" it by adjusting font sizes and spacing and throwing lots of random HTML elements at your code, you have essentially abandoned the point of using HTML in the first place. Your typographical efforts are lost on anyone using a different browser with different graphical qualities. Even worse, your tricks may make it look unnecessarily bad to someone else. Superscripts and subscripts have existed in handwritten and printed text for centuries, and there seems to be some kind of a consensus among typographers regarding how it should be rendered. Sometimes mathematicians have use for multiple levels of superscripts, but at least they put something at each level (10102). I have never before heard of or seen a four-level superscript with three empty levels, and I have no intuitive idea of what a "proper" typographical rendering of such a construct would be. If you want a free-standing logotype in a particular typeface, use some other tool (such as PostScript) to generate a GIF image, and use it in-line in your HTML code (but I'd discourage you from mixing GIF text with HTML text in the same passage, as you have little control over the appearance of the latter). If you are dissatisfied with the rendering of plain superscripts in your browser, then I'd suggest that you talk to your browser vendor about it. -- Anders Andersson, Dept. of Computer Systems, Uppsala University Paper Mail: Box 325, S-751 05 UPPSALA, Sweden Phone: +46 18 183170 EMail: andersa@DoCS.UU.SE From: "W. Wesley Groleau (Wes)" Subject: Re: PAF Doc Guidelines from SVPAFUG To: bap@goldrush.com Date: Wed, 5 Jun 96 13:40:14 EST Cc: genweb@UCSD.EDU Ann Prescott" at May 28, 96 7:41 am Mailer: Elm [revision: 70.85] :> of PAF I need a guide. I am basically following the PAF Documentation :> Guidelines of the Silicon Valley PAF Users Group. It seems to me that their :> format is clear and could easily become a universal format. I am not a Are these guidelines "on the Web" ? I don't use PAF, but I'd like to look at these since so many people seem to think they are "standard" Date: Wed, 5 Jun 1996 18:34:13 -0500 Message-Id: <2.2.16.19960605183500.2ec78076@connect.net> X-Sender: beau@connect.net X-Mailer: Windows Eudora Pro Version 2.2 (16) Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" To: "W. Wesley Groleau (Wes)" , bap@goldrush.com From: Beau Sharbrough Subject: Re: PAF Doc Guidelines from SVPAFUG Cc: genweb@UCSD.EDU Hi Wes, At 01:40 PM 6/5/96 EST, W. Wesley Groleau (Wes) wrote: >:> of PAF I need a guide. I am basically following the PAF Documentation >:> Guidelines of the Silicon Valley PAF Users Group. It seems to me that their >:> format is clear and could easily become a universal format. I am not a > > Are these guidelines "on the Web" ? I don't use PAF, but I'd like to > look at these since so many people seem to think they are "standard" No Wes, they sell them in hard copy. It's how their group makes money. I can find the address for you, it's only like USD3.00 or so. See you, ----------------------------------------------------------------------- Beau Sharbrough The Aggie Players - beau@connect.net 50 years of theater at http://www.connect.net/beau Texas A&M University From: "W. Wesley Groleau (Wes)" Subject: slightly off-topic humor To: genweb@UCSD.EDU Date: Thu, 6 Jun 96 12:34:52 EST Mailer: Elm [revision: 70.85] Did I say off-topic or off-color? The genweb connection is the last sentence. Extracted from the much longer http://www.emap.com/nww/bofh/bofh8nov.html > The silence is broken by the CEO's PC telling > him he has new mail. I know this has to be from > Personnel (I filtered everything else to /dev/null > earlier lest this message get lost among a flood of > trivia). I excuse myself, reasoning that I probably > couldn't keep a straight face as the CEO inquired > of the FD whether he thought that a director > who employs a crooked consultant who happens to be > married to his sister could possibly stay in office. > > As I sit by my console and gaze out of the window, > I see our ex-FD drop the contents of his ex-desk > all over the car park as Security body-search him > for the keys of his company Jag. On-line registers > of births, deaths and marriages are a wonderful thing ... ------------------------------------------------------------------------- -- W. Wesley Groleau (Wes) Office: 219-429-4923 Magnavox - Mail Stop 10-40 Home: 219-471-7206 Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com ------------------------------------------------------------------------- -- Date: Thu, 06 Jun 1996 11:09:39 -0700 From: Floyd Nordin Organization: Nordin Enterprises X-Mailer: Mozilla 2.01 (Win95; U) MIME-Version: 1.0 To: genweb@UCSD.EDU Subject: Re: PAF Doc Guidelines from SVPAFUG References: <2.2.16.19960605183500.2ec78076@connect.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit > >:> of PAF I need a guide. I am basically following the PAF Documentation > >:> Guidelines of the Silicon Valley PAF Users Group. It seems to me that their > >:> format is clear and could easily become a universal format. I am not a > > > > Are these guidelines "on the Web" ? I don't use PAF, but I'd like to > > look at these since so many people seem to think they are "standard" > We do indeed sell hard copies of our "Documentation Guidelines" at US3.00. This includes shipping. Just send a check to: Silicon Valley PAF Users Group 4417 Pitch Pine Court San Jose, CA. 95136-2410 It will be sent right out to you. It is a guide on how to record in your notes Source Citations of that individuals data etc. We are trying to encourage genealogists to include such data so the reader/viewer/recipient of this data may judge its authenticity. Most of the genealogy info we see on the internet is what we call "Junk Genealogy". Cheers Floyd Nordin, President of the SV-PAF-UG fnordin@slip.net see our Home page at: http://www.rahul.net/svpafug From: "W. Wesley Groleau (Wes)" Subject: Re: PAF Doc Guidelines from SVPAFUG To: genweb@UCSD.EDU Date: Thu, 6 Jun 96 15:58:46 EST Cc: gedcom-l%listserv.nodak.edu@vm1.nodak.edu In-Reply-To: <31B71EE3.96E@slip.net>; from "Floyd Nordin" at Jun 06, 96 11:09 am Mailer: Elm [revision: 70.85] :> > >:> of PAF I need a guide. I am basically following the PAF Documentation :> > >:> Guidelines of the Silicon Valley PAF Users Group. It seems to me that their :> > >:> format is clear and could easily become a universal format. I am not a :> > > Are these guidelines "on the Web" ? I don't use PAF, but I'd like to :> > > look at these since so many people seem to think they are "standard" :> We do indeed sell hard copies of our "Documentation Guidelines" at US3.00. :> It will be sent right out to you. It is a guide on how to record in your notes :> Source Citations of that individuals data etc. We are trying to encourage It is ironic that the same organization that invented the imperfect but certainly usable SOUR tag structure is still promoting a program that does not support it. There is a program that I really like a lot that I no longer use merely because it maintains compatibility with PAF by putting all sources in NOTEs. :> genealogists to include such data so the reader/viewer/recipient of this data :> may judge its authenticity. Most of the genealogy info we see on the internet :> is what we call "Junk Genealogy". This is certainly true. And if you HAVE to use PAF, then this ridiculous notes format is a hundred times better than not citing sources at all. I do not mean to denigrate SV PAF UG. In my view, standardizing the solution to a problem is a Good Thing. I am, however, disappointed in the originators of the problem (FHC), especially since, after coming up with a better way, they continue to promote the old way. -- ------------------------------------------------------------------------- -- W. Wesley Groleau (Wes) Office: 219-429-4923 Magnavox - Mail Stop 10-40 Home: 219-471-7206 Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com ------------------------------------------------------------------------- -- From: TomRaynor@aol.com Received: by emout07.mail.aol.com (8.6.12/8.6.12) id NAA07766 for genweb@ucsd.edu; Fri, 7 Jun 1996 13:21:50 -0400 Date: Fri, 7 Jun 1996 13:21:50 -0400 Message-ID: <960607132149_551426557@emout07.mail.aol.com> To: genweb@UCSD.EDU Subject: Re: PAF Doc Guidelines from SVPAFUG This is a timely discussion. I, too, use PAF as my "primary" database, although I import the data into other products, as well. Now, it seems to me it would be nice to have a program that takes a GEDCOM with PAF-style "tagged notes" (i.e.; "!BIRTH:...") and converts them to the newer GEDCOM SOUR format. The irony of the fact that everybody but the LDS church is getting "on board" with the new GEDCOM format that the LDS church created is not lost on me. However, as has been pointed out, that's the way it is, and a solution is better than a complaint! Anybody know of a program that converts a GEDCOM with PAF "tagged notes" to the new SOUR format? Anybody willing to take on that challenge? Thank you! - Tom From: "W. Wesley Groleau (Wes)" Subject: More on stale links To: genweb@UCSD.EDU Date: Mon, 10 Jun 96 12:17:17 EST Mailer: Elm [revision: 70.85] I know that something like this has already been suggested, but I'm bringing it up again as an example where it actually works. If someone happens to find my data on the Web, and discovers that I have an entry for someone who is the SAME person that they have, here is one way the can create a link to the record: 0 INDI 1 NAME Richard Junior /Groleau/ 1 SOUR Wes Groleau's database 2 CONT Click here to see it Now this link can still get "stale" like any other. However, it will NOT go stale if I change INDI. It can only go stale if I delete the person from the database, change the name, or change the first part of the URL. If I were to add another person of the same exact name, the link would go to a menu from which the person could select which person. If you'd like to try that, remove the " +junior " from the URL. (That's his true middle name, not a misplaced suffix.) The name portion ( " richard+junior%2Fgroleau%2F " ) is NOT case sensitive. The rest of it probably is. Also, wild cards can be used: http://www.genealogy.org/~smcgee/cgi-bin/genweb.cgi/DB=groleau?dbn=r*+j*% 2Fgroleau%2F gets a menu of everyone in my DB with those initials and if only one, gets the actual person. Unfortunately, wild cards do NOT work in the surname. BTW, %2F is ASCII for / so this is basically GEDCOM name format. This will also work for anyone else whose data is accessed by the CGI tools built by Scott McGee and Thomas Wetmore (see http://genealogy.org/~smcgee/genweb/other_db.html) -- ------------------------------------------------------------------------- -- W. Wesley Groleau (Wes) Office: 219-429-4923 Magnavox - Mail Stop 10-40 Home: 219-471-7206 Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com ------------------------------------------------------------------------- -- Subject: Re: More on stale links To: "W. Wesley Groleau" Date: Mon, 10 Jun 1996 19:18:11 +0100 (BST) From: Ben Laurie Cc: genweb@UCSD.EDU In-Reply-To: <9606101719.AA08411@most> from "W. Wesley Groleau" at Jun 10, 96 12:17:17 pm Reply-To: ben@algroup.co.uk X-Mailer: ELM [version 2.4 PL24 PGP2] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID: <9606101918.aa01714@gonzo.ben.algroup.co.uk> W. Wesley Groleau wrote: > > I know that something like this has already been suggested, but I'm > bringing it up again as an example where it actually works. > > If someone happens to find my data on the Web, and discovers that I have > an entry for someone who is the SAME person that they have, here is one > way the can create a link to the record: > > 0 INDI > 1 NAME Richard Junior /Groleau/ > 1 SOUR Wes Groleau's database > 2 CONT Click here to see it > > Now this link can still get "stale" like any other. However, it will NOT > go stale if I change INDI. It can only go stale if I delete the person > from the database, change the name, or change the first part of the URL. > If I were to add another person of the same exact name, the link would > go to a menu from which the person could select which person. > > If you'd like to try that, remove the " +junior " from the URL. > (That's his true middle name, not a misplaced suffix.) > > The name portion ( " richard+junior%2Fgroleau%2F " ) is NOT case sensitive. > The rest of it probably is. > > Also, wild cards can be used: > http://www.genealogy.org/~smcgee/cgi-bin/genweb.cgi/DB=groleau?dbn=r*+j*% 2Fgroleau%2F > gets a menu of everyone in my DB with those initials and if only one, gets the > actual person. Unfortunately, wild cards do NOT work in the surname. > BTW, %2F is ASCII for / so this is basically GEDCOM name format. > > This will also work for anyone else whose data is accessed by the CGI > tools built by Scott McGee and Thomas Wetmore (see > http://genealogy.org/~smcgee/genweb/other_db.html) My plan is to assign each individual a unique ID as they are created. This is robust against name changes, and can be used with a similar mechanism. Cheers, Ben. > > -- > ------------------------------------------------------------------------- -- > W. Wesley Groleau (Wes) Office: 219-429-4923 > Magnavox - Mail Stop 10-40 Home: 219-471-7206 > Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com > ------------------------------------------------------------------------- -- -- Ben Laurie Phone: +44 (181) 994 6435 Freelance Consultant and Fax: +44 (181) 994 6472 Technical Director Email: ben@algroup.co.uk A.L. Digital Ltd, URL: http://www.algroup.co.uk London, England. To: "W. Wesley Groleau" Date: Mon, 10 Jun 1996 19:18:11 +0100 (BST) From: Ben Laurie Cc: genweb@UCSD.EDU In-Reply-To: <9606101719.AA08411@most> from "W. Wesley Groleau" at Jun 10, 96 12:17:17 pm Reply-To: ben@algroup.co.uk X-Mailer: ELM [version 2.4 PL24 PGP2] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID: <9606101918.aa01714@gonzo.ben.algroup.co.uk> W. Wesley Groleau wrote: > > I know that something like this has already been suggested, but I'm > bringing it up again as an example where it actually works. > > If someone happens to find my data on the Web, and discovers that I have > an entry for someone who is the SAME person that they have, here is one > way the can create a link to the record: > > 0 INDI > 1 NAME Richard Junior /Groleau/ > 1 SOUR Wes Groleau's database > 2 CONT Click here to see it > > Now this link can still get "stale" like any other. However, it will NOT > go stale if I change INDI. It can only go stale if I delete the person > from the database, change the name, or change the first part of the URL. > If I were to add another person of the same exact name, the link would > go to a menu from which the person could select which person. > > If you'd like to try that, remove the " +junior " from the URL. > (That's his true middle name, not a misplaced suffix.) > > The name portion ( " richard+junior%2Fgroleau%2F " ) is NOT case sensitive. > The rest of it probably is. > > Also, wild cards can be used: > http://www.genealogy.org/~smcgee/cgi-bin/genweb.cgi/DB=groleau?dbn=r*+j*% 2Fgroleau%2F > gets a menu of everyone in my DB with those initials and if only one, gets the > actual person. Unfortunately, wild cards do NOT work in the surname. > BTW, %2F is ASCII for / so this is basically GEDCOM name format. > > This will also work for anyone else whose data is accessed by the CGI > tools built by Scott McGee and Thomas Wetmore (see > http://genealogy.org/~smcgee/genweb/other_db.html) My plan is to assign each individual a unique ID as they are created. This is robust against name changes, and can be used with a similar mechanism. Cheers, Ben. > > -- > ------------------------------------------------------------------------- -- > W. Wesley Groleau (Wes) Office: 219-429-4923 > Magnavox - Mail Stop 10-40 Home: 219-471-7206 > Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com > ------------------------------------------------------------------------- -- -- Ben Laurie Phone: +44 (181) 994 6435 Freelance Consultant and Fax: +44 (181) 994 6472 Technical Director Email: ben@algroup.co.uk A.L. Digital Ltd, URL: http://www.algroup.co.uk London, England. From: "W. Wesley Groleau (Wes)" Subject: Re: More on stale links To: genweb@UCSD.EDU Date: Mon, 10 Jun 96 14:38:03 EST Cc: genweb@pseserv3.fw.hac.com In-Reply-To: <9606101918.aa01714@gonzo.ben.algroup.co.uk>; from "Ben Laurie" at Jun 10, 96 7:18 pm Mailer: Elm [revision: 70.85] :> My plan is to assign each individual a unique ID as they are created. This is :> robust against name changes, and can be used with a similar mechanism. I think it has also been already pointed out that this approach has certain prerequisites: 1. You must never create an individual under this scheme unless you are absolutely certain that this person will NEVER be found to be the same as someone else. 2. You must never merge data from someone else until you verify that NO person in the source database duplicates one in the destination. 3. You must never discover that anyone in your database is actually two people. 4. If you discover duplicates or persons who are really two persons, you must not fix it. 5. You must not use any genealogical program that wants to control the assignment of such IDs. Or risk violating any of the above. In other words, this approach is just as imperfect as mine. The difference is that they don't get stale under the same circumstances. ------------------------------------------------------------------------- -- W. Wesley Groleau (Wes) Office: 219-429-4923 Magnavox - Mail Stop 10-40 Home: 219-471-7206 Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com ------------------------------------------------------------------------- -- From: "W. Wesley Groleau (Wes)" Subject: Re: More on stale links To: genweb@UCSD.EDU Date: Mon, 10 Jun 96 15:42:04 EST Mailer: Elm [revision: 70.85] :> My plan is to assign each individual a unique ID as they are created. This is :> robust against name changes, and can be used with a similar mechanism. I think others have already pointed out that this approach has certain prerequisites: 1. You must never create an individual under this scheme unless you are absolutely certain that this person will NEVER be found to be the same as someone else. 2. You must never merge data from someone else until you verify that NO person in the source database duplicates one in the destination. 3. You must never discover that anyone in your database is actually two people. 4. If you discover duplicates or persons who are really two persons, you must not fix it. 5. You must not use any genealogical program that wants to control the assignment of such IDs. Or risk violating any of the above. In other words, this approach has as many restrictions as mine. Neither works if its assumptions are not true. The difference is what the actual assumptions are, i.e., under which circumstances the links get stale.. ------------------------------------------------------------------------- -- W. Wesley Groleau (Wes) Office: 219-429-4923 Magnavox - Mail Stop 10-40 Home: 219-471-7206 Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com ------------------------------------------------------------------------- -- Date: 10 Jun 96 17:43:15 EDT From: N Oughtibridge <100020.1117@CompuServe.COM> To: GENWEB List Subject: Re: re More on stale links Message-ID: <960610214315_100020.1117_EHV147-1@CompuServe.COM> I'm glad that Wes has raised the issue of links again. Stale links are a problem however there are only two abstract solutions a) the people on both sides of the link agree to talk to each other to manage the links b) we pay someone else to manage it for us (ie a link clearing house) Solution a) suits a small community, b) suits the larger one (and a commercial organisation) I am worried that we are moving towards including HTML in GEDCOM without saying that's what we are doing. Any program other than a web browser will be somewhat confused by the embedded HTML (or should I say the user will be!) I prefer the idea of a tag, or multimedia link type, of HTML followed by the URL. The structure of the URL can be wholly dependant on the destination system. What matters is that someone using a system can point to someone elses data. As for when can the tag be used - I would suggest anywhere a SOUR tag can be used (which includes in a SOURce structure. Enough of my thoughts - how are people representing the links at the moment in GEDCOM - can some people send be brief samples of how they are holding links in their GEDCOM, so that I can have my program, uFTi, in step with the rest of GENWEB. ______________ Nicholas Oughtibridge is the author of uFTi, a Windows program to generate World Wide Web pages from GEDCOM files. See HTTP://ourworld.compuserve.com/homepages/oughtibridge Email 100020.1117@compuserve.com Date: Tue, 11 Jun 96 02:22:49 +0200 From: Anders Andersson Message-Id: <9606110022.AA24328@Mizar.DoCS.UU.SE> To: GENWEB@UCSD.EDU Subject: External references Cc: 100020.1117@compuserve.com [GenWeb project groups: Resource Identifiers, User Interfaces] Nicholas Oughtibridge writes: >I prefer the idea of a tag, or multimedia link type, of HTML followed by the >URL. The structure of the URL can be wholly dependant on the destination >system. What matters is that someone using a system can point to someone elses >data. I'm reiterating an earlier point of mine here by fully supporting your view. There is no need to include HTML code in GEDCOM records, and any external references should be as simple and well-defined as possible, without wiring lots of volatile URL elements into the very database itself. [A few minor points on the terminology: Don't involve the acronym "HTML" at all. It means "HyperText Meta-Language" and refers only to the common syntax for writing hypertext documents. The URL (Uniform Resource Locator) standard is entirely separate from the concept of HTML, although URLs are often used within HTML files. However, many documents referred to by URLs are in no way HTML documents. Also, the term URL should not be redefined to imply just any kind of "reference string"; a URL has a very precise syntax.] We should determine what elements are needed to uniquely identify a person, a location, an event, a source, or any other kind of object which we may want to establish external (universal) links to. Things like access methods (HTTP or FTP), server domain names (GenWeb.Org or IBM.COM), port numbers, paths and filename suffixes are irrelevant to the genealogist, and should be kept apart from the genealogical database in order to more easily implement changes in the physical infrastructure (like a database moving from one server to another). To me, a simple two-level "hierarchy" with a set of named databases, each containing a set of named objects, may look sufficient. I can imagine a million databases containing, on the average, a million atomic objects each. That would make a trillion (10^12) objects, presumably sufficient to cover the entire population of the Earth a few times over (although not if you want to represent each time they visit the marketplace as a separate event object). I'm in no way suggesting that we put a hard limit where it isn't necessary; I only mention these numbers in order to estimate the feasibility of various implementations. I can grow either number by a factor 1,000 if you like. Does anyone see a problem with having a flat namespace of 10^6 (or 10^9) databases? I do, but I feel I can't put my finger on it without a convoluted discussion of Internet protocol design and implementation case studies, and I'd very much appreciate the input from more experienced Internet protocol hackers here. What it may lead up to, is the idea of a database hierarchy with two (or more) levels above the individual object records. Should we thus define a generic multi-level syntax like HUSB @species/nation/family:person@ for an external link to the father in a FAM record, or should we settle for a simpler HUSB @database:person@ variant, assuming that the we can maintain a central registry of a billion named databases? Also, do we want the same syntax used for all kinds of objects, or do we want to provide each kind with an identifying tag (like @database:INDI/person@)? These examples may seem far out, but I think the actual numbers don't really matter; the same problems will have to be solved on any scale, and there is nothing to be gained from intentionally planning for a limited solution that by necessity will have to be replaced one day or another. Once this definition is in place, it should be a small matter of programming to translate this generic object reference into a URL or some other reference structure that is understood by existing software. --- There are two issues which have been discussed under the "stale link" subject. One is that of resolving links to servers which have gone out of operation for technical, economical or political reasons. I believe this issue is partly addressed by my suggestions above. Another is that of resolving links to ambiguous objects, such as an individual record that, due to further research, has been split in two, or links to objects which have been deleted due to their lack of relevance to real world facts. This is a much more difficult problem of Information Quality, which I haven't addressed at all, and which needs to be addressed separately. If others do that, I think I'll concentrate on the URLs and other Resource Identifiers... -- Anders Andersson, Dept. of Computer Systems, Uppsala University Paper Mail: Box 325, S-751 05 UPPSALA, Sweden Phone: +46 18 183170 EMail: andersa@DoCS.UU.SE From list-relay@UCSD.EDU Mon Jun 10 18:14:17 1996 Received: from UCSD.EDU (mailbox1.ucsd.edu [132.239.1.53]) by fuji.ucsd.edu (8.6.9/8.6.9) with ESMTP id SAA11007 for ; Mon, 10 Jun 1996 18:14:16 -0700 Received: from none.at.helo (desiree.teleport.com [192.108.254.21]) by UCSD.EDU (8.7.5/8.6.9) with ESMTP id SAA22667 for ; Mon, 10 Jun 1996 18:08:04 -0700 (PDT) Received: from jmurphy.teleport.com (ip-bend1-23.teleport.com [206.163.116.55]) by desiree.teleport.com (8.7.5/8.7.3) with SMTP id SAA10727; Mon, 10 Jun 1996 18:07:40 -0700 (PDT) Message-Id: <1.5.4.32.19960611010730.006df840@mail.teleport.com> X-Sender: jmurphy@mail.teleport.com X-Mailer: Windows Eudora Light Version 1.5.4 (32) Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Mon, 10 Jun 1996 18:07:30 -0700 To: N Oughtibridge <100020.1117@compuserve.com> From: Jeff Murphy Subject: Re: re More on stale links Cc: genweb@UCSD.EDU, kygenweb-l@teleport.com At 05:43 PM 6/10/96 EDT, N Oughtibridge wrote: >I prefer the idea of a tag, or multimedia link type, of HTML followed by the >URL. The structure of the URL can be wholly dependant on the destination >system. What matters is that someone using a system can point to someone elses >data. I fully agree, and have for some time proposed the use of a LINK tag in the notes which would tie us to a symbolic name for the second database. Unfortunately, although that would take us to the database, it will not give us the name of the individual in that database. So how about structuring the LINK command like LINK symbolic database name, surname, given name(s), birthdate, birthplace This assumes that the surname, given names, or birth information may be different. If those fields are blank, then the data can be picked up from the database fields. The symbolic name can be associated with a URL via a simple html form, where individuals with databases on the web can call and enter their symbolic name of choice. If a name is already selected, the person will have to choose another one. And provision can be made to update changes in the URL there, and to provide this information to those running the software which generates the html. The question is: what will give us enough information so that we can generate html directly from the gedcom file which will get us a direct link to the same individual in another database at another site? If this will do it, I've got 83 counties online in Kentucky who would like to try it. We have just been waiting for someone to come along who will provide us with the linkage facility we need. If you are willing to put it in, I think you will see a lot of people suddenly interested in your software. To date the other authors of html generators have chosen not to implement such a feature. >Enough of my thoughts - how are people representing the links at the moment in >GEDCOM - can some people send be brief samples of how they are holding links in >their GEDCOM, so that I can have my program, uFTi, in step with the rest of >GENWEB. At the moment, no one is providing support for links, unless the users work through a bunch of jury-rigged file setups. I have seen your program output on your home page, and have a few questions about the program. Does it generate html pages statically or dynamically? Is it available for download from your page? (I could not find it.) Are you planning to alter the output to be user defined, so that if we want a standard pedigree chart or descendant chart, we can get one? Do you support PAF gedcoms? Are user-defined backgrounds an option? If the pages are generated statically, have you considered trying to store the pages in compressed format, and only extracting them for display when they are called for? Jeff Murphy 735 NW 8th Redmond, Oregon 97756 h. (541) 548-4478 Specializing in the genealogy of Muhlenberg Co., Kentucky Comprehensive KY Genealogy Project: http://www.teleport.com/~jmurphy/ subscribe to PAFHELP-L or KYGENWEB-L at majordomo@teleport.com From: "W. Wesley Groleau (Wes)" Subject: Re: External references and stale links To: genweb@UCSD.EDU Date: Mon, 10 Jun 96 22:31:09 EST In-Reply-To: <9606110022.AA24328@Mizar.DoCS.UU.SE>; from "Anders Andersson" at Jun 11, 96 2:22 am Mailer: Elm [revision: 70.85] OK, I have THE solution: Step One: Wait until the Human Genome Mapping Project is complete. Step Two: Invent a GEDCOM tag GENOME which is followed by the uuencoded gzipped map of that person's chromosomes. Step three: Write a spider (webcrawler or whatever) that searches for these maps and stores them. Put a CGI tool on the same machine that can accept such a GENOME and return a menu of other researcher's URLs to the person with the same genes. But what about persons who are no longer alive for us to map their genes? Step Four: We need a program that can compare the genetic maps of a large group of people and by identifying similarities between them and thei mapped ancestors, deduce the maps of their unmapped ancestors. The program can then repeat the process with the deduced data to keep working backward. Generation by generation, eventually we will have genetic maps for everyone who ever lived. Step five: Start speculating on names and birth dates to assign to these genetic maps. -- ------------------------------------------------------------------------- -- W. Wesley Groleau (Wes) Office: 219-429-4923 Magnavox - Mail Stop 10-40 Home: 219-471-7206 Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com ------------------------------------------------------------------------- -- Date: Mon, 10 Jun 1996 23:40:06 -0600 From: smcgee@sol.slcc.edu (Scott McGee (Personal)) Message-Id: <9606110540.AA14122@sol.slcc.edu.> To: ben@algroup.co.uk, wwgrol@pseserv3.fw.hac.com Subject: Re: More on stale links Cc: genweb@UCSD.EDU Ben, I played with the idea of assigning each person a unique ID, but ran into this problem: I load a gedcom into a database. Latter, after much changes, I am given a new gedcom for the same database. How do I get the same ID's for people who are common to both, but not assign the same ID to similare but different people who are each in different versions of the database. For instance, if Bill Johnson was found to be the same person as William Johnson, but found to have a son named bill, both databases would have a Bill Johnson, but they would NOT be the same person. I could never solve the problem despite bringing it up here many times. I suspect that Wes' proposed solution, that makes use of my search query, is likely the best possible solution for my GenWeb implementation as it is now. Oh, "Hi again, Everyone, it is great to be back!" (I accepted a job here in Utah as the Webmaster for the Salt Lake Community College, got dumped by my former employer when I gave them my two weeks notice, and thus lost my access for a time. It has been a very busy seven weeks since I started my new job, and couldn't resubscribe here until just yesterday. Someone want to fill me in on any exciting stuff I missed in the last three months? I notice that we still get unsubs on the list quite regularly!) Scott GENEALOGY | Do you know who your ancestors are? | Scott McGee -----------+---------------------------------------+--------------------- email: smcgee@genealogy.org | What? Me speak for web: http://genealogy.org/~smcgee/homepage.html | someone else? Nah! ---------------------------------------------------+--------------------- See my genealogy page at http://genealogy.org/~smcgee and my GenWeb page at http://genealogy.org/~smcgee/genweb Subject: Re: External references To: Anders Andersson Date: Tue, 11 Jun 1996 09:45:55 +0100 (BST) From: Ben Laurie Anders Andersson wrote: > Does anyone see a problem with having a flat namespace of 10^6 > (or 10^9) databases? I do, but I feel I can't put my finger on > it without a convoluted discussion of Internet protocol design > and implementation case studies, and I'd very much appreciate > the input from more experienced Internet protocol hackers here. The problem with a flat namespace (especially one of that size) is that there has to be someone to allocate names within it. The traditional solution to this is a hierarchical namespace, like DNS. Since DNS is already widely supported it makes sense to piggyback the database names on it. Cheers, Ben. -- Ben Laurie Phone: +44 (181) 994 6435 Freelance Consultant and Fax: +44 (181) 994 6472 Technical Director Email: ben@algroup.co.uk A.L. Digital Ltd, URL: http://www.algroup.co.uk London, England. Subject: Re: More on stale links To: Scott McGee Date: Tue, 11 Jun 1996 11:36:00 +0100 (BST) From: Ben Laurie Cc: ben@algroup.co.uk, wwgrol@pseserv3.fw.hac.com, genweb@UCSD.EDU In-Reply-To: <9606110540.AA14122@sol.slcc.edu.> from "Scott McGee" at Jun 10, 96 11:40:06 pm Reply-To: ben@algroup.co.uk X-Mailer: ELM [version 2.4 PL24 PGP2] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID: <9606111136.aa03502@gonzo.ben.algroup.co.uk> Scott McGee wrote: > > Ben, > > I played with the idea of assigning each person a unique ID, but ran into this > problem: > > I load a gedcom into a database. Latter, after much changes, I am given a new > gedcom for the same database. How do I get the same ID's for people who are > common to both, but not assign the same ID to similare but different people > who are each in different versions of the database. For instance, if Bill > Johnson was found to be the same person as William Johnson, but found to have > a son named bill, both databases would have a Bill Johnson, but they would > NOT be the same person. If we assume that each database has a unique ID and each person within each database has an ID which is locally unique (that is, unique within the database), then the problem is one of mapping, surely? To take your example, lets say that Bill Johnson in database A has ID 1 (for the sake of simplicity), and that William and Bill Johnson in database B have IDs 1 and 2. Then, when it is realised that the first Bill corresponds to William a mapping must be established which says "A1 <-> B1" (and therefore B2 is A1's son). This needs to be done by the "owners" of the two databases, of course, and I suppose need not be mutual. That is, the owner of A can say "A1 -> B1" but B may decide not to (he doesn't believe it, can't be bothered, or whatever). What we need is a standard way to say this. Or am I misunderstanding? Are you assigning the IDs when you import from GEDCOM? If so, there is clearly a problem. They need to be assigned before the GEDCOM is exported from whichever package maintains the source data. > > I could never solve the problem despite bringing it up here many times. I > suspect that Wes' proposed solution, that makes use of my search query, is > likely the best possible solution for my GenWeb implementation as it is now. > > Oh, "Hi again, Everyone, it is great to be back!" (I accepted a job here in > Utah as the Webmaster for the Salt Lake Community College, got dumped by my > former employer when I gave them my two weeks notice, and thus lost my access > for a time. It has been a very busy seven weeks since I started my new job, > and couldn't resubscribe here until just yesterday. Someone want to fill me > in on any exciting stuff I missed in the last three months? I notice that we > still get unsubs on the list quite regularly!) Usually shortly after anyone says anything ;-) Cheers, Ben. > > Scott > > GENEALOGY | Do you know who your ancestors are? | Scott McGee > -----------+---------------------------------------+--------------------- > email: smcgee@genealogy.org | What? Me speak for > web: http://genealogy.org/~smcgee/homepage.html | someone else? Nah! > ---------------------------------------------------+--------------------- > See my genealogy page at http://genealogy.org/~smcgee > and my GenWeb page at http://genealogy.org/~smcgee/genweb -- Ben Laurie Phone: +44 (181) 994 6435 Freelance Consultant and Fax: +44 (181) 994 6472 Technical Director Email: ben@algroup.co.uk A.L. Digital Ltd, URL: http://www.algroup.co.uk London, England. Date: 11 Jun 96 08:06:39 EDT From: N OUghtibridge <100020.1117@CompuServe.COM> To: Jeff Murphy Cc: GENWebbers Subject: Re: re More on stale links Message-ID: <960611120638_100020.1117_EHV56-2@CompuServe.COM> Jeff Murphy asked the following questions about uFTi Does it generate html pages statically or dynamically? Statically, - it's a windows program - although if I had an NT webserver I could create them dynamically through ISAPI. Is it available for download from your page? Sadly not - I am hoping to put it on an FTP site. Currently it's only on CompuServe. I could mail you a copy shortly however I am upgrading it to have a 16bit and 32bit version from the same code. Are you planning to alter the output to be user defined, so that if we want a standard pedigree chart or descendant chart, we can get one? You tell me what you want and I will code it! (within reason) - the data is relational using the access JET engine so I can do most things. I have a good understanding of HTML tables. Do you support PAF gedcoms? I use Brother's keeper which I think is PAF compatible(ish). I have currently got problems loading in sources which link to a single source. I will address this sometime. Gedcom tags are not fully hard coded, that is additional variants on NOTE such as OCCU etc can be added. Why not email me a short GEDCOM (say 100 people) and I'll let you know how I get on. Are user-defined backgrounds an option? Yes - any GIF or JPEG at the moment although I will work on single colours. The ability to view the GIF before generation will be lost due to the conversion. If the pages are generated statically, have you considered trying to store the pages in compressed format, and only extracting them for display when they are called for? In short no - can you elaborate Thank's for your feedback and questions Nicholas ______________ Nicholas Oughtibridge is the author of uFTi, a Windows program to generate World Wide Web pages from GEDCOM files. See HTTP://ourworld.compuserve.com/homepages/oughtibridge Email 100020.1117@compuserve.com Date: 11 Jun 96 08:06:45 EDT From: N OUghtibridge <100020.1117@CompuServe.COM> To: GENWebbers Subject: Re: re External references Message-ID: <960611120645_100020.1117_EHV56-3@CompuServe.COM> Anders Anderrson discussed links. You've persuaded me, the link should be LINK not HTML. HTML is too restrictive. In GEDCOM terms, I still think it is a multimedia type. I think in fact URL is also too restrictive. It might me more appropriate in some cases to link directly to a contact name and postal address, or to a phone number. One thing that those of us looking at generating Web Pages in Hypertext Markup Language will require is to know when the link is to an object which can be pointed to by an URL. I still think that for direct web links, the code should be the URL. I accept that it is fluid and will move but as soon as anything in the link moves the whole link needs re-writing. The best we can achieve is to keep the domain name constant by using the genweb.org domain name to the full. That way if a database moves from one server to another we can keep the links alive. It pushes towards using a virtual server name as the beginning of the path. In short I would see a link to one of my pages becoming something like http://oughtibridge.genweb.org/~oughtibridge/myfile for a static file and http://oughtibridge.genweb.org/path_through_cgi_to_the_generated_page for a dynamic file ______________ Nicholas Oughtibridge is the author of uFTi, a Windows program to generate World Wide Web pages from GEDCOM files. See HTTP://ourworld.compuserve.com/homepages/oughtibridge Email 100020.1117@compuserve.com From: "W. Wesley Groleau (Wes)" Subject: Re: re More on stale links To: genweb@UCSD.EDU Date: Tue, 11 Jun 96 10:20:33 EST In-Reply-To: <960610214315_100020.1117_EHV147-1@CompuServe.COM>; from "N Oughtibridge" at Jun 10, 96 5:43 pm Mailer: Elm [revision: 70.85] :> problem however there are only TWO abstract solutions :> a) the people on both sides of the link agree to talk to each other to manage :> the links :> b) we pay someone else to manage it for us (ie a link clearing house) :> Solution a) suits a small community, b) suits the larger one (and a commercial :> organisation) 3. Each database includes an explanation of the "best" way to link to it. For example, I could put a version of my recent post in an HTML file which might also include my disclaimer ("ASSUME NOTHING; VERIFY EVERYTHING) and figure out how to include a link to it on each generated page. :> that's what we are doing. Any program other than a web browser will be somewhat :> confused by the embedded HTML (or should I say the user will be!) Not if it is the variable part of a defined tag. -- ------------------------------------------------------------------------- -- W. Wesley Groleau (Wes) Office: 219-429-4923 Magnavox - Mail Stop 10-40 Home: 219-471-7206 Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com ------------------------------------------------------------------------- -- Date: Tue, 11 Jun 1996 11:42:55 -0700 To: MattHBrown@aol.com From: Jeff Murphy Subject: Re: re More on stale links Cc: genweb@UCSD.EDU I hope you don't mind my responding in the mailing list, as I know there will be others interested in what you have to say. At 02:45 AM 6/11/96 -0400, MattHBrown@aol.com wrote: ><< LINK symbolic database name, surname, given name(s), birthdate, birthplace > >> > >I got your message via genweb. On my pages, I have a CGI-based search engine >that accepts parameters for surname, given names, born before, born after, >and birthplace. Your link proposal would fit very well with this. I >envision this working as follows: > >CREATING LINKS > >1) My CGI script that generates web pages would see the LINK data in my >GEDCOM file >2) The script would look up the database name on some remote server to >resolve it to a URL >3) The script would create HTML code on the page to make the URL a hot-link. > The only problem here is that different people may want to structure the >name and date parameters different ways. Maybe the server that looks up the >database name also accepts the name and date parameters in a uniform format >and returns the complete URL as a result? > >FOLLOWING LINKS > >1) A remote system tries to link to one of my pages. This would call my >search.cgi script with name and/or date parameters. >2) If search.cgi yields only one match, then I should generate the data/page >for the person that matches. (my script doesn't do this part yet - it always >goes to step 3, but it wouldn't be much of a stretch to add this >functionality) >3) If search.cgi yields multiple matches, then I display a page with a list >of matches and let the user select the match they are interested in. > >I think this could work very well. The trickiest part will be getting >someone to build the database name server. The amount of data stored there >should be relatively small - just one entry per genealogy web site. The >tough part will be making it flexible enough so that everyone will want to >use it. > >Does this sound right to you? > >Matt Brown >Houston, TX >http://www.genealogy.org/~mbrown This sounds like it would solve the problem, if it can be implemented. I think that maybe we've been unable to do this because we've been thinking in terms of a one-step approach. Based on what you wrote, I think we need a multi-step approach. To clarify it in my own mind, I'm going to list them: 1. The individual gedcom owner places the LINK command in his notes. We will assume that there may be multiple databases where the same individual exists, as in the various royal databases. 2. There has been created a table of database names and their URLs. 3. The html generator reads the gedcom, finds a LINK command, hits against the table of database names to get the URL, and according to your method above goes out against the actual URL, finds the individual or individuals referenced, and generates the html. Now, having received your message and slept on it awhile, I think that maybe the hard part is in #3. It's too time-intensive. It requires the person who has the gedcom to hit various databases. Think of those few who support a number of databases. Plus, it complicates the html program needlessly. What the html generator needs is a simple command embedded in the gedcom that tells it to create a link. With that in mind, I have a proposal for a gedcom pre-processor. The gedcom pre-processor would have the following jobs: 1. Read the gedcom file, searching for the LINK command in the notes. When found, it a. deletes the LINK command from the notes b. resolves the symbolic database information c. creates a gedcom LINK entry after the notes with the correct URL d. saves the necessary information to a temp file 1) individual data 2) URL 3) LINK line number 2. Sort the URL data 3. Dial for each connection and find the match(es). Since the URL data is sorted, it can find all the matches at one time at a single gedcom site, thus reducing the amount of connect time, which will still be considerable. There are some other considerations here, like what do you do with an error 404, but these can be addressed later. a. as each match is found, the unique Record Identification Number (RIN) can be captured and added to the temp file 4. Merge the matches back into the link file a. read the gedcom file using the LINK line number as a relative record number b. update the data by adding the RIN to the line Now, each html program will only have to read the LINK command and generate the code to go to that URL with that RIN. We've taken the burden off the html generator and placed it where it lies: with the user. And we are back to square one, because any change in the gedcom to which we link will continue to cause us to reprocess our gedcom. :-) So, that is the weakness in all of this, and the one we seem to spend all our time to get around. Yet my solution is so elegant that I thought it best to share. Jeff Murphy 735 NW 8th Redmond, Oregon 97756 h. (541) 548-4478 Specializing in the genealogy of Muhlenberg Co., Kentucky Comprehensive KY Genealogy Project: http://www.teleport.com/~jmurphy/ subscribe to PAFHELP-L or KYGENWEB-L at majordomo@teleport.com Date: Tue, 11 Jun 1996 23:30:04 -0600 From: smcgee@sol.slcc.edu (Scott McGee (Personal)) Message-Id: <9606120530.AA15845@sol.slcc.edu.> To: ben@algroup.co.uk Subject: Re: More on stale links Cc: genweb@UCSD.EDU, wwgrol@pseserv3.fw.hac.com Ben replies to my statements saying that asignments of stable ID must happen in the database where the info is maintained, and then written into the GEDCOM that is read into a GenWeb database. This all sounds good, but I don't beleive we have a mechanism in GEDCOM to encode that that is usefull to many genealogy progrms. Sure, with LifeLines, I can create an encoding method, and using a tag starting with a '_', I can even make it legal. Brother's keeper or PAF or any other program will fail to support it, however. If another program does have a way to do such, what are the chances that any other program will support it. Some of my databases are PAF, some are from a number of other programs, and a few even from Lifelines, but there is nothing I know of in common with all these that would allow the assigning of stable IDs. Hey, I'd love to find out I was wrong, but I have discussed it here before and nobody then had any good solutions. Scott GENEALOGY | Do you know who your ancestors are? | Scott McGee -----------+---------------------------------------+--------------------- email: smcgee@genealogy.org | What? Me speak for web: http://genealogy.org/~smcgee/homepage.html | someone else? Nah! ---------------------------------------------------+--------------------- See my genealogy page at http://genealogy.org/~smcgee and my GenWeb page at http://genealogy.org/~smcgee/genweb Subject: Re: More on stale links To: Scott McGee Date: Wed, 12 Jun 1996 09:32:38 +0100 (BST) From: Ben Laurie Scott McGee wrote: > > Ben replies to my statements saying that asignments of stable ID must happen > in the database where the info is maintained, and then written into the > GEDCOM that is read into a GenWeb database. This all sounds good, but I don't > beleive we have a mechanism in GEDCOM to encode that that is usefull to many > genealogy progrms. Sure, with LifeLines, I can create an encoding method, and > using a tag starting with a '_', I can even make it legal. Brother's keeper or > PAF or any other program will fail to support it, however. If another program > does have a way to do such, what are the chances that any other program will > support it. > > Some of my databases are PAF, some are from a number of other programs, and > a few even from Lifelines, but there is nothing I know of in common with all > these that would allow the assigning of stable IDs. Hey, I'd love to find out > I was wrong, but I have discussed it here before and nobody then had any good > solutions. I suppose it depends what you mean by a good solution. I agree, its a drag that a "legitimate" GEDCOM solution can't be found - but surely every package allows some free text? If so, the obvious solution is to use some markup in some free text field in those packages that can't do it properly. I know this is horrible. I suppose the way to make it less horrible is to have a GEDCOM translation package which turns marked-up GEDCOM into legitimate GEDCOM and vice versa. Cheers, Ben. > > Scott > > GENEALOGY | Do you know who your ancestors are? | Scott McGee > -----------+---------------------------------------+--------------------- > email: smcgee@genealogy.org | What? Me speak for > web: http://genealogy.org/~smcgee/homepage.html | someone else? Nah! > ---------------------------------------------------+--------------------- > See my genealogy page at http://genealogy.org/~smcgee > and my GenWeb page at http://genealogy.org/~smcgee/genweb -- Ben Laurie Phone: +44 (181) 994 6435 Freelance Consultant and Fax: +44 (181) 994 6472 Technical Director Email: ben@algroup.co.uk A.L. Digital Ltd, URL: http://www.algroup.co.uk London, England. Date: Wed, 12 Jun 96 15:24:54 +0200 From: Anders Andersson Message-Id: <9606121324.AA24045@Mizar.DoCS.UU.SE> To: 100020.1117@compuserve.com, GENWEB@UCSD.EDU Subject: Re: re External references Nicholas Oughtibridge writes: >I think in fact URL is also too restrictive. It might me more appropriate in >some cases to link directly to a contact name and postal address, or to a phone >number. We may be talking about different uses for the links. I was thinking of an implementation of external ("network" in GEDCOM lingo) links used in analogy with links internal to the database, that would enable seamless navigation within as well as between databases. In that case, a contact name and a postal address doesn't seem useful. However, you are quite right that we need a generic mechanism to provide references to arbitrary kinds of resources, including a mere postal address. The current URL specification doesn't include postal addresses, though I see no fundamental reason to why it shouldn't be extended in the future to cover also "off-line resources" like addresses, printed books, archived manuscripts, and physical tombstones. The idea is that any reference to an information resource can be expressed in a standard notation, thus enabling software to deal with the reference itself, even if your computer is unable to actually retrieve the information (I suppose a retrieval attempt for a printed book may put up a dialog box asking the user whether to send a purchase order for it). Now, getting the URL spec to include new access methods isn't done by snapping your fingers, and I don't advocate deploying a home-made scheme on the Internet without considerable input from appropriate IETF groups and similar authorities, so it's probably wise to make the URL one of several reference structures used simultaneously within the GenWeb context. As the URL spec develops, we may be able to translate our internal reference structures into a syntax which is understood by Internet software in general, not only GenWeb applications. It's often useful to have more than one way of expressing the same kind of information. One way to keep the reference structure simple is to represent it as a text string. In order to distinguish URLs from non-URLs, you may prefix them by "URL:" (as in "URL:http://ourworld.compuserve.com"). For any other kind of reference, use "X-BOOK:Grolier's Encyclopedia", "X-ADDRESS:Dr Smith/4711 Vermont Ave./Atlantic City 12345/USA", and so on. Note that these latter examples aren't URLs, and while they shouldn't normally be seen outside a GenWeb database, the "X-" prefix should tell any knowledgeable outsider who happens to see it anyway that it's an experimental scheme that they can't interpret without agreement with those who deployed it (us). >One thing that those of us looking at generating Web Pages in Hypertext Markup ^^^^^^ >Language will require is to know when the link is to an object which can be >pointed to by an URL. Thanks for correcting me! I don't know where I got "Meta" from... :-} >The best we can achieve is to keep the domain name >constant by using the genweb.org domain name to the full. That way if a >database moves from one server to another we can keep the links alive. It >pushes towards using a virtual server name as the beginning of the path. It still relies on you being able to allocate the same path within the new server as the one you had on the old server, unless you intend to employ a redirection server, in which case you may put a pretty heavy load on it (resolving cross-database links all over the world). I initially supported the idea of assigning alias names to the database servers, but I'm becoming less and less fond of that solution, as I see the effects on the name space, which gets polluted with multiple (and misleading) names for the same documents. -- Anders Andersson, Dept. of Computer Systems, Uppsala University Paper Mail: Box 325, S-751 05 UPPSALA, Sweden Phone: +46 18 183170 EMail: andersa@DoCS.UU.SE From: mavrogeorge@genealogysf.com Date: Wed, 12 Jun 1996 06:56:12 -0700 Subject: Stale links To: genweb@UCSD.EDU X-Mailer: SPRY Mail Version: 04.00.06.21 Why not let the individual link id be a concatenation of elements in an individual record. Then it is not necessary that the GEDCOM file itself contain the id only that there be standard way of determining the id. How many elements do we need to make an id unique using only the elements in GEDCOM? surname+firstname+birthyear+ ....?? Then if I wanted to link to a person in your data I would have a process (independent of the data) that says link to surname+firstname+...etc at databasename. The process can determine where that database is located based on the databasename and then find the individual based on the key I supplied. From list-relay@UCSD.EDU Wed Jun 12 08:11:49 1996 Received: from UCSD.EDU (mailbox2.ucsd.edu [132.239.1.54]) by fuji.ucsd.edu (8.6.9/8.6.9) with ESMTP id IAA17716 for ; Wed, 12 Jun 1996 08:11:48 -0700 Received: from none.at.helo (Hydro.CAM.ORG [198.168.100.7]) by UCSD.EDU (8.7.5/8.6.9) with ESMTP id IAA28315 for ; Wed, 12 Jun 1996 08:01:00 -0700 (PDT) Received: from Ocean.CAM.ORG (beaur@Ocean.CAM.ORG [198.168.100.5]) by Hydro.CAM.ORG (8.7.5/8.7.3) with ESMTP id LAA06927 for ; Wed, 12 Jun 1996 11:01:13 -0400 (EDT) Received: from localhost (beaur@localhost) by Ocean.CAM.ORG (8.7.5/8.7.3) with SMTP id LAA28274 for ; Wed, 12 Jun 1996 11:01:07 -0400 (EDT) X-Authentication-Warning: Ocean.CAM.ORG: beaur owned process doing -bs Date: Wed, 12 Jun 1996 11:01:04 -0400 (EDT) From: Denis Beauregard To: genweb@UCSD.EDU Subject: Re: Stale links In-Reply-To: <199606121356.AA15119@relay.interserv.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII On Wed, 12 Jun 1996 mavrogeorge@genealogysf.com wrote: > Why not let the individual link id be a concatenation of elements > in an individual record. Then it is not necessary that the > GEDCOM file itself contain the id only that there be standard way > of determining the id. > > How many elements do we need to make an id unique using only the > elements in GEDCOM? surname+firstname+birthyear+ ....?? IMO, this is not enough. Even adding the birth place is not enough. I know a software author who proposed this: few letters (3? consonants ?) for the family name few letters of first name few letters of father's first name few letters of mother family and first name birth date My own tag could be BEA-D-B-B-M-1956 (dashs are added to ease view) Not enough in a universal system, but what could be done: standardized (std) family name (based on the country or area where one is living) std first name (Denis/Denys/Dennis -> Dennis in Anglo area, Denis in Franco area, etc.) std father's first name std mother's maiden and first names birth date When data is missing (unknown parents or birth date) then we could have another tag based on a relative, i.e. my father could be BEA-D-B-B-M-1956.F > Then if I wanted to link to a person in your data I would have a > process (independent of the data) that says link to > surname+firstname+...etc at databasename. The process can > determine where that database is located based on the > databasename and then find the individual based on the key I > supplied. That database would include std names. Birth date, family and given names are not enough: given names popularity depends on time period, so 2 given name may be popular the same year at the same place, so 2 unrelated persons would have the same tag. I believe in variable signatures to match 2 persons in a large database, but I too far from code at this time to even try it. My theory is that in a 1st step, we try many methods of signatures to match persons between 2 data sets. In 2nd step, relative are matched. Fine tuning is required so that the signatures are long enough to avoid mismatch, but short enough to find matches. Denis ### Denis Beauregard, genealogiste amateur, Internet: beaur@cam.org ### Page web de genealogie: http://www.cam.org/~beaur/gen/index.html ### Genealogy Web page: http://www.cam.org/~beaur/gen/welcome.html ### Sujets: Quebec, France, Acadie, experts francophones, etc. Subject: Re: Stale links To: mavrogeorge@genealogysf.com Date: Wed, 12 Jun 1996 16:18:54 +0100 (BST) From: Ben Laurie Cc: genweb@UCSD.EDU Reply-To: ben@algroup.co.uk mavrogeorge@genealogysf.com wrote: > > Why not let the individual link id be a concatenation of elements > in an individual record. Then it is not necessary that the > GEDCOM file itself contain the id only that there be standard way > of determining the id. > > How many elements do we need to make an id unique using only the > elements in GEDCOM? surname+firstname+birthyear+ ....?? > > Then if I wanted to link to a person in your data I would have a > process (independent of the data) that says link to > surname+firstname+...etc at databasename. The process can > determine where that database is located based on the > databasename and then find the individual based on the key I > supplied. IMHO the difficulty with this method is the nonconstant nature of the data in the record. We have people about whom we currently know _nothing_ except their parents and children. No names, no dates. Nothing. But, one day, we may know something. Or, we may discover that what we though we knew was wrong. Also, we have people with more than one name. Or names but no dates. Obviously, the possibilities are endless. Of course, we do know who we are talking about (the father of X, the second son of Y, for instance) but would links based on these relationships be helpful? Cheers, Ben. -- Ben Laurie Phone: +44 (181) 994 6435 Freelance Consultant and Fax: +44 (181) 994 6472 Technical Director Email: ben@algroup.co.uk A.L. Digital Ltd, URL: http://www.algroup.co.uk London, England. To: mavrogeorge@genealogysf.com From: "John S. Quarterman" Cc: "John S. Quarterman" cc: genweb@UCSD.EDU Subject: Re: Stale links In-reply-to: Your message of "Wed, 12 Jun 96 06:56:12 PDT." <199606121356.AA15119@relay.interserv.com> Date: Wed, 12 Jun 96 11:34:59 -0500 Sender: jsq@tic.com >Why not let the individual link id be a concatenation of elements > in an individual record. Then it is not necessary that the >GEDCOM file itself contain the id only that there be standard way >of determining the id. > >How many elements do we need to make an id unique using only the >elements in GEDCOM? surname+firstname+birthyear+ ....?? Well, I tried that sort of thing recently just for making unique IDs for people in an index for a book produced from a database of only about 10,000 people. The concatenation you suggest doesn't work. In the nineteenth century and earlier people would commonly have a baby who died, and have another one nine months later with the same name. Also, cousins tended to end up with the same first name, and often the same birth year. Plus very frequently the birth year is unknown. The only thing I found that came close to handling the job was name+father's name+mother's name But of course often the parents are unknown, as well. And this is without even looking at Scandinavian databases, where everybody is Erik Eriksson or the like. Basically, there is no fixed set of attributes that can be depended upon to uniquely identify a person. If you want a unique identifier for a person, you have to make one up and assign it to the person. That is what we've taken to doing for the index problem, using REFN GEDCOM tags in a LifeLines database. To avoid collisions with REFNs imported from other databases, we're using a rudimentary database identifier (Q!) as part of the REFN. That part could clearly be improved. Thanks, John John S. Quarterman Editor, Matrix Maps Quarterly and Matrix News President, Matrix Information and Directory Services (MIDS) http://www.mids.org, +1-512-451-7602, fax: +1-512-452-0127 1106 Clayton Lane, Suite 500W Austin, TX 78723 U.S.A. To: genweb@UCSD.EDU From: mbr@dadd.ti.com (Martin Roberts) Subject: Re: Stale links Date: Wed, 12 Jun 1996 15:27:32 Message-ID: In article mavrogeorge@genealogysf.com writes: >Why not let the individual link id be a concatenation of elements > in an individual record. Then it is not necessary that the >GEDCOM file itself contain the id only that there be standard way >of determining the id. >How many elements do we need to make an id unique using only the >elements in GEDCOM? surname+firstname+birthyear+ ....?? > >Then if I wanted to link to a person in your data I would have a >process (independent of the data) that says link to >surname+firstname+...etc at databasename. The process can >determine where that database is located based on the >databasename and then find the individual based on the key I >supplied. There are many more people interested in genealogy than there are computer genealogists. It will be much easier to agree on a standard for links, like url's, than a standard for data records. Martin Date: 12 Jun 96 19:11:42 EDT From: N Oughtibridge <100020.1117@CompuServe.COM> To: Anders Andersson Cc: GENWEB List Subject: Re: re External references Message-ID: <960612231141_100020.1117_EHV63-1@CompuServe.COM> Anders, How about a syntax such as _LINK http://ourworld.compuserve.com/homepages/oughtibridge +1 TYPE URL in a formal GEDCOM syntax and a default value of URL for the type. That would solve the problems for new and old systems (except that a system which was not designed with GENWEB in mind would be limited to URL links). ______________ Nicholas Oughtibridge is the author of uFTi, a Windows program to generate World Wide Web pages from GEDCOM files. See HTTP://ourworld.compuserve.com/homepages/oughtibridge Email 100020.1117@compuserve.com Date: 12 Jun 96 19:11:44 EDT From: N Oughtibridge <100020.1117@CompuServe.COM> To: "Scott McGee (Personal)" Cc: GENWEB List Subject: Re: More on stale links Message-ID: <960612231144_100020.1117_EHV63-2@CompuServe.COM> Scott is concerned that no programs support a linking method now - if we never agree one, they never will! ______________ Nicholas Oughtibridge is the author of uFTi, a Windows program to generate World Wide Web pages from GEDCOM files. See HTTP://ourworld.compuserve.com/homepages/oughtibridge Email 100020.1117@compuserve.com : JohnR238@aol.com Received: by emout08.mail.aol.com (8.6.12/8.6.12) id TAA00906; Wed, 12 Jun 1996 19:22:37 -0400 Date: Wed, 12 Jun 1996 19:22:37 -0400 Message-ID: <960612192236_326317715@emout08.mail.aol.com> To: mavrogeorge@genealogysf.com cc: genweb@UCSD.EDU Subject: Proposed unique ID In a message dated 96-06-12 10:21:24 EDT, you write: >How many elements do we need to make an id unique using only the >elements in GEDCOM? surname+firstname+birthyear+ ....?? It's interesting that you make this first step in the direction of defining the ID, because we've been moving in the same direction with the KYGENWEB project and my Genealogist's Index to the World Wide Web. Here is my proposal which will generate an almost unique ID for datasets where we have complete data, and for incomplete data will still allow easy correlation. LAST NAME Soundex First Name Soundex BIRTHDATE DEATHDATE Thus a complete id for me would be 24 characters R235J50019530818xxxxxxxx and hopefully some kind soul will someday fill in the rest of the x's. For incomplete data there would be more likelyhood of index collissions, but this could be handled within the search / indexing schemes. This 24 character id has several advantages. Almost unique Easy to decipher by hoomans easily sortable and manipulated by computers The second tier reference file(s) / indexes can identify where the particular id code occurs and the process then becomes managable to drill down to a particular data object. Anders addresses some valid points regarding the existence of these objects in cyberspace or other universes, and the fact that the reference file can be structured to identify these other sources. For most of these object classes identification schemes have already been agreed upon (ISBN #, Library of Congress #, IGI reference, home addresses, SS# - whatever) and we should move in the direction of identifying each class that is pertinent to us and agreeing to call it the same thing in the GEDCOM tags. The fact that the current GEDCOM spec does not provide for this means that we should collectively agree to proceed with a standard TAG or NOTE or CONT (or whatever) followed by our object class:reference location:individual id tag. As we move forward, the GEDCOM spec and software vendors will incorporate it. Comments are welcome. John Rigdon GEN WEB Master The Genealogist's Index to the World Wide Web Date: Thu, 13 Jun 1996 14:19:11 -0700 From: "Pete Cook" To: genweb@UCSD.EDU Subject: A unique ID proposal Content-Type: Text/Plain; charset=US-ASCII Content-Disposition: Inline Here is a suggestion for a name independent unique ID. It was originally developed to help in the process of scanning two genealogical data bases in an attempt to identify common lines. Obviously it is strongest when the grandparents of an individual are known. That is often the case when the youngest common ancestor in a line on seperate databases often has several preceeding generations. Addition of the Soundex code has been suggested to provide some name identidy. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ The Chronological Identifier (CID) is a technique for generating a nearly unique identifier for individuals without reference to surname. What is the CID? It is a 10 digit alphanumeric data element that can be generated from genealogical data bases with trivial computational effort. Genealogists operating independently will generate values that permit automated searches for common lines in completely separate data bases. How is it used? Assume that GEDCOM files are available containing the CID as a data value in a specifically named field. (The tagname "CID" is proposed.) An investigator will build a table of CID's from their own local data base, and can then scan external GEDCOM files looking for CID tags. When found, a simple comparison technique determines whether the external CID matches one from the local database. Matching individuals are candidates for members of lines held in common by the two sets of genealogical records. As always in genealogical work, the match must be verified by critical examination of name, dates, etc. How is the CID computed? The starting point for the CID is the "C-Vector." That is a seven-element array consisting of the birth years of a base individual, and then that individuals two parents and four grandparents, arrayed in ahnentafel order (IND F M PGF PGM MGF MGM). The second step is to subtract the base individual's birth year from each of the others. This gives the age of each of these six individuals in the base person's birth year. Third, a single character is found the following table: Character Parent Age Grandparent Age 0(zero) 0-15 0-30 1-9 16-24 31-39 A-Z 25-50 40-65 a-y 51-75 66-90 z over 75 over 90 - unknown unknown For example, my C-Vector is: 1933 1907 1905 1871 1871 1870 1876 My CID then becomes 1933BDWWXR How are CIDs compared? To be used, the birth year of the base individual must be known. Then elements corresponding to missing elements in each ID are set to missing in the other ID. Two corresponding valid values must still exist. Then a comparison will indicate whether they match. My not very rigorous statistics indicate one random match in every 200,000 CIDs with two valid letters. With six valid letters it is one in 34 trillion. Genealogical matching still requires careful work, but this technique may, in effect, greatly increase the needle to hay ratio in the genealogical haystack - if there are any needles to be found. How effective is it in locating matching ancestors? In the limited trails I have made, it works pretty well. End of line ancestors, of course, don't have enough data. But when two or three generations of a single line have adequate data, they pretty clearly show up as a series of matches, even with some variation values between the two data bases. Author's note: I have been playing with this for several years, and don't know quite where to take it. I think the idea has merit, because I know a lot of genealogists cover the same ground repeatedly without communicating. Computers have opened up wide communication possibilities, but searching through GEDCOM files in their current format is still difficult. I also think it has some potential for reducing the tryanny of surnames that causes over-emphasis on paternal lines. (Everyone has only one paternal line, but dozens of lines through the women of the family.) REFERENCES: Cook, Peter G. "Is my John Cooke your John Cooke?" Genealogical Computing, Vol. 10, No. 1, Jul Aug Sept 1990 Cook, Peter G. "Chronological Ancestor Identification" The New England Computer Genealogist , 1995 Pete Cook p25359@email.mot.com GSTG AZ25 H1670C Motorola Diversified Technologies Services Phone: 602-441-1300 Fax: 602-441-1866 Pager - 1-800-759-8888 pin:2078366 To: Pete Cook Date: Thu, 13 Jun 1996 22:53:28 +0100 (BST) From: Ben Laurie Cc: genweb@UCSD.EDU In-Reply-To: <9606131419.AA11431@137.124.91.6> from "Pete Cook" at Jun 13, 96 02:19:11 pm Reply-To: ben@algroup.co.uk X-Mailer: ELM [version 2.4 PL24 PGP2] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID: <9606132253.aa11339@gonzo.ben.algroup.co.uk> Pete Cook wrote: > > Here is a suggestion for a name independent unique ID. > > It was originally developed to help in the process of scanning two > genealogical data bases in an attempt to identify common lines. Obviously it > is strongest when the grandparents of an individual are known. That is often > the case when the youngest common ancestor in a line on seperate databases > often has several preceeding generations. [snip] > > Addition of the Soundex code has been suggested to provide some name identidy. > How is the CID computed? > The starting point for the CID is the "C-Vector." That is a seven-element > array consisting of the birth years of a base individual, and then that > individuals two parents and four grandparents, arrayed in ahnentafel order > (IND F M PGF PGM MGF MGM). > > The second step is to subtract the base individual's birth year from each of > the others. This gives the age of each of these six individuals in the > base person's birth year. > > Third, a single character is found the following table: > > Character Parent Age Grandparent Age > 0(zero) 0-15 0-30 > 1-9 16-24 31-39 > A-Z 25-50 40-65 > a-y 51-75 66-90 > z over 75 over 90 > - unknown unknown > > > For example, my C-Vector is: > 1933 1907 1905 1871 1871 1870 1876 > > My CID then becomes 1933BDWWXR The trouble is that a huge number of people have a CID of ----------. Nothing simple that is based on information known about the individual is going to work, because that information is often either unavailable or subject to change. Cheers, Ben. -- Ben Laurie Phone: +44 (181) 994 6435 Freelance Consultant and Fax: +44 (181) 994 6472 Technical Director Email: ben@algroup.co.uk A.L. Digital Ltd, URL: http://www.algroup.co.uk London, England. From list-relay@UCSD.EDU Thu Jun 20 07:10:08 1996 Date: 20 Jun 96 09:55:59 EDT From: N OUghtibridge <100020.1117@CompuServe.COM> To: GENWebbers Subject: Trade marks and stale links Message-ID: <960620135558_100020.1117_EHV142-1@CompuServe.COM> I have just been revisiting this month's topics and a thought came to mind. 1. Wes Groleau suggests we wait until the human genome project is complete so that we can have a genuine universal ID for a person (using the GENOME GEDCOM Tag) 2. Does that mean that GENWEBTM.ORG and GENWEB.COM can marry each other since we will depend on genetics for success :-)> Nicholas _________________ Nicholas Oughtibridge is the author of uFTi, a Windows program to generate World Wide Web pages from GEDCOM files. See HTTP://ourworld.compuserve.com/homepages/oughtibridge Email 100020.1117@compuserve.com Date: Sat, 22 Jun 1996 12:06:39 -0700 To: usgenweb@sirius.dsenter.com From: Jeff Murphy Subject: Re: WEB: Clayton Lib. in Houston and Dallas Lib. Gen dept. Cc: genweb@UCSD.EDU, Naaman Nickell At 11:10 AM 6/22/96 -0600, Jim Riddle wrote: > I sent the Clayton Genealogical Research Library in Houston a brief description of what you guys have accomplished. They are looking at it and the site. > The Dallas Genealogical Society will establish a link to USGENWEB. This ... Society is aware of your project and says it one of the best efforts he has ever seen using the power of computers and the internet for genealogical research. Thanks so much, Jim! Man!, that's a nice compliment. It would be great if we could contact other major libraries around the U.S. Is there a way to contact them via email? Some kind of list of libraries with genealogy departments and their email addresses? In light of the comment about using the Internet, I'd like to finally talk about my view of an interconnected genealogical database, and how it might be accomplished. Some of you have heard this before, so if you wander off to other mail I won't object. Functionally, the Internet is nothing more or less than a single large computer. All the individual components are in essence only addresses. Like a hard drive with multiple paths, it is possible to store the data in one (or several) places, the program in another, and the output in a third. So, if we have a program which can read gedcoms and generate html, does it really matter where the program resides, or the gedcom resides, as long as the end result is a display of the data on the user's screen? No. And since most of the programming we need is already in place, what we are mostly concerned with is the acquisition of space to put a gedcom online. Now, one of the things that has not been addressed by the software developers (in my opinion) is the reduction of space. If we could store gedcoms in a compressed (zipped) format, we reduce the size of the gedcom by nearly 90%. My own database compresses by 87%. I would much rather store it compressed, because the space is the one thing I seriously begrudge. Why pay for space for 15Meg when compressed it would only be 2Meg? And even in those cases where pages are generated and stored for later processing, if they could be stored compressed until needed, we would save a great deal of space. I have been notified of one place where users may store some of their data, and one of our group is looking into it to see if this means an uncompressed gedcom. If the data can be stored there, is there anything that prohibits a site processing a program at another site, referencing the data at this third site, and displaying the output to the user? Not that I know of, in a well-behaved program. So, what does that mean to those of you who have websites, but no space for your own data? It should mean that it would be possible for you to arrange to display your data from your own page. Oh, it will be slower than if the data and program resided on the same server. But it should be possible. This, plus the attempt currently going on to find a way to link to the same individual in a different database on the web, will make the envisioned world-wide genealogical database a reality. We are *so* close. There will come a point in time when the technical problems are solved. Many different people have been working on a solution: Cliff Manis, ELIJAH, Gene Stark, John Rigdon, Scott McGee and others, plus all the various software authors who have been working on html generators. (I wonder if anyone is working to use Java for this?) It may be some time down the road, but at least we have established the basic structure by location to take advantage of the technology when it gets here. I appreciate all the work you have done to get the U.S. GenWeb Project to the position of completeness in which it finds itself. After a couple of weeks, we have 23 states online, and 8 more in various stages of completion. And some of them have already got a number of county pages up or being developed. Jeff Murphy 735 NW 8th Redmond, Oregon 97756 h. (541) 548-4478 Specializing in the genealogy of Muhlenberg Co., Kentucky USA GenWeb Project: http://www.teleport.com/~jmurphy/states.html majordomo@nebr.dsenter.com subscribe usgenweb To: genweb@UCSD.EDU cc: leverich@rootsweb.com, Jeff Murphy Subject: Inexpensive Web Space ... Date: Sat, 22 Jun 1996 17:26:15 -0700 From: Brian Leverich > I have been notified of one place where users may store some of their data, > and one of our group is looking into it to see if this means an uncompressed > gedcom. If the data can be stored there, is there anything that prohibits a > site processing a program at another site, referencing the data at this > third site, and displaying the output to the user? Not that I know of, in a > well-behaved program. > > So, what does that mean to those of you who have websites, but no space for > your own data? >From traffic I'm seeing on GENWEB and private e-mail, I'm getting the sense that disk space is a nasty constraint for many folks. At RootsWeb, our internal (no accounting or tech support) cost for providing disk space (with a Web server) is running about $1 per MB per year, and may be under 50 cents/year after we've looked at OS-level compressed file systems. We hadn't intended to support lineage-linked databases at RootsWeb, but we can change that if it looks like we can provide cheaper space than random ISPs. Let me know ... BTW, my apologies if I've been being slow to respond to correspondence of late -- karen@rand.org and I just moved to a new house and, concurrently, upgraded almost everything at RootsWeb and moved it onto a FT1 one router from the SprintLink backbone. We've been fighting off entropy with a broomstick! -B -- Dr. Brian Leverich Co-moderator, soc.genealogy.methods/GENMTD-L RootsWeb Genealogical Data Cooperative http://www.rootsweb.com/ leverich@rootsweb.com From list-relay@UCSD.EDU Sun Jun 23 05:58:10 1996 Received: from UCSD.EDU (mailbox1.ucsd.edu [132.239.1.53]) by fuji.ucsd.edu (8.6.9/8.6.9) with ESMTP id FAA05739 for ; Sun, 23 Jun 1996 05:58:09 -0700 Received: from none.at.helo (dub-img-2.compuserve.com [198.4.9.2]) by UCSD.EDU (8.7.5/8.6.9) with SMTP id FAA14465 for ; Sun, 23 Jun 1996 05:54:05 -0700 (PDT) Received: by dub-img-2.compuserve.com (8.6.10/5.950515) id IAA26515; Sun, 23 Jun 1996 08:54:03 -0400 Date: 23 Jun 96 08:52:26 EDT From: N Oughtibridge <100020.1117@CompuServe.COM> To: GENWEB List Subject: Re: Trade marks and stale links Message-ID: <960623125225_100020.1117_EHV93-4@CompuServe.COM> Forwarded to GENWEB@UCSD.EDU ---------- Forwarded Message ---------- From: scott couch, INTERNET:scouch@inetworld.net TO: N Oughtibridge, 100020,1117 DATE: 21/06/96 19:08 RE: Re: Trade marks and stale links Sender: scouch@inetworld.net To: N OUghtibridge <100020.1117@CompuServe.COM> From: scott couch Subject: Re: Trade marks and stale links To Whom It May Concern: Is it possible to identify any person through the genetic code of the blood? What I mean by identify is, who their parents are even if the father is unknown? I realize that computers are a major part in the input of what we call the family tree. But what if the family tree as we know it is not true? Is there a way to know exactly who the parents and lineage is of each human alive when they either give blood or walk through a special scanner? I may be living in the future of science fiction, but these questions were on my mind. Thank you for helping. Sincerely Submitted, Scott Jordan Couch California, USA (800)962-0172 (home) (619)667-0117 (home) p.s. This is my first posting on the GENEWEB newsgroup and do not know how to post a fresh letter. So, I replied to this letter. I apologize for this chaotic decision. At 09:55 AM 6/20/96 EDT, you wrote: >I have just been revisiting this month's topics and a thought came to mind. > >1. Wes Groleau suggests we wait until the human genome project is complete so >that we can have a genuine universal ID for a person (using the GENOME GEDCOM >Tag) > >2. Does that mean that GENWEBTM.ORG and GENWEB.COM can marry each >other since we will depend on genetics for success :-)> > >Nicholas >_________________ >Nicholas Oughtibridge is the author of uFTi, a Windows program to >generate World >Wide Web pages from GEDCOM files. > >See HTTP://ourworld.compuserve.com/homepages/oughtibridge >Email 100020.1117@compuserve.com > > > Date: 23 Jun 96 08:52:15 EDT From: N Oughtibridge <100020.1117@CompuServe.COM> To: Brian Leverich , GENWEB List , Jeff Murphy Subject: Re: Inexpensive Web Space ... Message-ID: <960623125215_100020.1117_EHV93-1@CompuServe.COM> This is a problem for myself, in two forms. 1. I am preparing static HTML pages of my data which, when I make links to pictures etc will need large space. 2. I want to publish the tool I use, uFTi, on the Web but can't find a home for it. At the moment only CompuServe users can download uFTi. For a few others, I can FTP it anonymously to their FTP sites. This causes a problem when looking at renting out Web space. Text, such as HTML and GEDCOM compresses beautifully. HTML even better than GEDCOM although it starts of larger. However, images etc are usually compressed (JPEG is already a lossy compression, GIF is an efficient exact one). Should a service provider provide a quota based on the sectors of hard disk they supply (which would be attractive to them for HTML and not for GIFS etc) or should they make a guess as to the composition and charge on file size. In any case, $1 per Mb per annum seems reasonable It is Gene Stark's rate at GENDEX.COM. If Brian can do it for less, I suggest he keeps the change! What may prove interesting is the new breed of Windows NT servers running Microsoft's IIS (or any other for that matter). Since Version 3.51 of Windows NT, the Op System has been able to compress individual files from the file manager (including, for example, all HTML files). I do not hold up much scope for applications compressing because most HTTP Servers couldn't uncompress. We have to wait for widespread Client Side processing before we get that good! Nicholas _________________ Nicholas Oughtibridge is the author of uFTi, a Windows program to generate World Wide Web pages from GEDCOM files. See HTTP://ourworld.compuserve.com/homepages/oughtibridge Email 100020.1117@compuserve.com From: "W. Wesley Groleau (Wes)" Subject: Trademarks, stale links, and science fiction To: genweb@UCSD.EDU Date: Mon, 24 Jun 96 11:43:22 EST Mailer: Elm [revision: 70.85] To WHOM IT MAY CONCERN: Any remarks by me suggesting that a GEDCOM file contain an encoding of anyone's chromosomes were intended as a joke. Any such remarks in the future have the same intent unless I fail to include a smiley. Just thought I'd clear that up. --------------------------------------------------------------------------- W. Wesley Groleau (Wes) Office: 219-429-4923 Magnavox - Mail Stop 10-40 Home: 219-471-7206 Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com --------------------------------------------------------------------------- Date: Wed, 26 Jun 1996 21:34:48 +0200 From: Gary Hoffman Reply-To: ghoffman@UCSD.EDU Organization: IR/PS, UC San Diego To: genweb@UCSD.EDU Subject: Economics of File Compression Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Jeff Murphy wrote: Now, one of the things that has not been addressed by the software developers (in my opinion) is the reduction of space. If we could store gedcoms in a compressed (zipped) format, we reduce the size of the gedcom by nearly 90%. My own database compresses by 87%. I would much rather store it compressed, because the space is the one thing I seriously begrudge. Why pay for space for 15Meg when compressed it would only be 2Meg? And even in those cases where pages are generated and stored for later processing, if they could be stored compressed until needed, we would save a great deal of space. --------- In reply, I write my own observations: This is not a technical issue, but an economic issue, still worthy of our disucssion here. The economies of computing are in constant flux. When storage space is dear or bandwidth is low, files are compressed for storage or transmission. When processors are slow, pre-processing is used to speed response. When memory is dear, programmers write tight code. At present, storage space is becoming cheap, processor speeds are increasing, bandwidth is increasing, and memory prices are way down. As a result, we get bloated programs, huge databases, and fat HTML pages. Now come along Internet service providers who need to recover their labor and equipment costs and are finding ways to charge for bandwidth and access time, processing time, and even data storage. The Internet is not free, but how to allocate its costs among all the players is still a puzzler. Should the publisher of a GenWeb dataset pay a few dollars for its storage or should all the users be charged a few cents for each access to the database? The GEDCOM format is a very inefficient storage medium. A PAF database of a few kilobytes can become a GEDCOM file of several hundred kilobytes. However, GEDCOM's virtue is that it is a text file that can be handled by most any storage or transmission system and recognized by most any genealogy program. Likewise, HTML is a very inefficient representation of a page of text, but benefits from its text format and the agreement to standards among using programs. So, if you compress a text file (say, GEDCOM or HTML) for efficient storage, you must de-compress it for further processing or transmission. That costs processing power and response time, and you end up making the user pay more for the time spent online waiting for a reponse to a HTTP request. This tends to drive users away rather than into a Web site. I see a trend towards commercially sponsored Web sites, and even some genealogy Web sites. Can we apply this model to the GenWeb? Who will sponsor a large, frequently accessed GenWeb site? Commercial sponsorship can change Jeff's economics, he won't be worried about cpressing his data. Instead he will be concerned about attracting more users to his site (so that the advertiser will have more exposure) by offering rapid access, high quality data, and attractive appearance. I have some ideas about who some of these sponsors might be but we can all observe who is sponsoring Web pages now and each come up with our own target list. I am not in favor of compression but rather of changing the economics that is driving Jeff to consider it. Cheers, Gary -- *************************************************************************** *Gary B. Hoffman, Computing Services Manager e-mail: ghoffman@ucsd.edu* *Graduate School of International Relations and Pacific Studies (IR/PS)* *University of California, San Diego (UCSD) voice: (619) 534-1989* *9500 Gilman Dr., La Jolla, CA 92093-0519 USA fax: (619) 534-3939* *************************************************************************** From: "W. Wesley Groleau (Wes)" Subject: Re: Economics of File Compression To: genweb@UCSD.EDU Date: Thu, 27 Jun 96 7:34:10 EST In-Reply-To: <31D190D9.A1B@ucsd.edu>; from "Gary Hoffman" at Jun 26, 96 9:34 pm Mailer: Elm [revision: 70.85] :> The GEDCOM format is a very inefficient storage medium. A PAF database of a :> few kilobytes can become a GEDCOM file of several hundred kilobytes. However, :> GEDCOM's virtue is that it is a text file that can be handled by most any :> storage or transmission system and recognized by most any genealogy program. :> Likewise, HTML is a very inefficient representation of a page of text, but :> benefits from its text format and the agreement to standards among using :> programs. :> :> So, if you compress a text file (say, GEDCOM or HTML) for efficient storage, :> you must de-compress it for further processing or transmission. That costs :> processing power and response time, and you end up making the user pay more :> for the time spent online waiting for a reponse to a HTTP request. This tends :> to drive users away rather than into a Web site. HTML when used "properly" may be MORE efficient than Wordperfect or MS Word. These programs carry all kinds of baggage about fonts, margins, indents, etc. I know some people load up their HTML with that stuff too (not to mention graphics) but ... As for GEDCOM and HTML compression, the bottleneck is "you must de-compress it for ... transmission." I continue to be amazed that there is no standard protocol for Web browsers to decompress AFTER transmission. For graphics, true, one doesn't save much, but text typically compresses to less than half, and as noted, GEDCOM much more. Decompression time IN MEMORY (which is what you need for display) is quite fast--much faster than compression time. (Compression is slow because the algorithms typically must analyze the data.) In fact, on many systems, file decompression is limited by the I/O, not by the CPU. And for many users, transmission time is as slow as writing to disk. I just tried an experiment. 24 genealogy related files, 506+ KB Compression took 26 seconds, to under 304 KB. Decompression to 13 seconds. Just one GEDCOM of 196 KB compressed in 15 seconds (80%) and decompressed in ONE second. (Includes disk I/O over NFS Ethernet) This is HP Unix on a 68030 (I think) with no local disk but plenty of memory. --------------------------------------------------------------------------- W. Wesley Groleau (Wes) Office: 219-429-4923 Magnavox - Mail Stop 10-40 Home: 219-471-7206 Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com --------------------------------------------------------------------------- From: "W. Wesley Groleau (Wes)" Subject: Correction on compression experiment To: genweb@UCSD.EDU Date: Thu, 27 Jun 96 7:40:14 EST Mailer: Elm [revision: 70.85] That was HP Unix, 68040, that DOES have local disk, and multiple users.. -- --------------------------------------------------------------------------- W. Wesley Groleau (Wes) Office: 219-429-4923 Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com ---------------------------------------------------------------------------