From UCSD.EDU!list-relay@netcomsv.netcom.com Mon Sep 12 09:14:41 1994 Date: Mon, 12 Sep 1994 18:04:25 --100 To: genweb@UCSD.EDU Subject: GenWeb2000 Thanks, Gary, for your good work. Do you really have so much resources to maintain a GenWeb archive? But, there will be more later... I think, an partition of the net into person info and marriage info should be one of the first decisions. This makes a net of person nodes, marriage nodes, person-marriage-links, and marriage-person-links. The nodes should be virtual www-nodes, i.e. it should not be necessary to have really html-files. The links should be noted as URLs or so like. These links, I call them PRL (person resource name) and MRL (marriage resource name) should be definable locally. They should be unique for a person, as far as possible. It seems to be clear, that name, date of birth, place of birth could serve for PRL, names of both partners, date and location of the marriage for MRL. A big problem with such a scheme is, that for older people, there are no dates and no locations. A transition procedure from a undetailed PRL/MRL into a detailed (if more info gets to be known) is necessary. If we really want to hold multiple nodes we need relay stations which enable the management. However, I would prefer to remove multiple nodes. We need such worms which find multiplicity and propose the owners to remove on of the nodes. Removal should be mean, shadow it from the net. In the net, everybody will be keen to use foreign resources, therefore reduction should be possible. At present, most of us have their genealogical info in gedcom files. Is gedcom really a good basis? As sequential file it cannot serve well. It could be a backup file behind some genweb-server. The genweb server should deliver - if a complete PRL/MRL is delivered: a Person/Marriage page with links to marriages/parents//husband/wife/children - if additionally a positive depth number is delivered: an outline of the upward subnet (ancestry) starting from the node - if additionally a negative depth number is delivered: an outline of the downward subnet (descendance) starting from the node We need a common data base tool for GenWeb. It should be a simple indexsequential database-tool and no fullfledged commercian data base. What we should define is a protocol, that can be implemented on various data bases. Additionally, a GenWeb-data base tool should be available. I doubt that a gedcom-html converter is the right thing, because in gedcom files the person-info is usually not at the same place as the link info. One has to scan the whole gedcom-file to bring them together. (In the marriage case, the links are at one place, but we have to translate them in PRL.) I've written my own gedcom-html converter and used it to translate LDSROYAL.GED (12 000 persons) and my own ancestry (5-7000 persons). I have to manage thousands of files and want to get rid of them. I.e. my converter translates the whole gedcom file into a bunch of html-files. Upon access, we need only one html-page: person or marriage. We shoud study the work of Tim Berners-Lee, but I doubt we really need it. We have to manage a really specialized kind of info and PRLs or MRLs, much like Birger has proposed them, can serve us well. The point of Bill (and the SVPAFUG) that we should collect person/marriage relevant mail and save it, is really interesting. The question is what should it be used for. In normal genealogy, info about others who look at the same people as I, is much helpful. The reason is: I want to know what they know. In the net, their knowledge is accessible. Now, reasons of priority could speak for this. We should make this clearer. What purpose can it serve? The work of todays researchers is the net today. If future resaerchers want to destroy the net, they will do. Each proposal for PRL/MRL has to overcome the language problem. Most of you, if you have seen my GenWeb-subnets on Charlemagne, might have been bothered with the German and Dutch naming. In extendig the system, I have to decide: Should I use German, French or English names? Our net should be international. Therefore, we need bridges or net-versions. At the end: Most of the criteria of the SVPAFUG are well acceptable. They are ours as well. Can somebody explain me: What is the Richard Austin data base? Can somebody send me royal92? From UCSD.EDU!list-relay@netcomsv.netcom.com Mon Sep 12 09:14:43 1994 Date: Mon, 12 Sep 94 11:53:32 EDT To: genweb@UCSD.EDU Subject: Newcomers/FAQ Iam way down on the learning curve here andwould appreciate suggested referneces/sources forme to get up to speed on the basics of this new technology. Seems since I don't have mosaic I can't even look at the pages that are being posted. True? Help - some pointers to basic references please. Thanks From ucsd.edu!list-relay@netcomsv.netcom.com Mon Sep 12 13:14:39 1994 To: GenWeb@ucsd.edu Date: Mon, 12 Sep 1994 12:52:59 PDT Subject: Re: Newcomers/FAQ Brian Mavrogeorge wrote: Iam way down on the learning curve here andwould appreciate suggested referneces/sources forme to get up to speed on the basics of this new technology. Seems since I don't have mosaic I can't even look at the pages that are being posted. True? Help - some pointers to basic references please. Thanks My reply: Brian, it is as difficult explaining the rainbow to an earthworm (which has no visual sensors) as it is to explain the fine points of HTML document preparation to someone who does not have access to Mosaic. No offense to you, of course, but the fest pointer is to get up and running on Mosaic and then point it to my demo page at URL http://irpsbbs.ucsd.edu/gene/genedemo.html. Any explanations you hear before doing this will just blow right past you. Please stay in touch and let us know when you've jointed the Mosaic generation. Gary *************************************************************************** *Gary B. Hoffman, Computer/Language Lab Director e-mail: ghoffman@ucsd.edu* *Graduate School of International Relations and Pacific Studies (IR/PS)* *University of California, San Diego (UCSD) voice: (619) 534-7733* *9500 Gilman Dr., La Jolla, CA 92093-0519 USA fax: (619) 534-3939* *************************************************************************** -------------------------------------------------------------------------------- From ucsd.edu!list-relay@netcomsv.netcom.com Mon Sep 12 14:33:24 1994 To: GenWeb@ucsd.edu Date: Mon, 12 Sep 1994 14:25:03 PDT Subject: WWW Conference '94 Well, I will not be presenting a session at October's WWW Conference: Mosaic and the Web for two reasons: 1) There were proposals from two presenters named Hoffman and an (in)efficient assistant deleted the references to one of them. Guess which one. 2) The conference coordinator did remember me and my proposal and would have put me into a panel anyway, despite #1, except that there was not a panel where this would have fit. So, the committee did not accept the topic I proposed of Genealogy on the Web. But I learned that I should have couched the proposal in terms the "technoids" would relate to, such as "distributed database application" and "multiple hypertext linkages" rather than "pedigrees" and "ancestral information." But I _was_ offered a one-meter square poster area to show off my proposal to passers-by at the conference. If I want I might get access to a workstation on the Internet to show off whatever I want to whomever is interested, but my topic will not be included in the proceedings of the conference. I have concluded that I will not attend the WWW Conference '94. But don't let this stop anyone from attending. In fact, there should be some good information that may help us with the GenWeb project. I encourage anyone else who may wish to attend to post relevant findings to us here. For further information on the conference, please see URL http://www.ncsa.uiuc.edu/SDG/IT94/IT94Info.html Cheers, Gary *************************************************************************** *Gary B. Hoffman, Computer/Language Lab Director e-mail: ghoffman@ucsd.edu* *Graduate School of International Relations and Pacific Studies (IR/PS)* *University of California, San Diego (UCSD) voice: (619) 534-7733* *9500 Gilman Dr., La Jolla, CA 92093-0519 USA fax: (619) 534-3939* *************************************************************************** From UCSD.EDU!list-relay@netcomsv.netcom.com Mon Sep 12 15:33:39 1994 Date: Tue, 13 Sep 94 00:11:02 +0200 To: genweb@UCSD.EDU, svpafug@rahul.net Subject: Re: GenWeb2000 Implementation Well, my message to svpafug@rahul.net was not in response to any particular query, but prompted by what I considered a misguided attempt at having HTML code produce a particular layout, which it isn't intended for. I didn't send my message to the GenWeb list as I thought it would be of little general interest. Bill Minnick writes: >Based on your observation, I'll design HTML pages for 640x480 monitors; higher >res displays will have a lot of blank space to the right side. This appears >to be a shortcoming of HTML which we have to live with. Do you concur? Whether it's a shortcoming or a feature of HTML is probably a matter of dispute. I have based my opinions largely on the guidelines written by Berners-Lee, and they say pretty clearly that HTML should not be tweaked into formatting the resulting text in any particular way, as that's the job of the client software. The purpose of this "shortcoming" is that HTML documents should be viewable on a wide variety of platforms, including plain old monospaced text screens. Thus it makes no sense to design HTML pages for any particular screen size, as you can't tell which fonts are being used on the client side. I may be privileged having an 1152x900 screen on my Sun, but my Mosaic is set up to use only about half that screen width for its window by default (and I can change the window size as I wish). Others may be stuck with Lynx on 80x24 character VT100 screens. All of us should be able to get the most possible out of the Web, although Lynx users will have to do without the sounds and images (for obvious reasons). Therefore, trying to center text in HTML by adding whitespace or other fillers means you are designing your document for a particular combination of client hardware and software, while those using a different platform most likely will get an aesthetically less pleasing result. The same goes for using header elements to produce different fonts for text--the result is unpredictable. If you want maximum control over page layout, you should use PostScript rather than HTML, but that's hardly an option here. Instead, just put

and

around your header, without any extra graphic filler. NCSA Mosaic will left-justify the header and use a big font, Lynx will center it and convert it to uppercase, and yet other clients may do otherwise. The choice is not yours. As for comments on your GenWeb proposal in particular, I'll have to defer them until I've had time to study it. I'm a newcomer to the GenWeb list, and I'm not familiar with the concepts you are discussing. -- Anders Andersson, Dept. of Computer Systems, Uppsala University Paper Mail: Box 325, S-751 05 UPPSALA, Sweden Phone: +46 18 183170 EMail: andersa@DoCS.UU.SE From ucsd.edu!list-relay@netcomsv.netcom.com Mon Sep 12 16:53:39 1994 To: GenWeb@ucsd.edu Date: Mon, 12 Sep 1994 16:40:03 PDT Subject: Re: Explaining a rainbow to an earthworm I meant no offense to Brian, or to earthworms. And I mistakenly referred to Mosaic when I should have written "WWW Browser" instead. With Lynx, you can browse Web-space on the Internet as well as with other browsers except that you don't see pictures. So I guess Lynx users can see light, but not colors, using the imagery of the earthworm. Anyway, to Brian and others not using a WWW browser, that's where you gotta be before these conversations make sense. Gary *************************************************************************** *Gary B. Hoffman, Computer/Language Lab Director e-mail: ghoffman@ucsd.edu* *Graduate School of International Relations and Pacific Studies (IR/PS)* *University of California, San Diego (UCSD) voice: (619) 534-7733* *9500 Gilman Dr., La Jolla, CA 92093-0519 USA fax: (619) 534-3939* *************************************************************************** From UCSD.EDU!list-relay@netcomsv.netcom.com Tue Sep 13 00:20:27 1994 To: genweb@UCSD.EDU Subject: Re: Your Comments on the SVPAFUG "Alternate Demo" Date: Mon, 12 Sep 1994 23:52:34 REPLY TO: Birger A. Wathne FROM: Bill Minnick, SVPAFUG If I send you my 2 Mbyte GEDCOM file of the Richard Austin data base, can you put it on line using your software and suggested approach? Do you have room on your computer equipment? YOU WROTE: >I looked at your (SVPAFUG's) alternative demo. >I think having the children and basic marriage info on the first page >looked up is nicer for the user. >Of course, this means duplicated data *if* you register in html format. >I plan to use a genealogy program centered around GEDCOM, and to use >the multimedia tags in GEDCOM to include info about pictures, etc. I'll >then use a CGI gateway between the web and my program to look up the info >and generate HTML on the fly. This avoids duplicated data. And it lets me >have my data in a database instead of in x000 files... >You can see a very small first start (just to test that the concept >worked) with a *very* small data base (4 persons...) at >http://www.vest.sdata.no/cgi-bin/ll-search/I3?Index >This invokes the gateway ll-search with the argument 'Index' and an env variable >set to I3 (the person's internal index number in the base). Your's is an interesting alternative to spewing out thousands of HTML pages. I am interested in getting wide feedback on the pro's and con's of both approaches. I see the many HTML pages approach having the advantage of easy entry and no middlemen like ourselves having to commit full time maintaining the software. I'm wondering why I should care if the HTML pages represent many files. This is a burden for the computer, not people; which is where I want the burden. I envision 1) a GEDCOM to HTML-page converter program for doing bulk genealogy transfers to GenWeb; and 2) perhaps a fill-in-the-blanks sheet for those who want to do direct entry into HTML pages. Simplicity of entry and direct entry by anyone are keys to global success of GenWeb. I don't have the resources or time to develop database-to-HTML on-the-fly capability, but I do have the money to buy 1 Gig drives at $495. (the price last weekend). Further, I am considering as an HTTL server a 486 or Pentium CPU running a Unix package, or perhaps Windows NT running an availablle HTTL server for that platform. Perhaps the "ROBOT" or web "Spiders" like the one at url: http://www.biotech.washington.edu/WebCrawler/WebQuery.html can be used to index every word in the GenWeb pages. Does anyone know the capacity of these robots, and are they planning to implement the selective fuzzy search capability we will need to locate people efficiently in a billion person GenWeb. I invite anyone to jump in to comment on any of this. When I sense the good new ideas are drying up, I'm going to start putting data bases on the web to begin getting some real experience, perhaps learning the hard way, and hopefully getting feedback from everyone. UNT, Bill Minnick, svPAFug, Santa Clara, CA From UCSD.EDU!list-relay@netcomsv.netcom.com Tue Sep 13 00:20:29 1994 To: genweb@UCSD.EDU Subject: Re: GenWeb2000: GEDCOM-to-HTML Converter Date: Tue, 13 Sep 1994 00:04:49 TO:Gene Stark FROM: BillMinnick, svPAFug >Bill Minnick wrote >>We plan to write a GEDCOM to HTML converter and place a the 6000 name Richard >>Austin data base in GenWeb within two months. This will give us a means to >>experiment with search/index techniques, and this data base will also serve as >>a great example of how to document sources! GENE STARK WRITES: >I have already written such a converter, which is (at least) capable >of processing the 30K line "royal92" file from ROOTS-L. I am sure that >it would be much less effort to process the "Richard Austin data base" >with my program than to rewrite another converter from scratch. >I am willing to help out with any modifications necessary to get the >database processed. GENE: I tried out your DOS GED2HTML.EXE; found it quite interesting. Your solution is to display the GEDCOM text in an HTML page, and build the links to jump around the lines of GEDCOM to family members and marriages, etc. I'm looking for a conversion to an "Individual Page" and a "Marriage Page" like the ones in our "ALTERNATIVE DEMO" at URL: http://www.rahul.net/svpafug. I think this would require quite a different program than your present one. Would you consider writing the program to generate these HTML Pages? - Bill Minnick From UCSD.EDU!list-relay@netcomsv.netcom.com Tue Sep 13 00:20:35 1994 To: genweb@UCSD.EDU Subject: Re: Alternative Demo Date: Tue, 13 Sep 1994 00:13:25 TO: Herbert Stoyan FROM: Bill Minnick YOU WRITE: >You should remove the parent links from the person page and include a link the the parents marriage instead. This is an excellent idea; Thanks. I'll try it in the Alternative Demo as soon as I can get some time. Regards, Bill Minnick, svPAFug From UCSD.EDU!list-relay@netcomsv.netcom.com Tue Sep 13 02:53:34 1994 Date: Tue, 13 Sep 1994 05:46:42 -0400 To: svpafug@rahul.net (Bill Minnick) Cc: genweb@UCSD.EDU Subject: Re: GenWeb2000: GEDCOM-to-HTML Converter > >Bill Minnick wrote > >>We plan to write a GEDCOM to HTML converter and place a the 6000 name Richard > >>Austin data base in GenWeb within two months. This will give us a means to > >>experiment with search/index techniques, and this data base will also serve as > >>a great example of how to document sources! > > GENE STARK WRITES: > >I have already written such a converter, which is (at least) capable > >of processing the 30K line "royal92" file from ROOTS-L. I am sure that > >it would be much less effort to process the "Richard Austin data base" > >with my program than to rewrite another converter from scratch. > >I am willing to help out with any modifications necessary to get the > >database processed. > > GENE: I tried out your DOS GED2HTML.EXE; found it quite interesting. Your That's interesting, since I did not supply a DOS program, only C and Yacc source, and I have not compiled the program under DOS, only under Berkeley Unix. Are you sure you obtained the version of the program from ftp://cs.sunysb.edu/pub/TechReports/stark/HOME.html If it was indeed my program you tried, that must mean you were able to compile it under DOS. Since I didn't do this, I would be interested to know what C compiler you used, whether you had to make any changes, and what build script, etc. you used. > solution is to display the GEDCOM text in an HTML page, and build the links to > jump around the lines of GEDCOM to family members and marriages, etc. > Yes, this is in fact what I did, even though I wonder if it was my program you tried. > I'm looking for a conversion to an "Individual Page" and a "Marriage Page" > like the ones in our "ALTERNATIVE DEMO" at URL: > http://www.rahul.net/svpafug. > I think this would require quite a different program than your present one. > Would you consider writing the program to generate these HTML Pages? My program is a syntax-directed translator written in YACC. I did it this way so that it would be easily modifiable to produce different output. The program should not be difficult to modify to do what you say. Do you have, say, a small sample of the input and output format that you would like that you could E-mail me? - Gene Stark From UCSD.EDU!list-relay@netcomsv.netcom.com Tue Sep 13 03:53:32 1994 Date: Tue, 13 Sep 1994 06:35:09 -0400 To: svpafug@rahul.net (Bill Minnick) Cc: genweb@UCSD.EDU Subject: Re: GenWeb2000: GEDCOM-to-HTML Converter > >Bill Minnick wrote > >>We plan to write a GEDCOM to HTML converter and place a the 6000 name Richard > >>Austin data base in GenWeb within two months. This will give us a means to > >>experiment with search/index techniques, and this data base will also serve as > >>a great example of how to document sources! > > GENE STARK WRITES: > >I have already written such a converter, which is (at least) capable > >of processing the 30K line "royal92" file from ROOTS-L. I am sure that > >it would be much less effort to process the "Richard Austin data base" > >with my program than to rewrite another converter from scratch. > >I am willing to help out with any modifications necessary to get the > >database processed. > > GENE: I tried out your DOS GED2HTML.EXE; found it quite interesting. Your > solution is to display the GEDCOM text in an HTML page, and build the links to > jump around the lines of GEDCOM to family members and marriages, etc. Oops, I just read your message again, and it is now clear that the program you tried was *not* my program. My program does not display GEDCOM text in an HTML page -- it converts the GEDCOM information into an HTML document. You can get my program from: ftp://cs.sunysb.edu/pub/TechReports/stark/HOME.html You can see a demo of the output of my program by visting the URL: ftp://starkhome.cs.sunysb.edu/pub/HTML/gene/personal/family_history/home.html This is over a 14.4K modem link to my home machine, and it takes about 40 seconds to dial, so if your browser times out, give it another shot right away and you should get through. The document isn't finished, but if you look at the first few sections and the "Index of Persons", you should be able to get the idea. >I'm looking for a conversion to an "Individual Page" and a "Marriage Page" >like the ones in our "ALTERNATIVE DEMO" at URL: > http://www.rahul.net/svpafug. I just tried this, but couldn't get through to www.rahul.net. My impression about this scheme is that it will be very slow for larger databases, as a browser will not be able to do random access of a particular person or marriage, but will have to do a linear scan of the entire "Individual Page" or "Marriage Page" each time you want to follow a link. - Gene Stark From UCSD.EDU!list-relay@netcomsv.netcom.com Tue Sep 13 19:15:25 1994 Date: Tue, 13 Sep 1994 18:23:12 -0700 To: genweb@UCSD.EDU Subject: GenWeb 2000: Whoa... Cc: jjones@alabama.nas.nasa.gov I have been sitting back, listening to the discussion of GenWeb, and thinking about it all. I am pleased to see the progress on the discussions since I last talked/discussed similar ideas months ago with Bill Minnick and others. I think Gary's whole idea of GenWeb is the Genealogy of the Future, and the future begins now. Allow me to comment on some past discussion: ghoffman@ucsd.edu (Gary Hoffman) says: | Anyway, to Brian and others not using a WWW browser, | that's where you gotta be before these conversations make sense. I disagree. You are looking at an end product instead of how to get there. The discussions should not be solely the mechanics of MulitMedia viewing of genealogical data. Rather, I see GenWeb being comprised of three separate but inter-connected tasks: o Collection of data (submission, entry) o Analysis of data (verification, merger, correction) o Dissemination of data (retrieval tools: GEDCOM, HTML, etc...) And each of these can be broken down into many area of discussion. I think we as a group need to step back from Mosaic and multimedia issues, and discuss the overall structure of GenWeb. We have got to talk about more than playing with HTML if we want this idea to blossom into something of use to genealogists around the world. And I strongly suggest we discuss and decide upon this structure, before Web-Fever-inspired ideas start dictating a structure to us. I envision GenWeb being a subset of a large Genealogy resource network, lets call it GenNET. Consider the following fictitious scenario: GenNET consists of nodes on the Internet (and other nets) dedicated to genealogy research. Such a node COULD be a Surname Research Group (SRG) such as the Austin Family Association of America (AFAOA) who are dedicated to tracing all the American AUSTIN lines. Consider further that the AFAOA "node" provides the following (DATA COLLECTION) services to its members: o Email submission of GEDCOM and PAF data o Ftp submission of data o GUI interface for inputting data The "worker bees" of AFAOA are the members. They are divided into groups, each working on part of the AUSTIN family tree. They communicate via email over a news-group maintained by the AFAOA node. Each and every email message is saved for future reference. Periodicly these saved email messages are automatically indexed and archived. As a work group finds new information, they compare it (DATA ANALYSIS) against their known data, i.e. their verified "final" database. As new tid-bits come in, they are discussed via email, and merged into a "research" database. The members of the group hash out the conflicts of merging any data, and together reach a decision on what to add to the "final" database. Any time a member of the group wants to access data (DATA DISSEMINATION) from either the "final" or the "research" database, they have multiple choices: o Automatic database query via email o WWW access (Mosaic, Lynx) o FTP retrieval The email query provides the member the ability to request any report be run on a specified database on specified individual(s), and have the result emailed back as soon as the report it generated. The WWW (Mosaic) query provides the member with the ability to extract and view any data from the database (via an automatic GEDCOM query of said data, converted to HTML on the fly). No disk space is wasted in maintaining hundreds of thousands HTML documents, because they are created as needed, and then removed a short time later. All of this serviced by a genealogy program that is capable of maintaining a database of one billion (actually 10^9 minus 1) individuals, and capable of generating ANY type of report. Each SRG can determine if they wish to open their research to the rest of the GenNET. And when they do add their site to the Web, their homepage can contain links to related SRG nodes, genealogy sites, etc. So, what do you think of this scenario? Interesting? Futuristic? Impossible? What you say if I told you every detail I just described is currently possible? Right now. I have beta tested an email database query system with over 550,000 individuals in the database. I can get a reply to my query within minutes. The genealogy program I described is a Unix application called LifeLines (of which I was one of the first beta testers a couple of years ago.) The capabilities named exist today. Birger A. Wathne has discussed GEDCOM -> HTML query/conversion on the fly. The database engine he is using is LifeLines. Its just that none of this exists in one place. I would like to work on bringing the above scenario into existence in one place, to create the AFAOA node (with the Richard Austin database) as a test case. I just need a machine and a network connection... Conclusion: We as a group need to discuss more than Mosaic issues. We need to discuss the pros and cons of a scenario like the one above. We need to define what we would like GenNET to look like, what functionality it will have. But we need to have the flexibility to take advantage of new technology. We have the chance to create the genealogy environment of the future. Lets take our time, think it out carefully, and suggest a structured, orderly foundation on which to build our ideas. -James _______________________________________________________________________________ James Patton Jones email: jjones@nas.nasa.gov Parallel Systems Support phone: 415 604 4369 Computer Sciences Corporation home: 415 571 6762 fax: 415 604 4377 Numerical Aerodynamic Simulation (NAS) Facility, M/S 258-6 NASA Ames Research Center, Moffett Field, California 94035-1000, USA _______________________________________________________________________________ From UCSD.EDU!list-relay@netcomsv.netcom.com Wed Sep 14 05:15:53 1994 Date: Wed, 14 Sep 94 07:55:57 EDT Subject: Re: GenWeb 2000: Whoa... To: "James P. Jones" , genweb@UCSD.EDU I liked James' note and imagine things will head that way. Eventually, we should hope that some permanent, motivated, well financed, non-profit organization will step in and help out as a repository/developer (and no I'm not LDS...but that group is an obvious possibility). The point is, of course, that we are mostly hobbiests and often need to let fun stuff fall to the side...or lack resources for grownth/maintenance. George (hbladm47@uconnvm.uconn.edu) From UCSD.EDU!list-relay@netcomsv.netcom.com Wed Sep 14 05:15:59 1994 Date: Wed, 14 Sep 94 08:09:22 EDT To: genweb@UCSD.EDU Subject: Directions Re: GenWeb 2000: Whoa, 13 Sep 1994 by James Patton Jones, jjones@nas.nasa.gov James, As a very recent new subscriber to GenWeb I was about to pull the plug since it looked like a bunch of tech-weenies rather than genealogists. Your very thoughful post appears to set it on a track toward being useful to me. I have 3300+ documented related people in my genealogy database, I have been a subscriber to the ROOTS-L list in the Internet for more than a year, and I am retired from a career in computing so I have some knowledge of the involved subjects. I don't have time (nor the inclination) to dig into the details of high-tech, state-of-the-art data management technology - I did that in a former life - I just want the use of it! If GenWeb is for genealogists, I'll remain a subscriber. Maybe some lurking on ROOTS-L at LISTSERV@VM1.NODAK.EDU would provide some insight. Dean Wheaton wheaton@marconi.w8upd.uakron.edu From UCSD.EDU!list-relay@netcomsv.netcom.com Wed Sep 14 08:53:16 1994 To: genweb@UCSD.EDU Subject: Re: GenWeb2000: GEDCOM-to-HTML Converter Date: Wed, 14 Sep 1994 08:31:21 TO: Gene Stark & Michael Cooley FROM: Bill Minnick SUBJECT: GEDCOM-HTML Converter Program Bill Minnnick WROTE: >> GENE: I tried out your DOS GED2HTML.EXE; found it quite interesting. Gene Stark WROTE: >That's interesting, since I did not supply a DOS program, only C and Yacc >source, and I have not compiled the program under DOS, only under Berkeley >Unix. > Are you sure you obtained the version of the program from > ftp://cs.sunysb.edu/pub/TechReports/stark/HOME.html It turns out that I had tried the program provided by Mike Cooley, and not your version, Gene. Sorry for the mixup; Mike cooley gets the credit. Please check with Mike as to which version he used, and haw he made the DOS conversion. Bill Minnick WROTE: >> I'm looking for a conversion to an "Individual Page" and a "Marriage Page" >> like the ones in our "ALTERNATIVE DEMO" at URL: >> http://www.rahul.net/svpafug. >> I think this would require quite a different program than your present one. >> Would you consider writing the program to generate these HTML Pages? Gene Stark WROTE: >My program is a syntax-directed translator written in YACC. I did it this >way so that it would be easily modifiable to produce different output. >The program should not be difficult to modify to do what you say. >Do you have, say, a small sample of the input and output format that you >would like that you could E-mail me? Yes Gene, I will prepare and send separately a GEDCOM which will produce the HTML pages now visible in the "Alternative Demo" at URL: httl://www.rahul.net/svpafug - Bill Minnick From UCSD.EDU!list-relay@netcomsv.netcom.com Wed Sep 14 10:36:33 1994 To: genweb@UCSD.EDU Subject: Re: GenWeb 2000: Whoa... Date: Wed, 14 Sep 1994 08:42:15 James: Your statement of how to proceed with the GenWeb project was thoughtful and far reaching. Thanks for putting the essence of our discussions last Sunday into a well organized statement of the problem and possible solution path. -- Bill Minnick From UCSD.EDU!list-relay@netcomsv.netcom.com Wed Sep 14 15:53:51 1994 Date: Wed, 14 Sep 1994 18:23:42 -0400 To: svpafug@rahul.net (Bill Minnick) Cc: genweb@UCSD.EDU Subject: Re: GenWeb2000: GEDCOM-to-HTML Converter Bill Minnick writes: >Bill Minnnick WROTE: >>> GENE: I tried out your DOS GED2HTML.EXE; found it quite interesting. > >Gene Stark WROTE: >>That's interesting, since I did not supply a DOS program, only C and Yacc >>source, and I have not compiled the program under DOS, only under Berkeley >>Unix. > >> Are you sure you obtained the version of the program from > >> ftp://cs.sunysb.edu/pub/TechReports/stark/HOME.html > >It turns out that I had tried the program provided by Mike Cooley, and not >your version, Gene. Sorry for the mixup; Mike cooley gets the credit. Please >check with Mike as to which version he used, and haw he made the DOS >conversion. Sheesh. I give up on this. I am not very interested in how Mike Cooley got *Mike Cooley's* program to compile under DOS. I was merely trying to point out that the program you tried was NOT MY PROGRAM, either in source or executable form. Had you in fact tried my program (which you did not), I would have been interested in comments on how it worked and how you had compiled it under DOS (you didn't, and Mike Cooley didn't either). - Gene From UCSD.EDU!list-relay@netcomsv.netcom.com Thu Sep 15 20:54:21 1994 To: genweb@UCSD.EDU Subject: Re: GenWeb 2000: Whoa... .. . . . Austin (AFAOA) Node Equipment Date: Thu, 15 Sep 1994 20:43:00 TO: James P. Jones FROM: Bill MInnick YOU WROTE: > GenNET consists of nodes on the Internet (and other nets) dedicated to > genealogy research. > Such a node COULD be a Surname Research Group (SRG) such as the Austin > Family Association of America (AFAOA) who are dedicated to tracing all > the American AUSTIN lines. > > Consider further that the AFAOA "node" provides the following (DATA > COLLECTION) services to its members: > > o Email submission of GEDCOM and PAF data > o Ftp submission of data > o GUI interface for inputting data > > The "worker bees" of AFAOA are the members. They are divided into groups, > each working on part of the AUSTIN family tree. > > They communicate via email over a news-group maintained by the AFAOA node. > Each and every email message is saved for future reference. Periodicly these > saved email messages are automatically indexed and archived. > > As a work group finds new information, they compare it (DATA ANALYSIS) > against their known data, i.e. their verified "final" database. As new > tid-bits come in, they are discussed via email, and merged into a > "research" database. The members of the group hash out the conflicts of > merging any data, and together reach a decision on what to add to the > "final" database. > Any time a member of the group wants to access data (DATA DISSEMINATION) from > either the "final" or the "research" database, they have multiple choices: > o Automatic database query via email > o WWW access (Mosaic, Lynx) > o FTP retrieval > The email query provides the member the ability to request any report > be run on a specified database on specified individual(s), and have the > result emailed back as soon as the report it generated. > The WWW (Mosaic) query provides the member with the ability to extract > and view any data from the database (via an automatic GEDCOM query of > said data, converted to HTML on the fly). No disk space is wasted in > maintaining hundreds of thousands HTML documents, because they are created > as needed, and then removed a short time later. > All of this serviced by a genealogy program that is capable of maintaining > a database of one billion (actually 10^9 minus 1) individuals, and capable > of generating ANY type of report. > Each SRG can determine if they wish to open their research to the rest of > the GenNET. And when they do add their site to the Web, their homepage > can contain links to related SRG nodes, genealogy sites, etc. >So, what do you think of this scenario? Interesting? Futuristic? Impossible? >What you say if I told you every detail I just described is currently >possible? Right now. >I have beta tested an email database query system with over 550,000 >individuals in the database. I can get a reply to my query within minutes. >The genealogy program I described is a Unix application called LifeLines >(of which I was one of the first beta testers a couple of years ago.) The >capabilities named exist today. >Birger A. Wathne has discussed GEDCOM -> HTML query/conversion on the fly. >The database engine he is using is LifeLines. >Its just that none of this exists in one place. I would like to work on >bringing the above scenario into existence in one place, to create the AFAOA >node (with the Richard Austin database) as a test case. I just need a machine >and a network connection... I will work with the Austin Families Association of America (AFAOA) to provide a 486-based PC with appropriate hard drive which we will dedicate to the AFAOA Node Function. If we associate this project with a legitimate non-profit organization like the Silicon Valley PAF user's group, will it be possible to hook into the Internet via a .edu or .gov facility in the San Francisco Bay Area? James, do you or any other participant have a suggestion about getting the network connection? -- -- -- Bill Minnick From UCSD.EDU!list-relay@netcomsv.netcom.com Fri Sep 16 07:14:46 1994 Date: Fri, 16 Sep 1994 08:26:11 -0500 To: genweb@UCSD.EDU Subject: Re: GenWeb 2000 Whoa .... One possible way to gain access to the Internet would be to approach a Freenet site. There is no such site in my area but there are large ones in Cleveland and other locations. These sites seem to have as a mission the making available of Internet access to the public. Bill Woodward From UCSD.EDU!list-relay@netcomsv.netcom.com Fri Sep 16 09:54:45 1994 To: genweb@UCSD.EDU Date: Fri, 16 Sep 94 12:33:55 EDT Subject: Genealogical Server Engines Quick note from a new reader. I'm relatively new to WWW, but an old hand at genealogical data processing. Any internet based genealogical service will require the services of a genealogical engine and surrounding system. One such e-mail based system that I am aware of is GENSERV. GENSERV uses my LifeLines program as the engine, and an automated e-mail and shell script system handles requests, searches databases, generates reports and responds. (Other than providing LifeLines I am not associated with GENSERV). LifeLines as a program is a UNIX-based genealogical system aimed at individual users. Not to extoll it, but LifeLines is far more flexible in what its databases can handle, and in what kinds of outputs it can generate, than any commercially available system. LifeLines was not constructed to be a genealogical server engine. However it is a very modular system. The database engine, the genealogical operations and the report generation components can all be rearranged to form other special purpose server engines. Some of the features that engines built from LifeLines modules would provide include: o Very large databases with fast access times. o Variable length records (unlimited sizes). o Unlimited record types (though persons, families, events and sources can have some builtin semantics); lineage-linking of course provided. o Fast name access to the persons in database. o Records kept in GEDCOM syntax (as opposed to semantics), providing unlimited depth of detail, and unlimited flexibility in structuring data. o Unlimited flexibility for generated output; the LifeLines system includes a report programming sub-system, complete with hundreds of genealogical, string processing, number processing, data accessing, control flow and other operations, allowing for essentially any output. This means, for example, postscript files for any kind of chart or report, LaTeX output, troff output, books, indices, you name it, you can generate it. o Anything else. Picking up on the output point, a few LifeLines users have already written LifeLines report programs for converting GEDCOM data to SGML. I have little time to explore these issues on my own. If, however, anyone knows enough to propose some requirements for genealogical server engines, I could probably glue together some experimental systems from the LifeLines piece parts in short order. Any interest in further discussion? By the way, LifeLines is free of charge and available in source format for porting to any UNIX-like system. Thomas T. "Tom" Wetmore IV, Ph.D. (crusty old fart) Distinguished Member of Technical Staff (translation: unpromotable) AT&T Bell Laboratories and Network Systems (we built your world) North Andover, Massachusetts (raining) ttw@beltway.att.com From UCSD.EDU!list-relay@netcomsv.netcom.com Fri Sep 16 17:54:35 1994 Date: Fri, 16 Sep 1994 20:07:24 -0400 (EDT) Subject: Re: GenWeb 2000 Whoa .... To: BILL WOODWARD On Fri, 16 Sep 1994, BILL WOODWARD wrote: > One possible way to gain access to the Internet would be to approach > a Freenet site. There is no such site in my area but there are large ones > in Cleveland and other locations. These sites seem to have as a mission the > making available of Internet access to the public. > > Bill Woodward Cleveland's too busy, try Columbus Freenet. The freenet there is just getting started and maybe they would be interested. freenet.columbus.oh.us Carol Swinehart cswineha@freenet.columbus.oh.us From UCSD.EDU!list-relay@netcomsv.netcom.com Sat Sep 17 01:05:33 1994 Date: Fri, 16 Sep 94 16:46:51 PDT Subject: ged2html - what the hey? To: genweb@UCSD.EDU I returned last night from a trip that has kept me away from my email for about three weeks. I'm surprised to see may name in so many posts! I'd like to say, just in case anyone is curious, that the small utility I wrote, ged2html, was meant to be nothing else but that and was my first attempt at doing any kind of html conversion - just one way for me to start getting familiar with it. I wrote the simplest thing imaginable; not meant to be a solution or anything else. - Just a quick and dirty and extraordinarily simple program. I made it available at my ftp site only because it was there and, at the time, knew of no other such program, at least for DOS. It may have an advantage for some uses as it is small and (I think) *very* fast. The resultant file may be, for instance, smaller than others, possibily making it useful for archival purposes, yet somewhat readable and useful for the WWW. Anyway, it is obvious that all the traffic on this resulted from someone looking at their notes cross-eyed. Too bad the subject wasn't dumped quicker - and that it was so public. Nevertheless, thanks for the free and effortless (on my part) exposure. Anyone still curious about it can download ged2html.exe from ftp.netcom.com/pub/nqf/UTIL. I know of no bugs, but if you find some, please let me know. Once I have time, I will play with it and create more attractive, and readable, output. Now. I am interested in getting involved somehow. It seems, at this point, absurd to spent much of my time trying to improve my program as much of the work has already been done. (It seems I got tuned into the Net a couple of years too late. Every "great" idea I've had has been covered, it seems). I will be putting my system on the Net full time in a few months. I hope to turn it into a commercial venture in time (I sold my business and have no job at present) but I am still very open to suggestions regarding content, etc. I have some good ideas, but there are probably better ones out there. The machine going online is a 486-66 with a 1 gig 9.5 ms HD, almost empty at this time, 16 megs of RAM and runs on Linux (with a small 100 meg DOS partition.) Not the greatest - but decent. The connection right now is only 14.4 bps, soon to be increased to 28.8. Again, could be better. I will improve the connection if any money is derived from it -- otherwise, it would be too costly to do. Any suggestions for disk usage? Services? More? Emcee.com will be telnetable, etc. Feel free to comment privately - publicly if you really think everyone will be interested. Michael ---------------------------------------------------------------------------- Michael Cooley michael@genealogy.emcee.com From UCSD.EDU!list-relay@netcomsv.netcom.com Fri Sep 16 20:41:32 1994 To: genweb@UCSD.EDU Date: Fri, 16 Sep 94 23:39:03 EDT Subject: GEDCOM Databases Herbert (>): >At present, most of us have their genealogical info in gedcom files. Is >gedcom really a good basis? As sequential file it cannot serve well. GEDCOM is fine for raw record format if the data is kept in a database so that random records can be retrieved on demand. You are right that a single GEDCOM file would not work. In addition, if the database also supports conversion to HTML on record extraction, I think you've got most of what you need. The records, after translation to HTML, need have no similarity to the original GEDCOM records. There are systems that can provide these features now. >...The genweb server should deliver >- if a complete PRL/MRL is delivered: a Person/Marriage page with links to > marriages/parents//husband/wife/children >- if additionally a positive depth number is delivered: an outline of the > upward subnet (ancestry) starting from the node >- if additionally a negative depth number is delivered: an outline of the > downward subnet (descendance) starting from the node No sweat. Can be done with the program I mentioned in my last post. >We need a common data base tool for GenWeb. It should be a simple >indexsequential database-tool and not fullfledged commercial data base. Don't know where you're getting these requirements from. I would strongly recommend not using a simple, relational database system. These databases are too rigid in their field and size restrictions to meet the needs of genealogical data. The database must allow variable length records, nearly unlimited field lengths, and must be relatively lax on the format restrictions on the field values. The program I mentioned in my last post uses a BTree database, indexed by record key and by name, and it allows records of any length (0 bytes to file system maximums). >What we should define is a protocol, that can be implemented on various >data bases. Additionally, a GenWeb-data base tool should be available. "Protocol" is an over-used word; I don't know what you mean in this context. >I doubt that a gedcom-html converter is the right thing, because in >gedcom files the person-info is usually not at the same place as the >link info. One has to scan the whole gedcom-file to bring them together. Well, this is true if your "database" is just a single GEDCOM file. I'm starting to sound like a broken record now, but if your database is a random access database whose records are GEDCOM records, then basing a genealogical server on a system that includes GEDCOM to HTML conversion is, in my opinion, a excellent choice. I am biased, I know, but I think this is the best choice at present. >I have to manage thousands of files and want to get rid of them. I understand your frustration. Solution: load them all into the a GEDCOM-based database program. All your data's there, still in complete GEDCOM format, with no data loss or modification. Then set your system to first extract the proper sets of person and family records based on a user request, and then do the GEDCOM to HTML to PRL/MRL translations. The program I keep not naming can do this. >We should study the work of Tim Berners-Lee... What is this? Tom Wetmore, ttw@beltway.att.com AT&T Bell Laboratories From UCSD.EDU!list-relay@netcomsv.netcom.com Sat Sep 17 05:57:15 1994 Date: Sat, 17 Sep 1994 14:38:55 --100 To: genweb@UCSD.EDU Subject: LifeLines If LifeLines works so well as is told, why not decide to take it as common basis? We should standardize how LifeLines working at different sites could interact. From UCSD.EDU!list-relay@netcomsv.netcom.com Sat Sep 17 17:13:51 1994 To: hstoyan@faui80.informatik.uni-erlangen.de Cc: genweb@UCSD.EDU Date: Sat, 17 Sep 94 19:45:07 EDT Subject: Re: LifeLines Herbert, LifeLines is a single user genealogical database and report generation system. It has a multi-screen based user interface based on curses. It is being used in one application as an automatic genealogical engine, but it is a bit awkward to use it this way. All the parts are there, they're just not put together in the best way. I will be very willing, however, to help reshape those parts to help the Gen Web efforts. Tom Wetmore, ttw@beltway.att.com Newburyport, Massachusetts From UCSD.EDU!list-relay@netcomsv.netcom.com Sat Sep 17 23:37:54 1994 To: genweb@UCSD.EDU Subject: Suitability of Lifelines Program for the GenWeb Task Date: Sat, 17 Sep 1994 23:20:56 TO: Tom Wetmore FROM: Bill Minnick JAMES JONES WROTE: . . . . . Rather, I see GenWeb being comprised of three separate >but inter-connected tasks: > o Collection of data (submission, entry) > o Analysis of data (verification, merger, correction) > o Dissemination of data (retrieval tools: GEDCOM, HTML, etc...) >And each of these can be broken down into many area of discussion. I think >we as a group need to step back from Mosaic and multimedia issues, and >discuss the overall structure of GenWeb........... And I strongly suggest we >discuss and decide upon this structure. > ......... Any time a member of the >group wants to access data (DATA DISSEMINATION) from either the "final" or >the "research" database, they have multiple choices: > o Automatic database query via email > o WWW access (Mosaic, Lynx) > o FTP retrieval > The email query provides the member the ability to request any report > be run on a specified database on specified individual(s), and have the > result emailed back as soon as the report it generated. TOM: Can you give us your vision of what would need to be done to implement the above three tasks using Lifelines as the kernel or core program? There seems to be a positive feeling among GenWeb participants that Lifelines is a good solution. Do you feel that we can build the required interfaces and tailor Lifelines in a reasonable number of hours? Will the resultant software require be relatively maintenance-free? Will the above functions of GenWeb be user friendly to the millions of PC and Mac owners coming onto the Internet over the next year? We may not be able to answer all of these questions today, but we need to keep them in mind as we make commitments to pursue specific solutions. If Lifelines becomes the heart of our pilot GenWeb implementation, can we identify the pieces which need to be coded, and perhaps ask other members if they'd be willing to share some of the software design/code load? I look forward to your thoughts, as I think they will really help us to get GenWeb 2000 under way on a technically sound footing. --- --- ---Bill Minnick, Cupertino, CA From UCSD.EDU!list-relay@netcomsv.netcom.com Sun Sep 18 00:13:10 1994 To: genweb@UCSD.EDU Subject: RE: Your GenWeb Action Proposal Date: Sun, 18 Sep 1994 00:00:35 jjones@nas.nasa.gov (James P. Jones) SAYS: >I want to build a proof of concept for my genNET proposal, creating a host >that would serve as a test case for both my ideas, as well as those offered >for genWEB. And I know you would like to see real progress on genWEB. As I >set up my test-case, I'd be willing to use the AFAOA 6000+ indi database as >the initial starting ground, rather than my own data, if you would like. (I >know the quality of the data, and it certainly serves as an excellent example >of correct source documenting guidelines. And I will use LifeLines, of course, >as the genealogical engine. JAMES: We have a team of 30 people who have participated in creating the Richard Austin Data base. It has surpassed 7,000 linked individuals, and contains over 100 scanned portrait photos of Austin descendants, some dating back to 1848, ready for inclusion in the data base. Most individuals have excellent source documentation and many have biographical sketches. I am authorized to provide you with the deceased members of the Richard Austin data base for GenWeb testing on Internet. I can have the data base to you with about a one-week notice. We have about 4 Mbytes in GEDCOM Format, and 37 Mbytes of photos. Current average photo size is about 250 Kbytes, .TIF format. >I am working on the network connection now. For initial testing I am >seriously considering a full-time dial up slip connection (28.8kbps) >to a 486PC running Dell Unix, unless anyone has any better ideas. This >connection would provide a slow but usable connection to a platform >(at home) where I can put all the pieces together in one place, just >like in my 'fictious scenerio'. >If things work, I will look into a more permanent network connection. >Opinions? Comments? Bill? This sounds great, James. The Austin Families of America Association will offer to provide a Disk Drive for the 486 system, if that would help. What size hard disk will you need? I'll look into some other possibilities for Internet connection as a possible backup to your plan. -- -- -- Bill Minnick, Cupertino, CA From UCSD.EDU!list-relay@netcomsv.uucp Sun Sep 18 15:53:28 1994 To: genweb@UCSD.EDU Date: Sun, 18 Sep 94 18:41:20 EDT Subject: Re: GenWeb and LifeLines TO: Bill Minnick FROM: Tom Wetmore RE: LifeLines for Gen Web FROM: Bill Minnick JAMES JONES WROTE: >I see GenWeb...comprised of three...inter-connected tasks: > o Collection of data (submission, entry) > o Analysis of data (verification, merger, correction) > o Dissemination of data (retrieval tools: GEDCOM, HTML, etc...) >...Any time a member...wants to access data...they have multiple choices: > o Automatic database query via email > o WWW access (Mosaic, Lynx) > o FTP retrieval >...The email query provides the member the ability to request any report >be run on a specified database on specified individual(s), and have the >result emailed back as soon as the report it generated. BILL MINNICK WROTE: >TOM:...give us your vision of what would need to be done to implement the >above three tasks using Lifelines as the kernel or core program? The GENSERV system uses LL in the first manner. GENSERV has hundreds of databases and has handled hundreds of email-based, database search and report requests. I helped Cliff Manis get the GENSERV to LL interface working; I helped him write a few report programs and a few shell scripts to automate the interface. I did not modify LL to get it to work in this automated mode. For the Gen Web project, however, I would envision putting the parts of LL together in another way, to make a system intended for unattended use. The system would have a simple message-based interface of a small set of commands. The Gen Web community should work out these commands as part of the Gen Web requirements phase. Note. The database used by LL is a custom BTree that was developed with genealogical records in mind. It allows variable length records, and it places no restrictions on the record contents. The genealogical records in a LL database hold arbitraily structured and sized GEDCOM-format ASCII records. The interface to a BTree database is provided by a small library of routines that can be used in other programs. So the same database of genealogical data can be accessed by a variety of special purpose programs. For example, LL can be used to manage databases, another program can provide an email server interface to the database, another program can provide the WWW/HTML server interface, and so on. >Do you feel that we can build the required interfaces and tailor Lifelines >in a reasonable number of hours? Yes, and I can help a lot or a little. The LL source code, including the full BTree and report generation code, is freely available. The only issue I foresee is coming up with the interfaces. >Will the resultant software require be relatively maintenance-free? Yes. >Will the above functions of GenWeb be user friendly to the millions of PC >and Mac owners coming onto the Internet over the next year? I don't see this as an "engine" question. There could be a number of user interface components between the GenWeb users and the server engines, one for Mosaic, one for email and so on. These interfaces would provide the user friendliness and they would interact with the servers to perform the required data manipulation. I wish I understood better how WWW processing worked. If it is possible to create HTML pages dynamically, based on users' selections on a WWW page, then it I think it will be very easy to create an LL-based engine to support on-demand retrieval of genealogical records, including the dynamic preparation of HTML files based on the user selections. >If Lifelines becomes the heart of our pilot GenWeb implementation, can we >identify the pieces which need to be coded, and perhaps ask other members if >they'd be willing to share some of the software design/code load? I will be happy to participate. I can package LL parts up into separate libraries for others to build around (eg, the BTree database can be made available as a library; the name indexing scheme used in LL [which provides a very fast way to retrieve person records by name instead of by internal key] can be made a library; the report programming feature can be made a library). In addition, I could help write the higher level code that make up the engine server functions. It's knowing what that functionality should be that may slow us down, not the actual software development. Tom Wetmore, ttw@beltway.att.com AT&T Bell Laboratories and Network Systems From UCSD.EDU!list-relay@netcomsv.uucp Mon Sep 19 12:14:30 1994 To: genweb@UCSD.EDU Date: Mon, 19 Sep 94 13:44:52 EDT Subject: Broadened Horizons I have now read JAMES JONES' and BILL MINNICK's early messages stating their positions on the GEN WEB project. I was very relieved to see such level-headed and common sensical approaches. Frankly, my first impression of the GEN WEB stuff was not so positive. I had felt there was an over-emphasis on WWW stuff and a lack of emphasis on fundamentals. Don't get me wrong. The WWW/Mosaic/Lynx/Internet stuff is critical, but it is "only" the user interface. It might not be easy fluff, but it is the fluff nevertheless. The larger issues are the genealogical servers underneath that James spoke so well about. How to handle data submissions, merging of new data with old (a very difficult problem, which, in general, cannot be fully automated), how to handle conflicting data and so on, how to store data, how much and what kinds of data to store, how to be compatible with other systems, and so on, and so on, and so on. If the GEN WEB is willing to take on some of these issues, in my opinion, some real worthwhile work will get done. On the other hand, I would be very disappointed if GEN WEB turned out to be just a forum for discussing HTML layouts, and what ought to be hypertextable (though I think GED WEB is the right forum for such things, and they ought to be discussed here). Thanks James and Bill. Tom Wetmore ttw@beltway.att.com From UCSD.EDU!list-relay@netcomsv.netcom.com Mon Sep 19 14:55:30 1994 Date: Mon, 19 Sep 1994 14:46:19 -0700 To: genweb@UCSD.EDU Subject: proof of concept... Hi. I am working on getting the pieces together to impliment the proof of concept for what I proposed as genNET. (I have wanted to do something like this for a while, and all the interest in genWEB and my proposal has made me decide to jump in headfirst.) I am getting a 28.8 kbps slip (serial line IP) connection. It'll be slow, but should be sufficient to test our ideas. Should be configured in a little over a week. This will enable me to put a 486 pc running Dell Unix on the Internet through a service provider. Next I will begin collecting the pieces I described in that "ficticous scenerio" of last week. I first want to get the whole system working as I described it. Then I will want to stream-line and automate it. This is where we will work with Tom Wetmore to specialize parts of LifeLines to a certain purpose, and utilize all these great ideas that folks have been suggesting. Comments? The AFAOA have suggested testing this idea with their data, which I will use. THey have also offered to provide a hard-disk. Once I get things in place and configured, I will announce the site to genWEB folks for dialogue. Speaking of dialogue, lets hear more on the fundamentals, as requested by Tom Wetmore and Bill Minnick. I suggested that genWEB or genNET nodes or sites were responsible for three separate but inter-connected tasks: o Collection of data (submission, entry) o Analysis of data (verification, merger, correction) o Dissemination of data (retrieval tools: GEDCOM, HTML, etc...) So how about the first one, Data Collection. (I'll hold off on the other two for now.) Here are some issues to think about. How do we resolve them? What are the available means of data submission? email, ftp, direct database entry, GUI interface (WWW)... What are the concerns, restrictions, benefits of each? What sort of structure should be imposed on the data? maintain data in gedcom? [my vote is YES!] require GEDCOM? Accept PAF (and others), and be able to convert? require sources? [YES!] format? (here I would suggust the AFAOA source documentation guideline as an excellent place to start. Bill? ) Who (within a GenNET node) can edit the research database? master db? How should a node handle data? WHat standards do we need to think about to allow the most useful use of everyone's time, AND provide the most useful service for genealogists now and in the future? Looking forward to what y'all have to say... -James _______________________________________________________________________________ James Patton Jones email: jjones@nas.nasa.gov Parallel Systems Support phone: 415 604 4369 Computer Sciences Corporation home: 415 571 6762 fax: 415 604 4377 Numerical Aerodynamic Simulation (NAS) Facility, M/S 258-6 NASA Ames Research Center, Moffett Field, California 94035-1000, USA _______________________________________________________________________________ From UCSD.EDU!list-relay@netcomsv.netcom.com Wed Sep 21 13:39:28 1994 Date: Wed, 21 Sep 94 19:56:03 +0200 To: genweb@UCSD.EDU Subject: Gedcom standard in HTML ? It would be very nice if someone could make a set of HTML pages of the current GEDCOM description. I just don't have the time..... Use the HotMeTaL editor, and just cut'n'paste things in some sensible way. Link things together, so it becomes possible to jump around the document and follow definitions. Birger From UCSD.EDU!list-relay@netcomsv.netcom.com Thu Sep 22 19:14:41 1994 Date: Thu, 22 Sep 94 21:36:55 -0500 To: genweb@UCSD.EDU Subject: Suggestion I've been looking at your demos and find them very interesting. One thought did cross my mind. With all these individuals linked together by hypertext links, the process of stepping up or down the tree could be rather lengthy; especially for those of us on dial-up lines. My suggestion would be to require that all html files have no icons or inline graphics ( text only). The advantage is obvious... speed. Pictures, sound recordings, etc. would require the browzer to clink a link, thus specifically requesting those items. This would greatly accelerate the process of moving through brush to get to the solid wood. Once you found the person or persons you desired, you could request the other info. Just a thought.... Keep those cards and letters comming to: Mike Fauber (mfauber@fsp.fsp.com) From UCSD.EDU!list-relay@netcomsv.netcom.com Fri Sep 23 17:57:18 1994 Date: Fri, 23 Sep 1994 20:37:46 -500 (EST) Subject: Re: Suggestion To: genweb@UCSD.EDU Oh no, when you have one or more inline gif files it takes much longer to download and eventually display the html. My thought is to load the text info in a jiffy (no pictures) then if you choose to request more info in the form of pictures, sound recording (even movies), you do so with a click of the mouse. This allows the movement around the database with the greatest ease and speed. At work my pc is tied to an ethernet network with ties to the outside world over T1 lines. Small pictures are not a hassle. At home I'm limited by a dial-up connection (14.4kb). That has changed my out look on how html pages should be designed. Again, I like the general concept. Make all the appropriate info avialable. Why load down the comm lines with pictures until they are really wanted... See ya... From UCSD.EDU!list-relay@netcomsv.netcom.com Fri Sep 23 20:56:16 1994 To: genweb@UCSD.EDU Subject: Re: Suggestion Date: Fri, 23 Sep 1994 20:05:33 Marion, YOU WROTE: >Again, I like the general concept. Make all the appropriate info avialable. >Why load down the comm lines with pictures until they are really wanted... I've pretty much come to the same conclusion. I believe that for the forseeable future, there should be an option to look at photo and sound objects when one has determined that this ia a person of interest to them. Thanks for the suggestion. Regards, Bill Minnick From UCSD.EDU!list-relay@netcomsv.netcom.com Sat Sep 24 03:34:24 1994 Date: Sat, 24 Sep 94 12:12:16 +0200 To: genweb@UCSD.EDU Subject: Re: Suggestion Bill Minnick writes in response to Marion M. Fauber: >I've pretty much come to the same conclusion. I believe that for the >forseeable future, there should be an option to look at photo and sound >objects when one has determined that this ia a person of interest to them. I think you are reinventing a small wheel here. There is such an option in NCSA Mosaic, i.e. the *client* program run by the user ("Delay Image Loading" in the "Options" menu). There should be in other clients as well. Lynx doesn't download any inline images at all. My point is that the client software is responsible for downloading any inline images as needed; the server never transmits any image not specifically requested. Inline images and links to images are coded almost the same in HTML; one is "" and the other is "...". The server never cares about the difference. The difference comes in the client, which normally issues requests to download inline images immediately. However, a good client gives the user the ability to tailor the presentation according to his or her needs. Limiting document downloading to text only (because of slow modem lines or whatever) is a perfect example of such a need. Therefore, I feel it would be a waste of time (although entirely possible, of course) to cater specifically for this need in the server end, and provide a separate WWW tree without the inline images. Now, it could be that many GenWeb users happen to be stuck with clients not offering them the abovementioned option; I don't know. How about a survey to find out how many have this problem? Fixing it in the server end also limits the effect to your particular server, while there are yet thousands of WWW servers on the Internet with lots of inline images. If we want to help those users on slow lines, our time is better spent reminding their client software vendors to add those basic options, if they aren't already there, and then *all* of the Internet Web will be available with or without inline images as requested by the user, not just GenWeb. Of course, inline images should always be used with reason, which among other things means they shouldn't be too big. A common solution is to make a small inline image a link to a bigger version of the same image. However, even a few small inline images may be too much for a slow modem line, and this is why the abovementioned option exists (or should exist, in case it isn't there). I'm sorry for getting technical, but I feared someone might spend a lot of work on this for little reason. I hope I have made myself understood; please ask if anything is still unclear. -- Anders Andersson, Dept. of Computer Systems, Uppsala University Paper Mail: Box 325, S-751 05 UPPSALA, Sweden Phone: +46 18 183170 EMail: andersa@DoCS.UU.SE From UCSD.EDU!list-relay@netcomsv.netcom.com Sat Sep 24 08:34:52 1994 Date: Sat, 24 Sep 1994 07:54:25 -0700 (PDT) Subject: Re: Suggestion To: "Marion M. Fauber" I agree with Marion.......NO pictures until needed or wanted by the researcher! I am also limited in my access! Amelia Chapman Painter apainter@san_marcos.csusm.edu From UCSD.EDU!list-relay@netcomsv.netcom.com Sat Sep 24 08:53:28 1994 To: genweb@UCSD.EDU Subject: Re: Suggestion Date: Sat, 24 Sep 1994 08:32:31 TO: Anders Andersson FROM: Bill Minnick SUBJECT : Suggestion (No photos in initial individual/marriage page) OUR CONVERSATION WENT: >Bill Minnick writes in response to Marion M. Fauber: >>I've pretty much come to the same conclusion. I believe that for the >>forseeable future, there should be an option to look at photo and sound >>objects when one has determined that this ia a person of interest to them. >I think you are reinventing a small wheel here. There is such an >option in NCSA Mosaic, i.e. the *client* program run by the user >("Delay Image Loading" in the "Options" menu). There should be in >other clients as well. Lynx doesn't download any inline images at >all. ((----and so on ---)) Anders, I supported Marion's suggestion to avoid embedding photos in the individual/marriage pages, knowing full well all of the technical detail you presented. The issue is more fundamental. Once one turns off the option to receive photos in the page, one has no option to look at the picture by clicking on it (with NCSA Mosaic and probbly others). Recall the page, and the "cached" page always comes back on screen without the image, whenever you recall the page (with NCSA Mosaic and probably others). The normal user sequence will be, "Ah, finally, this is the person I want. Now, let's see the picture!" So the obvious solution is to place an anchor for the "IMAGE" near the top of each page so everyone can receive a copy of the photo when they decide they need it. Same goes for sound, movie or the like. We have to be careful not to put existing features of software ahead of legitimate user needs. Regards, Bill Minnick From UCSD.EDU!list-relay@netcomsv.netcom.com Sat Sep 24 10:34:14 1994 Date: Sat, 24 Sep 1994 10:05:49 -0700 To: genweb@UCSD.EDU Subject: genweb proof of concept In reference to my creating a proof of concept of my "genNet" proposal, I am still in the preliminary stage. I should have my 28.8 kbps dialup slip line in place by the end of the week. Then its a matter of getting my 486 configured to use the slip connection. Then getting the pieces I described installed and setup. I'll keep you posted... -James _______________________________________________________________________________ James Patton Jones email: jjones@nas.nasa.gov Parallel Systems Support phone: 415 604 4369 Computer Sciences Corporation home: 415 571 6762 fax: 415 604 4377 Numerical Aerodynamic Simulation (NAS) Facility, M/S 258-6 NASA Ames Research Center, Moffett Field, California 94035-1000, USA _______________________________________________________________________________ From UCSD.EDU!list-relay@netcomsv.netcom.com Sat Sep 24 10:34:16 1994 Date: Sat, 24 Sep 1994 10:02:37 -0700 To: genweb@UCSD.EDU Subject: re: photos Bill Minnick wrote: >> The issue is more fundamental. Once one turns off the option to >> receive photos in the page, one has no option to look at the picture by >> clicking on it (with NCSA Mosaic and probbly others). Recall the page, and >> the "cached" page always comes back on screen without the image, whenever >> you recall the page (with NCSA Mosaic and probably others). Not necessarily true. The WWW viewer I use (NCSA Mosaic 2.4) allows me to select "Delay Image Loading". This prevents the downloading of any images. If I then want to see the image, I simply click on that image point, and it then retrieves and displays that image only. Work great. I would tend to agree with the idea of NOT writing html pages _specifically_ for slow modems, but I also prefer if a site has lots of images, for the site to allow me to request the images _when I want them_. For me its not an issue, if I'm accessing over a slow modem, then I delay image loading. If not, I don't. -James _______________________________________________________________________________ James Patton Jones email: jjones@nas.nasa.gov Parallel Systems Support phone: 415 604 4369 Computer Sciences Corporation home: 415 571 6762 fax: 415 604 4377 Numerical Aerodynamic Simulation (NAS) Facility, M/S 258-6 NASA Ames Research Center, Moffett Field, California 94035-1000, USA _______________________________________________________________________________ From ucsd.edu!list-relay@netcomsv.netcom.com Sun Sep 25 00:56:32 1994 To: genweb@ucsd.edu Date: Sat, 24 Sep 1994 23:38:13 PDT Subject: Autoload Images Regarding the autoloading of images, I have found that in the Mac version of Mosaic, I can turn off "Autoload Images" and receive text only. In the place of the image appears an icon indicating where an image would show. If I want to see the image, I can click on that icon and Mosaic generates the request to download the image into that location. With this type of client support, I agree that the server should not be "tasked" with the job of withholding data. Remember, we are building a system for the future, where data will flow like water over very high speed links to everyone's personal (wireless) communicator. [Right!] Gary *************************************************************************** *Gary B. Hoffman, Computer/Language Lab Director e-mail: ghoffman@ucsd.edu* *Graduate School of International Relations and Pacific Studies (IR/PS)* *University of California, San Diego (UCSD) voice: (619) 534-7733* *9500 Gilman Dr., La Jolla, CA 92093-0519 USA fax: (619) 534-3939* *************************************************************************** From UCSD.EDU!list-relay@netcomsv.netcom.com Sun Sep 25 00:56:34 1994 To: genweb@UCSD.EDU Date: Sat, 24 Sep 94 12:11:52 EDT Subject: GenEngines It's been quiet on Gen Web, or little mail is reaching me. With the idea of preparing LifeLines for Gen Web experimentation, I have begun making changes to handle the one writer, multiple readers problem. Coming along. I will also add a simpler, message-based interface that will prepare LifeLines for straightforward engine work. I'm leaving for a week's business trip to England tonight, so please excuse tardy responses for a while. Tom Wetmore, ttw@beltway.att.com From UCSD.EDU!list-relay@netcomsv.netcom.com Sun Sep 25 06:34:14 1994 Date: Sun, 25 Sep 1994 09:22:58 -0500 To: genweb@UCSD.EDU Subject: Re: Autoload Images > Remember, we are building a system for the future, where >data will flow like water over very high speed links to everyone's personal >(wireless) communicator. [Right!] Gary brings up a very good point here. While some people may not have the desire or option of dealing with the inline images currently, the odds are that they will in the not too distant future. If you take a look at how modems have changed in the past few years, and then project that a few years ahead, that's the point that we should be looking at here. There are more than enough options out there currently to deal with this without adding something into the service to take care of it. ************************************************************************** Bill Spurlock 5125-C BeverlyGlen Lane shadow@mindspring.com Norcross, GA (404) 368-8884 30092 ************************************************************************** SPURLOCK FAMILY GENEALOGY HOMEPAGE : http://www.mindspring.com/users/shadow/shadow.html ************************************************************************** From vic.cc.purdue.edu!abe@netcomsv.netcom.com Tue Sep 27 14:19:08 1994 To: B.C.Tompsett@computer-science.hull.ac.uk (Brian Tompsett) Cc: Birger.Wathne@vest.sdata.no, birger@sdata.no, frode@ifi.unit.no, ghoffman@ucsd.edu, hstoyan@faui80.informatik.uni-erlangen.de, pakers@netcom.uucp, shadow@mindspring.com Subject: Re: Experimental Genealogical Web data Date: Tue, 27 Sep 94 14:44:18 -0500 Brian, In message <9409271732.AA22701@olympus> you write: > > I have used Vic Abell's code to make a Web page for the Royal Families of >Europe. I had to modify the indexing code to index by Forname rather than >Surname because royals dont have surnames. The result is at > http://www.dcs.hull.ac.uk/public/genealogy/GEDCOM.html >if you wish to browse. > > I'm still not happy with the way the data looks and plan to smarten it up >a bit. The online version is also about 1000 names behind my offline version; >so dont regard the data as "proper". Its more the indexing paradigm and how >easy is it to locate the desired record that interests me. Welcome, and thanks for chiming in to this discussion -- it's been pretty quiet lately. I have looked at your data once, before you changed the indexing scheme, and I agree that what you are now doing makes much more sense for your data base. Have you reach a point where you're satsified enough with the format that I could include your URL in my genealogical page? Vic From UCSD.EDU!list-relay@netcomsv.netcom.com Thu Sep 29 00:19:12 1994 To: genweb@UCSD.EDU Subject: Your Genealogy Data Base Search Concept Date: Wed, 28 Sep 1994 22:11:59 To: Tom Wetmore FROM: Bill Minnick SUBJECT: GenWeb Search Concept I am intrigued by the current activity at U.S. colleges on Robots and Spiders which crawl the web, and index every word of HTML pages on the Web. GenWeb will need a powerful, yet simple to use search capability of finding "a needle in a haystack" as the saying goes. Besides exact searches, we are going to need to be able to specify proximity range of words; date ranges; boolean combinations of dates, places, events, etc; "Fuzzy" matches to accommodate misspelling, possibly even common name substitutes (Bill for William, Merrimack Valley for North Andover, etc). Key question is, does Lifelines address the data base search question??? I don't recall you mentioning this , though you may have. In any case, I'm sure many of us would appreciate you laying out your ideas on the GenWeb search requirements and possible solutions. To stir the pot a little, I'll toss out an idea of what I'd like to see running along side GenWeb. I'd like an option which would be (or appear to be) a search sequence, starting with the precise individual name, date, place name information, and if no match, a progressive combination of "Fuzzy" searches on each word of the sequence. The search would, in effect, automatically, slowly relax its match criteria, until at least one "match" is made and delivered up to the user as a page URL. This option would take the burden of defining search strategy off the user, if he/she chose this option. I know a lot of older folks doing their genealogy that would appreciate this kind of help when the get into GenWeb. I also want to see all past e-mail research letters regarding an individual kept with that individual's data, and indexed by the same "Spider" to help in searching for an individual. After many years the e-mail will leave a rich research history for future generations to work with as new researchers come on line and new sources turn up. For anyone who is not familiar with Web "Spiders", take a look at the following URLs: http://lycos.cs.cmu.edu/cgi-bin/pursuit and http://www.biotech.washington.edu/WebCrawler/WebQuery.html These "Spiders" operate totally independantly of the data they are indexing; and work very well as totally separate functions from th edata base entry and editing process I'd like to hear any and all thoughts on the subject of searching the GenWeb data base. Tom, Hope you have a good trip to Jolly old England. -- -- Regards, Bill Minnick From UCSD.EDU!list-relay@netcomsv.netcom.com Thu Sep 29 03:34:23 1994 Date: Thu, 29 Sep 94 11:17:40 +0100 To: genweb@UCSD.EDU Subject: LifeLines -> WWW gateway I have now completed a new version of my LifeLine to WWW gateway. Remove any pointers you may have to my demo setup, and add a pointer to http://www.vest.sdata.no/skrivervik/employees/birger/genealogy.html This page should be stable, even if pointers to my database are not :) I only have a synchronous 19200 baud line to the internet. Better than nothing, but database access may be slow at times. There are no images in my bases at the moment, so transfer speed should be acceptable. There are still a lot of things to do. I use a modified version of Stark's ged2html, not the version pointed to from my genealogy page. I'll mail all modifications back to Gene Stark when I have finished them. I am currently loading the 'royal92.ged' file into a new LifeLines database, and I'll use that one for demo purposes. The demo link from my genealogy page may not be stable, so please try again later if it fails. But you should always be able to reach the genealogy page unless my server crashes.... I'm planning some changes to the URL's to access the base, so I would advise you to not save any links directly into the base. I have to encode the database name into the URL, as I want to use the same gateway to access several bases. I will also have to find a way to build index pages. ged2html can do it, but doing it directly from LifeLines could perhaps speed up the process? I'll be looking into it. Birger From UCSD.EDU!list-relay@netcomsv.netcom.com Thu Sep 29 08:37:49 1994 To: genweb@UCSD.EDU Subject: Re: LifeLines -> WWW gateway CONGRATULATIONS! Date: Thu, 29 Sep 1994 07:29:43 TO: Birger Wathne FROM: Bill Minnick SUBJECT: Congratulations! >I have now completed a new version of my LifeLine to WWW gateway. Your demo works GREAT! This ia an exciting moment. >I will also have to find a way to build index pages. ged2html can do it, >but doing it directly from LifeLines could perhaps speed up the process? >I'll be looking into it. Have you thought about asking the "Spider" at URL: http://www.biotech.washington.edu/WebCrawler/WebQuery.html or http://lycos.cs.cmu.edu/cgi-bin/pursuit to do the indexing for you? Then you can just tell people to look up any royalty via that URL. You would have no more work to do if the Spider does the indexing. -- -- Great work, Bill Minnick, Cupertino, CA From UCSD.EDU!list-relay@netcomsv.netcom.com Thu Sep 29 09:13:18 1994 To: Birger.Wathne@vest.sdata.no (Birger A. Wathne) Cc: genweb@UCSD.EDU, cwg@mcc.com Subject: Re: LifeLines -> WWW gateway Date: Thu, 29 Sep 1994 10:54:36 -0500 From: Chris Garrigues In message <9409291017.AA13576@sdvest>, Birger A. Wathne typed: > > I have now completed a new version of my LifeLine to WWW gateway. I like it. In your page, you talk about wanting "one host on the internet with good connectivity, and guaranteed lasting dedication to genealogy". I might be able to provide this soon. My home system is now on the internet over a bonded BRI line giving me a 128kb connection. I'm getting it registered in the DNS as deepeddy.com. I don't yet have an http server running on it, but I certainly will. It's a lower-end Sun (an IPX), but I don't really intend for it to be used as much other than a permanent internet address. (I do my own hacking, programming, writing, and netsurfing from my Mac which is on the same ISDN link.) Once I have the setup fully registered and have a working http server (and am doing regular backups :-{), we should talk again about this possibility. I expect deepeddy.com to exist permanently, and since I own it and it's in a house that I own, it can have a lasting dedication to genealogy. (I'd probably create an alias for genealogy.deepeddy.com pointing to the same machine, so it could even move to it's own system if need be.) Chris Chris Garrigues (MIME capable) cwg@mcc.com Microelectronics and Computer Technology Corporation +1 512 338 3328 3500 West Balcones Center Fax +1 512 338 3838 Austin, TX 78759-5398 USA From UCSD.EDU!list-relay@netcomsv.netcom.com Sun Oct 2 15:54:28 1994 To: genweb@UCSD.EDU Date: Sun, 2 Oct 94 18:14:44 EDT Subject: Re: Your Genealogy Data Base Search Concept TO: Bill Minnick FROM: Tom Wetmore SUBJECT: Re: GenWeb Search Concept >GenWeb will need a powerful ... search capability of finding "a needle in >a haystack" ... we are going to need to be able to specify proximity range >of words; date ranges; boolean combinations of dates, places, events, >etc; "Fuzzy" matches to accommodate misspelling, possibly even common >name substitutes ... does Lifelines address the data base search question??? There are two search targets: a GEDCOM record while still "inside" a database, LifeLines or otherwise; and an HTML file about a person. I assume that spiders and robots search HTML files only. So an important issue seems to be whether HTML files are to be kept available at all times, or only on demand when a user is browsing through the family. To LifeLines quickly. At present the basic LifeLines system indexes persons only by name. However, report programs can be written that can generate, as one or more files, any kind of index of a database you could want. These index files could then be searched, rather than the more costly searches of a database, for specific records, which could then be HTML'ized on the spot and presented. >... I'd like an option which would be ... a search sequence, starting with >the precise individual name, date, place name information, and if no >match, a progressive combination of "Fuzzy" searches on each word of the >sequence. The search would ... slowly relax its match criteria ... This >... would take the burden of defining search strategy off the user ... Good idea. This could all be done based on index files generated by report programs off a database. >Tom, Hope you have a good trip to Jolly old England. Bill, Thanks. Got back yesterday. Returned to Avebury (largest henge in England), Stonehenge, and say the Roman baths in Bath. Tom Wetmore, ttw@beltway.att.com From UCSD.EDU!list-relay@netcomsv.netcom.com Sun Oct 2 17:33:24 1994 To: genweb@UCSD.EDU Subject: Re: Genealogy Data Base Search Concept Date: Sun, 2 Oct 1994 17:15:28 >Subject: Re: My Genealogy Data Base Search Concept >FROM: Bill Minnick >TO: Tom Wetmore & Birger Wathne >SUBJECT: Re: GenWeb Search Concept Glad you've returned from England safely. I have the following comment on your response (below) I WROTE: >>GenWeb will need a powerful ... search capability of finding "a needle in >>a haystack" ... we are going to need to be able to specify proximity range >>of words; date ranges; boolean combinations of dates, places, events, >>etc; "Fuzzy" matches to accommodate misspelling, possibly even common >>name substitutes ... does Lifelines address the data base search question??? YOU WROTE: >There are two search targets: a GEDCOM record while still "inside" a >database, LifeLines or otherwise; and an HTML file about a person. I >assume that spiders and robots search HTML files only. So an important >issue seems to be whether HTML files are to be kept available at all times, >or only on demand when a user is browsing through the family. Correct me if I'm wrong, but wouldn't the "Spider" issue one URL at a time which would cause the GEDCOM-to-HTML converter to deliver up one HTML page at a time, which theSpider would sequentially analyze and index every word it saw?? . In other words, I think the spider will work with the generated HTML pages just as you or I would. Though this may seem inefficient, it would be an easy way to take advantage of existing, parallel technology developments. I'd like to hear your thoughts on this. I would also ask Birger Wathne for his thoughtson how a "Spider" utility might "see" the genealogy material in a lifelines data base. Would it not appear as generated HTML pages to the Spider? I'll ask one of the "Spider" development teams to index Birger's Royal line to see what happens. TOM: By the way have you checked out Birger's implementation of the Royal genealogy line at URL: http://www.vest.sdata.no/skrivervik/employees/birger/genealogy.html We await your comment on this breakthrough. Regards, Bill Minnick From UCSD.EDU!list-relay@netcomsv.netcom.com Sun Oct 2 17:53:15 1994 To: Brian Pinkerton Subject: Re: Genealogy Follow-up Date: Sun, 2 Oct 1994 17:44:08 Cc: genweb@UCSD.EDU TO: Brian Pinkerton FROM: Bill Minnick SUBJECT: Spider Indexing of Demo Royalty Data Base Have a favor to ask. Can you direct the "Spider" to index the HTML pages which begin at URL: http://www.vest.sdata.no/skrivervik/employees/birger/genealogy.html Will the Spider seek out and index URL's included on the this first page? And if each page points to other URLs, can you direct the spider to track down all the referenced URLs? At the above URL, we have a genealogy of Royalty lines appearing as HTML pages to WWW users. We want to test the concept of Spider-indexing of the Royal line. Are you game? Of course our success depends on whether you index only the first (home) page, or all embedded URL references within pages. Looking forward to your reply (which will automatically go to our GenWEB mail list) . Appreciate your support to this experiment. -- Bill Minnick, GenWEB project team member. From UCSD.EDU!list-relay@netcomsv.netcom.com Mon Oct 3 07:19:26 1994 To: genweb@UCSD.EDU Date: Mon, 3 Oct 94 09:37:17 EDT Subject: Re: Genealogy Data Base Bill Minnick (>): >... wouldn't the "Spider" issue one URL at a time which would cause the >GEDCOM-to-HTML converter to deliver up one HTML page at a time, which the >Spider would ... index ... ? ... I think the spider will work with the >generated HTML pages just as you or I would. Excuse the dark ages response, but I don't know what a URL is. From context I am guessing that a URL allows operations to be performed at a site that equate to operations that humans could perform. If this is the case, then a LifeLines database could be used to automatically and sequentially generate an HTML file for each member of the database. The file could then be indexed, presumably, and then either retained or removed. I am assuming that in large database one would not keep around thousands to hundreds of thousands of HTML files. >... though this may seem inefficient, it would be an easy way to take >advantage of existing, parallel technology developments. For something like this, efficiency does not seem a major issue! Tom Wetmore, ttw@beltway.att.com From UCSD.EDU!list-relay@netcomsv.netcom.com Mon Oct 3 13:38:16 1994 Date: Mon, 3 Oct 1994 12:10:54 -0400 To: genweb@UCSD.EDU Subject: Re: Genealogy Data Base (INDEXING Question) Bill Minnick writes: >I've asked Brian Pinkerton, who manages the "Spider" (URL: >http://www.biotech.washington.edu/WebCrawler/WebQuery.html ) >to attempt a Spider indexing of the Royalty line just put on the WEB by >Birger Wathne. Let's see how that turns out. I missed some of this "Spider" discussion due to a mail outage, but it strikes me that the usefulness of a Spider would be to form a unified index of data residing on a number of different hosts. It would generally be pretty inefficient to index data residing on one single host, since that could be done very quickly by whatever database software is maintaining the data on that host. For example, my ged2html program, which Birger is using for the demo, can read and index the 30,000 line, 3000 individual royal92.ged file in about 15 seconds. Or, I am sure that the LifeLines program he is also using could do this as well. Any sort of "Spider" program that tries to follow all the links and generate and index will take much longer than this (I would estimate on the order of hours) and generate a great deal of load on the network subsystem of the host in the meantime. - Gene Stark From UCSD.EDU!list-relay@netcomsv.netcom.com Mon Oct 3 13:38:20 1994 To: genweb@UCSD.EDU Subject: Re: Genealogy Data Base (INDEXING Question) Date: Mon, 3 Oct 1994 07:25:09 TO: TOM WETMORE FROM: BILL MINNICK SUBJECT INDEXING >Bill Minnick (>): >>... wouldn't the "Spider" issue one URL at a time which would cause the >>GEDCOM-to-HTML converter to deliver up one HTML page at a time, which the >>Spider would ... index ... ? ... I think the spider will work with the >>generated HTML pages just as you or I would. >Tom Wetmore (>): >Excuse the dark ages response, but I don't know what a URL is. From >context I am guessing that a URL allows operations to be performed at a >site that equate to operations that humans could perform. If this is the >case, then a LifeLines database could be used to automatically and >sequentially generate an HTML file for each member of the database. The >file could then be indexed, presumably, and then either retained or >removed. I am assuming that in large database one would not keep around >thousands to hundreds of thousands of HTML files. As I refer to it, the URL (Universal Resource Locator) is simply the address of the HTML page on the World Wide Web (www). It would normally point uniquely to the computer, the directory, and the HTML file being requested. My assumption is that a GenWEB computer, using Lifelines and a GEDCOM to HTML converter, receives one of its URLs, it will generate the HTML page instead of delivering up a fixed file from its memory. I've asked Brian Pinkerton, who manages the "Spider" (URL: http://www.biotech.washington.edu/WebCrawler/WebQuery.html ) to attempt a Spider indexing of the Royalty line just put on the WEB by Birger Wathne. Let's see how that turns out. I invite a more exact definition definition of the URL in seasoned Internet lingo from anyone who can expand on this. We are all at some stage learning; a key value of the GenWEB mail list is to help get these questions answered. -- -- Regards, Bill Minnick From UCSD.EDU!list-relay@netcomsv.netcom.com Mon Oct 3 13:38:23 1994 To: starkhome!gene@sbstark.cs.sunysb.edu (Gene Stark) Cc: genweb@UCSD.EDU, cwg@mcc.com Subject: Re: Genealogy Data Base (INDEXING Question) Date: Mon, 03 Oct 1994 13:22:12 -0500 In message <199410031610.MAA01325@starkhome.cs.sunysb.edu>, Gene Stark typed: > For example, my ged2html program, which Birger is using for the demo, > can read and index the 30,000 line, 3000 individual royal92.ged file in > about 15 seconds. Or, I am sure that the LifeLines program he is also using > could do this as well. Any sort of "Spider" program that tries to > follow all the links and generate and index will take much longer than this > (I would estimate on the order of hours) and generate a great deal of load > on the network subsystem of the host in the meantime. I would think that a more useful approach would be to define an index for each server (or at least each genweb tree; possibly several per server) and then have tools which collect together these indexes into a master index. If the index is a simple format such as a text file containing keys and URLs, ths tool to merge wouldn't have to be much more than a concatination of these files. However, I suspect the real answer for how to do this right would be to use WAIS. WAIS (Wide Area Information Server) is an indexing tool which is often used in the back end for web servers. It can certainly index items located at a variety of different sites by a variety of different indices. Unfortunately, I don't know much more about just how it works. Hopefully, someone else on this list knows more than I do; otherwise we've got some reading to do. Using the search facility at http://galaxy.einet.net/galaxy.html (which uses WAIS as a backend, I find more references than I'm willing to deal with right now. If nobody else on this list knows more than I do, I'll do some research. Chris Chris Garrigues (MIME capable) cwg@mcc.com Microelectronics and Computer Technology Corporation +1 512 338 3328 3500 West Balcones Center Fax +1 512 338 3838 Austin, TX 78759-5398 USA From UCSD.EDU!list-relay@netcomsv.netcom.com Mon Oct 3 13:38:25 1994 Date: Mon, 3 Oct 94 18:44:21 +0100 To: genweb@UCSD.EDU, svpafug@rahul.net Subject: Re: Genealogy Data Base Search Concept A spider traversing the database would indeed have to open each URL, wait for the database report, and the HTML conversion. This would be very inefficient, but a very simple solution. But what if I change my database? I would have to remember to tell the spider which URL's have become invalidated, and ask it to redo at least part of the indexing. Would I remember to do so? Perhaps not.... I would rather believe in some kind of search/index mechanism giving a possibility to search on a fixed set of tags. I guess people will need to be able to search based on: Names Locations Time periods Parents Spouse(s) Children Any other vital tags? Of course, these search mechanisms should be able to perform some kind of 'fuzzy' search as well. The next question is: Should searching be done through a central server, redistributing the search to known databases? Birger From UCSD.EDU!list-relay@netcomsv.netcom.com Mon Oct 3 13:38:31 1994 To: svpafug@rahul.net (Bill Minnick) Cc: genweb@UCSD.EDU, cwg@mcc.com Subject: Re: Genealogy Data Base (INDEXING Question) Date: Mon, 03 Oct 1994 13:08:39 -0500 In message , Bill Minnick typed: > I invite a more exact definition definition of the URL in seasoned Internet > lingo from anyone who can expand on this. We are all at some stage learning + ; > a key value of the GenWEB mail list is to help get these questions answered. A URL (Universal Resource Locator) is a string which may be used to describe a resource available on the internet no matter what access method is required to access the resource. It was invented for use in WWW, but is being used elsewhere now because it's a useful concept. The format is an acces method followed by a colon followed by an accessor string. The accessor string is a sequence of tags separated by slashes. The semantic meaning of the tags is defined on a per access method basis. Examples: An html document available via http: http://www.biotech.washington.edu/WebCrawl er/WebQuery.html A binhexed file available via anonymous FTP: ftp://ftp-boi.external.hp.com/pub/ printers/djet_pjet_dwriter/mac/dwgx10.hqx etc. There are also URLs defined for gopher and a few other protocols as well. Chris Chris Garrigues (MIME capable) cwg@mcc.com Microelectronics and Computer Technology Corporation +1 512 338 3328 3500 West Balcones Center Fax +1 512 338 3838 Austin, TX 78759-5398 USA