The GenWeb list grew rapidly and carried over 120 messages in October,
1994. This is part I of the October 1994 archive file.


          Sunday, October 2, 1994 3:17:23 PM
          GenWeb Item
  From:           T.T.Wetmore,ttw@beltway.att.com,Internet
  Subject:        Re:  Your Genealogy Data Base Search Concept
  To:             GenWeb
TO: Bill Minnick
FROM:  Tom Wetmore
SUBJECT:  Re: GenWeb Search Concept

>GenWeb will need a powerful ... search capability of finding "a needle in
>a haystack" ... we are going to need to be able to specify proximity range
>of words; date ranges; boolean combinations of dates, places, events,
>etc; "Fuzzy" matches to accommodate misspelling, possibly even common
>name substitutes ... does Lifelines address the data base search question???

There are two search targets: a GEDCOM record while still "inside" a
database, LifeLines or otherwise; and an HTML file about a person.  I
assume that spiders and robots search HTML files only.  So an important
issue seems to be whether HTML files are to be kept available at all times,
or only on demand when a user is browsing through the family.

To LifeLines quickly.  At present the basic LifeLines system indexes
persons only by name.  However, report programs can be written that can
generate, as one or more files, any kind of index of a database you could
want.  These index files could then be searched, rather than the more
costly searches of a database, for specific records, which could then be
HTML'ized on the spot and presented.

>... I'd like an option which would be ... a search sequence, starting with
>the precise individual name, date, place name information, and if no
>match, a progressive combination of "Fuzzy" searches on each word of the
>sequence.  The search would ... slowly relax its match criteria ... This
>... would take the burden of defining search strategy off the user ...  

Good idea.  This could all be done based on index files generated by report
programs off a database.

>Tom,  Hope you have a good trip to Jolly old England.

Bill, Thanks.  Got back yesterday.  Returned to Avebury (largest henge in
England), Stonehenge, and say the Roman baths in Bath.

Tom Wetmore, ttw@beltway.att.com



          Sunday, October 2, 1994 5:22:59 PM
          GenWeb Item
  From:           Bill Minnick,svpafug@rahul.net,Internet
  Subject:        Re:  Genealogy Data Base Search Concept
  To:             GenWeb
>Subject: Re:  My Genealogy Data Base Search Concept

>FROM: Bill Minnick
>TO:  Tom Wetmore & Birger Wathne
>SUBJECT:  Re: GenWeb Search Concept

Glad you've returned from England safely.  I have the following comment on 
your response (below)

I WROTE:
>>GenWeb will need a powerful ... search capability of finding "a needle in
>>a haystack" ... we are going to need to be able to specify proximity range
>>of words; date ranges; boolean combinations of dates, places, events,
>>etc; "Fuzzy" matches to accommodate misspelling, possibly even common
>>name substitutes ... does Lifelines address the data base search question???
YOU WROTE:
>There are two search targets: a GEDCOM record while still "inside" a
>database, LifeLines or otherwise; and an HTML file about a person.  I
>assume that spiders and robots search HTML files only.  So an important
>issue seems to be whether HTML files are to be kept available at all times,
>or only on demand when a user is browsing through the family.

Correct me if I'm wrong, but wouldn't the "Spider" issue one URL at a time 
which would cause the GEDCOM-to-HTML converter to deliver up one HTML page 
at a time, which theSpider would sequentially analyze and index every word it 
saw?? .  In other words, I think the 
spider will work with the generated HTML pages just as you or I would.  Though 
this may seem inefficient, it would be an easy way to take advantage of 
existing, parallel technology developments.  I'd like to hear your thoughts on 
this.

I would also ask  Birger Wathne for his thoughtson how a "Spider" utility 
might "see" the genealogy material in a lifelines data base.  Would it not 
appear as generated HTML pages to the Spider?

I'll ask one of the "Spider" development teams to index Birger's Royal line to 
see what happens. 

TOM: 
By the way have you checked out Birger's implementation of the Royal 
genealogy line at URL:   
http://www.vest.sdata.no/skrivervik/employees/birger/genealogy.html

We await your comment on this breakthrough.   Regards,  Bill Minnick


          Sunday, October 2, 1994 5:51:49 PM
          GenWeb Item
  From:           Bill Minnick,svpafug@rahul.net,Internet
  Subject:        Re: Genealogy Follow-up
  To:             GenWeb
TO:  Brian Pinkerton
FROM: Bill Minnick
SUBJECT:  Spider Indexing of Demo Royalty Data Base

Have a favor to ask.  Can you direct the "Spider" to index the HTML pages 
which begin at URL: 

http://www.vest.sdata.no/skrivervik/employees/birger/genealogy.html

Will the Spider seek out and index URL's included on the this first page?  And 
if each page points to other URLs, can you direct the spider to track down all 
the referenced URLs?  

At the above URL, we have a genealogy of Royalty lines appearing as HTML pages 
to WWW users.  We want to test the concept of Spider-indexing of the Royal 
line. Are you game?   

Of course our success depends on whether you index only the first (home) page, 
or all embedded URL references within  pages.  Looking forward to your reply 
(which will automatically go to our GenWEB mail list) .   Appreciate your 
support to this experiment.   -- Bill Minnick, GenWEB project team member. 



          Monday, October 3, 1994 6:41:42 AM
          GenWeb Item
  From:           T.T.Wetmore,ttw@beltway.att.com,Internet
  Subject:        Re:  Genealogy Data Base
  To:             GenWeb
Bill Minnick (>):

>... wouldn't the "Spider" issue one URL at a time which would cause the
>GEDCOM-to-HTML converter to deliver up one HTML page at a time, which the
>Spider would ... index ... ? ... I think the spider will work with the
>generated HTML pages just as you or I would.

Excuse the dark ages response, but I don't know what a URL is.  From
context I am guessing that a URL allows operations to be performed at a
site that equate to operations that humans could perform.  If this is the
case, then a LifeLines database could be used to automatically and
sequentially generate an HTML file for each member of the database.  The
file could then be indexed, presumably, and then either retained or
removed.  I am assuming that in large database one would not keep around
thousands to hundreds of thousands of HTML files.

>... though this may seem inefficient, it would be an easy way to take
>advantage of existing, parallel technology developments.

For something like this, efficiency does not seem a major issue!

Tom Wetmore, ttw@beltway.att.com



          Monday, October 3, 1994 7:33:52 AM
          GenWeb Item
  From:           Bill Minnick,svpafug@rahul.net,Internet
  Subject:        Re:  Genealogy Data Base  (INDEXING Question)
  To:             GenWeb
TO: TOM WETMORE
FROM: BILL MINNICK
SUBJECT INDEXING

>Bill Minnick (>):
>>... wouldn't the "Spider" issue one URL at a time which would cause the
>>GEDCOM-to-HTML converter to deliver up one HTML page at a time, which the
>>Spider would ... index ... ? ... I think the spider will work with the
>>generated HTML pages just as you or I would.

>Tom Wetmore (>):
>Excuse the dark ages response, but I don't know what a URL is.  From
>context I am guessing that a URL allows operations to be performed at a
>site that equate to operations that humans could perform.  If this is the
>case, then a LifeLines database could be used to automatically and
>sequentially generate an HTML file for each member of the database.  The
>file could then be indexed, presumably, and then either retained or
>removed.  I am assuming that in large database one would not keep around
>thousands to hundreds of thousands of HTML files.

As I refer to it, the URL (Universal Resource Locator) is simply the address 
of the HTML page on the World Wide Web (www).  It would normally point 
uniquely to the computer, the directory, and  the HTML file being requested.  
My assumption is that a GenWEB computer, using Lifelines and a GEDCOM to HTML 
converter, receives one of its URLs, it will generate the HTML page instead of 
delivering up a fixed file from its memory. 

I've asked Brian Pinkerton, who manages the "Spider" (URL:  
http://www.biotech.washington.edu/WebCrawler/WebQuery.html   )
to attempt a Spider  indexing  of the Royalty line just put on the WEB by 
Birger Wathne.  Let's see how that turns out.

I invite a more exact definition definition of the URL in seasoned Internet 
lingo from anyone who can expand on this.   We are all at some stage learning; 
a key value of the GenWEB mail list is to help get these questions answered.
-- -- Regards,  Bill Minnick 



          Monday, October 3, 1994 9:45:12 AM
          GenWeb Item
  From:           Gene Stark,starkhome!gene@sbstark.cs.sunysb.edu,Internet
  Subject:        Re:  Genealogy Data Base  (INDEXING Question)
  To:             GenWeb
Bill Minnick writes:

>I've asked Brian Pinkerton, who manages the "Spider" (URL:  
>http://www.biotech.washington.edu/WebCrawler/WebQuery.html   )
>to attempt a Spider  indexing  of the Royalty line just put on the WEB by 
>Birger Wathne.  Let's see how that turns out.

I missed some of this "Spider" discussion due to a mail outage, but
it strikes me that the usefulness of a Spider would be to form a unified
index of data residing on a number of different hosts.  It would generally
be pretty inefficient to index data residing on one single host, since that
could be done very quickly by whatever database software is maintaining
the data on that host.

For example, my ged2html program, which Birger is using for the demo,
can read and index the 30,000 line, 3000 individual royal92.ged file in
about 15 seconds.  Or, I am sure that the LifeLines program he is also using
could do this as well.  Any sort of "Spider" program that tries to
follow all the links and generate and index will take much longer than this
(I would estimate on the order of hours) and generate a great deal of load
on the network subsystem of the host in the meantime.

							- Gene Stark


          Monday, October 3, 1994 10:49:05 AM
          GenWeb Item
  From:           Birger A. Wathne,Birger.Wathne@vest.sdata.no,Internet
  Subject:        Re:  Genealogy Data Base Search Concept
  To:             GenWeb

A spider traversing the database would indeed have to open each URL,
wait for the database report, and the HTML conversion. This would be very
inefficient, but a very simple solution. But what if I change my
database? I would have to remember to tell the spider which URL's
have become invalidated, and ask it to redo at least part of the indexing.
Would I remember to do so? Perhaps not....

I would rather believe in some kind of search/index mechanism
giving a possibility to search on a fixed set of tags.

I guess people will need to be able to search based on:
Names
Locations
Time periods
Parents
Spouse(s)
Children

Any other vital tags? Of course, these search mechanisms should be able
to perform some kind of 'fuzzy' search as well.

The next question is: Should searching be done through a central
server, redistributing the search to known databases?


Birger



          Monday, October 3, 1994 11:13:08 AM
          GenWeb Item
  From:           Chris Garrigues,cwg@mcc.com,Internet
  Subject:        Re: Genealogy Data Base (INDEXING Question)
  To:             GenWeb
In message , Bill Minnick typed:
> I invite a more exact definition definition of the URL in seasoned Internet 
> lingo from anyone who can expand on this.   We are all at some stage learning
+ ; 
> a key value of the GenWEB mail list is to help get these questions answered.

A URL (Universal Resource Locator) is a string which may be used to describe a 
resource available on the internet no matter what access method is required to 
access the resource.  It was invented for use in WWW, but is being used 
elsewhere now because it's a useful concept.

The format is an acces method followed by a colon followed by an accessor 
string.  The accessor string is a sequence of tags separated by slashes.  The 
semantic meaning of the tags is defined on a per access method basis.

Examples:

An html document available via http: http://www.biotech.washington.edu/WebCrawl
er/WebQuery.html

A binhexed file available via anonymous FTP: ftp://ftp-boi.external.hp.com/pub/
printers/djet_pjet_dwriter/mac/dwgx10.hqx

etc.

There are also URLs defined for gopher and a few other protocols as well.

Chris


Chris Garrigues                              (MIME capable) cwg@mcc.com
Microelectronics and Computer Technology Corporation    +1 512 338 3328
3500 West Balcones Center	                    Fax +1 512 338 3838
Austin, TX  78759-5398          USA




          Monday, October 3, 1994 11:28:38 AM
          GenWeb Item
  From:           Chris Garrigues,cwg@mcc.com,Internet
  Subject:        Re: Genealogy Data Base (INDEXING Question)
  To:             GenWeb
In message <199410031610.MAA01325@starkhome.cs.sunysb.edu>, Gene Stark typed:
> For example, my ged2html program, which Birger is using for the demo,
> can read and index the 30,000 line, 3000 individual royal92.ged file in
> about 15 seconds.  Or, I am sure that the LifeLines program he is also using
> could do this as well.  Any sort of "Spider" program that tries to
> follow all the links and generate and index will take much longer than this
> (I would estimate on the order of hours) and generate a great deal of load
> on the network subsystem of the host in the meantime.

I would think that a more useful approach would be to define an index for each 
server (or at least each genweb tree; possibly several per server) and then 
have tools which collect together these indexes into a master index.

If the index is a simple format such as a text file containing keys and URLs, 
ths tool to merge wouldn't have to be much more than a concatination of these 
files.

However, I suspect the real answer for how to do this right would be to use 
WAIS.

WAIS (Wide Area Information Server) is an indexing tool which is often used in 
the back end for web servers.  It can certainly index items located at a 
variety of different sites by a variety of different indices.

Unfortunately, I don't know much more about just how it works.  Hopefully, 
someone else on this list knows more than I do; otherwise we've got some 
reading to do.

Using the search facility at http://galaxy.einet.net/galaxy.html (which uses 
WAIS as a backend, I find more references than I'm willing to deal with right 
now.

If nobody else on this list knows more than I do, I'll do some research.

Chris



Chris Garrigues                              (MIME capable) cwg@mcc.com
Microelectronics and Computer Technology Corporation    +1 512 338 3328
3500 West Balcones Center	                    Fax +1 512 338 3838
Austin, TX  78759-5398          USA




          Monday, October 3, 1994 9:04:50 PM
          GenWeb Item
  From:           Gary Hoffman,ghoffman@ucsd.edu,Internet
  Subject:        Fwd: WWW for Genealogists
  To:             GenWeb
Forwarded with permission from sender, 
Hylton Tuckett,HYLTON@csc.cit.ac.nz

27 September 1994 (NZST)

Hi Gary,
Some time ago I read of your GenWeb proposal with some interest. I am 
the Director of the Computing Services Centre of the Central 
Institute of Technology, a leading tertiary education provider in New 
Zealand. I am also a genealogist and have been heavily involved in 
genealogy since the late 1970s. In 1983 I founded the Hutt Valley 
Branch of the New Zealand Society of Genealogists (NZSG) and was the 
Chairman for 4 years. In 1986 I was elected to the national Council 
of the NZSG and then became firstly the Vice-President and then the 
President for the next 2 years of the Genealogical Research Institute 
of New Zealand (GRINZ). In 1993 I published my own family history 
book which won the Kevin McAnulty Award for the best family history 
book published in New Zealand. 

I have had published quite a number of papers on genealogy and 
computing including my most recent one "WWW for Genealogists". The 
potential of the WWW for genealogy is quite clear and my paper went 
through the rudiments of the Web and ended up with an outline of your 
proposal. At the time of writing I did not have my own developed Web 
server but over recent weeks have set up a server running on a 
Windows NT box. The server has been set up for development and use in 
the educational sector but I am going to put together some personal 
genealogical pages. When I get something together I will email you. 
My Web server is http://cscnt.cit.ac.nz and my email address is 
hylton@csc.cit.ac.nz. I have had people all over the world connect 
onto my server and I have also connected onto your site so the 
cyberspace really is out there. 

At the end of October I am putting on an afternoon seminar for the 
Genealogical Computing Group of the NZSG and next May I am presenting 
2 papers to the annual conference of the NZSG - subject in all cases 
is the  WWW for genealogists. I have not come across any one else 
developing in this area yet but like the WWW itself I expect to see 
more people get on board. 

Constructing pages with hyperlinks is now straight forward (I use 
"ant", a program that works pretty well with Word for Windows v6 as 
it has all the necessary wizards etc to insert html constructs in to 
your text files).

The only difficulty I have with your proposal is the placement/access 
 of personal information servers in the web. If personal WWW servers 
can be put into the web then there are all sorts of possibilities but 
at present the web is held together mostly by educational and  
commercial sites. I know that in New Zealand there are, currently, no 
personal web sites. My, web site, for example, while it may put up 
personal/genealogical pages is still essentially an educational site. 
What are your thoughts on personal web servers?

I deliberately chose the Windows NT platform to experiment with for 
this reason as I believe that this box/architecture will be the 
simplest and most cost effective web server  option for 
individuals/genealogical societies etc. 

This is enough from me as a first up communication and I am very 
interested to hear back from you.

Regards
Hylton Tuckett 




          Monday, October 3, 1994 9:04:52 PM
          GenWeb Item
  From:           Gary Hoffman,ghoffman@ucsd.edu,Internet
  Subject:        Fwd: GenWeb
  To:             GenWeb
Forwarded with permission from
Hylton Tuckett,HYLTON@csc.cit.ac.nz


Thanks for your prompt reply.
Yes, I would like to be put on to the Genweb mailing 
list and yes you may forward my message to other members on the list.
I have just returned from a "Technology in Education" conference 
which had some very interesting papers presented - the speakers had 
all learnt Powerpoint!

Of particular interest were ways of getting the 
necessary bandwidth to the home eg fibre cable and dishes(a USA 
solution for schools etc we were told).

With the increase in graphics, sound, video etc the bandwidth becomes 
a major issue. At our site I manage a dedicated 48Kb link to our 
nearest Internet hub which then joins a 512Kb frame relay connection 
to the world. At home a 14.4Kb modem and at the moment a straight 
connection, via PPP, to one of my work UNIX hosts. Next step is to 
develop a Web connection from home with Mosaic and RAS to my work NT 
server. 

There are about 6 commercial (non-educational) Internet hubs in New 
Zealand and they are expensive to use - all NZ sites charge per byte 
of IP traffic which currently means that publicly available resources 
such as WWW, ftp etc are limited.

Hylton



          Tuesday, October 4, 1994 7:36:08 AM
          GenWeb Item
  From:           Bill Minnick,svpafug@rahul.net,Internet
  Subject:        Re: LifeLines -> WWW gateway
  To:             GenWeb
TO: CHRIS GARRIGUES
FROM: BILL MINNICK

In article  Chris Garrigues  writes:
>In your page, you talk about wanting "one host on the internet with good 
>connectivity, and guaranteed lasting dedication to genealogy".

>I might be able to provide this soon.

>My home system is now on the internet over a bonded BRI line giving me a 128kb 
>connection.  I'm getting it registered in the DNS as deepeddy.com.  I don't 
>yet have an http server running on it, but I certainly will.  It's a lower-end 
>Sun (an IPX), but I don't really intend for it to be used as much other than a 
>permanent internet address.  (I do my own hacking, programming, writing, and 
>netsurfing from my Mac which is on the same ISDN link.)

>Once I have the setup fully registered and have a working http server (and am 
>doing regular backups :-{), we should talk again about this possibility.  I 
>expect deepeddy.com to exist permanently, and since I own it and it's in a 
>house that I own, it can have a lasting dedication to genealogy.  (I'd 
>probably create an alias for genealogy.deepeddy.com pointing to the same 
>machine, so it could even move to it's own system if need be.)

I havn't seen any responses to this item you wrote last week.  This is a 
generous offer.  I think your offer will be acted on as soon as we get a few 
pilot nodes working well enough to start really building up GenWEB data bases.

James Jones and I are working to put up an Alpha Test system using the 6700 
person Richard Austin data base from the Austin Families Association of 
America.   This data base offers very complete Source Citations, timelines and 
biographies as well as over 100 early photos. some going back to 1848.  The 
photos are all scanned and await a scheme to incorporate them in Lifelines.
We'll be using  a 486 system James Jones is putting together here in Silicon 
Valley.  We hope to enlist the experience of Birger Wathne and Tom Wetmore as 
necessary to get on the air in the next few weeks with the Richard Austin 
(1638, Charlestown, MA; English emigrant) data base.   As we progress, we 
hope this experiment will spill over onto your equipment.  

Would appreciate your thoughts on this.   (and thanks for your recent 
explanation of URLs)    -- -- Regards,  Bill Minnick, Cupertino, CA



          Tuesday, October 4, 1994 8:45:06 PM
          GenWeb Item
  From:           T.T.Wetmore,ttw@beltway.att.com,Internet
  Subject:        Changing Rules for LifeLines Database Access
  To:             GenWeb
CONCURRENT DATABASE ACCESS IN LIFELINES -- 4 OCT 1994
=====================================================
LifeLines was written with the assumption that no database would be open
by more than one LifeLines user at a time.  Thus multiple accesses to the
same database were not checked, and a few experimenting users have
discovered that ugly things can happen when multiple accesses to LifeLines
databases are tried.

Responding to requests from a few LL users, and in preparing LL for
experimental use as a GenWeb engine, I have tightened up LL behavior in the
face of multiple database requests.  Here is a summary of the new rules.

A LifeLines database can now be open by one "writer" program or by any
number of "reader" programs, but never by reader and writer programs
simultaneously.

By default LifeLines attempts to open a database for writing.  If the
database is already open (for either reading or writing), the database
cannot be opened for writing so LifeLines will automatically "revert" to
trying to open the database for reading.  If the database cannot be open
for reading either, LifeLines quits and tells you why.

There is a new LifeLines command line option, -r, to indicate that the
database should only be open for read access.  If the database cannot be
open for reading, LifeLines quits and tells you why.

If a LifeLines database is open for read access only, the main screen will
place the notation "(readonly)" after the name of the database.

Implementation note, for those interested.  Every LL database contains a
file named "key."  Until now this file has held two entries, the key of the
master index in the database, and the key of the next index or data block
to be created in the database.  I have added a third entry, an integer.
This integer holds the "open status" of the database.  A value of 0 means
the database is currently unopen.  A positive value of n indicates that the
database is open for reading by n programs.  A negative value indicates
that the database is open for writing.  Because the format of the key file
has changed, there is an incompatibility between databases created before
and after these changes.

Tom Wetmore, ttw@beltway.att.com



          Saturday, October 8, 1994 11:49:39 PM
          GenWeb Item
  From:           Bill Minnick,svpafug@rahul.net,Internet
  Subject:        Re:  Genealogy Data Base  (INDEXING Question)
  To:             GenWeb
TO: GENE STARK
FROM: BILL MINNICK

In article  starkhome!gene@sbstark.cs.sunysb.edu (Gene Stark) writes:
>I missed some of this "Spider" discussion due to a mail outage, but
>it strikes me that the usefulness of a Spider would be to form a unified
>index of data residing on a number of different hosts.  It would generally
>be pretty inefficient to index data residing on one single host, since that
>could be done very quickly by whatever database software is maintaining
>the data on that host.

>For example, my ged2html program, which Birger is using for the demo,
>can read and index the 30,000 line, 3000 individual royal92.ged file in
>about 15 seconds.  Or, I am sure that the LifeLines program he is also using
>could do this as well.  Any sort of "Spider" program that tries to
>follow all the links and generate and index will take much longer than this
>(I would estimate on the order of hours) and generate a great deal of load
>on the network subsystem of the host in the meantime.

Gene,
Thanks for your thoughts on this "Spider" concept.  You have the benefit of 
visibility to Lifelines and the GEDCOM to HTML conversion software, and the 
most efficient ways to use these items.  Please consider my thoughts below and 
let us know what level of search "power" we'll need, and how you would 
recommend we develop or obtain our GenWEB search capability.

I'm thinking ahead about a year or two when we have the GenWEB operating on 25 
separate nodes.  A new user will want to make one query that scans the 
individuals found on all 25 nodes.  The choices seem to be: 1) have an 
independent spider program crawl thru the 25 GenWEB nodes and continuously 
update a master index file; 2) have one master indexing node receive index 
updates automatically from all 25 GenWEB nodes when any change is made on the 
given node; or 3) some other scheme you may suggest.  However and whereever we 
create the indices, I believe our search algorithm will need to go well beyond 
simple indexing.  The thing that attracted me about the "Spiders" currently 
being developed is that they are developing many of the features we'll be 
needing -- -- such as the "fuzzy" searches that will find misspelled names, 
places, etc.  The price in computer time may or may not be worth using (and 
modifying)  a free spider program, such as Lycos offered by CMU.   

I recently came across a brochure for a commercial product called 
LaserFiche by Compulink, Torence, CA.  Let me quote to you from their brochure:

"FAST SEARCH AND RETRIEVAL:  Imagine being able to remember every single word 
inside each of your documents.  Now imagine what could happen if you could 
retrieve any one of those documents by any word, in a matter of seconds!  That 
is the search power that you get with LaserFiche.

*  Boolean operators and wildcards help narrow or expand your full text 
     searches.
*  Fuzzy search capability can locate words with misspelled, transposed or 
    incorrect letters, even if they occur at the beginning of the word. 
*  Proximity search reveals when one word or phrase occurs near another.
*  Indexed database fields provide traditional search capabilities
*  Search results can be saved in a folder for later examination."

I wish we could afford this product, or we could get it donated for GenWEB 
use (if it could be used with lifelines).  I don't know how we'll get this 
kind of search power; but I believe we'll need this kind of power to 
effectively search for one ancestor in a GenWEB containing 50 million or more 
individuals.   

 I guess I don't think a simple index concept is going to be worth much past 
the first year of GenWEB operation.  I'd like to hear any thoughts any GenWEB 
maillist members have on how we should best go about getting a full search 
capability implemented.  Those of you who know Lifelines and GED2HTML 
programs, do you see a good solution to the search problem?  What can the rest 
of us do to help you implement the solution?     -- -- Bill Minnick, 
Cupertino, CA





          Sunday, October 9, 1994 5:00:47 AM
          GenWeb Item
  From:           Gene Stark,starkhome!gene@sbstark.cs.sunysb.edu,Internet
  Subject:        Re:  Genealogy Data Base  (INDEXING Question)
  To:             GenWeb
>I'm thinking ahead about a year or two when we have the GenWEB operating on 25 
>separate nodes.  A new user will want to make one query that scans the 
>individuals found on all 25 nodes.  The choices seem to be: 1) have an 
>independent spider program crawl thru the 25 GenWEB nodes and continuously 
>update a master index file; 2) have one master indexing node receive index 
>updates automatically from all 25 GenWEB nodes when any change is made on the 
>given node; or 3) some other scheme you may suggest.

The spider concept is a fine idea, it's just that I don't think it should
work at the level of individuals.  Instead, each node should produce an
index file of the individuals located at that node, and the spider will
collect and process the index files.  An example of the kind of thing
that currently works this way on the Web is the "Unified CS Technical
Report Index".  An index is maintained at a central site, and it is updated
weekly by simply looking in anonymous FTP areas known to hold technical
reports and looking for files in various formats.  The main problem with
it, in my opinion is that it does not impose any standard index format,
but rather tries to "understand" whatever it sees.  For GenWEB, we could
probably afford the luxury of requiring that each node produce an index
file in a standard format (say, a certain type of HTML file).
For availability, it would probably be worthwhile to have multiple indexing
sites, not just one, but they need not be highly specialized nodes.

The way I would like to see GenWEB work is to have it be a loosely
federated collection of individual nodes, each owned and managed by an
individual who wishes to contribute his or her data.  The data could be
maintained in GEDCOM format or whatever format the individual finds most
convenient, however, GenWEB would provide conversion software to present
the data to the Web in a convenient form (e.g. HTML or any successor).
The software would also be capable of producing an index (in standard format)
on demand, for use by automatic indexing nodes or any individual requesting
it.  Birger Wathne's demonstration works this way.

In this scheme, none of the nodes (except maybe for a few indexing nodes)
will need to be machines with special dedication to GenWEB other than the
fact that they are reasonably often available on the Internet and that
there is a user of that machine who is interested in making his/her data
available and is able to have GenWEB server software hooked into their
Web system.  The strength of the Web is its distributed nature, where the
actual information is maintained locally at each node by an individual who
has an interest in that data, rather than having some constipated centralized
administration.  In my opinion, the number of participants in GenWEB will
be maximized if each individual can set up their own site with a minimum
of fuss and bother.

>  However and whereever we 
>create the indices, I believe our search algorithm will need to go well beyond 
>simple indexing.  The thing that attracted me about the "Spiders" currently 
>being developed is that they are developing many of the features we'll be 
>needing -- -- such as the "fuzzy" searches that will find misspelled names, 
>places, etc.  The price in computer time may or may not be worth using (and 
>modifying)  a free spider program, such as Lycos offered by CMU.   

>I recently came across a brochure for a commercial product called 
>LaserFiche by Compulink, Torence, CA.  Let me quote to you from their brochure:

>I wish we could afford this product, or we could get it donated for GenWEB 
>use (if it could be used with lifelines).  I don't know how we'll get this 
>kind of search power; but I believe we'll need this kind of power to 
>effectively search for one ancestor in a GenWEB containing 50 million or more 
>individuals.   

Yes, I believe that the search capabilities you describe will be very
useful, or perhaps even essential.  There are published algorithms for doing
the types of fuzzy searches you describe.  I am not an expert on this type
of computing, but as a Computer Science professor, I know that I could go
to the library and locate relevant literature in a few hours.  The harder
part is to figure out what type of search capabilities are feasible and
desirable for this project.  I don't think that 50 or 100 million individuals
presents a very difficult problem, and I believe it would be well within the
capability of a high-end PC-class machine (running a decent operating system,
not MS-DOS), to serve as an indexing server.  Most of the work would be done
in the background, by preprocessing collected index files into various kinds
of data structures to enable efficient searching in response to user queries.
If the rate of queries makes the load too high, redundant indexing servers
could be introduced.

In my opinion, GenWEB can (and should) start right now, and evolve from there.
All that is needed is to do the following:

(1)  Designate one node on the Web with an HTTP server to hold the
	GenWEB "home page", which will point to indexing nodes and places
	where data will be stored.

	It would be useful to have an administrator at this node to
	register new GenWEB nodes, much as CERN maintains an index
	of new Web servers.

	This site could also serve as the repository of GenWEB software,
	for access by anonymous FTP.

(2)  Specify *minimal* standards for the presentation of GenWEB data in
	HTML format, including the format of the index file which a
	GenWEB site is supposed to provide.  These standards need not
	be cast in stone, and would be expected to evolve, but they
	should not be very stringent.  They are needed simply to ensure
	that GenWEB can be traversed automatically, if necessary, to
	accomplish various tasks.

(3)  Make some sample software available so people can take their current
	data (say, in GEDCOM format) and present it to GenWEB in HTML
	format.  Various examples of this kind of software already exist.

(4)  Advertise that GenWEB has started, and begin registering GenWEB nodes.

Once this is done, and some data starts to become available, an interested
group can work out an "index server" implementation.  The best motivation
for this work will be a growing base of GenWEB nodes which one would like
to access efficiently.

							- Gene Stark



          Sunday, October 9, 1994 9:49:07 AM
          GenWeb Item
  From:           Bill Minnick,svpafug@rahul.net,Internet
  Subject:        Re: LifeLines -> WWW gateway
  To:             GenWeb
TO: GENE STARK
FROM: BILL MINNICK

Your lucid response is much appreciated; I believe it will be considered by 
the GenWEB community as a basis document for getting GenWEB kicked off.
I have a few questions/comments.  

In article  starkhome!gene@sbstark.cs.sunysb.edu (Gene Stark) writes:

>In my opinion, GenWEB can (and should) start right now, and evolve from there.
>All that is needed is to do the following:

>(1)  Designate one node on the Web with an HTTP server to hold the
>        GenWEB "home page", which will point to indexing nodes and places
>        where data will be stored.

>        It would be useful to have an administrator at this node to
>        register new GenWEB nodes, much as CERN maintains an index
>        of new Web servers.

>        This site could also serve as the repository of GenWEB software,
>        for access by anonymous FTP.


GENE:   This is a key piece of GenWEB we can use now.  Do you have a 
preference as to who provides this site?  Perhaps this could be a good use for 
Chris Garrigues' computer.  Recently, he wrote:
    >>From: 
    >>In your page, you talk about wanting "one host on the internet with good 
    >>connectivity, and guaranteed lasting dedication to genealogy".
    >>I might be able to provide this soon.

>(2)  Specify *minimal* standards for the presentation of GenWEB data in
>        HTML format, including the format of the index file which a
>        GenWEB site is supposed to provide.  These standards need not
>        be cast in stone, and would be expected to evolve, but they
>        should not be very stringent.  They are needed simply to ensure
>        that GenWEB can be traversed automatically, if necessary, to
>        accomplish various tasks.

>(3)  Make some sample software available so people can take their current
>        data (say, in GEDCOM format) and present it to GenWEB in HTML
>        format.  Various examples of this kind of software already exist.

GENE:
How do we deal with the millions of PCs (DOS and Mac)?  Can we get a version 
of Lifelines and your GEDCOM-to-HTML program which runs on a Windows based PC? 
This would work for limited size data bases, and open up GenWEB now to many 
more data bases.

>(4)  Advertise that GenWEB has started, and begin registering GenWEB nodes.

>Once this is done, and some data starts to become available, an interested
>group can work out an "index server" implementation.  The best motivation
>for this work will be a growing base of GenWEB nodes which one would like
>to access efficiently.

Our Austin Families Association of America currently has over 50,000 
U.S.and Canadian Austins and allied surnames in a data base ready to go on 
line as soon as we have a way to do it.  I'm sure there are many other such 
data bases available.  Most of us are DOS/PC based at present.  We are looking 
for guidance as to how to include these types of machines in GenWEB.  Do we 
have to wait for Windows 95 (the newest name for MS Chicago op system)?  Or 
are we faced with switching to UNIX-based operations?  

Thanks again for your well stated thoughts.  Let us know how we can help get 
GenWEB rolling.   Bill Minnick,  Cupertino, CA





          Sunday, October 9, 1994 9:55:05 AM
          GenWeb Item
  From:           Bill Minnick,svpafug@rahul.net,Internet
  Subject:        Availability of prior GenWEB Mail
  To:             GenWeb
TO: Gary Hoffman
FROM: Bill Minnick

A number of people have asked if newcomers to the GenWEB Mail List can get 
prior mail.  Is there an easy source for this at uscd.edu?   



          Sunday, October 9, 1994 12:57:55 PM
          GenWeb Item
  From:           Phillip Akers,freyr!pakers@netcom.com,Internet
  Subject:        Prior GenWeb Mail
  To:             GenWeb
I believe I have everything that has been put out so far on the GenWeb
list-server and the mailing which led up to the creation of the list-
server... I will be happy to make it available via ucsd or netcom...
I guess I should check the file size before promising I can put it up
on netcom... do any of the people who were in on the orginal mailings
have any objection to that email being made public?  Please send me
email if there is a problem...

Phil


          Sunday, October 9, 1994 1:56:30 PM
          GenWeb Item
  From:           Gene Stark,starkhome!gene@sbstark.cs.sunysb.edu,Internet
  Subject:        Re: LifeLines -> WWW gateway
  To:             GenWeb
(Regarding a "home site" for GenWEB)

>GENE:   This is a key piece of GenWEB we can use now.  Do you have a 
>preference as to who provides this site?  Perhaps this could be a good use for 
>Chris Garrigues' computer.

If he can provide the machine, and has good network connectivity with
minimal downtime, then I would say he is as good a choice as anyone else.

>GENE:
>How do we deal with the millions of PCs (DOS and Mac)?  Can we get a version 
>of Lifelines and your GEDCOM-to-HTML program which runs on a Windows based PC? 
>This would work for limited size data bases, and open up GenWEB now to many 
>more data bases.

As far as PC's running DOS and/or Windows are concerned, I can make available
executables of my program that will run on these machines.  However, I forsee
one fairly substantial problem:  the GEDCOM "standard" is not very clear on
some points, and it is not adhered to rigidly by systems that produce GEDCOM
output.  So far, two problems I have encountered are:

	(1)  Use of tags that are not mentioned in the GEDCOM standard.
	(2)  Empty data fields where the GEDCOM standard seems to imply
		there should be data.

The looseness in the GEDCOM standard makes it somewhat difficult to create one
set of executables that will be guaranteed to work on GEDCOM input from a
variety of sources.  I could try to handle the variety of GEDCOM's that are
produced by the major packages, however, to do this I need to have a set of
substantially sized, representative GEDCOM files from each of these packages
that I can use in lieu of definitive answers from the GEDCOM standard.
Right now I only have access to PAF, which is what I use for my own data.
If people using different packages would be willing to make test files
available to me, I would modify my program so that it will process these
test files.  At that point, I could make DOS/Windows executables that could
reasonably be expected to work on most people's data.

A somewhat different problem with PC's running DOS and Windows is that
they are single-user systems which would generally either be on the net
or in use by a user at the console, but not both.  Unix or Unix-like
operating systems for PC's (e.g. FreeBSD, NetBSD, or Linux) provide the
substantial advantage of being able to serve requests from the net at the
same time as somebody is using them to do other useful work.  However, I
am quite well aware that for the average PC user the installation and
maintenance of a Unix-like system can be somewhat daunting.

>Our Austin Families Association of America currently has over 50,000 
>U.S.and Canadian Austins and allied surnames in a data base ready to go on 
>line as soon as we have a way to do it.  I'm sure there are many other such 
>data bases available.  Most of us are DOS/PC based at present.  We are looking 
>for guidance as to how to include these types of machines in GenWEB.  Do we 
>have to wait for Windows 95 (the newest name for MS Chicago op system)?  Or 
>are we faced with switching to UNIX-based operations?  

If you can provide me with a substantial sample (say a few MB) of GEDCOM
from this database, I will be happy to produce a DOS or Windows executable
that is capable of turing GEDCOM files like that into HTML.  Actually,
it might be an interesting project to process the entire database.
You say 50,000 surnames?  If this database were output in GEDCOM, how many
megabytes would it be?  The "royal92.ged" file is 468kbytes, and it has
3010 individuals.  This was no problem to process on my system, and I could
probably do the same under DOS or Windows.  On my own system, I am pretty sure
I could process files of 10 to 20 times that size, say 10MB files with 50,000
individuals, without inordinate difficulty (say processing time of about an
hour, though I may eat my words).  My equipment is a 486DX/33 with 16MB of
RAM and a bit shy of 1GB of disk, running the FreeBSD operating system.
I can't guarantee being easily able to make a DOS or Windows executable
that would process such a large file, since it depends on whether I can get
the program to make effective use of the available RAM.  If I can get a sample
file, I will give it a shot, first under FreeBSD, then under DOS and Windows.

							- Gene Stark



          Sunday, October 9, 1994 9:18:36 PM
          GenWeb Item
  From:           Bill Minnick,svpafug@rahul.net,Internet
  Subject:        Re: LifeLines -> WWW gateway
  To:             GenWeb
In article  starkhome!gene@sbstark.cs.sunysb.edu (Gene Stark) writes:

>If you can provide me with a substantial sample (say a few MB) of GEDCOM
>from this database, I will be happy to produce a DOS or Windows executable
>that is capable of turing GEDCOM files like that into HTML.  Actually,
>it might be an interesting project to process the entire database.
>You say 50,000 surnames?  If this database were output in GEDCOM, how many
>megabytes would it be?  The "royal92.ged" file is 468kbytes, and it has
>3010 individuals.  This was no problem to process on my system, and I could
>probably do the same under DOS or Windows.  On my own system, I am pretty sure
>I could process files of 10 to 20 times that size, say 10MB files with 50,000
>individuals, without inordinate difficulty (say processing time of about an
>hour, though I may eat my words).  My equipment is a 486DX/33 with 16MB of
>RAM and a bit shy of 1GB of disk, running the FreeBSD operating system.
>I can't guarantee being easily able to make a DOS or Windows executable
>that would process such a large file, since it depends on whether I can get
>the program to make effective use of the available RAM.  If I can get a sample
>file, I will give it a shot, first under FreeBSD, then under DOS and Windows.

GENE:  This is exciting!  The data base sample I have is the Richard Austin 
data base,  6482 persons.  It is about 2.4 MBytes, most of which is a 
complete set of source citations and many biographical notes.  It was done 
with PAF 2.3.  You will find 2 marriages are empty; ignore them.  You may ftp 
the data base in a gzip or a pkzip compressed file via anonymous ftp at:  
ftp.rahul.net://pub/svpafug.

We want to run two experiments:  James Jones palns to put this data base on a 
486 PC running a commercial UNIX version, Lifelines, and he would need your 
GEDCOM-HTML converter.  I would like to try to put another data base (my 
ancestry data base on a PC running under Windows, which will link into the 
Richard Austin data base at James Jones site.  I'll prepare my family data 
base in a few days, but lets work with the Richard Austin project first.  
You may direct specific questions to James Jones on his setup at:
jjones@nas.nasa.gov   .

Thanks again for putting in a lot of effort in the past few days to get us 
pointed in the right direction.  Regards,  Bill Minnick, Cupertino, CA





          Sunday, October 9, 1994 10:06:30 PM
          GenWeb Item
  From:           Gary Hoffman,ghoffman@ucsd.edu,Internet
  Subject:        Prior GenWeb Mail
  To:             GenWeb

The UCSD listserver is homebrew software, not majordomo or one of the other
listservers running around on major unix sites, and therefore does not have
all the features of those products, like digests and archives. However, it
does support file service. We can upload files to ucsd.edu and they can be
downloaded via a command to the listserver. I will look into the details.

Meanwhile, I have the messages in a "mailbox" on my mailing system, but not
in a downloadable file format.  Phil Akers was a "charter" subscriber. If
he has all the postings in a handy form, we can probably make something
work here.

FYI: At last count, there are over 100 subscribers to this list, mostly
lurkers. Most have logged on after seeing my demo html page and the blurb I
put there for the genweb list.

Gary



          Sunday, October 9, 1994 10:31:19 PM
          GenWeb Item
  From:           Gary Hoffman,ghoffman@ucsd.edu,Internet
  Subject:        GenWeb Spider
  To:             GenWeb
I do not believe we want to support a master GenWeb personal index site; we
could end up indexing all the persons that have ever lived (including all
duplicate entries). What a task!

Instead, each GenWeb archive site (let's standardize this term) should have
its own index in some agreed-upon format that can be easily searched by a
spider or other type agent. 

The function of a GenWeb master index will be to keep track of all the
known GenWeb archive sites and thus direct spiders to their indexes. 

I believe this approach will minimize network traffic while maximizing
usefulness of the system.

I could work like this: I have an ancestor named Reuben Webster in my local
GenWeb archive (Site A) with no known parentage. I would send out a spider
to search the GenWeb archive sites to see if there is a Reuben Webster
anywhere with a birthplace/birthdate close to those of my ancestor. The
spider checks with the GenWeb index which sends it to the various sites.
After querying each site, it reports back to me a compiled list of possible
matches, in html format, of course. I would then manually check each entry
by hand to see which is the most likely candidate and the one I want to
link up to, say Site B. (I believe this manual step will be necessary no
matter how smart the spider gets.) Then I modify my entry for Reuben
Webster to include a link to the Reuben Webster on Site B.

Further, the GenWeb archivist/sysop/manager (we need a title here) of Site
B must do the same for all the deadend lines in the archive. That is, seek
out the continuations of the lines in Site B by sending out spiders to link
up with other archives. This will create the most efficient set of links.
If a Site B archivist does not link further to sites C, D, & E, etc. then
all the Site A's will be tempted to patch around Site B by downloading Site
B's data and linking directly to C, D & E. Thus an inactive Site will
quickly go unused.

The trend we want to encourage is to push the links farther away than to
bring them closer.

Cheers,
Gary



          Monday, October 10, 1994 5:12:22 AM
          GenWeb Item
  From:           Gene Stark,starkhome!gene@sbstark.cs.sunysb.edu,Internet
  Subject:        Re: LifeLines -> WWW gateway
  To:             GenWeb
>GENE:  This is exciting!  The data base sample I have is the Richard Austin 
>data base,  6482 persons.  It is about 2.4 MBytes, most of which is a 
>complete set of source citations and many biographical notes.  It was done 
>with PAF 2.3.  You will find 2 marriages are empty; ignore them.  You may ftp 
>the data base in a gzip or a pkzip compressed file via anonymous ftp at:  
>ftp.rahul.net://pub/svpafug.

Thanks for the data file.  As the GEDCOM was produced by PAF (an LDS product)
it adheres closely to the GEDCOM standard, and I was able to parse the
entire file after making only one minor change to my program (to allow
empty SUBM records).  It takes 44 seconds of wall clock time on my system
to parse the 82874 lines and construct the database and sorted HTML index
of 6482 individuals.

What do you want the output to look like?  Currently my program will
generate one HTML file for each individual, plus one index file.
Try Birger Wathne's demonstration to see how they look when displayed
on the Web.  What would you like done with the notes?  Currently
my program just copies them verbatim into the HTML output, without
trying to interpret cross-references, etc.

							- Gene Stark

Here is a sample of the INDEX file:

Index of Persons

Index of Persons

Eda Alice ABRAHAM (12 FEB 1876 - 26 DEC 1948 )
Grace ABRAHAM (1808 - )
Agnes ADAMS (4 FEB 1709/1710 - 2 OCT 1754 )
Alice ADAMS
Alle ADAMS
Charles ADAMS ( - )
Dorothy ADAMS (26 JUN 1679 - 26 JUN 1772 )
A sample individual file is below: I51: Aaron AUSTIN

Aaron AUSTIN

  • BIRTH: 17 JAN 1792, Fairfield,,NY
  • DEATH: 21 MAY 1804, Fairfield,,NY
Father: John AUSTIN
Mother: Sarah HEATH

Notes

EDITOR: #ACP/0146 Lorraine Norlund, 9/93.
RESEARCHER: #ACP/0104 John Austin Pell, 7/92.
!BIRTH-DEATH: EAM Ref: p 82. NOTE: Fairfield, NY is now called Luzerne.
!BIRTH-DEATH: Austin, Sybil Epps, GENEALOGY OF JAMES WESLEY AUSTIN; 1598-1973;
Privately published by the author; p25; photocopy in poss of AFAOA. CONFLICT:
Death 21 May 1809

Monday, October 10, 1994 9:22:49 AM GenWeb Item From: Bill Minnick,svpafug@rahul.net,Internet Subject: Re: LifeLines -> WWW gateway To: GenWeb TO: GENE STARK FROM: BILL MINNICK This is *really* getting exciting. To think I could get the equivalent of 5 pounds of paper (450 pages, if printed) in your hands (Cupertino, CA to LI, New York) on a holiday, in seconds, for no additional cost, between two different operating systems is amazing. We seem very close to getting the Richard Austin data base on Internet. As a side note, what we hope to do with this Richard Austin Descendants data base is to set an example of *Good Source Documentation*, which is included in this data base with each individual. In the end, it will be the Source Documentation that makes the GenWEB information truly valuable. In future mail-articles, I'll tell our subscribers the story behind how we achieved high quality source documentation in such a large data base. In article starkhome!gene@sbstark.cs.sunysb.edu (Gene Stark) writes: >Thanks for the data file. As the GEDCOM was produced by PAF (an LDS product) >it adheres closely to the GEDCOM standard, and I was able to parse the >entire file after making only one minor change to my program (to allow >empty SUBM records). It takes 44 seconds of wall clock time on my system >to parse the 82874 lines and construct the database and sorted HTML index >of 6482 individuals. >What do you want the output to look like? Currently my program will >generate one HTML file for each individual, plus one index file. >Try Birger Wathne's demonstration to see how they look when displayed >on the Web. What would you like done with the notes? Currently >my program just copies them verbatim into the HTML output, without >trying to interpret cross-references, etc. In the interest of simplicity, let's proceed with the html page format you have now. It appears readable, useful and quite adequate to get started. I will be interested in varying the format later when/if it becomes obvious that user interface will be easier/better with specific page format changes. The NOTES are best handled exactly as you stated above. They are designed to be part of the individual records; the source citations have been repeated for each individual they to which they apply, so the presentation will be complete and therefore useful. At some future date, we can experiment with a separate source citation data base with references to it placed on individual pages. In the future I'm sure we'll be talking about several changes that will make sense at that time. When you are finished with your formating, what files will I ftp back from you? Will your files be targeted to James Jones' 486/Unix system? How will your effort differ if we wanted to put this data base on a DOS-based system? ********************************************************************** COMMENT ON YOUR PC ADEQUACY STATEMENT ONE OR TWO LETTERS BACK: My current computer is a 486/40 w/20 MByte RAM, 420 Mbyte HD, MS Windows 3.11. I am networked to a 386/33 w/1.7Gbyte HD. Using the following shareware/freeware, I am currently running NCSA Mosaic (requires install of MS WIN32S which NCSA provides), and am using PeterTattam's Dial-up TCP/IP (I'm running at 19,200 Baud on a SLIP connection - local call). A few nights ago, I had two Mosaic windows receiving different pages in Birger's Royalty line GenWEB demo, and a third Mosaic window downloading a 16M color, 1280x1024 photo of an F-18 airplane from a site at Edwards AFB. This all worked without a hitch, which leads me to believe that if I set up an HTTP server in another window I could handle external http requests. I am guessing at this right now; perhaps you would have further insite on my observations. Though it is not practical now for me to tie up a phone line 24 hrs/day, our area is supposed to be able to get ISDN lines (combined 64kbps digital link and std audio phone line for $18/month) installed to homes soon. This may be the break we need to get our PCs on line. Looking forward to your reply and thanking you for your efforts to make GenWEB successful, Regards, Bill Minnick Monday, October 10, 1994 9:29:29 AM GenWeb Item From: Chris Garrigues,cwg@MCC.COM,Internet Subject: Re: LifeLines -> WWW gateway To: GenWeb In message <199410092055.QAA03254@starkhome.cs.sunysb.edu>, Gene Stark typed: > (Regarding a "home site" for GenWEB) > > >GENE: This is a key piece of GenWEB we can use now. Do you have a > >preference as to who provides this site? Perhaps this could be a good use f + or > >Chris Garrigues' computer. > > If he can provide the machine, and has good network connectivity with > minimal downtime, then I would say he is as good a choice as anyone else. I've been busy doing other things, but my name appeared in bold, so I thought I'd better pipe up. I believe I can do this. The system isn't quite yet configured as I'll want it to be. Today it can be reached as deepeddy.aus.sig.net, but eventually it will be known as deepeddy.com. I'm also in the process of getting the site secured, and I'll want to finish this process before advertising it very far. If someone wants to work on the index, I can provide an account which they would telnet into. As this project moves forward, it might be interesting to register genweb.net. I could even see putting pointers to other gedcom sites in this domain. We'd at least want to put pointers to any other "starting points" in it. As pointed out in mail from Gary Hoffman, we probably don't want an index of all genweb records, but instead need some sort of index of indexes. This "master index" would need to pass queries off to another system as fast as possible to keep from being overloaded as thousands of genealogists start using the net. genweb.net would certainly have a list of all known genweb sites, and it would probably make sense to have an index mapping soundex codes to sites, so if someone is looking for a "John Stark b. late 1800s", they'd first query gedcom.net for S362 indexes, and would get a pointer to your system as well as anybody else who has any S362s. They would then query those systems directly for "J500 S362 b. late 1800s", would pull those records, hopefully ordered by probability of a match. Our protocols will need to be designed in such a way that they behave appropriately if a system is down. The query for S362 from the master index might point to a dozen systems of which only ten are up at that time. The researcher should be able to query the ones which are up immediately, but keep a cache of the unresolved queries to the other two systems. Chris Monday, October 10, 1994 11:51:09 AM GenWeb Item From: Birger A. Wathne,Birger.Wathne@vest.sdata.no,Internet Subject: Re: Lifelines and ged2html To: GenWeb >From: Herbert Stoyan >To: genweb@UCSD.EDU >Subject: Lifelines and ged2html > >Birgers presentation is really nice (Birger, can you send more details? >How did you implemented the parts?). >I really don't understand why gen2html is necessary. I understood Tom Wetmore >that he is able to produce any output -- html would be a special case only. >Herbert > The reason I use ged2html is that I could write a report to do it directly from LifeLines, but the internal database format is GEDCOM, and I would have to write a report program that scanned through each tag, and wrote HTML output. This is indeed what my first demo did. But I decided that it would be a waste of effort to do it in the LifeLines report language, when: - Outputting a GEDCOM record directly is extremely easy in LifeLines - A general filter to convert GEDCOM to HTML would benefit everyone, and could be polished by a much bigger user base. My implementation consists of: - NCSA httpd: The Web server software from NCSA - A CGI gateway program: This gateway checks the request, extracts the arguments, and then runs a LifeLines report and ged2html. - LifeLines: I have 3.00b. This version is not generally abailable yet, I think. 2.x should work just as well. The main missing thing is a read-only mode. this has been implemented, but I don't have that version yet.... - ged2html: I have a slightly modified version of ged2html. I'll mail it back to Gene when I'm finished with it. It's a bit unstable (core dumps on some of my records). Birger Monday, October 10, 1994 3:32:22 PM GenWeb Item From: Bill Minnick,svpafug@rahul.net,Internet Subject: Re: LifeLines -> WWW gateway To: GenWeb TO: GENE STARK FROM: BILL MINNICK In article starkhome!gene@sbstark.cs.sunysb.edu (Gene Stark) writes: >The file is 993236 bytes, and it is a gzipped tar file, made up of >several thousand HTML files. Gene: I need to clarify in my mind what you have done with the Richard Austin GEDCOM file which I handed over to you last night. Is it correct to say that you have used your GEDCOM-HTML converter on the Richard Austin data base to produce 6482 linked HTML pages and an index page, each HTML page being a file? And all of these pages (files) take a total disk space just under 1 Mbyte? Then using an HTTPD server, I can make these pages publicly available on www, right? My local internet provider is a2i, which runs SunOS on Sun computers. I would probably put your set of HTML files there and reference it in our current SVPAFUG home page. My next question is, How does this compare, disk-storagewise, to the system Birger Wathne is using (Lifelines, with your GEDCOM-HTML converter translating on the fly) ? I know Lifelines stores the data base in GEDCOM format. Is this a compressed format? or does the Lifelines storage take up about the same 1 Mbyte disk space? If Birger Wathne's setup takes significantly less disk space, I expect I should implement his method as soon as possible. Monday, October 10, 1994 4:49:57 PM GenWeb Item From: Bill Minnick,svpafug@rahul.net,Internet Subject: Genealogy Documentation Guidelines To: GenWeb TO: GenWEB Mail List FROM: Bill Minnick SUBJECT: Genealogy Documentation Guidelines Based on interest by several GenWEB mail list members in the Source Documentation Guidelines we used in our Richard Austin data base, our Silicon Valley PAF User Group (svpafug) Exec Staff has decided to make our Genealogy Documentation Guidlines, 4th Edition, available at no cost at our anonymous ftp site as follows: ftp.rahul.net://pub/svpafug/docguid4.zip This is a MS Word for Windows file (about 50kb) in PKZIP 2.04g format. You may give this document to others, but we ask that you 1) give credit to the SVPAFUG when quoting from the document, and 2) you do not republish sections of, or otherwise sell part or all of this document for financial gain. This version of the SVPAFUG Documentation Guidelines was released on 1/26/94. Note that while the Guidelines are targeted to users of PAF 2.2/2.3, the information contained is very useful for anyone documenting sources in any genealogy format. At the same anonymous ftp location, we have a copy of the Richard Austin data base (file name: wwwra.zip, in GEDCOM format), which has been documented over the past three years by a team of 15 researchers, who have held as closely as possible to the Documentation Guidelines as they documented the 6482 individuals in this data base. We will attempt in the next month, or so, to put out the Documentation Guidelines in HTML on WWW for general access to those who can't use the Word for Windows format. For those of you who want a shiny new update of the Documentation Guidelines in a handy booklet form, they'll be available in November for $2 by contacting Mary Nordin, SVPAFUG Guidelines, 4417 Pitch Pine Ct., San Jose, CA 95136-2410 (E-MAIL ADDRESS: fnordin@ix.netcom.com). Please let us know if these documents are helpful. We are also taking suggestions for changes/corrections. Regards to all, Bill Minnick, VP, Silicon Valley PAF Users Group Tuesday, October 11, 1994 8:17:52 AM GenWeb Item From: T.T.Wetmore,ttw@beltway.att.com,Internet Subject: LifeLines Versions To: GenWeb The current official version of LifeLines is 2.3.6. Its source is available via ftp. The current beta version of LifeLines is 3.0.0. There have been about four "releases" of 3.0.0 to beta testers. The source of 3.0.0 is only available from me. Major changes in 3.0.0 are: 1. More vanilla use of curses, and ifdef's to handle BSD versus System V derived versions of curses. 2. New support for event, source and user defined records. 3. More information provided in the display windows, and a few more operations. 4. User defined options have been added -- currently included are the edit templates for new persons, sources and events. 5. Program can now run in read only mode, and in write access required mode. 6. Database block size changed from 1024 to 4096 -- not upward compatible. 7. Many improvements, a few bug fixes, and programming language extensions. Tom Wetmore, ttw@beltway.att.com Tuesday, October 11, 1994 9:34:52 AM GenWeb Item From: T.T.Wetmore,ttw@beltway.att.com,Internet Subject: Re: lifelines To: GenWeb Herbert, There is no automatic merge feature in LifeLines. The usual way to merge is to use the j commands from the tandem display modes. The quick reference has a short section on merging. I think there is a LifeLines program that suggests lists of persons who should maybe be merged. You could ask the LINES-L mailing list. I may have it, but my directory of report programs is huge and disorganized. Automatic merging is, in general, an intractible problem. Many people have suggested many heuristics that could be employed, but, in my opinion, the problem requires user intervention. Merging is an area that requires much thought. The merging feature provided by LifeLines is almost an "assembly language" level feature. Much more could be done, given we know what. Tom Wetmore, beltway.att.com Tuesday, October 11, 1994 10:02:17 AM GenWeb Item From: Birger A. Wathne,Birger.Wathne@vest.sdata.no,Internet Subject: Re: Lifelines and ged2html To: GenWeb The royal92.ged file is 472Kbytes. The royal gedcom base is 712Kbytes. I guess LifeLines doesn't compress data, and you have indexes, etc. So a database file will always be bigger than the raw data. If you intend to work on the same database, then you'll need the base anyway, and HTML files will be a waste of space. For the royal92 base it may make more sense to store them as separate files, as they are static (I'm not maintaining them). Access could get faster, etc. But for my other base, where I'm supposed to start working on my own family as soon as I get enough time, it's vital for me to have the data in lifeLines. Having a separate set of HTML files would be a waste of space, and I would have to worry about maintaining them as I worked on my base. Birger Tuesday, October 11, 1994 12:23:09 PM GenWeb Item From: Herbert Stoyan,hstoyan@faui80.informatik.uni-erlangen.de,Internet Subject: spiders To: GenWeb Spiders should not visit html-pages. They should use the ftp interface directly to lifelines. Tuesday, October 11, 1994 12:26:02 PM GenWeb Item From: Herbert Stoyan,hstoyan@faui80.informatik.uni-erlangen.de,Internet Subject: database size under lifelines To: GenWeb I made an experiment (with version 2.3.6): gedcom file: 94252 bytes database: 163 blocks (82KB). Wednesday, October 12, 1994 9:11:55 AM GenWeb Item From: T.T.Wetmore,ttw@beltway.att.com,Internet Subject: LifeLines 3.0.1 Available To: GenWeb LifeLines 3.0.1 is now available in source form. Because there are so few LL users, it's hard to give LL a good "soak" before making versions available. Because of this and because there are major differences between 2.3.6 and 3.0.1, I am postponing putting 3.0.1 on ftp sites until a few guinea pigs have used it awhile. To get 3.0.1 contact me and I will email it. Warning! 3.0.1 databases are incompatible with 2.3.6 databases. You will have to use a version 2.x.x LL to save your database in a GEDCOM file, and then use 3.0.1 to create a new database from that file. Special note to beta testers: 3.0.1 databases are also incompatible with 3.0.0 databases, though I can supply a patch program to do the database conversion for you (the difference is just 4 extra bytes in the key file in 3.0.1). There are significant differences between 3.0.1 and past versions, though the look and feel is similar; you'll be able to use 3.0.1 right out of the box. I am preparing both release notes to cover the differences, and a new version of the Quick Reference Guide. Tom Wetmore, ttw@beltway.att.com Wednesday, October 12, 1994 9:18:31 AM GenWeb Item From: Chris Garrigues,cwg@mcc.com,Internet Subject: Re: Lifelines and ged2html To: GenWeb In message <9410101654.AA07205@sdvest>, Birger A. Wathne typed: > My implementation consists of: > > - NCSA httpd: The Web server software from NCSA > - A CGI gateway program: This gateway checks the request, extracts the argume + nts, > and then runs a LifeLines report and ged2html. > - LifeLines: I have 3.00b. This version is not generally abailable yet, I thi + nk. > 2.x should work just as well. The main missing thing is a read-only mode. t + his > has been implemented, but I don't have that version yet.... > - ged2html: I have a slightly modified version of ged2html. I'll mail it > back to Gene when I'm finished with it. It's a bit unstable (core dumps > on some of my records). I'm just about ready to set up my own database so that I can study indexing issues and see if WAIS is useful for GENWEB. I'd like to use your work as a starting point. Do you have anything right now that feels stable enough to pass out? If not, do you have an ETA on when you might be ready to do so? Chris Chris Garrigues (MIME capable) cwg@mcc.com Microelectronics and Computer Technology Corporation +1 512 338 3328 3500 West Balcones Center Fax +1 512 338 3838 Austin, TX 78759-5398 USA Wednesday, October 12, 1994 9:39:27 AM GenWeb Item From: Gary Hoffman,ghoffman@ucsd.edu,Internet Subject: How to unsubscribe To: GenWeb If you want to unsubscribe to this list, please do not send the message to genweb@ucsd.edu. The proper format is to send your message to listserv@ucsd.edu with the subject line blank. In the body of the message put only the words: UNSUB genweb Send that message and the listserv will take care of you and you won't bother the other readers of the list. Thanks, Gary Wednesday, October 12, 1994 10:30:23 AM GenWeb Item From: Bill Minnick,svpafug@rahul.net,Internet Subject: Re: lifelines (tracking public changes to data bases) To: GenWeb In article "Rich @ (614) 427-5121" writes: >Bill, If I should get into your open data and "correct" a marriage date, >given that your have a huge database, how will you ever know that I've done >it? Will you also maintain some sort of edit.log? RICH: There are numerous software methods for comparing a master data base with a "public" data base. GEDCOM versions can be compared using MS WIN WORD or other word processors or text editors. Using these existing tools, one could manually review and merge changes from a public data base into their private master data base. OR, you could convert the GEDCOM to PAF data base; Ann Turner's PAFSPLIT program now has a data base compare option that puts out a "delta data base" which represents only the changes between the "Master" and "Public" data bases. I know there are still other data base compare programs, which I am not familiar with. I would like to go further in the future. Consider having available a transaction log of changes made to the data base. One could select the transactions of interest and apply them to their private data base. One could reverse out public transactions which are clearly wrong. I'd like to see an individual transaction log kept in the data base with each individual, so "cousins" can argue out, put in, reverse out information. Furthermore, I'd also like to see E-Mail between researchers and cousins relating to an individual kept with that individual's file for posterity, to be kept until primary sources are located (if ever) and put the conflicts to rest. Thanks for your interest in the concept of the public data base. --Bill Minnick, Cupertino, CA Wednesday, October 12, 1994 10:47:40 AM GenWeb Item From: T.T.Wetmore,ttw@beltway.att.com,Internet Subject: Request for HTML Example File To: GenWeb I would like to see what it is like to write a LifeLines program for generating HTML files. Would someone (Birger?) please send me a fairly complex GEDCOM person record, and the HTML version of the same record. There is already an ll2html LL program, but I've not run it. It might be useful to learn from the two. Tom Wetmore, LifeLines author, ttw@beltway.att.com Wednesday, October 12, 1994 11:26:42 AM GenWeb Item From: Chris Garrigues,cwg@mcc.com,Internet Subject: Re: lifelines (tracking public changes to data bases) To: GenWeb In message , Bill Minnick typed: > I would like to go further in the future. Consider having available a > transaction log of changes made to the data base. One could select the > transactions of interest and apply them to their private data base. One coul + d > reverse out public transactions which are clearly wrong. I'd like to see > an individual transaction log kept in the data base with each individual, so > "cousins" can argue out, put in, reverse out information. Furthermore, I'd These all seems like excelent ideas. Let me say a little bit abount why I was assuming that others wouldn't make changes to my database. The model that I had was that I'd continue to maintain my database in PAF (or something else) on my Mac. I'd then occasionally export an GEDCOM file which I'd load into GENWEB and export as my database from my Sun. Using this model, every time I did a new export, their changes would be lost. If I had an automatic way of (a) using this audit trail to update my master database where I thought it needed updating, and (b) extracting my deltas and merging them into the public database with appropriate additions to the audit trail, I might not object to the idea much. The remaining concern is that I still don't want other people adding individuals to my database, although adding pointers to their own individuals is fine with me. This is because I don't particularly want anybody other than myself to decide to double the size of my database. On the other hand, if they add pointers, that is in the spirit of the web. > also like to see E-Mail between researchers and cousins relating to an > individual kept with that individual's file for posterity, to be kept until > primary sources are located (if ever) and put the conflicts to rest. Part of this I would imagine would be to define an email alias for every individual in the genweb database which would send to everybody who has an interest in that individual. At sites where the genweb manager has control over the mail aliases database as well, this should be fairly easy (it's a SMOP). However, I'm not sure that'll be true at all sites. I could see a mail-to URL in each record which would point at the appropriate alias. Using forms, there's probably also be a way to automatically add yourself as an interested party. Doing this would add you to the alias list for that individual and send automatic email to those who are already on the list letting them know that they have yet another cousin. I suppose it would propigate to that individuals ancestors as well. Chris Wednesday, October 12, 1994 11:39:27 AM GenWeb Item From: T.T.Wetmore,ttw@beltway.att.com,Internet Subject: Notes on GEDCOM, HTML and LifeLines To: GenWeb Scott McGee mentions hyperlinks from genealogical HTML files to files (and programs?) of many types (eg, links to other persons, to pictures, to audio, ...). If HTML files are created by some special program from some special database with the conventions needed to hold the extra information, or references to that extra info, no sweat. Just do it. However, if the raw data begins in GEDCOM format, or in some database system that generates GEDCOM format records on demand, and some other process must first convert the GEDCOM into HTML, well, unless there are some conventions for storing references to these other kinds of things in the original GEDCOM, things could be a bit sticky. Could. There is GEDCOM and then there is GEDCOM. GEDCOM is both a syntactic standard and a semantic standard. I think the current "official" semantic standard is 4.6, though most developers are working to 5.3. These standards do allow references to information in external places, though I don't recall how (or if) you can specify what kind of info it is. A quick check of the standard will tell. At worst, naming conventions handle it. GEDCOM is also a syntactic standard. In this form GEDCOM is simply a set of rules for structuring information, but says nothing about how particular types of information should be structured. At the syntactic level you can therefore just make up your own conventions for structuring novel kinds of data. You can essentially make up conventions for anything you want, including making references to any kind of external data, places or programs. I would recommend that the semantic standard be used whenever possible, of course, but new conventions be employed if they prove necessary. In terms of LifeLines. By and large LifeLines takes the syntactic approach to GEDCOM. There are some "core semantics" that LifeLines does assume, but these are the most basic of the most basic. In all other ways LL uses GEDCOM only in syntactic mode. Because of this LL can read GEDCOM generated by any program, and in any standard, and LL programs can be written to write out LL records for anybody else's GEDCOM. Thus LifeLines easily supports (though I am not recommending it) the notion of arbitrary link types out of records to be converted into HTML hyperlinks upon record extraction. In addition, special standalone programs for converting those generated GEDCOM files into HTML files are not required -- the normal LL report generation system can do the translations. This is a more important point than it may sound. LifeLines allows you to both create new conventions, and then use them in the generation of outputs, without any modification to the LifeLines program itself being made. Tom Wetmore, Your Programming Pal, ttw@beltway.att.com Wednesday, October 12, 1994 11:41:25 AM GenWeb Item From: Jane Pryor,jpryor@systema.westark.edu,Internet Subject: Translation To: GenWeb Is genweb and lifelines just for programmers? Is there somewhere that I could get simple explanations of all this. I'm not an internet expert but I am willing and would like to learn. Is there a beginners class out there? Thank you Jane Pryor jpryor@systema.westark.edu Wednesday, October 12, 1994 11:42:17 AM GenWeb Item From: baud@research.att.com,Internet Subject: Re: lifelines (tracking public changes to data bases) To: GenWeb I really wish that genserve was up and running for two reasons (yes, this is posted with the right subject and to the right list): 1. I am anxious to try it out as yet another way to disseminate and gather genealogical information on line. 2. I think that a lot of these questions about how genweb should be structured could be answered with ``well, the genserve way is ...'' 3. I don't really understand why genweb needs to be anything except an alternative interface to genserve (of course, html-generating reports would be used rather than ascii-text generating ones, but hey :^) with genserve databases located at multiple sites (since this is the web-way). kurt :-) Wednesday, October 12, 1994 11:55:36 AM GenWeb Item From: Chris Garrigues,cwg@mcc.com,Internet Subject: Re: Translation To: GenWeb In message , Jane Pry or typed: > Is genweb and lifelines just for programmers? Is there somewhere that I > could get simple explanations of all this. I'm not an internet expert but > I am willing and would like to learn. Is there a beginners class out there? (whoops, sorry about the one that got away; I gotta change my default exmh key bindings) I'll let Tom answer for lifelines, but right now genweb is mostly an idea and this mailing list is for people who are interested in focusing and implementing this idea. Some of the people on the list are working on prototypes for parts of genweb; some are throwing out ideas; and some are merely lurking. I started as a lurker, have moved on to throwing out ideas, and am hoping to borrow enough of other people's work to try prototyping some things. Today there isn't really anything that a "beginner" can use, but the idea is to get things to a point where all a beginner will need is a starting point on the web and they'll be able to query other people's data and add their own data. You can certainly participate by keeping your mind and mouth open and suggesting to the rest of us ways to solve some of the (many) unsolved problems that we're all looking at. Or you could just lurk and try out people's prototypes as they get announced. If you do this, at some point you'll discover that you are using genweb. Chris Chris Garrigues (MIME capable) cwg@mcc.com Microelectronics and Computer Technology Corporation +1 512 338 3328 3500 West Balcones Center Fax +1 512 338 3838 Austin, TX 78759-5398 USA Wednesday, October 12, 1994 12:02:31 PM GenWeb Item From: Chris Garrigues,cwg@mcc.com,Internet Subject: Re: lifelines (tracking public changes to data bases) To: GenWeb In message <199410121840.OAA03036@bone.research.att.com>, Kurt Baudendistel typ ed: > I really wish that genserve was up and running for two reasons (yes, this > is posted with the right subject and to the right list): > > 1. I am anxious to try it out as yet another way to disseminate and gather > genealogical information on line. I agree. I was in the midst of querying various things when it went away. > 2. I think that a lot of these questions about how genweb should be structure + d > could be answered with ``well, the genserve way is ...'' Yes and now. genserv is very much mail oriented and as such can be very slow to find what you want. I expect that all the functionality of genserv will eventually be available via genweb, and we will need to remember what was done there, but what most of us are interested in is distributed information; that's not what genserv is. > 3. I don't really understand why genweb needs to be anything except an > alternative interface to genserve (of course, html-generating reports > would be used rather than ascii-text generating ones, but hey :^) with > genserve databases located at multiple sites (since this is the > web-way). Well, since genserv isn't anyting except a mail interface to lifelines, I suppose if genweb turns into an html based interface to lifelines, then you'll have what you want. If both genserv and genweb use lifelines as their engine, then it should be easy to have a mail interface to all of genweb as well as an html interface to the genserv database. I do hope that the genserv database gets put in genweb. Chris Wednesday, October 12, 1994 4:58:39 PM GenWeb Item From: Bill Minnick,svpafug@rahul.net,Internet Subject: Re: lifelines (Why Go Public With Data Bases) To: GenWeb In article Gary Steiner writes: >What if the person who sees an "error" in your database is wrong? What if >they "correct" your database with their supposedly correct information? >I don't want anyone changing my database without proving to me why their >information is correct and mine is not. There is a lot of shoddy research >out there; much published material containing incorrect data that people >take as gospel simply because it is in print. After spending the time and >effort to do the research to prove/disprove a fact found in someone else's >material, I don't want some newbie to come along and "correct" that data >back to the erroneous data that I just corrected! >It is a mistake to allow people to change other's databases. People should >be able to flag data to say that they have something that is different, but >they should not be able to supercede what is in the original database. GARY; One can start from the premise that "my data base is perfect; others may look at it, but do no touch!" My experience says that that attitude misses the great opportunity that GenWEB presents. I don't care if you think you have everything documented about your parents, grandparents, and so on, someone you don't know out there has some additional info on people in "your" data base, and I believe in making it easy for anyone to attach that information to the individual record. I agree that there will have to be limits in the size of data files appended to the individual record; perhaps a simple reference could be made to a new source of a lengthy biography or extensive photos or movies that exist elsewhere regarding the individual. Certainly we'll request a source citation be entered for any change in vital information on a person. If current vital info is from primary sources (birth, marriage, death certificates), then we would like to have a provision to lock those elements of the record. but I say that any record supported only be secondary or tertiary sources should be left open for correction when better information is found (by anyone). The strategy here is to use a *copy* of your data base on line to get the most people possible working to improve it. When someone makes an error and reenters erroneous info, the other cousins will in due course set the person straight. Perhaps a button on the HTML page to back out the last change is in order. It's certainly possible! The more restrictive you are with public access, the less the public will contribute to the knowledge base on your ancestors. I will admit that to achieve my aspirations for GenWEB, we have a fair amount of software development ahead of us. My hope is that these discussions will fire up the software experts among us to create or adapt existing software tools we need to do the job. Regards, Bill Minnick, Cupertimo, CA Wednesday, October 12, 1994 6:34:57 PM GenWeb Item From: Bill Minnick,svpafug@rahul.net,Internet Subject: Re: Request for HTML Example File To: GenWeb In article ttw@beltway.att.com (T.T.Wetmore) writes: >I would like to see what it is like to write a LifeLines program for >generating HTML files. Would someone (Birger?) please send me a fairly >complex GEDCOM person record, and the HTML version of the same record. >There is already an ll2html LL program, but I've not run it. It might be >useful to learn from the two. TOM: I have exactly what you are looking for, and probably to excess. Go to the following anonymous ftp site: ftp.rahul.net://pub/svpafug and get the following compressed files: wwwra.gez (a GZIP of the 6482 person Richard Austin Data Base) (James Jones has successfully loaded this into LifeLines; there are warnings on two empty marriages which you can ignore) wwwra.taz (a GZIP/TAR file of all 6482 HTML pages generated earlier this week by Gene Stark using his GEDCOM-HTML program) These two files should be an adequate challenge for you lifelines tests, Tom. Regards, Bill Minnick Wednesday, October 12, 1994 8:25:49 PM GenWeb Item From: Bill Minnick,svpafug@rahul.net,Internet Subject: Re: Genealogy Documentation Guidelines (ASCII Versions) To: GenWeb >TO: GenWEB Mail List >FROM: Bill Minnick >SUBJECT: Genealogy Documentation Guidelines (ASCII Versions) >Based on interest by several GenWEB mail list members in the Source >Documentation Guidelines we used in our Richard Austin data base, our Silicon >Valley PAF User Group (svpafug) Exec Staff has decided to make our Genealogy >Documentation Guidlines, 4th Edition, available at no cost at our anonymous >ftp site in plain ASCII as follows: I have added an ASCII version in two compressed formats: PKZIP compression >ftp.rahul.net://pub/svpafug/dguidtxt.zip GZIP compression: >ftp.rahul.net://pub/svpafug/dguidtxt.txz >Please let us know if these documents are helpful. We are also taking >suggestions for changes/corrections. >Regards to all, Bill Minnick, VP, Silicon Valley PAF Users Group Wednesday, October 12, 1994 8:27:08 PM GenWeb Item From: Bill Minnick,svpafug@rahul.net,Internet Subject: Re: lifelines To: GenWeb >In message , Bill Minnick typed: >> On the other hand, the GenWEB concept is tailor-made for submitters to both >> merge individuals and make corrections and additions of missing information >> and sources. We will have to figure out a way to enable "writes" into GenWEB >> files for any individual not fully documented by primary sources or >> otherwise "blessed" and "frozen" by an official family organization. We must >+ >> set up GenWEB to let thousands of people spend a few hours merging and >> correcting information, rather than requiring a few "data base owners" to >> spend thousands of hours doing the same tasks. In article Chris Garrigues writes:> >Be careful here. I don't particularly want you modifying the records in my >database; and you probably don't want me modifying the records in yours. CHRIS: I have a very different concept for GenWEB. My concept is to, in effect, give copies of my data bases to the public, and let any and all poeple have at the process of making copies, inputs, corrections, changes. I will keep a master of all my data bases privately, and plan to back up the WEB versions fairly often. I'll also update my private master versions of data bases from time-to-time. Of course, every GenWEB site owner can make his/her own rules, so we can track the successes and failures of each strategy. I personally want people to be able to enter corrections and additions when they see the need. My concern would be that someone might maliciously destroy the data base through open public access. I'd replace any corrupted data with the last good backup, should someone decide to corrupt the data. I'm going to try this open concept in any case for all the Austin Data Bases that we have on hand (about 50,000 individuals). I believe people's ability to interact with a "living" data base will draw the most interest from around the world and provide the greatest potential for improvement of the data bases. Regards, Bill Minnick, Cupertino, CA Wednesday, October 12, 1994 8:27:15 PM GenWeb Item From: Chris Garrigues,cwg@mcc.com,Internet Subject: Re: Quick Thoughts on LifeLines a GENWEB Program To: GenWeb In message <9410112059.AA24294@beltway.att.com>, T.T.Wetmore typed: . . . > There are database issues. LL never had to worry about multiple programs > accessing its databases, so the original b-tree database did not have > secure access control. This has now changed; the b-tree database now > controls multiple access by allowing a single writer or multiple readers, > but never simultaneous writers or simultaneous readers and writer. . . . > Other requirements exist for a GENWEB database of course -- it must be > maintained. But the current LL program, run in write access required mode, > addresses this just fine. Note that write access required mode means that > the database could not be open by LL if any other GENWEB program had the > database open for report generation, and that if LL had the database open > for maintenance, all other GENWEB programs would be locked out of the > database until the maintenance session were over. Sounds good to me. From the above, I assume that you're saying that you do file locking rather than record locking? It would be nice if locks could be handled on a per-record basis, but if the present structure of LL doesn't permit that, I can certainly understand. This is probably acceptable in the medium term. BTW, in the earlier discussion of merging, I missed the beginning of the conversation somehow, but in the context of GENWEB, it seems to me that merging is something to stay away from. Instead of merging two databases, we'd want to merely insert URLs in the HTMLs to point to the other database record with a note that "this is apparently the same person" or whatever is appropriate. I haven't had a chance to look at your code yet; can I assume that you've got a way to put a URL to another record in the GEDCOM so that it will be clickable from my web browser? (BTW, I don't think this would only be used to point into other GENWEB records: for my own GENWEB entry, I'd like a pointer to my personal home page; for a musician, I'd like a pointer to their on-line discography; etc.) Chris Wednesday, October 12, 1994 8:27:21 PM GenWeb Item From: T.T.Wetmore,ttw@beltway.att.com,Internet Subject: Quick Thoughts on LifeLines a GENWEB Program To: GenWeb LifeLines (LL) is built from four major components: o b-tree database allowing variable-length records o curses-based user interface based on windows and commands o report generation system based on a programming language o infrastructure of genealogical routines and functions LL was written as a single live user, genealogical database and report generation syste for UNIX systems. Because of three main reasons: o it runs on UNIX o it can be programmed to generate essentially any output products o it can flexibly store GEDCOM data better than other systems LL is used as the GENSERVE engine and is being used in GENWEB experiments. The requirements for a single-user, live system, and a remote-service, genealogical engine system, are not the same. Though LL can be used as a remote engine, there are some wrinkles. There are database issues. LL never had to worry about multiple programs accessing its databases, so the original b-tree database did not have secure access control. This has now changed; the b-tree database now controls multiple access by allowing a single writer or multiple readers, but never simultaneous writers or simultaneous readers and writer. Second, LL assumed that its curses-based user interface was how it would be used. Use of LL in automated situations is based on preparing scripts that mimic a user's actions and send the user interface output to the bit bucket. It does work. That's all one can say. It has been my view for a long time that the current LL program is one member of a possible family of programs, all centered on the same b-tree database mechanism and genealogical data structuring semantics. For the GENWEB, I imagine the proper engine would be a system that can generate indices on demand, generate HTML files on demand, and generate reports and charts on demand. Actually, the current LL can do all these things quite well, but another program devoted to just these tasks would be easier to put to use. Such a program would be built from: o b-tree database allowing variable-length records o simple message/command interpreter o report generation system based on a programming language o infrastructure of genealogical routines and functions Three of these four pieces are identical to those already in the current LL program and would be reused unchanged. The difference would be a switch out of the curses user interface to a command based scripting interface. For example, the following set of commands seem necessary: index -- index database into file html -- generate html file for person key to file report * -- generate report from program using arguments to file Actually, the index and html commands are just special cases of the more basic report command. Note that the view taken here is that the basic automated GENWEB use for LL-type programs would be the generation of files (index, html, reports, charts, ...) on demand. The report generation features of LL are designed precisely for this purpose. Other requirements exist for a GENWEB database of course -- it must be maintained. But the current LL program, run in write access required mode, addresses this just fine. Note that write access required mode means that the database could not be open by LL if any other GENWEB program had the database open for report generation, and that if LL had the database open for maintenance, all other GENWEB programs would be locked out of the database until the maintenance session were over. Sounds good to me. Tom Wetmore, ttw@beltway.att.com Wednesday, October 12, 1994 8:36:48 PM GenWeb Item From: Bill Minnick,svpafug@rahul.net,Internet Subject: Re: lifelines To: GenWeb In article ttw@beltway.att.com (T.T.Wetmore) writes: > Automatic merging is, in general, an intractible problem. Many >people have suggested many heuristics that could be employed, but, in my >opinion, the problem requires user intervention. Regarding this "MERGING" issue, Floyd Nordin and/or Mike Andrews should comment on the experience of the LDS church with the Ancestral File. They have indicated to me that many individuals in the current Ancestral File (on CD ROM at LDS Family History Centers) are duplicates, and should be merged. But no effective means of merging has been devised. The person-hours cost for a few trained people to do this is out of the question. The Ancestral File submission procedure gives no means to the submitter to do the merge. On the other hand, the GenWEB concept is tailor-made for submitters to both merge individuals and make corrections and additions of missing information and sources. We will have to figure out a way to enable "writes" into GenWEB files for any individual not fully documented by primary sources or otherwise "blessed" and "frozen" by an official family organization. We must set up GenWEB to let thousands of people spend a few hours merging and correcting information, rather than requiring a few "data base owners" to spend thousands of hours doing the same tasks. Wednesday, October 12, 1994 8:55:22 PM GenWeb Item From: T.T.Wetmore,ttw@beltway.att.com,Internet Subject: LifeLines Storage Requirements To: GenWeb There have been a couple questions about the size of a LifeLines database. LifeLines stores records in straight GEDCOM format - no compression. A LifeLines database has the following contents: 1. The person, family, source, event and other records, all in GEDCOM format, that are entered by the user or imported from files. 2. The name index records required to quickly find persons in the database by name. 3. The indexing overhead of a B-Tree database. 4. A few "special records" that hold abbreviations, user options and character translation tables. In summary, a LifeLines database will be larger than the sum of all the GEDCOM records it contains, but not appreciably so. Tom Wetmore Wednesday, October 12, 1994 8:58:46 PM GenWeb Item From: Phillip Akers,freyr!pakers@NETCOM.COM,Internet Subject: Prior Mail Available To: GenWeb I have uploaded a file to my anonymous ftp directory on netcom called genweb.doc which contains the pre-GenWeb list-server ponderings which led to the creation of the GenWeb list-server (at least I think that how it happened) and everything I have received from the list-server since its creation to today, October 11, 1994. It can be downloaded from netcom by ftping to ftp.netcom.com. Login as anonymous and use your email address as the password, then cd to /pub/pakers and do a get for genweb.doc. It is an ascii text file. I have run it through unix2dos so it should be readable for dos machines. Phil -- Wednesday, October 12, 1994 9:14:22 PM GenWeb Item From: Bill Minnick,svpafug@rahul.net,Internet Subject: Re: Lifelines and ged2html To: GenWeb TO: Birger Wathne FROM: Bill Minnick SUBJECT: GEDCOM Compression in Lifelines??? Birger, Does Lifelines compress the GEDCOM files; if so, could you tell us the size of the Roalty GEDCOM file as an uncompressed ASCII file, and as a compressed file in Lifelines. Thanks and Regards from Bill Minnick, Cupertino, CA Wednesday, October 12, 1994 10:03:59 PM GenWeb Item From: T.T.Wetmore,ttw@beltway.att.com,Internet Subject: Re: Quick Thoughts on LifeLines a GENWEB Program To: GenWeb Chris (>): >From the above [description of LifeLines database access rules], I assume >that you're saying that you do file locking rather than record locking? It's even larger grained than that -- entire database locking! >It would be nice if locks could be handled on a per-record basis, but if >the present structure of LL doesn't permit that, I can certainly understand. >This is probably acceptable in the medium term. I agree with the "it would be nice" part. There would be some significant technical hurdles to overcome. For example, LL caches the last so many persons and other records accessed. Keeping cached records up to date with real records being changed by other programs is not something I am excited to work on. Also, recall that the new rules allow any number of simultaneous readers. >... in the context of GENWEB, it seems to me that merging is something to >stay away from. Instead of merging two databases, we'd want to merely >insert URLs in the HTMLs to point to the other database record with a note >that "this is apparently the same person" or whatever is appropriate. I think the merging issue has more to do with database maintenance than it has to do with GENWEB access to the database. One example of the merging problem occurs when you get new data, say in GEDCOM file format, that contain a substantial number of persons already in your database. After you import the data you are faced with a database with a substantial number of duplicates. The merging problem. >I haven't had a chance to look at your code yet; can I assume that you've >got a way to put a URL to another record in the GEDCOM so that it will be >clickable from my web browser? A GEDCOM record may contain references to other files, and conventions can be used to indicate what kinds of files they are. Then LL programs can be written to translate those records with their file references into HTML files with the correct URL indications. Given the conventions, there's nothing to it. >(BTW, I don't think this would only be used to point into other GENWEB >records: for my own GENWEB entry, I'd like a pointer to my personal home >page; for a musician, I'd like a pointer to their on-line discography; etc.) I agree; the pointers could be to all kinds of things. Tom Wetmore, ttw@beltway.att.com Wednesday, October 12, 1994 10:44:00 PM GenWeb Item From: Amelia Painter,apainter@coyote.csusm.edu,Internet Subject: Re: Translation To: GenWeb Thank You Jane!!!!!!!! Ditto! Amelia Chapman Painter apainter@san_marcos.csusm.edu PO Box 154 San Luis Rey, CA 92068 On Wed, 12 Oct 1994, Jane Pryor wrote: > Date: Wed, 12 Oct 1994 13:40:58 -0500 (CDT) > From: Jane Pryor > To: genweb@UCSD.EDU > Subject: Translation > > Is genweb and lifelines just for programmers? Is there somewhere that I > could get simple explanations of all this. I'm not an internet expert but > I am willing and would like to learn. Is there a beginners class out there? > Thank you > Jane Pryor > jpryor@systema.westark.edu >