Subject: Re: ...Unique ID To: Todd Fries Date: Wed, 17 Jul 1996 10:40:21 +0100 (BST) From: Ben Laurie Cc: ghoffman@UCSD.EDU, genweb@UCSD.EDU, todd@miango.com In-Reply-To: <199607170036.AAA18887@lighthouse.umr.edu> from "Todd Fries" at Jul 16, 96 07:36:34 pm Reply-To: ben@algroup.co.uk X-Mailer: ELM [version 2.4 PL24 PGP2] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID: <9607171040.aa29392@gonzo.ben.algroup.co.uk> Todd Fries wrote: > > > Todd Tyrone Fries,todd@miango.com,Internet writes: > > Thus, I suggest 'giving up' on trying to include enough personal data about > > a person to guarantee a unique id, and instead focus on guaranteeing a > > unique id which references the person in a database. This would require a > > central (or perhaps distributed) authority that assigns numbers.. > > > > ------------------ > > Todd, > > This brings us back full circle. This discussion began with the concern > > that a "number" or "ID" that consists of a server name plus a RIN-like > > number could break if either the server must change names (very common on > > the Internet) or the sponsor recompiles the HTML from an original database > > with the result that the record number changes. > > Hrm, I'll read up as you suggest, but my suggestion is to have 1 authority > that starts with zero, 1, etc... and numbers people. The number should not > have any bearings based on the server name or anything just the person. Why? > > Because the person is the only thing unique to the person. Should be obvious, > but it's not. Some people think that just because a person was 'entered' into > a server should forever tie their records with that server. But this doesn't > allow a very distributed caching scheme of data or any of that. When I said > similar to dns I meant very similar. Consider the fallacies of what you > said: > > id = server + server-database-id > > If the server in this equation changes anything, including itself, or if the > server goes out of business, or the database needs to change locations to > another server, or someone wants to download the database and then serve the > data from a local machine, what happns to the above equation? It breaks. > horribly. > > The problem at hand is: > > We have, globally, alot of data about alot of people. How can we give a > unique id to every person know to (have) exist(ed) ? > > Well, let's see, we assign them... a unique id? > > Then the problem becomes: > > How do we choose a unique id? Where do we store it? Who will asssign it? > > Well, the most logical way to assign a unique id seems to me to be to > start counting. Just number the people. Perhaps use something other than > base 10 to make the 'string' for each unique person short, but assign each > person a number nevertheless. > > Where the id is stored is related to how it is assigned. > > Allow me to side-track a moment. Currently, we have a vast number of computers > on internet. Each one has a unique id. Each machine is a separate entity that > must be kept track of, for we all must be able to 'reach' it. The id's are > assigned by first a central authority who gives authority to groups who are > given numeric ranges with which to work. If the group needs more numbers, it > gets another numeric range. There is never any duplication on internet. > > So why is it so hard to decide, hrm, we must give a unique id/number to each > person so we can easily refer to them across all databases, so gendex, you get > 1-1,000,000 and genweb, you get 1,000,001 - 2,000,000 , etc... so start > counting. And if you discover you have the same person 'enumerated' twice? > Well, somehow agree on a way to have 'alias' numbers such that even though > a person receives two numbers, this actually refers to the same individual. > It is perhaps 'not good' that a database should have multiple id numbers > referring to an individual, but IMHO it is a tradeoff: multiple ids per person > or multiple people per id, I think it is obvious which is preferred...because > no matter how perfect a system you devise, there are always going to be those > two people who submit a 'new' individual to two different databases, and that > new individual happens to be the same person. > > I guess I must be missing something. What would be wrong with such a plan? There are two things wrong with this particular variant. The first is that it requires a central authority - this leads to (at least) two problems - funding and politics. The second is that just plain numbers give you no clue as to where the record corresponding to the number may reside. This problem also afflicts the Internet's numbering scheme, and is one of the driving forces behind the move to IPv6. The first can be solved by using an existing authority, hence my proposal which piggybacks on DNS. To some extent this also solves the second. Cheers, Ben. > > -- > Todd Fries .. todd@miango.com -- Ben Laurie Phone: +44 (181) 994 6435 Freelance Consultant and Fax: +44 (181) 994 6472 Technical Director Email: ben@algroup.co.uk A.L. Digital Ltd, URL: http://www.algroup.co.uk London, England. Apache Group member (http://www.apache.org) From: Brian Tompsett Date: Wed, 17 Jul 96 11:59:30 BST Message-Id: <931.9607171059@olympus.dcs.hull.ac.uk> To: andrsn@andrsn.stanford.edu, genweb@UCSD.EDU Subject: Re: Excluding Robots Several people have followed up from my tangential comments, both privately and to this list, making the presumption that abuse by robots is caused by being ignorant of the protocols on the part of the web page provider or system manager. Let me just stem this rising tide of comment. As George quite rightly says, abuse by robots is a topic in itself and not suitable for this list. If your site has not been abused by robots yet then either you have not noticed, or some robots haven't found you yet! It is not solved by the simple presence of a robots.txt file. I now spend as much of my time adding more lines to my exclusion file to contains the beasts as I do in adding genealogical data. Systems are being attacked by robots in the same way as their users are "mail spammed", usenet is being spammed and so on. It's just every day business from another outfit who think the 'net is their sure fire way to a quick buck.... [Damn, must resist these tangential comments...] :-) Brian From: "W. Wesley Groleau (Wes)" Subject: Re: ...Unique ID To: genweb@UCSD.EDU Date: Wed, 17 Jul 96 7:51:36 EST In-Reply-To: <199607170036.AAA18887@lighthouse.umr.edu>; from "Todd Fries" at Jul 16, 96 7:36 pm Mailer: Elm [revision: 70.85] Folks thinking about "unique IDs" for persons should also look at the Social Security system (for a BAD example) and the "UUID" concept of the DCE - Distributed Computing Environment (for an example I don't have an opinion on). -- --------------------------------------------------------------------------- W. Wesley Groleau (Wes) Office: 219-429-4923 Magnavox - Mail Stop 10-40 Home: 219-471-7206 Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com --------------------------------------------------------------------------- Date: Wed, 17 Jul 1996 06:24:08 -0700 Message-Id: <199607171324.AA19820@relay.interserv.com> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Subject: Re: ...Unique ID To: genweb@UCSD.EDU In-Reply-To: <9607171040.aa29392@gonzo.ben.algroup.co.uk> X-Mailer: SPRY Mail Version: 04.00.06.21 Since we are about full circle I resubmit my two cents worth. Each site/archive be given a virtual name. The GenWeb "authority" maintains a DNS-like index of all archive names and the underlying urls for the site. The site maintains an index of the individuals there including the record number assigned by that site (either via the genealogy software or by the site-maintainer). When I include a link at my site to someone else, I simply include the GenWeb site name + that site's number for the individual. Responsibilities: GenWeb maintains the master site index. Individual sites assign individual numbers and notify GenWeb if the url of their site changes. -- Notice I have not dealt with the functionality of clicking on a link and having the other site's individual info appear automagically. Date: Wed, 17 Jul 1996 11:00:51 -0500 (CDT) From: Todd Tyrone Fries Sender: tfries@umr.edu Reply-To: Todd Tyrone Fries Subject: Re: ...Unique ID To: genweb@UCSD.EDU In-Reply-To: <9607171040.aa29392@gonzo.ben.algroup.co.uk> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII > Todd Fries wrote: > There are two things wrong with this particular variant. The first is that > it requires a central authority - this leads to (at least) two problems - > funding and politics. I'll concede this is a problem. But how else could we possibly have organization without a 'secretary' so to speak? > The second is that just plain numbers give you no clue as to where the > record corresponding to the number may reside. This problem > also afflicts the Internet's numbering scheme, and is one of the driving > forces behind the move to IPv6. Last I checked IPv6 is necessary because there are not enough internet numbers to go around for computers. The internet numbering scheme was setup for a limited number of computers. Just because the conventions used to designate IPv6 look a little more 'human' readible (i.e. having an edu string as part of the ip address) has no bearing on the fact that it is being used to represent more ip address namespace. > The first can be solved by using an existing authority, hence my proposal > which piggybacks on DNS. To some extent this also solves the second. This is eactly the thing I have a problem with. How can you use dns to create a unique id? Internet is a changing beast. No one can guarantee that next month, let alone next year, their server will be up and profitible, or that they will forever serve genealogical records. While it is a nice thought to use an existing service, I guess my big problem is that I cannot see a solution to the situation where a server ceases to exist ... does its data go to limbo or what? Perhaps I'm thinking the wrong way about this. Perhaps you mean to use the ip address of a database server as a starting poing for the number of the person's unique id, and that we could have an 8 digit hex number to which a unique id is assigned per site or some such variant, so that everyone refers to that person by the new id no matter WHERE the person's data resides on any server on the planet. Key point: the data's id can be used outside the server from which it originated. Thus allowing for data migration, caching, etc... Thanks for humoring me on this, I guess I finally 'got the picture'. Ok, so back to square one. How to generate a unique id? Suggestion: <8-digit hex ip|dns name|dotted decimal>: Where the site id is an arbitrarily long string that is hopefully kept to a close minimum in length that is necessary to guarantee a unique person's id. My $.02 only.. -- Todd Fries .. todd@miango.com Date: Wed, 17 Jul 1996 11:23:45 -0500 (CDT) From: Todd Tyrone Fries Sender: tfries@umr.edu Reply-To: Todd Tyrone Fries Subject: Re: ...Unique ID To: genweb@UCSD.EDU In-Reply-To: <199607171324.AA19820@relay.interserv.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII > Since we are about full circle I resubmit my two cents worth. :-) > Each site/archive be given a virtual name. The GenWeb "authority" > maintains a DNS-like index of all archive names and the > underlying urls for the site. Sounds reasonable enough. > The site maintains an index of the individuals there including > the record number assigned by that site (either via the genealogy > software or by the site-maintainer). I could envision this happening. At some sites this is already occurring. > When I include a link at my site to someone else, I simply > include the GenWeb site name + that site's number for the > individual. This is already being done. > Responsibilities: > GenWeb maintains the master site index. > Individual sites assign individual numbers and notify GenWeb if > the url of their site changes. Heavy responsibilties. But a start. > Notice I have not dealt with the functionality of clicking on a > link and having the other site's individual info appear > automagically. So basically, a distributed data model where each 'virtual archive' stores information about the individuals in it's own archive. Genweb is thus an archive of it's own special nature into individual archives rather than for indivuduals. Pardon me while I think aloud... Hrm, well, one problem. This gives a unique id to a person at a specific virtual archive. What happens if that person is at more than one virtual archive? What happens if archive a collects information from archive b and c. Are there then considered to be three unique id's for each person simply because they are in three archives? I may be wrong, but I think 'getting to a person easily' is being confused with 'identifying a person uniquely'. Since things appera to generally 'moving' towards site+site-id, let me ask... Can I take data from gendex, genweb, etc, and move parts of it to my own server and have the id's be the same? Are we talking about a virtual archive at http://site/data/site-id or about data at http://anysite/data/virtual-archive_site-id I guess what I'm trying to say is, once you have found a person's id number, you should be able to go to any archive anywhere, look for that id number, and find the same person if they have data about that person. Therefore, we can't really guarantee in any way the ability to 'point' using the id number directly to any specific archive. But if we have unique id's that are used throughout all databases, it should then be possible to go to whatever search engines are available searching for a person...locate the unique id, and then query all databases for the presence of that unique id to locate any additional information about that individual. Am I making any sense? Todd Fries .. todd@miango.com Date: Wed, 17 Jul 1996 16:49:01 +0100 To: Brian Tompsett From: "Harold A. Driscoll" Subject: Re: Robots Searching GenWeb Sites Cc: genweb@UCSD.EDU, ghoffman@UCSD.EDU At 09:14 16/7/96 BST, Brian Tompsett wrote: > My nine (or so) genealogical databases are perpetually being > indexed by robots. > This causes interesting question of database design and page format, > Robots also cause a fair bit of problems to my server. Whatever the > implementors think some of them are a nuisance. One in particular tries > to fill the partition that contains my accesss log. Security attacks are an entirely different matter, and deserve to be treated as the nasty attacks which they are. I trust that you've been in appropriate contact with their service provider's management (or legal counsel). > I get about 6000 hits a day for genealogical requests. I log everything, >and I study the log to learn about how people use the resource - that is >in fact why I have the data there! I study how the data is accessed. I'm envious of your persistence on this. I'd suggest that genealogical databases consider including indexing META elements in the page headers. These allow well-behaved search engines to know which keywords you want to have indexed (or at least given preference), as well as the page description to be provided when a search match is made. See the help page at AltaVista for a good description of this, http://www.altavista.com/ -- some other search engines are supporting these META elements, and we can expect more to follow. Header META elements help search engines to index the pages and sites. The robot.txt file (URL in prior message of this thread) helps to limit the range of the indexing. This is something which benefits not only the site, but also their guests and those who use the search engines. /Harold -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Harold A. Driscoll mailto:harold@driscoll.chi.il.us #include http://homepage.interaccess.com/~driscoll/ Subject: Re: ...Unique ID To: todd@miango.com Date: Wed, 17 Jul 1996 17:33:14 +0100 (BST) From: Ben Laurie Cc: genweb@UCSD.EDU In-Reply-To: from "Todd Tyrone Fries" at Jul 17, 96 11:00:51 am Reply-To: ben@algroup.co.uk X-Mailer: ELM [version 2.4 PL24 PGP2] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID: <9607171733.aa00517@gonzo.ben.algroup.co.uk> Todd Tyrone Fries wrote: > > > Todd Fries wrote: > > There are two things wrong with this particular variant. The first is that > > it requires a central authority - this leads to (at least) two problems - > > funding and politics. > > I'll concede this is a problem. But how else could we possibly have > organization without a 'secretary' so to speak? In this context, it seems to me that there are two reasons for having an organisation: 1. To define standards. 2. To allocate IDs (or some component of them). It seems to me that both of these already exist in a form which is usable for our purposes, without any need to invent new ones. The first can be satisfied by going through the usual Internet standards process, a course I would hope this group will pursue in the fullness of time, and the second can be piggybacked on existing ID allocators. > > > The second is that just plain numbers give you no clue as to where the > > record corresponding to the number may reside. This problem > > also afflicts the Internet's numbering scheme, and is one of the driving > > forces behind the move to IPv6. > > Last I checked IPv6 is necessary because there are not enough internet numbers > to go around for computers. The internet numbering scheme was setup for a > limited number of computers. Just because the conventions used to designate > IPv6 look a little more 'human' readible (i.e. having an edu string as part > of the ip address) has no bearing on the fact that it is being used to > represent more ip address namespace. There are actually two driving forces, one is the shortage of numbers, but the other is the size of routing tables, which is what I was referring to. > > > The first can be solved by using an existing authority, hence my proposal > > which piggybacks on DNS. To some extent this also solves the second. > > This is eactly the thing I have a problem with. How can you use dns to > create a unique id? Internet is a changing beast. No one can guarantee that > next month, let alone next year, their server will be up and profitible, or > that they will forever serve genealogical records. > > While it is a nice thought to use an existing service, I guess my big problem > is that I cannot see a solution to the situation where a server ceases to exist > ... does its data go to limbo or what? > > Perhaps I'm thinking the wrong way about this. Perhaps you mean to use the > ip address of a database server as a starting poing for the number of the > person's unique id, and that we could have an 8 digit hex number to which > a unique id is assigned per site or some such variant, so that everyone refers > to that person by the new id no matter WHERE the person's data resides > on any server on the planet. Key point: the data's id can be used outside the > server from which it originated. Thus allowing for data migration, caching, > etc... > > Thanks for humoring me on this, I guess I finally 'got the picture'. Ok, so > back to square one. Aha! You have seen the light! > > How to generate a unique id? > > Suggestion: > > <8-digit hex ip|dns name|dotted decimal>: Yep, though I would say that the IP number is dangerous - at the current state of play your allocated IP (if you have one at all - DHCP means more and more people don't) is tied to your service provider, and so can end up in someone else's hands. Unfortunately, DNS is also reassignable (though this happens much less often), so perhaps my suggestion of DNS is less than ideal. A much more reliable unique ID can be got from the 6 byte MAC address of an Ethernet card. These are globally unique, and unlikely to change hands (and, indeed, cheap enough so that you could buy one for the purpose, then burn it to ensure it was never reused). I would be inclined to allow all three methods, at the discretion of the user. A fourth method, often suggested and almost never used, is to generate a large random number. > > Where the site id is an arbitrarily long string that is hopefully kept to > a close minimum in length that is necessary to guarantee a unique person's > id. > > My $.02 only.. No, no. That was _my_ $.02. Give it back! ;-) Cheers, Ben. > -- > Todd Fries .. todd@miango.com > -- Ben Laurie Phone: +44 (181) 994 6435 Freelance Consultant and Fax: +44 (181) 994 6472 Technical Director Email: ben@algroup.co.uk A.L. Digital Ltd, URL: http://www.algroup.co.uk London, England. Apache Group member (http://www.apache.org) From: "W. Wesley Groleau (Wes)" Subject: Re: ...Unique ID To: genweb@UCSD.EDU Date: Wed, 17 Jul 96 13:22:45 EST In-Reply-To: ; from "Todd Tyrone Fries" at Jul 17, 96 11:23 am Mailer: Elm [revision: 70.85] :> What happens if archive a collects information from archive b and c. Are there :> then considered to be three unique id's for each person simply because they :> are in three archives? :> :> I may be wrong, but I think 'getting to a person easily' is being confused with :> 'identifying a person uniquely'. Apparently we HAVE been confusing two topics. But I submit that "getting to a [record about a] person easily" is of more interest to most of us. And identifying a person uniquely is neither particularly useful nor particularly likely to happen. There are MANY ways to generate "unique" IDs. The problem is mapping them one-to-one with "real" people. There is merit in the idea that when two persons are discovered to be the same, one ID is aliased to the other and neither is re-used. To extend this to when "one person" is discovered to be two or more, how would you do that? Say I prove that ID xxxxx is really two people. What happens to all references to xxxxx ? If I have John Doe in my database, and you have John Doe in yours, and they happen to be the same John Doe, what mechanism will ensure that they are assigned the same ID? For that mechanism to be reliable, it will have to have enough information about each record to be sure of identity. If my record contains that much information, then I have no need for your record, so the unique ID is too late to help me. If we both have enough info to equate the two John Does, what if my standards of proof are higher than those of the person coding the ID assigner? I disagree that they are the same person, so I refuse to use the ID. Or the opposite: they are assigned distinct IDs, but my standards of "proof" are not as high, so I insist on using the same ID as yours. What GenWeb censor is going to raid my server and prevent it? :> Am I making any sense? :> Todd Fries .. todd@miango.com Yes, you are, at least as far as your idea and rationale being understood. It's a good attempt, but IMHO will not pan out to something workable. --------------------------------------------------------------------------- W. Wesley Groleau (Wes) Office: 219-429-4923 Magnavox - Mail Stop 10-40 Home: 219-471-7206 Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com --------------------------------------------------------------------------- Subject: Re: ...Unique ID To: "W. Wesley Groleau" Date: Wed, 17 Jul 1996 19:55:51 +0100 (BST) From: Ben Laurie Cc: genweb@UCSD.EDU In-Reply-To: <9607171822.AA09540@most> from "W. Wesley Groleau" at Jul 17, 96 01:22:45 pm Reply-To: ben@algroup.co.uk X-Mailer: ELM [version 2.4 PL24 PGP2] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID: <9607171955.aa00829@gonzo.ben.algroup.co.uk> W. Wesley Groleau wrote: > > :> What happens if archive a collects information from archive b and c. Are there > :> then considered to be three unique id's for each person simply because they > :> are in three archives? > :> > :> I may be wrong, but I think 'getting to a person easily' is being confused with > :> 'identifying a person uniquely'. > > Apparently we HAVE been confusing two topics. But I submit that "getting to a > [record about a] person easily" is of more interest to most of us. And identifying > a person uniquely is neither particularly useful nor particularly likely to > happen. I think that's fair enough. But what we need to do is to identify the record uniquely. So, making this (useful) distinction does not make the need for unique IDs go away. > > There are MANY ways to generate "unique" IDs. The problem is mapping them > one-to-one with "real" people. There is merit in the idea that when two > persons are discovered to be the same, one ID is aliased to the other > and neither is re-used. To extend this to when "one person" is discovered > to be two or more, how would you do that? Say I prove that ID xxxxx is really > two people. What happens to all references to xxxxx ? In a completely sane world, they'd point to a record which indicated that in fact this was two people (or records), each of which would have a new ID. > > If I have John Doe in my database, and you have John Doe in yours, and they happen > to be the same John Doe, what mechanism will ensure that they are assigned > the same ID? For that mechanism to be reliable, it will have to have enough > information about each record to be sure of identity. If my record contains > that much information, then I have no need for your record, so the unique ID > is too late to help me. It is clear that there is no way that the same person isn't going to get more than one ID. If you can thread your way through those negatives you'll agree that what is then needed is a way of mapping IDs onto each other. > > If we both have enough info to equate the two John Does, what if my standards > of proof are higher than those of the person coding the ID assigner? I disagree > that they are the same person, so I refuse to use the ID. Or the opposite: > they are assigned distinct IDs, but my standards of "proof" are not as high, so > I insist on using the same ID as yours. What GenWeb censor is going to raid > my server and prevent it? Interesting point. One way of dealing with that is to use cryptographic techniques to "sign" the ID, thus proving legitimate ownership. Another is to tie the ownership of the ID to a server which is authoritative for the location of the (record corresponding to the) ID. The former lends itself more naturally to distributed systems, IMO. > > :> Am I making any sense? > :> Todd Fries .. todd@miango.com > > Yes, you are, at least as far as your idea and rationale being understood. > It's a good attempt, but IMHO will not pan out to something workable. I think some people will disagree, me amongst them. History will be the judge. Cheers, Ben. -- Ben Laurie Phone: +44 (181) 994 6435 Freelance Consultant and Fax: +44 (181) 994 6472 Technical Director Email: ben@algroup.co.uk A.L. Digital Ltd, URL: http://www.algroup.co.uk London, England. Apache Group member (http://www.apache.org) From: TomRaynor@aol.com Received: by emout16.mail.aol.com (8.6.12/8.6.12) id PAA27292 for genweb@ucsd.edu; Wed, 17 Jul 1996 15:45:08 -0400 Date: Wed, 17 Jul 1996 15:45:08 -0400 Message-ID: <960717154507_362828795@emout16.mail.aol.com> To: genweb@UCSD.EDU Subject: Re: ...Unique ID In a message dated 96-07-17 09:39:58 EDT, mavrogeorge writes: << Responsibilities: GenWeb maintains the master site index. Individual sites assign individual numbers and notify GenWeb if the url of their site changes. >> This is good. Kind of best of both worlds. Local site assigns a local unique ID, while central site only tracks the relationship "local-site-name to current-url". Isn't this sort of like what Todd is saying? The "central authority" hands out the first part of the ID, and the local site adds on the remainder. If you don't see it, look at it this way. How about, instead of handing out the range "1,000,000 to 1,999,999", the central site handed out ranges like "1 to 2". The local site, instead of having only 999,999 numbers to assign, has an infinite range (1.01, 1.56, 1.03746, 1.999999999, etc.) Notice that if the central site handed out an "alias" (or "site ID") instead of the "1", these two plans look a lot alike. Unfortunately, we may still end up with circular logic. What if the "site ID" had to change? Say, company A buys out company B, and merges all their data? Or I pass on my data to my children? A "site ID" may be more stable than a DSN or URL, but will it really be long-term? Also, as more individuals are recorded, you don't have to go back very many generations before duplicates become the norm, not the exception. This adds an absolute requirement that duplicates be dealt with. Then there are the really big questions, like what about 10 years from now when the technology has changed in ways we can't imagine now? How about 50 years? How long do we REALLY need this data to be static? What/who will be here 50 years from now to keep it current? Maybe we should just stick with go with "URL.local-unique-id". It'll be around for as long as the URL, which may be the best we can do in these fast-changing times. Surely it wouldn't be too hard to have a robot run through the list of URLs from time to time and delete any that don't answer after some set number of tries? From: Todd Tyrone Fries To: ben@algroup.co.uk Subject: Re: ...Unique ID Date: Wed, 17 Jul 1996 13:28:14 -0500 (CDT) In-Reply-To: <9607171733.aa00517@gonzo.ben.algroup.co.uk> Message-Id: ReSent-Date: Wed, 17 Jul 1996 15:27:39 -0500 (CDT) ReSent-From: Todd Tyrone Fries ReSent-Reply-To: Todd Tyrone Fries ReSent-Subject: Re: ...Unique ID ReSent-To: genweb@UCSD.EDU ReSent-Message-ID: > > My $.02 only.. > > No, no. That was _my_ $.02. Give it back! ;-) Ok, ok! If you insist. But I'm going to get as much mileage out of it as I can before you can have it back.....hehehe.. > > I'll concede this is a problem. But how else could we possibly have > > organization without a 'secretary' so to speak? > In this context, it seems to me that there are two reasons for having an > organisation: > > 1. To define standards. > 2. To allocate IDs (or some component of them). > > How to generate a unique id? > > Suggestion: > > <8-digit hex ip|dns name|dotted decimal>: > > Yep, though I would say that the IP number is dangerous - at the current > state of play your allocated IP (if you have one at all - DHCP means more > and more people don't) is tied to your service provider, and so can end up > in someone else's hands. So there is the small risk that the numeric ip gets assigned to someone else. If they are actually serious about serving a genealogical database, they will be 'smart' enough to 'continue' assigning unique id's where the last 'tennant' of that ip left off. DHCP does not come into play, because if someone is serious enough to setup a database, it is most likely not going to be on a server where the ip is dynamically assigned every time the machine is rebooted. Archive servers, and such, I would hope are 'stable' enough that they intend to be reliable connections for at least some period of time. Ifnot, perhaps they need to 'borrow' a subset of namespace from another machine that is. DHCP is used in an ethernet'ed environment where machines need to be able to 'plug' into the network, and get an unused ip. It allows system administators a way to bypass temporarily some extra work. But if a machine boots on a network and gets an ip and then decides to setup shop as a web server, etc, either someone needs to design a DHCP server to change the nameserver dynamically (possible, but an ugly idea), or the administator needs to just give a bootp record to that particular machine so it gets a 'static' ip address. Perhaps the URN system will help in this regard. Perhaps an ip is not what we want to use because of the upcoming ipv6 and thus the ip addresses we use now will not necessarily be unique. I really should read up on the specs, but haven't yet, so I am not sure of all the options here. I maintain that for now, an ip-sheme will help us get started with unique id's for a person. It doesn't matter if a person is given a unique id on server a, which then moves the database to server b, which then adds more unique id's based on it's ip address, and then the databse is split and goes to server c and d. Wherever the data lies, each person has a unique id. If the new ipv6 makes the ip addresses non-unique, then we design a new scheme for assigning new unique id's, while the 'older' id's remain unduplicated. > Unfortunately, DNS is also reassignable (though this happens much less > often), so perhaps my suggestion of DNS is less than ideal. Domain names are human readable, but take up alot more storage space than a 4 byte ip address. You cannot have a 4 byte fully-qualified hostname, because a fully-qualified dns name uses a dot at the end, thus '.cz.' is 4 bytes before we get to even the 1st domain, let alone a hostname. Nit picky, I know, but just a thought to keep in mind. > A much more reliable unique ID can be got from the 6 byte MAC address of an > Ethernet card. These are globally unique, and unlikely to change hands > (and, indeed, cheap enough so that you could buy one for the purpose, then > burn it to ensure it was never reused). But what about the isdn people? To my knowledge they have an ip addres accessed over 1 to 2 serial ports. This doesn't = a hardware ethernet address. I would suggest this is not a good method; if everyone in the world had nothing but ethernet cards, it would be different. Too bad, because it is a good idea to use the globally unique ethernet card address. > I would be inclined to allow all three methods, at the discretion of the > user. > A fourth method, often suggested and almost never used, is to generate a > large random number. Personally, the only thing I care about is that a given site uses 'some' unique identifier, be it ip address, hardware address, random number, SOMETHING to prefix site-specific id's, so when they go belly-up and need to dump their database somewhere else, they can do so without worryinb about duplicate id numbers. So, after all this discussion, perhaps it all boils down to: Specifications for Genealogical Database Uniqueness 1. Each virtual archive shall be responsible for registering with genweb an id prefix for the purposes of guaranteeing that it is not duplicated elsewhere. 2. An archive must relate at the very least one unique id to every individual it contains information on. This provides for: a) for an individual being entered into the database without an already assigned unique id, one is assigned prefixed by the archive's id. b) for an archive that contains some set of data from a second archive, the first archive must retain the original unique id's without re-assigning new id's just because it is in a new database. c) 'at the very least one' provides for the ability to relate multiple id's to a single person. (this allows a person to be entered in two separate databases, assigned two separate id's, yet be considered the same person. This would allow genweb to point to specific archives, given an id because the 'assigned archive' part of the id would mean a specific virtual archive wherever it might be located. This would also allow me, for instance, to go out and search for all Fries's in all databases, and include them in my own database, and if I had some new cousins, etc, I could add them to either my database directly (after obtaining an archive id from genweb) or to someone else's archive and still add it to my own database, each person having a unique id. I now return your two, well used, cents, Ben... -- Todd Fries .. todd@miango.com Date: Wed, 17 Jul 1996 14:46:59 -0600 From: smcgee@sol.slcc.edu (Scott McGee (Personal)) Message-Id: <9607172046.AA24580@sol.slcc.edu.> To: genweb@UCSD.EDU Subject: Re: ...Unique ID Todd, A few questions (or points to ponder) regarding your thoughts on ID's: 1. I currently have a couple of hundred thousand names on my server in a couple of dozen databases. What if I lose my account? 1. a. I lose my account there, and set up the same info on another account. 1. b. I lose my account there and the info, but am able to get some of the info and set up a new server elsewhere with partly old, and partly new data (some data missing that was on the old site) 1. c. I say "to heck with it" and don't try to recreate the site elsewhere at all. 2. Many of the databases I host are from other people. There are known overlaps between them. What about these duplicate individuals? 3. Many of the databases I host exist in either identical or modified form elsewhere. What of those databases and the individuals in them? 4. I have one ancestor, Ezekiel Johnson, about whom little is known. We have identified two possible people who might be him. I'll refer to them as ej1750 and ej1754 (the number being the year they were born). My own database ends with what little is known about Ezekiel, and links to the two databases with info on the ancestors of the two posibilities. What if I identify ej1754 as my ancestor? I would likely fold the ej1754 database into my mcgee database and merge the two Ezekiel Johnsons. What does this do to your id? I have thought about id's a lot. I have never found a way to try to assign any permanant ID to an individual that is not as frought with problems as the current lack of such an ID. I don't say one doesn't exist, just that I can't seem to find one, nor have I seen anyone else propose one. It is my opinion that the very uncertainty of genealogical research makes such ID's unworkable. ID's that map to certain attributes known about a person have the benefit of making matches easier, but that seems to be the only benefit. Database move, get duplicated, modified, etc, and individual in databases may be merged, divided, deleted, added, or changed. All make any kind of lasting ID pretty meaningless. I hate to give up on a good idea, but for over a year I have thought about this issue without ever getting further than this. Do any of the rest of you suspect that we are trying to solve an insoluable problem? Scott GENEALOGY | Do you know who your ancestors are? | Scott McGee -----------+---------------------------------------+--------------------- email: smcgee@genealogy.org | What? Me speak for web: http://genealogy.org/~smcgee/homepage.html | someone else? Nah! ---------------------------------------------------+--------------------- See my genealogy page at http://genealogy.org/~smcgee and my GenWeb page at http://genealogy.org/~smcgee/genweb Date: 17 Jul 96 17:13:54 EDT From: N Oughtibridge <100020.1117@CompuServe.COM> To: GENWEB List Cc: Marthe Arends , JOHN R BANCROFT <74601.420@CompuServe.COM>, "Charles A. Barker" , "C.W.Trowbridge" <100013.3356@CompuServe.COM>, garland clark , Jack Clements , "John A. Colburn" <72057.1504@CompuServe.COM>, Elaine J Cornell , John G Cowan , "\"Jack Cross\"" , "\"N. Linn Hendershot\"" , Stam Hill , Hodgman , Rick D Ingersoll , "INTERNET:BMarlatt@aol.co" , "M. Johnson" , Tom Ledoux , "\"Peter Middleton\"" , "Ted Weeks / N.C." <74106.2627@CompuServe.COM>, "Mark R. Schears" , Gregg Schlaudecker , Dennis Thrush , Daniel Vulkan Subject: uFTi Version 1.1 is now available Message-ID: <960717211354_100020.1117_EHV98-1@CompuServe.COM> Just a quick note - I have included support for Gene Stark's GENDEX.TXT index files in ufti as a minor version release. Full details of Gene's GENDEX service are at http://www.gendex.com Those of you who used Version 1.0 can get the replacement exe file from http://ourworld.compuserve.com/homepages/oughtibridge/ufti32ug.zip Penn State will be serving it very shortly at ftp://ftp.cac.psu.edu/genealogy/windows/ufti32ug.zip The full installation is going to be at ftp://ftp.cac.psu.edu/genealogy/windows/ufti32.zip with a July date stamp on the file The Windows 3.1 version is going to be at ftp://ftp.cac.psu.edu/genealogy/windows/ufti16.zip All of these files are also available in the Genealogy Forum of CompuServe. A number of bugs have also been fixed, including one which caused problems with Netscape. Nicholas ---------- Nicholas Oughtibridge is the author of uFTi, a Windows program to generate World Wide Web pages from GEDCOM files. See HTTP://ourworld.compuserve.com/homepages/oughtibridge Email 100020.1117@compuserve.com Date: Wed, 17 Jul 1996 15:20:10 -0600 From: smcgee@sol.slcc.edu (Scott McGee (Personal)) Message-Id: <9607172120.AA24641@sol.slcc.edu.> To: genweb@UCSD.EDU, todd@miango.com Subject: Re: ...Unique ID Todd, For some of the same reasons enumerated in my last post, and some others, I also feel that a unique personal ID is either currently or ultimately unworkable. Lets again take my ancestor Ezekiel Johnson. Many genealogies I have seen assume that ej1750 is this Ezekiel. There is evidence that one branch of the family, at one point, simply "decided" this was the case. Nobody has ever found any evidence that he is my ancestor. Other genealogies list ej1754, but usually note that it is uncertain. So, we obviously have at least two individuals. Lets just say ej1750 is assigned that as an ID, and similarly for ej1754. What about my ancestor? Is he ej1750, ej1754, or something else. Who's data do we use to make the determination? I have another ancestor who got merged in someones database with his own grand- father with the same name. This data was submitted to the LDS Ancestral Files. Thus, most databases show this one person married to one woman, and then many years latter, married to another. Lets assume that someone obtains (how?) an ID for this person, and then, looking at the dates and historical info, finds the mistake. We now have two people with one ID. Do we give one that ID and make a new one for the other (which gets the old one?) or do we make new ID's for both? What of old data pointing to one of them? What about when someone finds that through mis-interpretation of data, a fictitious individual has been created and decides to remove them. What of the ID assigned to them? Many other scenarios where one person is thought to be the same as another and then found not to be, or one record is found to be a combination of two (or more) others. This happens in a single database and between databases. Often, genealogists disagree on which case is correct. Sometimes two individuals will have been merged, seperated, merged, seperated so many times that sorting it out becomes hopeless. We deal with a very subjective subject (no pun intended) here, and trying to nail down an ID for someone is the very point of it. Makes trying to assign one rather difficult. Scott GENEALOGY | Do you know who your ancestors are? | Scott McGee -----------+---------------------------------------+--------------------- email: smcgee@genealogy.org | What? Me speak for web: http://genealogy.org/~smcgee/homepage.html | someone else? Nah! ---------------------------------------------------+--------------------- See my genealogy page at http://genealogy.org/~smcgee and my GenWeb page at http://genealogy.org/~smcgee/genweb Date: 17 Jul 96 17:13:59 EDT From: N Oughtibridge <100020.1117@CompuServe.COM> To: "\"Harold A. Driscoll\"" Cc: GENWEB List Subject: Re: Robots Searching GenWeb Sites Message-ID: <960717211358_100020.1117_EHV98-2@CompuServe.COM> I shall endeavour to include the option to produce the Alta Vista META tags in a future version of uFTi, however Brian has previously raised a very valid point - how do you know what is interesting? If you are interested in occupations or locations then the full page is interesting, however when you then search for all smiths, you cant dig through all of the data that Alta Vista spews back. My Solution will eventually be to allow the user to choose the types of file to index, for example my full surname index, each individual surname's index of first names and each person's page. I will then, probably at a later date, have a keywords field. If I remember correctly, Alta Vista also allows you to specify a different title for the pages. Nicholas ----------- Nicholas Oughtibridge is the author of uFTi, a Windows program to generate World Wide Web pages from GEDCOM files. See HTTP://ourworld.compuserve.com/homepages/oughtibridge Email 100020.1117@compuserve.com Subject: Re: ...Unique ID To: Scott McGee Date: Wed, 17 Jul 1996 22:21:41 +0100 (BST) From: Ben Laurie Cc: genweb@UCSD.EDU In-Reply-To: <9607172046.AA24580@sol.slcc.edu.> from "Scott McGee" at Jul 17, 96 02:46:59 pm Reply-To: ben@algroup.co.uk X-Mailer: ELM [version 2.4 PL24 PGP2] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID: <9607172221.aa01178@gonzo.ben.algroup.co.uk> Scott McGee wrote: > > Todd, > > A few questions (or points to ponder) regarding your thoughts on ID's: > > 1. I currently have a couple of hundred thousand names on my server in a couple > of dozen databases. What if I lose my account? > > 1. a. I lose my account there, and set up the same info on another account. Existing records retain their existing ID, new records use an ID derived from a new base. Of course, there's no reason that your IDs should be tied to your account (they could be tied to an Ethernet card you own, a subdomain issued by genweb.org, or whatever). > 1. b. I lose my account there and the info, but am able to get some of the info > and set up a new server elsewhere with partly old, and partly new data > (some data missing that was on the old site) See above. > 1. c. I say "to heck with it" and don't try to recreate the site elsewhere at > all. Then the IDs become stale. > > 2. Many of the databases I host are from other people. There are known overlaps > between them. What about these duplicate individuals? We need a mechanism to map IDs to each other. This is one of the harder problems, but not insoluble, IMHO. I'm already doing it in my own databases. > > 3. Many of the databases I host exist in either identical or modified form > elsewhere. What of those databases and the individuals in them? Again, tricky. Perhaps we also need version numbers to go with the IDs. Or perhaps modified version should be given new IDs and the old and new IDs mapped to each other. > > 4. I have one ancestor, Ezekiel Johnson, about whom little is known. We have > identified two possible people who might be him. I'll refer to them as ej1750 > and ej1754 (the number being the year they were born). My own database ends > with what little is known about Ezekiel, and links to the two databases with > info on the ancestors of the two posibilities. What if I identify ej1754 as my > ancestor? I would likely fold the ej1754 database into my mcgee database and > merge the two Ezekiel Johnsons. What does this do to your id? Just a mapping. > > I have thought about id's a lot. I have never found a way to try to assign any > permanant ID to an individual that is not as frought with problems as the > current lack of such an ID. I don't say one doesn't exist, just that I can't > seem to find one, nor have I seen anyone else propose one. > > It is my opinion that the very uncertainty of genealogical research makes such > ID's unworkable. ID's that map to certain attributes known about a person have > the benefit of making matches easier, but that seems to be the only benefit. > Database move, get duplicated, modified, etc, and individual in databases may > be merged, divided, deleted, added, or changed. All make any kind of lasting > ID pretty meaningless. > > I hate to give up on a good idea, but for over a year I have thought about this > issue without ever getting further than this. Do any of the rest of you suspect > that we are trying to solve an insoluable problem? I don't think it is insoluble. I do think it is hard. I think we have outlined a line of attack: 1. Assign unique IDs to each record. 2. Locate data given the unique ID. 3. Map unique IDs to each other. If we could all agree that 1 is necessary, and at least vaguely how to do it, then we could proceed to the more interesting problems, 2 and 3. Cheers, Ben. > > Scott > > GENEALOGY | Do you know who your ancestors are? | Scott McGee > -----------+---------------------------------------+--------------------- > email: smcgee@genealogy.org | What? Me speak for > web: http://genealogy.org/~smcgee/homepage.html | someone else? Nah! > ---------------------------------------------------+--------------------- > See my genealogy page at http://genealogy.org/~smcgee > and my GenWeb page at http://genealogy.org/~smcgee/genweb -- Ben Laurie Phone: +44 (181) 994 6435 Freelance Consultant and Fax: +44 (181) 994 6472 Technical Director Email: ben@algroup.co.uk A.L. Digital Ltd, URL: http://www.algroup.co.uk London, England. Apache Group member (http://www.apache.org) From: "W. Wesley Groleau (Wes)" Subject: Re: ...Unique ID To: genweb@UCSD.EDU Date: Wed, 17 Jul 96 17:20:26 EST In-Reply-To: <9607171955.aa00829@gonzo.ben.algroup.co.uk>; from "Ben Laurie" at Jul 17, 96 7:55 pm Mailer: Elm [revision: 70.85] Ben, you're cunfusing me again. You talked about a unique ID for a person, and we thought you were talking about the stale links problem. (Which is why I added [record about a] when quoting you.) Then I quoted what made me think you actually did mean "unique ID for a PERSON" and said that this was not as important as a non-volatile link to data. But you responded that the stale links problem requires a RECORD have a unique ID that doesn't go stale. But as your response progressed, you drifted back to PERSON. But your last comment says that the ID should be tagged -- you know, I'm so confused I can't even remember which was my objection to that last paragraph. What exactly ARE we talking about? The details, pros, cons, and ramifications are very different according to whether we are talking about 1. unique non-volatile IDs for specific records that help us get to a record we want to link to and therefore are not re-used for a DIFFERENT record about the SAME person. OR 2. a unique ID for a PERSON that is to be re-used in every record that relates to that person, and which is of no value to Jennie Ologist if she doesn't agree with the "standard" identification of that person. Can't use arguments FOR (against) item two to support (shoot down) item one. Item 2 makes me remember my objection: :> Interesting point. One way of dealing with that is to use cryptographic :> techniques to "sign" the ID, thus proving legitimate ownership. Another is to :> tie the ownership of the ID to a server which is authoritative for the location :> of the (record corresponding to the) ID. The former lends itself more naturally :> to distributed systems, IMO. If each RECORD is to have a unique ID, then this would indeed prevent repeating an ID from confusing the system. Such repetition might be from malice, or it could be that the "guilty" party thought the ID referred to the PERSON. If each PERSON is to have a unique ID, then a crypto tag that makes the ID "not-to-be-trusted" when seen on the "wrong" server would defeat the purpose of allowing people to mark that person as THAT person. In one of your posts, you mentioned being able to search the Web for the ID in order to find more info on the person. that would be option two, which I am trying to say won't work because it requires you to already have the info you're looking for just to find out what the ID string IS. --------------------------------------------------------------------------- W. Wesley Groleau (Wes) Office: 219-429-4923 Magnavox - Mail Stop 10-40 Home: 219-471-7206 Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com --------------------------------------------------------------------------- From: "W. Wesley Groleau (Wes)" Subject: Re: ...Unique ID To: genweb@UCSD.EDU Date: Wed, 17 Jul 96 17:29:36 EST In-Reply-To: <9607172046.AA24580@sol.slcc.edu.>; from "Personal)" at Jul 17, 96 2:46 pm Mailer: Elm [revision: 70.85] :> I hate to give up on a good idea, but for over a year I have thought about this :> issue without ever getting further than this. Do any of the rest of you suspect :> that we are trying to solve an insoluable problem? I suspect that BOTH problems are unsolvable. But if I'm wrong, solving either one of them will be made all the more difficult if no one's sure WHICH problem the other guy is talking about! By BOTH I mean 1. Identifying a RECORD so that it can be found again 2. Identifying a PERSON so that records about him/her can be found. --------------------------------------------------------------------------- W. Wesley Groleau (Wes) Office: 219-429-4923 Magnavox - Mail Stop 10-40 Home: 219-471-7206 Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com --------------------------------------------------------------------------- Date: Wed, 17 Jul 1996 16:09:07 -0700 (PDT) From: Annelise Anderson To: N Oughtibridge <100020.1117@compuserve.com> cc: "\"Harold A. Driscoll\"" , GENWEB List Subject: Re: Robots Searching GenWeb Sites In-Reply-To: <960717211358_100020.1117_EHV98-2@CompuServe.COM> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Before you jump to conclusions about what's useful and what's not to index, try altavista.digital.com as a genealogy search engine perhaps using advanced search--try a surname and add as search terms "mother" and "father," eg Anderson AND mother AND father Anderson, Smith, etc., might be a little too common to get you genealogy pages in response, but anything a little less common will provide immediate hits. Annelise Date: Wed, 17 Jul 1996 18:04:34 -0500 (CDT) From: Todd Tyrone Fries Sender: tfries@umr.edu Reply-To: Todd Tyrone Fries Subject: Re: ...Unique ID To: genweb@UCSD.EDU In-Reply-To: <9607172120.AA24641@sol.slcc.edu.> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII > Todd, Gene, Being of the computer science type (typical or not) I tend to try to explore a possibility to it's furthest if it seems like a good idea to me, until convinced otherwise. Please excuse me while I try to seek possible fixes for the delimas you profess deem a unique id approach unworkable....ok? > Lets again take my ancestor Ezekiel Johnson. Many genealogies I > have seen assume that ej1750 is this Ezekiel. There is evidence that one > branch of the family, at one point, simply "decided" this was the case. > Nobody has ever found any evidence that he is my ancestor. Other > genealogies list ej1754, but usually note that it is uncertain. So, we > obviously have at least two individuals. Lets just say ej1750 is assigned > that as an ID, and similarly for ej1754. What about my ancestor? Is he > ej1750, ej1754, or something else. Who's data do we use to make the > determination? If I were making the call (and for the sake of my discussion, I am), I would say that 'ej1750' and 'ej1754' were both separate id's, if not individuals. Then I would say that if you want to be related to ej1754 in your database, and I want you to be related to ej1999 in my database, then we both at least know what we are talking about. The point is, once every 'unique entity identified as a person' is assigned an id, every database in the world could have a random number generator make family relationships, but we would still all be talking about the same individuals. This is precisely why I find a unique id useful. > I have another ancestor who got merged in someones database with his own > grand- father with the same name. This data was submitted to the LDS > Ancestral Files. Thus, most databases show this one person married to one > woman, and then many years latter, married to another. Lets assume that > someone obtains (how?) an ID for this person, and then, looking at the dates > and historical info, finds the mistake. We now have two people with one ID. > Do we give one that ID and make a new one for the other (which gets the old > one?) or do we make new ID's for both? What of old data pointing to one of > them? As someone pointed out, it would be useful to allow not only a set of id's to refer to one person, but a single id to state there was perhaps a mixup and thus the two new id's could be used to describe each of the supposed two new individuals that perhaps were once thought to be one. So, in sync with your chronological description, there could be jf1 as the grandfather as originally thought. Then, oh my gosh, we have two people instead of one, oops, somebody messed up. The real grandfather could be jf2, and the ancestor could be jf3. So to refer to the mixedup original person, jf1, for anybody with an old database. But someone with a newer database would have jf2 and jf3, and everybody knows that jf1 is a mistake and jf2 and jf3 are correct. But if someone wants to be stubborn in their database and refer back to jf1 as the correct person, because it was that way when it was submitted, they can do so if they wish. Clear enough so far? > What about when someone finds that through mis-interpretation of data, a > fictitious individual has been created and decides to remove them. What of > the ID assigned to them? Many other scenarios where one person is thought > to be the same as another and then found not to be, or one record is found > to be a combination of two (or more) others. This happens in a single > database and between databases. Often, genealogists disagree on which case > is correct. Sometimes two individuals will have been merged, seperated, > merged, seperated so many times that sorting it out becomes hopeless. If a fictitious individual were assigned an id, an id is an id, and imho with a unique id system, once given out is set in stone 'data'. However, if someone realizes, hrm, this person never existed, then I would say to put a note in their database information about that individual that the person does not exist. If others believe the new data, they'll make note of it as well. Other scenarios... p1 is an id, p2 is an id, and someone says p1 and p2 are the same, so we have a note, record, pointer what-have-you that says p1 is the same as p2. Then someone else says, no, really, they're two separate individuals. So another note is made. If genealogists disagree on what id is the same person and what id is not, they can disagree all they want. We each have our own personal databases that we believe are correct, don't we? > We deal with a very subjective subject (no pun intended) here, and trying > to nail down an ID for someone is the very point of it. Makes trying to > assign one rather difficult. It really depends on how you define an id. If each id has a one-to-one relationship to a single person, it would have to be a very good morning in the idealistic world before that ever happend, I will agree with you. However, without computers, how do you keep track of genealogical data? You find a paper trail, a record of some kind, you try to go back to the primary sources, and find out who existed and how much information can you gather that is authoritative, etc, etc. Ok, so you start out with some data about a person. Then you decide the data is inaccurate. Do you trash and burn your original data? No, you modify it, you add to it. In an ideal genealogical record system, you write down all the data you have. Then if you need to change or modify it any, you simply add to what is already there. You don't wipe anything out, you don't erase, or smite out information because on a whim you decided it was inaccurate. Perhaps it is, but is was data you spent hard work getting, so why should you throw it away? I know this is getting a little obsurd, but bear with me :-) ... if the unique id system were implemented correctly, it would start out by everybody enumerating their unique individuals at their respective archives, erring on the side of extra id's where messes (like the ones you describe above) are found. Then we start looking over everyone else's data. And we say, hrm, it appears that that person in your data is this person in my data. So I inform you of my observations, and since I'm convinced, I mark some note in my data that I (Todd Fries, on July 17, 1996) presume to have found a relationship between 'todd1' and 'mcgee1' be it father-son, same-person, etc. You can add it in your database as correct information, add it as a possible but conflicting, or however you wish to denote it. After all, is that not what we do anyway? I say hrm, your great uncle looks like he's the same person as my great uncle. So we discuss it, exchange information, and perhaps exchange data on other relatives in the process. When I find another person who also seems to be related to this great uncle, I let them know about the family ties I found in your database, pluse what I have in my database. We all know who's database the information came from, which individuals were in what databases originally, but we all refer to the same people. But suppose Johnny Slicktalk comes along and tries to convince this 'other' person that infact I and they are not related. They write down the supporting evidence in their database, perhaps shairing it with me, and we now have more information. Perhaps not accurate, but at least we know what we are talking about. The point of all of this is so that from an 'archive:person' unique id system can grow a set of data that is the collective opinions of the genealogists. Every one is going to have a separate view of at least one or two parts of the data 'out there', so they keep their own local copies. Your example of the LDS is exactly this. The cdrom is burned with innacurate information. So you have your own data that corrects this, and you point out to as many people as you can what is wrong. The same with unique id's. Some massive says 'archive1:person1 = archive2:person2 because Johnny Slicktalk said so'. But you say 'archive1:person1 is not archive2:person2 because you cannot be your own grandfather'. At least we all know the 'personal data' being referred to. Perhaps that is a good point. If you think of an id as being a person, it doesn't really make sense. But if an id referrs to a set of 'personal data' about what is thought to be an individual, then all sorts of wierd relationships can be recored in that one set of 'personal data'. Thus we can let the computer take care of what it is good at dealing with: data. And we can worry about how the data fits together. I am sorry for being so long winded. My mind thinks and my fingers type and I get a lot of words out in the process. Comments please.... -- Todd Fries .. todd@miango.com Date: Wed, 17 Jul 1996 23:15:05 EDT Message-ID: To: beaur@cam.org CC: GenWeb@UCSD.EDU In-reply-to: (message from Denis Beauregard on Tue, 16 Jul 1996 11:32:16 -0400 (EDT)) Subject: Re: Robots Searching GenWeb Sites From: "Michael A. Patton" Reply-To: "Michael A. Patton, genealogy" Date: Tue, 16 Jul 1996 11:32:16 -0400 (EDT) From: Denis Beauregard On Tue, 16 Jul 1996, Michael A. Patton wrote: > My theory on how to approach this is to use the robot control stuff(*) > to limit the robot to specific pages. These would include some kind > of overview and also a local index. I'd make these pages well linked There is no "local index". I don't know about you, but on my genealogy pages (not presently accessable from the net), there _is_ an index. That's the page I meant, an index that is local to that genealogy collection. From what I read, there is a list of files to read or not Actually, you can specify whole trees... I'd tell it not to index the URLs that are detail pages, all are under the same prefix in my setup. Since the aforementioned index is not in that part of the tree, it _will_ be indexed. for the whole server, You are right, you need your server administrator to do this for you. I usually ignore that since on the server my genealogy is on, *I'm* the admin (after all, it's in my den :-). > into the descriptive info. If the index (etc.) pages that you let the > robot look at are sufficiently well constructed, this should allow > most searchers to find any of your pages, without the need for the This may work for rare surnames, but if you are looking for a smith, it won't help at all... I'm not sure what your point is, if the index can't usefully be used for searches, it's not "sufficiently well constructed" by my definition. After all, that's what it's for. On the other hand, I don't see how having all the individual pages addresses this question, either. In fact it's even worse, rather than a reference to my one index page where you can quickly look down the Smiths (I do have some on my charts :-) and see if any look likely, you instead get references to potentially hundreds of pages and have to go look at each one. I'm quite aware of these class of problems, I recently played around with looking for _my_ name in Altavista, many many hundreds of pages to wade through. -MAP Date: Wed, 17 Jul 1996 22:54:16 -0700 (PDT) From: Annelise Anderson To: "Michael A. Patton, genealogy" cc: beaur@cam.org, GenWeb@UCSD.EDU Subject: Re: Robots Searching GenWeb Sites In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII On Wed, 17 Jul 1996, Michael A. Patton wrote: > each one. I'm quite aware of these class of problems, I recently > played around with looking for _my_ name in Altavista, many many > hundreds of pages to wade through. > > -MAP I tried an advanced altavista search with Patton AND mother AND father with Patton mother father in the sort criteria field and got lots of genealogy pages mentioning (presumably) Patton before I started getting stuff on General Patton of WWII fame-- Annelise Date: Wed, 17 Jul 1996 20:46:25 -0700 To: GENWEB List From: Jeff Murphy Subject: Re: Robots Searching GenWeb Sites At 05:13 PM 7/17/96 EDT, N Oughtibridge wrote: >I shall endeavour to include the option to produce the Alta Vista META tags in a >future version of uFTi, however Brian has previously raised a very valid point - >how do you know what is interesting? If you are interested in occupations or >locations then the full page is interesting, however when you then search for I do not think the META tags are a good answer. They place the responsibility for defining the key words on the page developer, rather than on the search engine. This is the wrong way to solve any problem. One does not make the user (in this case the developer) responsible for satisfying the software (the search engine), one makes the software develop a solution which meets the needs of the user. Jeff Murphy 735 NW 8th Redmond, Oregon 97756 h. (541) 548-4478 Specializing in the genealogy of Muhlenberg Co., KY and cat-herding USA GenWeb Project: http://www.teleport.com/~jmurphy/states.html http://www.dsenter.com/lists/states.html to subscribe to mailing lists "Where there is no vision, the people perish" (Prov. 29:18) Date: Wed, 17 Jul 1996 20:46:32 -0700 To: genweb@UCSD.EDU From: Jeff Murphy Subject: Re: ...Unique ID At 02:46 PM 7/17/96 -0600, Scott McGee (Personal) wrote: >It is my opinion that the very uncertainty of genealogical research makes such >ID's unworkable. ID's that map to certain attributes known about a person have >the benefit of making matches easier, but that seems to be the only benefit. >Database move, get duplicated, modified, etc, and individual in databases may >be merged, divided, deleted, added, or changed. All make any kind of lasting >ID pretty meaningless. > >I hate to give up on a good idea, but for over a year I have thought about this >issue without ever getting further than this. Do any of the rest of you suspect >that we are trying to solve an insoluable problem? John Rigdon, Pam Carey, and I have been working on a test case of 110,000 names, where we can test various approaches to the index. It doesn't really matter, as others have said, what the id is, as long as it can be tied to the individual, and pointed toward the database. The test case involves 4 databases on three different providers, with known duplicates. If we can find a way to process these successfully - matching the ones that should, rejecting the others - we will then be ready to try it on a larger sample. I would hope we could use all of your databases to test the larger sample. The problem here has always seemed to be that there is too much concentration on the theoretical, and not enough trying. Too much talk, too little action. Once we have the test case software in place, we can try different approaches to the development of a unique index, as far as that is possible. Now, what happens if we can't make the id unique? Well, let's cross that bridge when we come to it. But I don't think the problem is insoluable. It's just a matter of a little work. If we can get the cooperation of those with major projects, like yours and Cliff Manis', once the testing is done we should find ourselves able to index the entire world. Granted, it is one thing to index, and another to hit against that index as the html is being generated (or before in a preprocessing pass, as I have suggested earlier). But establishing a master index seems to be a concensus at this point - and frankly, even if it weren't we're going to go ahead with our project. :-) I should say here that your work to date has been what has inspired me to think about all this, and to do what I've done on the web to this end. So don't think your time has been wasted. Jeff Murphy 735 NW 8th Redmond, Oregon 97756 h. (541) 548-4478 Specializing in the genealogy of Muhlenberg Co., KY and cat-herding USA GenWeb Project: http://www.teleport.com/~jmurphy/states.html http://www.dsenter.com/lists/states.html to subscribe to mailing lists "Where there is no vision, the people perish" (Prov. 29:18) Subject: Re: ...Unique ID To: "W. Wesley Groleau" Date: Thu, 18 Jul 1996 08:21:13 +0100 (BST) From: Ben Laurie Cc: genweb@UCSD.EDU In-Reply-To: <9607172220.AA13626@most> from "W. Wesley Groleau" at Jul 17, 96 05:20:26 pm Reply-To: ben@algroup.co.uk X-Mailer: ELM [version 2.4 PL24 PGP2] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID: <9607180821.aa02119@gonzo.ben.algroup.co.uk> W. Wesley Groleau wrote: > > Ben, you're cunfusing me again. You talked about a unique ID for a person, > and we thought you were talking about the stale links problem. (Which is why I > added [record about a] when quoting you.) Then I quoted what made me think > you actually did mean "unique ID for a PERSON" and said that this was not as > important as a non-volatile link to data. OK, to make it perfectly clear. I am advocating unique IDs for RECORDs, not for PERSONs. I agree that the latter is impossible. > > But you responded that the stale links > problem requires a RECORD have a unique ID that doesn't go stale. But as your > response progressed, you drifted back to PERSON. But your last comment says that > the ID should be tagged -- you know, I'm so confused I can't even remember which > was my objection to that last paragraph. > > What exactly ARE we talking about? The details, pros, cons, and ramifications > are very different according to whether we are talking about > > 1. unique non-volatile IDs for specific records that help us get to a record > we want to link to and therefore are not re-used for a DIFFERENT record about > the SAME person. > > OR > > 2. a unique ID for a PERSON that is to be re-used in every record that relates to > that person, and which is of no value to Jennie Ologist if she doesn't agree > with the "standard" identification of that person. We are talking about 1. > > Can't use arguments FOR (against) item two to support (shoot down) item one. > > Item 2 makes me remember my objection: > > :> Interesting point. One way of dealing with that is to use cryptographic > :> techniques to "sign" the ID, thus proving legitimate ownership. Another is to > :> tie the ownership of the ID to a server which is authoritative for the location > :> of the (record corresponding to the) ID. The former lends itself more naturally > :> to distributed systems, IMO. > > If each RECORD is to have a unique ID, then this would indeed prevent repeating > an ID from confusing the system. Such repetition might be from malice, or it > could be that the "guilty" party thought the ID referred to the PERSON. > > If each PERSON is to have a unique ID, then a crypto tag that makes the ID > "not-to-be-trusted" when seen on the "wrong" server would defeat the purpose > of allowing people to mark that person as THAT person. In one of your posts, > you mentioned being able to search the Web for the ID in order to find more > info on the person. that would be option two, which I am trying to say won't > work because it requires you to already have the info you're looking for just > to find out what the ID string IS. Agreed. But since we are talking about RECORDs and not PERSONs are you now happy? My apologies for not making this clear. Having realised that this is the way to go some considerable time ago I have almost stopped distinguishing the terms in my mind (that is, often when I say PERSON I really mean "one particular archivist's record of one particular person". But indeed, even that is inaccurate [as even an archivist can make mistakes and combine two people into one, et al]). I'll try to be clearer in future. Keep nagging me! Cheers, Ben. > > --------------------------------------------------------------------------- > W. Wesley Groleau (Wes) Office: 219-429-4923 > Magnavox - Mail Stop 10-40 Home: 219-471-7206 > Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com > --------------------------------------------------------------------------- -- Ben Laurie Phone: +44 (181) 994 6435 Freelance Consultant and Fax: +44 (181) 994 6472 Technical Director Email: ben@algroup.co.uk A.L. Digital Ltd, URL: http://www.algroup.co.uk London, England. Apache Group member (http://www.apache.org) From: "W. Wesley Groleau (Wes)" Subject: Re: Robots Searching GenWeb Sites To: genweb@UCSD.EDU Date: Thu, 18 Jul 96 7:36:09 EST In-Reply-To: ; from "Michael A. Patton" at Jul 17, 96 11:15 pm Mailer: Elm [revision: 70.85] :> definition. After all, that's what it's for. On the other hand, I :> don't see how having all the individual pages addresses this question, :> either. In fact it's even worse, rather than a reference to my one :> index page where you can quickly look down the Smiths (I do have some :> on my charts :-) and see if any look likely, you instead get :> references to potentially hundreds of pages and have to go look at :> each one. I'm quite aware of these class of problems, I recently :> played around with looking for _my_ name in Altavista, many many :> hundreds of pages to wade through. OK, would you like to find INDEXes that have Smith in them? I'm too lazy to try this myself, but what would Alta Vista do if you asked it to find pages with Smith AND Smith AND Smith AND Smith AND Smith AND Smith AND Smith AND Smith AND Smith AND Smith AND Smith AND Smith AND Smith AND Smith AND .... --------------------------------------------------------------------------- W. Wesley Groleau (Wes) Office: 219-429-4923 Magnavox - Mail Stop 10-40 Home: 219-471-7206 Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com --------------------------------------------------------------------------- From list-relay@UCSD.EDU Thu Jul 18 06:02:50 1996 Received: from UCSD.EDU (mailbox1.ucsd.edu [132.239.1.53]) by fuji.ucsd.edu (8.6.9/8.6.9) with ESMTP id GAA02807 for ; Thu, 18 Jul 1996 06:02:49 -0700 Received: from none.at.helo (gw1.hughes-defense-comm.com [151.168.2.3]) by UCSD.EDU (8.7.5/8.6.9) with SMTP id FAA17382 for ; Thu, 18 Jul 1996 05:59:47 -0700 (PDT) Received: by most.fw.hac.com (4.1/SMI-4.1) id AA24159; Thu, 18 Jul 96 07:59:42 EST Received: from unknown(151.168.254.82) by gw1 via smap (V1.3mjr) id smI024069; Thu Jul 18 07:57:48 1996 Received: by most (4.1/SMI-4.1) id AA02866; Thu, 18 Jul 96 07:57:47 EST Message-Id: <9607181257.AA02866@most> Received: from pseserv3.fw.hac.com(151.168.254.223) by most via smap (V1.5khhunt) id sma002848; Thu Jul 18 07:56:43 1996 Received: by pseserv3 (1.37.109.4/16.2) id AA02711; Thu, 18 Jul 96 07:56:58 -0500 From: "W. Wesley Groleau (Wes)" Subject: Re: ...Unique ID To: genweb@UCSD.EDU Date: Thu, 18 Jul 96 7:56:55 EST In-Reply-To: <9607180821.aa02119@gonzo.ben.algroup.co.uk>; from "Ben Laurie" at Jul 18, 96 8:21 am Mailer: Elm [revision: 70.85] :> Agreed. But since we are talking about RECORDs and not PERSONs are you now :> happy? Sort of. :-) having read some other recent posts, and doing some more thinking, I also suspect that a solution is difficult but possible. As long as we remember we're identifying the records, not the actual people. Before I drop the identifying people problem altogether, let me repeat one thing: Suppose XXX is the ID identifying a specific record. Then it will NOT be of any use at all to someone wanting to search the net for a specific record. And (if the person problem could be solved) suppose ZZZ is the ID for a specific person in my database. Then it would be of limited use in searching the net for more data on that person, because in order for another record to obtain the ID ZZZ, it would have to have a LARGE amount of data identical to my record. This is why it is important to make the distinction. It may be that it was obvious to the rest of you what you were talking about. Maybe I just take things too literally. But I would urge us all to think when we say 'record' or 'person' which we are really talking about. I don't generally save these posts, but I am sure that at least one of the posts in this thread (the one that mentioned searching the net for a PERSON's ID) was in fact talking about IDs for PERSONS. :> My apologies for not making this clear. Having realised that this is the way :> to go some considerable time ago I have almost stopped distinguishing the :> terms in my mind (that is, often when I say PERSON I really mean "one :> particular archivist's record of one particular person". But indeed, even that :> is inaccurate [as even an archivist can make mistakes and combine two people :> into one, et al]). :> :> I'll try to be clearer in future. Keep nagging me! Apology accepted. I understand now. Even the best of us--which may or may not inlude me :-) --can mis-use words. Someone could conceivably flame me for using "say" instead of "type" above. :-) And I will not 'nag' about the 'wrong word' if I think the meaning is clear. --------------------------------------------------------------------------- W. Wesley Groleau (Wes) Office: 219-429-4923 Magnavox - Mail Stop 10-40 Home: 219-471-7206 Fort Wayne, IN 46808 elm (Unix): wwgrol@pseserv3.fw.hac.com --------------------------------------------------------------------------- To: genweb@UCSD.EDU From: mbr@dadd.ti.com (Martin Roberts) Subject: Re: Robots Searching GenWeb Sites Date: Thu, 18 Jul 1996 10:48:54 Message-ID: In article Annelise Anderson writes: >I tried an advanced altavista search with >Patton AND mother AND father >with Patton mother father in the sort criteria field and got >lots of genealogy pages mentioning (presumably) Patton before >I started getting stuff on General Patton of WWII fame-- >Annelise Hi, Annelise. I agree that there is a lot to find in terms of genealogy web sites. But I don't think altavista is an answer for specific name search. I have tried to use it to locate cousins, business associates, and other individuals. I discovered that there are some very large association files that have been indexed. The worst one is some dog owners association. Another is the various university student enrollments. There are also various state government lists. These associations seem to have a million or more members. When I do name searches I find many matches in these files, and too many of the total matches. It would be nice to have a generic tag for genealogy sites so you could search for "name" AND "gentag". This would avoid all the extraneous large lists. Because of the way search engines work, this tag would need to appear in every page that had searchable name records. Martin Roberts Date: Thu, 18 Jul 1996 10:57:53 -0700 (PDT) From: Annelise Anderson To: Martin Roberts cc: genweb@UCSD.EDU Subject: Re: Robots Searching GenWeb Sites In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII On Thu, 18 Jul 1996, Martin Roberts wrote: > In article Annelise Anderson writes: > > > > >I tried an advanced altavista search with > >Patton AND mother AND father > >with Patton mother father in the sort criteria field and got > >lots of genealogy pages mentioning (presumably) Patton before > >I started getting stuff on General Patton of WWII fame-- > > >Annelise > > Hi, Annelise. I agree that there is a lot to find in terms of genealogy web > sites. But I don't think altavista is an answer for specific name search. I > have tried to use it to locate cousins, business associates, and other > individuals. I discovered that there are some very large association files > that have been indexed. The worst one is some dog owners association. Another > is the various university student enrollments. There are also various state > government lists. These associations seem to have a million or more members. > When I do name searches I find many matches in these files, and too many of > the total matches. > > It would be nice to have a generic tag for genealogy sites so you could search > for "name" AND "gentag". This would avoid all the extraneous large lists. > Because of the way search engines work, this tag would need to appear in > every page that had searchable name records. > > Martin Roberts Martin, the point I've been trying to make is that "mother" and "father" function as generic tags to for genealogy pages, excluding virtually all other pages. Even works very well with Anderson. Annelise > Date: 18 Jul 96 17:20:33 EDT From: N Oughtibridge <100020.1117@CompuServe.COM> To: Martin Roberts Cc: GENWEB List Subject: Re: Robots Searching GenWeb Sites Message-ID: <960718212032_100020.1117_EHV75-2@CompuServe.COM> Martin Roberts wrote: ---------- It would be nice to have a generic tag for genealogy sites so you could search for "name" AND "gentag". ---------- I propose GENWEB as that tag, or perhaps GENEALOGY - GENWEB has confusions with genetics. All that is needed is for the tag to be included in the pages. Date: 18 Jul 96 17:20:38 EDT From: N Oughtibridge <100020.1117@CompuServe.COM> To: Jeff Murphy Cc: GENWEB List Subject: Re: Robots Searching GenWeb Sites Message-ID: <960718212037_100020.1117_EHV75-3@CompuServe.COM> Jeff Murphy wrote: ---------- I do not think the META tags are a good answer. They place the responsibility for defining the key words on the page developer, rather than on the search engine. This is the wrong way to solve any problem. One does not make the user (in this case the developer) responsible for satisfying the software (the search engine), one makes the software develop a solution which meets the needs of the user. ---------- It's some time since I checked AltaVista's use of META tags, but I think there is one for "leave this page alone" Surely there are different levels of openness: Let them see everything Let them see a selection of pages only, for example those for which you know something more than just name Let them see every page but only use keywords Let them see all the index pages Let them see top level (surname only) indeces Let them see nothing where "them" refers to anyone wanting to search a GENWEB site. I have also checked out the text file option - I will implement this as far as I can, however all I can generate is a list of files to look at or ignore - an administrator needs to publish it (which is a problem if you use CompuServe!). Nicholas Date: Thu, 18 Jul 1996 22:31:58 -0600 From: smcgee@sol.slcc.edu (Scott McGee (Personal)) Message-Id: <9607190431.AA00541@sol.slcc.edu.> To: genweb@UCSD.EDU, todd@miango.com Subject: Re: ...Unique ID OK, Todd, I start to follow you a bit better, but now help me out with this: I have a database that I serve. I get it from Gene E Ologist and call it Gene. Now, Gene just sent me the latest copy of his GEDCOM. I serve it, and assign ID's. Now, Gene has talked with his cousins and found all these nifty new relatives, doubling his database. Wanting his new-found relatives on the net, he sends me a new GEDCOM. Do I try to map the individuals in the first GEDCOM to those in the second, or just reassign from scratch. If you want the first case, I have failed to find a way to do it, and frankly, it devolves back to the matching problem. In the second case, I end up invalidating all (making stale) all id's from the first GEDCOM, and I no longer see the value of these Unique ID's. Now, on the other hand, for my own database, I can and should implement some sort of unique ID scheme and maintain it as data changes. In that case, the ID's are highly valuable to both me and others. The problem only occurs where I receive updated data from someone who does NOT keep such ID's. (well, it does reflect on the value of that data, but right now, how many ID'd databases are there?) Scott GENEALOGY | Do you know who your ancestors are? | Scott McGee -----------+---------------------------------------+--------------------- email: smcgee@genealogy.org | What? Me speak for web: http://genealogy.org/~smcgee/homepage.html | someone else? Nah! ---------------------------------------------------+--------------------- See my genealogy page at http://genealogy.org/~smcgee and my GenWeb page at http://genealogy.org/~smcgee/genweb Date: Fri, 19 Jul 1996 00:44:32 -0500 (CDT) From: Todd Tyrone Fries Sender: tfries@umr.edu Reply-To: Todd Tyrone Fries Subject: Re: ...Unique ID To: genweb@UCSD.EDU In-Reply-To: <9607190431.AA00541@sol.slcc.edu.> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII > OK, Todd, I start to follow you a bit better, but now help me out with > this: > > I have a database that I serve. I get it from Gene E Ologist and call it > Gene. Now, Gene just sent me the latest copy of his GEDCOM. I serve it, > and assign ID's. Now, Gene has talked with his cousins and found all these > nifty new relatives, doubling his database. Wanting his new-found relatives > on the net, he sends me a new GEDCOM. > > Do I try to map the individuals in the first GEDCOM to those in the second, > or just reassign from scratch. If you want the first case, I have failed > to find a way to do it, and frankly, it devolves back to the matching > problem. In the second case, I end up invalidating all (making stale) all > id's from the first GEDCOM, and I no longer see the value of these Unique > ID's. I would suggest that all people submitting data to databases that assign id's should inform the people who have given them the data of the id the people being submitted are being assigned. Thus, a note in the gedcom file of some sort I would hope is possible. Then, when he comes back, you simply assign the ones that aren't assigned, and correlate the ones he does have id's for with the ones you have and see if he has any new information. That sounds like a good plan to me, but that doesn't mean it is the best. > Now, on the other hand, for my own database, I can and should implement > some sort of unique ID scheme and maintain it as data changes. In that > case, the ID's are highly valuable to both me and others. The problem only > occurs where I receive updated data from someone who does NOT keep such > ID's. (well, it does reflect on the value of that data, but right now, how > many ID'd databases are there?) If you indeed have someone who submits gedcom data who refuses to add id type information to their database, I would perhaps for them, perhaps for everyone submitting data, where it came from. Then, when they submit their data again, you can match the name, dates, etc, other information you have against his database. Anything that doesn't match from your database to his, I would consider a different person unless he explicitly states 'I corrected this date, or the middle name was wrong, my mother would be furious, but I made a typo.' On a side note, I have been wondering about storage of data in general. It is probably up to the database maintainers, but would it be wise to have a history of all changes to a particular id kept? When I work on a programming project, I generally use a revision control system, cvs is my preference. When I screw up, or someone wants to see why I made a change, or even who made the change (if it is a multiple person project) then I can show that information because cvs keeps track of it. In short, with cvs, any change only adds information to the repository, never taking away. (unless 'rm -rf $CVSROOT' is issued, but that is another matter entirely). Should this same practice be kept for genealogical records, record changes, not re-do old records? (unless re-doing static web pages from a database, of course) Just wondering... -- Todd Fries .. todd@miango.com Subject: Re: ...Unique ID To: Scott McGee Date: Fri, 19 Jul 1996 13:40:24 +0100 (BST) From: Ben Laurie Cc: genweb@UCSD.EDU, todd@miango.com In-Reply-To: <9607190431.AA00541@sol.slcc.edu.> from "Scott McGee" at Jul 18, 96 10:31:58 pm Reply-To: ben@algroup.co.uk X-Mailer: ELM [version 2.4 PL24 PGP2] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID: <9607191340.aa05913@gonzo.ben.algroup.co.uk> Scott McGee wrote: > > OK, Todd, I start to follow you a bit better, but now help me out with this: > > I have a database that I serve. I get it from Gene E Ologist and call it Gene. > Now, Gene just sent me the latest copy of his GEDCOM. I serve it, and assign > ID's. Now, Gene has talked with his cousins and found all these nifty new > relatives, doubling his database. Wanting his new-found relatives on the net, > he sends me a new GEDCOM. > > Do I try to map the individuals in the first GEDCOM to those in the second, > or just reassign from scratch. If you want the first case, I have failed > to find a way to do it, and frankly, it devolves back to the matching problem. > In the second case, I end up invalidating all (making stale) all id's from the > first GEDCOM, and I no longer see the value of these Unique ID's. > > Now, on the other hand, for my own database, I can and should implement some > sort of unique ID scheme and maintain it as data changes. In that case, the > ID's are highly valuable to both me and others. The problem only occurs where > I receive updated data from someone who does NOT keep such ID's. (well, it > does reflect on the value of that data, but right now, how many ID'd databases > are there?) I think the idea is that everyone should generate IDs. Those who don't will be unable to fit into GenWeb in a fully functional way. I fear that this is just life. Ideally, I guess, what would happen would be that Gene (being in the unfortunate position of using a non-GenWeb compliant package) would send you an un-IDed database, which you would ID and send back. Gene would then use that IDed version as the basis for new versions. So, when he updated you, he would send you data which already had IDs for the existing people, and you can simply generate IDs for those that haven't (and send it back again). Of course, if all this takes off, hopefully vendors will build in support, and for those who don't, surely we can come up with a small package which IDs GEDCOM which can then be reimported into the package? Cheers, Ben. > > Scott > > GENEALOGY | Do you know who your ancestors are? | Scott McGee > -----------+---------------------------------------+--------------------- > email: smcgee@genealogy.org | What? Me speak for > web: http://genealogy.org/~smcgee/homepage.html | someone else? Nah! > ---------------------------------------------------+--------------------- > See my genealogy page at http://genealogy.org/~smcgee > and my GenWeb page at http://genealogy.org/~smcgee/genweb -- Ben Laurie Phone: +44 (181) 994 6435 Freelance Consultant and Fax: +44 (181) 994 6472 Technical Director Email: ben@algroup.co.uk A.L. Digital Ltd, URL: http://www.algroup.co.uk London, England. Apache Group member (http://www.apache.org) Subject: Re: ...Unique ID To: todd@miango.com Date: Fri, 19 Jul 1996 13:57:17 +0100 (BST) From: Ben Laurie Cc: genweb@UCSD.EDU In-Reply-To: from "Todd Tyrone Fries" at Jul 19, 96 00:44:32 am Reply-To: ben@algroup.co.uk X-Mailer: ELM [version 2.4 PL24 PGP2] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Message-ID: <9607191357.aa05960@gonzo.ben.algroup.co.uk> Todd Tyrone Fries wrote: > > > OK, Todd, I start to follow you a bit better, but now help me out with > > this: > > > > I have a database that I serve. I get it from Gene E Ologist and call it > > Gene. Now, Gene just sent me the latest copy of his GEDCOM. I serve it, > > and assign ID's. Now, Gene has talked with his cousins and found all these > > nifty new relatives, doubling his database. Wanting his new-found relatives > > on the net, he sends me a new GEDCOM. > > > > Do I try to map the individuals in the first GEDCOM to those in the second, > > or just reassign from scratch. If you want the first case, I have failed > > to find a way to do it, and frankly, it devolves back to the matching > > problem. In the second case, I end up invalidating all (making stale) all > > id's from the first GEDCOM, and I no longer see the value of these Unique > > ID's. > > I would suggest that all people submitting data to databases that assign id's > should inform the people who have given them the data of the id the people > being submitted are being assigned. Thus, a note in the gedcom file of > some sort I would hope is possible. Then, when he comes back, you simply > assign the ones that aren't assigned, and correlate the ones he does have > id's for with the ones you have and see if he has any new information. > That sounds like a good plan to me, but that doesn't mean it is the best. > > > Now, on the other hand, for my own database, I can and should implement > > some sort of unique ID scheme and maintain it as data changes. In that > > case, the ID's are highly valuable to both me and others. The problem only > > occurs where I receive updated data from someone who does NOT keep such > > ID's. (well, it does reflect on the value of that data, but right now, how > > many ID'd databases are there?) > > If you indeed have someone who submits gedcom data who refuses to add id type > information to their database, I would perhaps for them, perhaps for everyone > submitting data, where it came from. Then, when they submit their data again, > you can match the name, dates, etc, other information you have against his > database. Anything that doesn't match from your database to his, I would > consider a different person unless he explicitly states 'I corrected this > date, or the middle name was wrong, my mother would be furious, but I made > a typo.' > > On a side note, I have been wondering about storage of data in general. It is > probably up to the database maintainers, but would it be wise to have > a history of all changes to a particular id kept? > > When I work on a programming project, I generally use a revision control > system, cvs is my preference. When I screw up, or someone wants to see why > I made a change, or even who made the change (if it is a multiple person > project) then I can show that information because cvs keeps track of it. In > short, with cvs, any change only adds information to the repository, never > taking away. (unless 'rm -rf $CVSROOT' is issued, but that is another matter > entirely). Of course, CVS also opens up the interesting possibility of databases which are _shared_ between archivists. We use CVS to maintain Apache, with the programmers spread all around the world, and have had no problems doing this kind of stuff (though it does take some discipline). > > Should this same practice be kept for genealogical records, record changes, not > re-do old records? (unless re-doing static web pages from a database, of > course) I intend to use CVS for externally contributed databases, at least. Its just too useful for problem tracking to miss. (For those who don't know, CVS is a freeware version control system which allows multiple users to freely edit files and manages the merging of overlapping changes [with some human intervention at times]). > > Just wondering... > -- > Todd Fries .. todd@miango.com > -- Ben Laurie Phone: +44 (181) 994 6435 Freelance Consultant and Fax: +44 (181) 994 6472 Technical Director Email: ben@algroup.co.uk A.L. Digital Ltd, URL: http://www.algroup.co.uk London, England. Apache Group member (http://www.apache.org) Subject: Re: ...Unique ID, changes To: genweb@UCSD.EDU In-Reply-To: X-Mailer: SPRY Mail Version: 04.00.06.21 Some of the existing genealogy software lets you print records depending on the "last modified" date. What I do is keep printed faimly group sheets of everything in my database. After I have completed updating I print all family group sheets affected by modifications made since xx/xx/xx. It prints out that select group and I can easily file them. From list-relay@UCSD.EDU Fri Jul 19 07:19:55 1996 Received: from UCSD.EDU (mailbox2.ucsd.edu [132.239.1.54]) by fuji.ucsd.edu (8.6.9/8.6.9) with ESMTP id HAA06878 for ; Fri, 19 Jul 1996 07:19:55 -0700 Received: from puma.sirinet.net (puma.sirinet.net [198.203.196.67]) by UCSD.EDU (8.7.5/8.6.9) with SMTP id HAA20522 for ; Fri, 19 Jul 1996 07:15:18 -0700 (PDT) Received: from ppp2.sirinet.net (ppp2.sirinet.net [207.3.80.2]) by puma.sirinet.net (8.6.11/8.6.9) with SMTP id JAA21734 for ; Fri, 19 Jul 1996 09:15:15 -0500 Message-Id: <199607191415.JAA21734@puma.sirinet.net> X-Sender: fsnyelib@mail.sirinet.net X-Mailer: Windows Eudora Version 1.4.4 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Fri, 19 Jul 1996 09:00:59 -0500 To: genweb@UCSD.EDU From: fsnyelib@sirinet.net (Ft. Sill Nye Library) Subject: REMOVE FROM LIST Please remove this e-mail address from your list. Should you receive another request for mail from this address, please disregard request. Thank you. Mr. C Mayse U.S. ARMY FIELD ARTILLERY CENTER ADP, SYSTEM ADMIN FORT SILL, OK 73503 -- Registered ICC User check out http://www.usefulware.com/icc.html Date: Fri, 19 Jul 96 14:34:43 -0400 From: "Crist, Dave" Sender: "Crist, Dave" Organization: USIA To: genweb@UCSD.EDU Subject: Wiles / Umphlet X-mailer: Connect2-SMTP 4.00 MHS to SMTP Gateway Anyone with information on the following, I would appreciate an eMail. dcrist@juno.com ========================================================================= Husband: Isreal T. Wiles ------------------------------------------------------------------------- Born: in: Baptized: in: Died: in: Buried: in: Other: in: Ref: Occupation: Father: Mother: ========================================================================= Wife: Nancy Savage Umphlet Married: in: Marr. Ceremony? Y/N: Divorced/Annulled/Separated: End Year: ------------------------------------------------------------------------- Born: 16 MAY 1812 in: North Carolina Baptized: 1906 in: Died: in: Buried: in: Other: in: Ref: Full Choctow Occupation: Father: Mother: ========================================================================= 1 Elizabeth Ann Wiles F Born: 27 JUL 1837 in: Clinton Co., Ohio Died: 22 SEP 1907 in: Russell Co., Kansas ========================================================================= Date: Fri, 19 Jul 1996 15:36:13 -0500 To: , genweb@UCSD.EDU From: Beau Sharbrough Subject: ANNOUNCEMENT: Rochester Tech Session ANNOUNCEMENT GENTECH Tech Session Thursday, August 15 7:30 pm Genessee Room, Holiday Inn, Rochester, NY OVERVIEW Reports will be presented on the status of the Lexicon Working Group, the GEDCOM Test Book, and other topics of interest. The second half of the meeting will be a breakout session for volunteers to team up to work on projects due at GENTECH97. The Volunteers will choose the projects. What would you like to do? AGENDA (Presenters are tentative at this time) Announcements - Beau Sharbrough Lexicon Update - Robert Charles Anderson GEDCOM Compatibility Update - Larry Ledden GEDCOM Registration - a representative of the Family History Department Breakout session - All interested participants MORE DETAILS For more information about the Lexicon Working Group, or the GEDCOM Test Book projects, please see the GENTECH web site at http://gentech.org/~gentech/ For more information about the meeting, email Beau Sharbrough at beau@connect.net. Be there or be square. CAUTIONARY NOTE: The Tech Sessions at GENTECH each year are probably the largest aggregation of family history developers that there is. These sometimes become very technical sessions. If you have an interest in technical issues in family history, and haven't been able to attend GENTECH in Texas because of the distance, be sure to attend this meeting. ----------------------------------------------------------------------- Beau Sharbrough The Aggie Players - beau@connect.net 50 years of theater at http://www.connect.net/beau Texas A&M University Date: Fri, 19 Jul 1996 18:18:13 -0500 (CDT) From: Todd Tyrone Fries Sender: tfries@umr.edu Reply-To: Todd Tyrone Fries Subject: Re: ...Unique ID To: genweb@UCSD.EDU In-Reply-To: <9607191357.aa05960@gonzo.ben.algroup.co.uk> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII > Todd Tyrone Fries wrote: > Of course, CVS also opens up the interesting possibility of databases which > are _shared_ between archivists. We use CVS to maintain Apache, with the > programmers spread all around the world, and have had no problems doing > this kind of stuff (though it does take some discipline). > > Should this same practice be kept for genealogical records, record > > changes, not re-do old records? (unless re-doing static web pages from > > a database, of course) > I intend to use CVS for externally contributed databases, at least. Its > just too useful for problem tracking to miss. > > (For those who don't know, CVS is a freeware version control system which > allows multiple users to freely edit files and manages the merging of > overlapping changes [with some human intervention at times]). Hrm, I was only using cvs as an example that it keeps records of the program being worked on. I thought perhaps the databases could, internally use some sort of history mechanism so as to date all additional information.. I don't know if I would personally go so far as to use cvs for a database, but the concept is the same I guess... -- Todd Fries .. todd@miango.com Date: Tue, 23 Jul 1996 18:01:59 -0400 (EDT) Message-Id: <199607232201.SAA13165@mh004.infi.net> X-Sender: tooger@roanoke.infi.net X-Mailer: Windows Eudora Light Version 1.5.2 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" To: genweb@UCSD.EDU From: T D AKER Subject: AKER,BOONE,PERDUE OR FERGUSON FAMILIES I'm new to the list,and I was wondering if anyone has any information on the AKER,BOONE,PERDUE OR FERGUSON families.We live in roanoke VA. but the families are from FRANKLIN COUNTY, VIRGINIA.I would appreciate any help I can get. TIM AKER (tooger@roanoke) Date: Fri, 26 Jul 1996 13:59:31 +0900 To: genweb@UCSD.EDU From: Helen Robinson Subject: Ged2html Hi I'm new on the list 1) I down loaded this file Ged2html Windows Version on to my harddrive C but can't open it what have I done wrong????? Do I need to run it through MS Works? Please explain in simple english I'm no computer expert, just a learner genealogist. 2) I thought being part of Genweb may turn up some of those ancestors that seem to have gone missing. Also looking for any connection to any Robinson families in BROMLEY in KENT UK, particularly if they migrated to Australia in the 1800's. Helen Robinson From: Herbert Stoyan Message-Id: <9607261041.AA02521@immd8.informatik.uni-erlangen.de> Date: Fri, 26 Jul 1996 12:41:01 +0200 To: genweb@UCSD.EDU Subject: java as script language X-Sun-Charset: US-ASCII Has anybody experience in using Java instead of cgi-scripts to handle forms or accessing c-code (like lifelines)? Date: Fri, 26 Jul 1996 12:58:00 -0700 (PDT) From: Annelise Anderson To: Helen Robinson cc: genweb@UCSD.EDU Subject: Re: Ged2html In-Reply-To: <1.5.4.32.19960726045931.0067c7c8@mailhost> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII On Fri, 26 Jul 1996, Helen Robinson wrote: > Hi I'm new on the list > > 1) I down loaded this file Ged2html Windows Version on to my harddrive > C but can't open it what have I done wrong????? Do I need to run it > through MS Works? > Please explain in simple english I'm no computer expert, just a > learner genealogist. Have you gotten it unzipped, or is the file ged2html.zip? If it has the zip extension you will have to unzip it. You can try whatever versions of unzip or pkunzip you may already have. If you don't have these or they don't work, you can download an unzip utility from the same place you got ged2html. When I did this the unzip utility was a self-extracting file, i.e., it had an exe extension and one typed the part preceeding the .exe to get additional files. One of the resulting files with an .exe extension will unzip ged2html.zip. Then you'll have ged2html.exe and g2h.exe, as I recall. You can run either of these from the Windows file manager or File Run. One of them (ged2html, I think) brings up a dialog box in which you enter the name of the gedcom file you want to process and any other options you want. The other one needs the options entered on the File Run command line, the one essential option being the name of the gedcom file, e.g., anderson.ged. I don't know anything about MS Works but you don't need to process this through anything like that. These programs also run with Windows 95 in the same way other Windows programs run, and also from Windows 3.1 in OS/2. Annelise > Subject: Bagwell-Adair-Laster-York Date: Fri, 26 Jul 96 15:33:01 -0500 x-mailer: Claris Emailer 1.1 From: Pat Bagwell To: Mime-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Hi, I'm new to the list. I have a Wiley Bagwell, b. 1800 SC that m. 1823, Jefferson Co., AL to Mary "Polly" York, b. c 1806 Old Cherokee Nation, dau. of (Chief?) Emanuel and Elizabeth Adair York, half blood Cherokee. Wiley and Mary's son, George L. Baylis? Bagwell, b. c 1834, m. c1853 Jefferson Co., AL to Nancy Laster. This family may have first moved to MO>Ark> then to Indian Territory (now OK)>TX. I'm looking for info on any of the above surnames. I have just started a BAGWELL group discussion, for anyone researching that surname. Please email me for instructions to join us. *********************************************************************** Pat Bagwell Icezena@texas.net BAGWELL List Owner: send to: MAISER@rmgate.pop.indiana.edu Type SUB BAGWELL in the body of your email note. Wanted: All Bagwell information &&&&& &&&&&&&&& &&|~_~_~|&& &&(\0-0/)&& ---ooOO--(_)--OOoo--- Rooting for: Adair, Bailey, Bain, Bagwell, Belunek, Bogar, Boswell, Boyd, Boykin, Bryan, Chism/Chisholm, Cogburn, Hightower, Hill, Hutcheson, Jay, Johnson, Jones, Konarik, Kutra, Laster, Manak, Mlcak, Moutray, Page, Reed, Rushing, Sliva, Smith, Sula, Tomanek, Thompson, White, Wilson/Willson, Woolley, York. ************************************************************************** Date: Fri, 26 Jul 1996 23:09:56 -0400 From: Alexis Carrington X-Mailer: Mozilla 3.0b5aGold (Win95; I) MIME-Version: 1.0 To: genweb@UCSD.EDU Subject: GenWeb Info Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi All, I'm not sure this is the correct forum for this question, buuuuuuttt... Can someone pleas tell me how one goes about taking on County Web Page, under the GenWeb Project? -- Alexis Kinney Carrington E-mail If cannot get rid of the family skeleton, you may as well make it dance. --G. B. Shaw To: Alexis Carrington From: "David C. Crane" Subject: Re: GenWeb Info Cc: genweb@UCSD.EDU At 11:09 PM 7/26/96 -0400, Alexis Carrington wrote: >Can someone pleas tell me how one goes about taking on County Web Page, >under the GenWeb Project? That information should be available on the state genweb page in question. There are state pages for all but OK at the moment. What county/state interests you? To: "David C. Crane" From: Dataman Subject: Re: GenWeb Info Date: Sat, 27 Jul 1996 11:50:45 -0700 X-BeyondMail-Priority: 1 Message-Id: Conversation-Id: <1.5.4.16.19960727092439.1d972b94@hal-pc.org> In-Reply-To: <1.5.4.16.19960727092439.1d972b94@hal-pc.org> Reply-To: Dataman Cc: genweb@UCSD.EDU > That information should be available on the state genweb page in question. > There are state pages for all but OK at the moment. What county/state > interests you? Are you saying there is no state page for Oklahoma. I saw something earlier that said there was. I sent an email to the person named but never received a reply. Who would I contact to get information on how to set up the Oklahoma state page. Gene Phillips Send me Mail! mailto:dataman@thor.net To: Dataman From: "David C. Crane" Subject: Re: GenWeb Info Cc: "David C. Crane" , genweb@UCSD.EDU Yes, last I looked, OK was the only state/territory/district unclaimed. Point a browser to http://www.teleport.com/~jmurphy/states.html and follow the instructions there. If I types it correctly, that is the URL for the USA GenWeb page that points to all the state pages. At 11:50 AM 7/27/96 -0700, Dataman wrote: In reply to a posting by dcrane@hal-pc.org: > >> That information should be available on the state genweb page in question. >> There are state pages for all but OK at the moment. What county/state >> interests you? > >Are you saying there is no state page for Oklahoma? I saw something earlier >that said there was. I sent an email to the person named but never received a >reply. Who would I contact to get information on how to set up the Oklahoma >state page. > > Gene Phillips >Send me Mail! dataman@thor.net Date: Sat, 27 Jul 1996 14:01:06 -0700 From: Joyce Youngblood X-Mailer: Mozilla 2.0 (Win16; I) MIME-Version: 1.0 To: genweb@UCSD.EDU Subject: RUSHING DEASON LYTLE X-URL: http://users.aol.com/johnf14246/gen_mail_general.html#GEN-MEDIEVAL Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Looking for info on these families. Lived in Tn early 1700,s and 1800,s. Bedford Co. Also lived in Independence Co ar. in 1852 thru 1857. Em