Wednesday, January 26, 2011

Pruning Your Family Tree

Cruft is a term used in computer programming circles to mean the useless code in a computer program that accumulate over time. Cruft is the stuff you added at one point that might have been important then, but is now irrelevant, and worse it causes the rest of your program to slow down. You might have needed, for example, to support what is now an obsolete computer platform at one point, but the code for that shouldn't still be in your program today.

Family trees also accumulate cruft over time, and just like in computer programs those extra people and extra information can slow you down. There are a number of reasons that bad information can enter your tree, but the most common and most problematic is when you import a GEDCOM from a relative without checking first to see if everyone in the tree is actually related to you. If you get a GEDCOM file from a relative with 2000 people in it and only 200 of them are actually related to you, you've just added 1800 that are irrelevant to your tree. Moreover, if you upload your family tree to a site like Ancestry.com or MyHeritage.com where they can do some form of automatic matching between your tree and other trees as well as with records on the site, you're going to get all kinds of matches for people who are not actually related to you. Following up these false leads is a big waste of time.

I recently uploaded a family tree to one of these web sites and started getting matches to people not related to me. It illustrated to me that when I imported a tree from a relative a couple of years ago I did not properly check out the tree first. When I share a tree with someone, I usually only export those people who are related to the person I'm sending the tree to, plus spouses. This insures the person doesn't get a lot of records that are not relevant to them. When you receive a GEDCOM from someone else, you should also check it out, create a test file where you import it, add yourself to the file, and then see if everyone in the file is related to you. I obviously forgot to do this with this particular file a few years back, and ended up with about 300 extra people in my tree that I was not related to, which was what was causing these false hits in the matching program (technically they're not false hits, okay, but from my perspective they're just as annoying even if they are my fault).

After receiving quite enough of these messages from the web site I decided it was time to remove the incorrect records from my family tree file. While my initial guess about that GEDCOM file was correct, and it was indeed the source of most of the incorrect records, I also discovered something else interesting – that there were other people in my family tree that were not related to me, some of them that I wanted to keep. The important thing here is that while most genealogy programs will let you select all your relatives (and their spouses), it's not so simple to select your relatives and delete everyone else. The issue of the spouses, by the way, is a simple one. If you only had the program select your actual relatives, your sibling's spouse would not be chosen. Your sibling's kids would be chosen, but they would be missing a parent since strictly speaking that spouse is not your blood relative. Thus you need your genealogy program to select spouses as well.

The people I found in my tree that were not related to me fell into a few categories. Most were from GEDCOM imports, with most of those from that one GEDCOM I suspected, but also a few others here and there.

Some of the people were really cruft in that they were small sections that were someone isolated from the rest of the tree. I suspect they were descendants of someone I deleted at some point. They should probably have been deleted a long time ago, but were somehow still in my tree - probably due to a bug in the genealogy program.

Then there were the parents of spouses. I sometimes like to add information on parents of spouses that I add to the tree. This is mainly so that if I want to research the spouse at some point in time, that I know a bit more about them to help me with the research. Knowing the names of a person's parents can be very important when doing research. The problem, of course, is that if I do a standard selection of people in my tree that are not relatives or their spouses, these parents get left out – yet I still want them in the tree. The solution here is not simple. There is not an automated way to include these people. The answer is probably (and I have not done this yet) to flag those parents in some way. Some genealogy programs let you define custom flags, and then assign them to people. If you carefully check out all the non-relatives in your tree and see which ones you want to keep, you can then flag them for future reference. Each time you add non-relative parents, you can flag them. In the future if you go to prune your tree again, you can do a standard selection of relatives and spouses, and then add the flagged people. Anyone left over can then be removed from your tree.

No comments:

Post a Comment