Download


Important notice


Many curios genealogists try to download the data and become frustrated to discover that they wasted their time. Please note that the FamiLinx data is very large (16Gbyte) and cannot be processed using standard Word or Excel. You need to know how to program to benefit from the data. Also, the data is de-identified. There are no names or surnames. If you look for a specific family or records, you will not be able to find them in the data. Also, the terms of use disallow to re-identify individuals.

If you simply interested in your own genealogy, you will much better benefit from going directly to Geni.com or MyHeritage.com and start your tree there.

Don't waste your time


Description


This page contains a dataset which was compiled with permission from public genealogical profiles uploaded to Geni.com, a MyHeritage company.

Thanks to the commitment of MyHeritage to support scientific research, we were able to release redacted data that contains the basic family tree structures and demographic information. The data does not contain names or other explicit identifiers of Geni.com users.

See the Data page for citation information.


Terms of Use


To download the data, you must agree to and accept the Terms of Use below.

  • 1. COPYRIGHT:
    FamiLinx is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
  • 2. PRIVACY:
    You agree not to use the dataset, either alone or or in concert with any other information, in any manner which may expose the identity of individuals. You agree not to use FamiLinx to promote hate, discrimination, or violence towards groups or individuals based on race, ethnicity, religion, gender, age, or family heritage or to use FamiLinx for any illegal activity. You further agree that any future work based on the FamiLinx database will be posted under the same privacy restriction above.
  • 3. GENERAL:
    You agree to report any violation of the Terms of Use to the authors of the study. FamiLinx and Geni.com take no responsibility or liability for the accuracy of any content in the database.

Get the Data


We collect your contact information only to document the distribution of the data. We will not use this information to contact you for updates or offers or give it to any third party.

Please enter a VALID email address: an email with download instructions will be sent to the provided address.

.




*Yes, I agree to the Terms of Use.


Datasets


The FamiLinx database

Once your access is approved, you'll receive an email with link to the dataset archive. Download this file and extract it on your computer.

Warning: the datasets require large amount of storage (2GB compressed, ~16GB uncompressed).

The archives are compressed using the XZ format. The required programs are commonly available on Unix computers (e.g. Mac OS and Linux). To extract the archives on Windows, download the 7-zip program.

# Download the archive (based on URL given in approval email)
# Be sure to use single-quote characters around the URL
$ wget -O familinx.tar.xz 'https://server.com/familinx.tar.xz?AWSKey=XXXXX'


# If the 'wget' program is not available, use 'curl' instead:
$ curl 'https://server.com/familinx.tar.xz?AWSKey=XXXXX' > familinx.tar.xz


# Optional: verify data integrity.
# You should see the exact same value as printed below:
$ sha256sum familinx.tar.xz
a336cd271168ed5dc90efc84d25a12256e767f591a63634d174edfccdc4f1c6a


# Extract the data files
$ tar -xf familinx.tar.xz



# The files are in a 'familinx' subdirectory
$ cd familinx
$ ls -lh
-rw-r--r-- 1 user users  15G Feb  8 18:10 profiles-anon.txt
-rw-r--r-- 1 user users 2.1K Feb  7 16:03 profiles-field-list.txt
-rw-r--r-- 1 user users 1004 Feb  8 18:23 README
-rw-r--r-- 1 user users 877M Feb  8 12:14 relations-anon.txt
-rw-r--r-- 1 user users  259 Feb  8 18:37 sha256sum-anon.txt


# The 'relations-anon.txt' file contains parent/child relations
$ head relations-anon.txt
parent    child
1002      2044
1002      2045
1004      2045
1006      2046
...


# The 'profiles-anon.txt' file contains information about each profile
# See here for the complete field list
$ cut -f1,2,3,15,16,17,20,21,22 profiles-anon.txt | head
id    gender  is_alive birth_year birth_city    birth_state  birth_country
1002  male    0        1917       Cleveland     Ohio         United States
1005  *       0        1908       *             *            *
1009  male    0        1942       Philadelphia  PA           US
1010  female  0        1904       Guxhagen      Hesse        DE
1011  male    0        1942       *             *            IN
...