Many curios genealogists try to download the data and become frustrated to discover that they wasted their time. Please note that the FamiLinx data is very large (16Gbyte) and cannot be processed using standard Word or Excel. You need to know how to program to benefit from the data. Also, the data is de-identified. There are no names or surnames. If you look for a specific family or records, you will not be able to find them in the data. Also, the terms of use disallow to re-identify individuals.
If you simply interested in your own genealogy, you will much better benefit from going directly to Geni.com or MyHeritage.com and start your tree there.
This page contains a dataset which was compiled with permission from public genealogical profiles uploaded to Geni.com, a MyHeritage company.
Thanks to the commitment of MyHeritage to support scientific research, we were able to release redacted data that contains the basic family tree structures and demographic information. The data does not contain names or other explicit identifiers of Geni.com users.
See the Data page for citation information.
To download the data, you must agree to and accept the Terms of Use below.
We collect your contact information only to document the distribution of the data. We will not use this information to contact you for updates or offers or give it to any third party.
Please enter a VALID email address: an email with download instructions will be sent to the provided address.
.
Once your access is approved, you'll receive an email with link to the dataset archive.
Download this file and extract it on your computer.
Warning: the datasets require large amount of storage (2GB compressed, ~16GB uncompressed).
The archives are compressed using the XZ format.
The required programs are commonly available on Unix computers (e.g. Mac OS and Linux).
To extract the archives on Windows, download the 7-zip program.
# Download the archive (based on URL given in approval email)
# Be sure to use single-quote characters around the URL
$ wget -O familinx.tar.xz 'https://server.com/familinx.tar.xz?AWSKey=XXXXX'
# If the 'wget' program is not available, use 'curl' instead:
$ curl 'https://server.com/familinx.tar.xz?AWSKey=XXXXX' > familinx.tar.xz
# Optional: verify data integrity.
# You should see the exact same value as printed below:
$ sha256sum familinx.tar.xz
a336cd271168ed5dc90efc84d25a12256e767f591a63634d174edfccdc4f1c6a
# Extract the data files
$ tar -xf familinx.tar.xz
# The files are in a 'familinx' subdirectory
$ cd familinx
$ ls -lh
-rw-r--r-- 1 user users 15G Feb 8 18:10 profiles-anon.txt
-rw-r--r-- 1 user users 2.1K Feb 7 16:03 profiles-field-list.txt
-rw-r--r-- 1 user users 1004 Feb 8 18:23 README
-rw-r--r-- 1 user users 877M Feb 8 12:14 relations-anon.txt
-rw-r--r-- 1 user users 259 Feb 8 18:37 sha256sum-anon.txt
# The 'relations-anon.txt' file contains parent/child relations
$ head relations-anon.txt
parent child
1002 2044
1002 2045
1004 2045
1006 2046
...
# The 'profiles-anon.txt' file contains information about each profile
# See here for the complete field list
$ cut -f1,2,3,15,16,17,20,21,22 profiles-anon.txt | head
id gender is_alive birth_year birth_city birth_state birth_country
1002 male 0 1917 Cleveland Ohio United States
1005 * 0 1908 * * *
1009 male 0 1942 Philadelphia PA US
1010 female 0 1904 Guxhagen Hesse DE
1011 male 0 1942 * * IN
...