Difference between revisions of "Gramps Performance"

From Gramps
Jump to: navigation, search
(Genealogical datasets: as per jralls)
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{out of date}}
+
{{man tip|The advice on this page was for older versions of Gramps so may not work for you. Please update as needed.}}
 +
{{stub}}
 
Comparison of performance on large datasets between different Gramps versions
 
Comparison of performance on large datasets between different Gramps versions
  
Line 24: Line 25:
 
=== Genealogical datasets ===
 
=== Genealogical datasets ===
  
{{man warn|Warning|Private datasets will not be shared under any reason. <br><br>Free datasets are given under the following copyright: use for testing of genealogical programs only, no publication, no sharing. They have been created with free information on the net of which the users explicitly state it can be used freely.<br><br> Should you however feel certain data is misplaced, or the original author does not have the right to distribute the data, please contact us to remove any information necessary.}}
+
{{man warn|Warning|Private datasets will not be shared under any reason. <br><br>Free datasets are given under the following copyright: use for testing of genealogical programs only, no publication, no sharing. They have been created with free information on the net where the posting author explicitly stated their dataset may be re-distributed freely.<br><br> However, should you feel certain data is misplaced, or that the original posting author did not have the right to distribute the data, please contact us to remove any information as necessary.}}
  
 
'''FAQ'''
 
'''FAQ'''
* ''My computer hangs on open, eating memory?'' These are LARGE datasets, so do NOT open them directly. For Gramps open them as follows: create a new Family Tree. Open it and go to the import menu and import the dataset.
+
* ''My computer hangs upon open, eating memory?'' These are LARGE datasets, so do NOT open them directly. For Gramps open them as follows: create a new Family Tree. Open it and go to the import menu and import the dataset.
 
* ''What is tar.bz?'' This is a compression format. You must uncompress the file before importing it
 
* ''What is tar.bz?'' This is a compression format. You must uncompress the file before importing it
* ''Can you provide the GEDCOM?'' No. Offering GEDCOM has the danger of attracting to much traffic to this site. If you need GEDCOM, you should install Gramps, import the dataset, and then choose "Export to GEDCOM".
+
* ''Can you provide the GEDCOM?'' No. Offering GEDCOM sample would tend to attract excessive traffic to this site not related to Gramps. If you must have GEDCOM, you could install Gramps, import the dataset, and then choose "Export to GEDCOM".
 
* ''What is in these files?'' See summary at the bottom of this page.
 
* ''What is in these files?'' See summary at the bottom of this page.
  
Line 53: Line 54:
 
|<!-- People --> 82688  
 
|<!-- People --> 82688  
 
|<!-- Size (MB) -->70MB
 
|<!-- Size (MB) -->70MB
|<!-- License -->Testing only, no sharing, no publication<br>{{man menu|*** NOTE: THIS FILE IS MISSING.<br>IF ANYONE HAS A COPY, PLEASE CONTACT nick@gramps-project.org ***}}
+
|<!-- License -->Testing only, no sharing, no publication<br>{{man menu|*** NOTE: THIS FILE IS MISSING.<br>IF ANYONE HAS A COPY, PLEASE CONTACT webmaster@gramps-project.org ***}}
 
|-
 
|-
 
|<!-- Code -->[[#Summary_of_database_test_d03|d03]]
 
|<!-- Code -->[[#Summary_of_database_test_d03|d03]]
Line 296: Line 297:
 
* Repositories almost immediate
 
* Repositories almost immediate
 
* sources about 1 sec - (time selecting a source varies according to number references for that source - my worst case is a civil registry which has about twice as many references as people in my database).
 
* sources about 1 sec - (time selecting a source varies according to number references for that source - my worst case is a civil registry which has about twice as many references as people in my database).
 +
 +
=== JohnBoyTheGreat 2019-12, version 5.1.1 ===
 +
Import tested with the [https://mhss.sk.ca/FH/GRANDMA.shtml GRANDMA Mennonite database] of 1.4 million people. by user on reddit!! https://www.reddit.com/r/gramps/comments/dzevcl/database_size_limit_for_gramps/fb6hdbj/
 +
 +
''
 +
FOLLOWUP...
 +
 +
It's been a few weeks since I asked whether anyone had attempted to use GRAMPS with the GRANDMA Mennonite database of 1.4 million people.
 +
 +
Based upon the suggestion above, I tried to load the Catalog of Life database...it took several days and it seemed to be working, but it eventually locked up GRAMPS. However, it seemed to be working. But, since I was only loading the Catalog of Life to test it, I decided not to waste time trying again.
 +
 +
In the meantime, I had ordered the GRANDMA database, but the organization selling it somehow gave me a bad download code and I couldn't reach them for several days (holiday and weekend). It was more than a week later before I was able to try loading the GRANDMA database into GRAMPS.
 +
 +
The result was SUCCESS!
 +
 +
It took about three days to load the GRANDMA database into GRAMPS, after giving GRAMPS a realtime priority in Windows. Setting GRAMPS to realtime sped up the loading considerably. My computer is rather fast compared to many, so anyone who wants to do the same thing should consider that it could take a week or longer to load up a huge database like GRANDMA.
 +
 +
At this point I have the GRANDMA database loaded into GRAMPS and running okay. It's really slow to switch to various functions, but it works okay once you get to each part of the GRAMPS program. One difficulty I ran into was scrolling through the 1.4 million records. It moves through the records quickly, but there are so many that you can't just use your cursor to pull to the surname you want to explore. Instead, I have to move the slider to a surname as close as possible, then scroll repeatedly until I find it. That can be 10-20 spins of the mouse wheel, so the process can be exhausting when you are looking for many different names.
 +
 +
My next step is going to be to copy the individual profiles which I need to a second family tree database that will be much, much smaller.
 +
 +
CONCLUSION: GRAMPS can handle 1.4 million records in a database. It's slow and takes several days to load, but it works.
 +
''
  
 
== Possible Future Optimizations ==
 
== Possible Future Optimizations ==
Line 306: Line 330:
 
** [http://www.tamurajones.net/TwoHugeGEDCOMFiles.xhtml Two Huge GEDCOM Files]
 
** [http://www.tamurajones.net/TwoHugeGEDCOMFiles.xhtml Two Huge GEDCOM Files]
 
** [http://www.tamurajones.net/GedFan0.4.0.0.xhtml GedFan] - creates GEDCOM files, so-called fan files, which are used to test genealogy applications, and thus determine the capacity of those application, expressed as a fan value.
 
** [http://www.tamurajones.net/GedFan0.4.0.0.xhtml GedFan] - creates GEDCOM files, so-called fan files, which are used to test genealogy applications, and thus determine the capacity of those application, expressed as a fan value.
 +
 +
==See also==
 +
* [[GEPS 016: Enhancing Gramps Processing Speed]]
  
 
[[Category:Developers/General]]
 
[[Category:Developers/General]]
 
[[Category:Documentation|Performance]]
 
[[Category:Documentation|Performance]]

Revision as of 03:26, 20 October 2020

Tango-Dialog-information.png
The advice on this page was for older versions of Gramps so may not work for you. Please update as needed.

Gramps-notes.png

This article's content is incomplete or a placeholder stub.
Please update or expand this section.


Comparison of performance on large datasets between different Gramps versions

Performance tests

It is important that Gramps performs well on datasets in the 10k to 30k range. A good benchmark is to test Gramps on a 100k range dataset, and keep track of performance with every new version.

Furthermore, this page can serve as proof to users that the present version of Gramps is not slow. From version 2.2.5 onwards, special attention will be given to performance, so that it does not deteriorate due to changes.

If you want to work with a large database, read Tips for large databases.

General setup

Comparison should be with equal hardware, and on the same datasets to be fair. Optimal representation may be chosen, so for Gramps, tests are done in the native database format, called GRAMPS GRDB format or GRAMPS XML format.

Should somebody want to publish results of commercial software under windows, this is allowed, but should be fair: same hardware and dataset, so test on a dual-boot machine, and use the internal format of the program.

A table with datasets is given. Pay attention to the copyright

The second table is a table with hardware configuration. Add your machine to this list if you do some tests and want to add them to this article.

The third table gives the test results, which are subjective. Please, don't use other software while doing the tests.

The Test Results

Genealogical datasets

Gnome-important.png
Warning

Private datasets will not be shared under any reason.

Free datasets are given under the following copyright: use for testing of genealogical programs only, no publication, no sharing. They have been created with free information on the net where the posting author explicitly stated their dataset may be re-distributed freely.

However, should you feel certain data is misplaced, or that the original posting author did not have the right to distribute the data, please contact us to remove any information as necessary.

FAQ

  • My computer hangs upon open, eating memory? These are LARGE datasets, so do NOT open them directly. For Gramps open them as follows: create a new Family Tree. Open it and go to the import menu and import the dataset.
  • What is tar.bz? This is a compression format. You must uncompress the file before importing it
  • Can you provide the GEDCOM? No. Offering GEDCOM sample would tend to attract excessive traffic to this site not related to Gramps. If you must have GEDCOM, you could install Gramps, import the dataset, and then choose "Export to GEDCOM".
  • What is in these files? See summary at the bottom of this page.
Test Code File Name Download size People Size(MB) License
d01 Doug's test GEDCOM - 100993 32MB Private
d02 testdb80000 11.2MB 82688 70MB Testing only, no sharing, no publication
*** NOTE: THIS FILE IS MISSING.
IF ANYONE HAS A COPY, PLEASE CONTACT [email protected] ***
d03 testdb120000 14.8MB 124032 88 MB Testing only, no sharing, no publication
d03_alternate test_2011-09-07.gramps 11.9MB 124032 88.4MB Testing only, no sharing, no publication (d03 for Gramps 3.3.x)
d04 Jean-Raymond's test GEDCOM french forum - 52699 13.6MB Private
d05 places.gramps 2.5MB 65598 place objects 15.3MB Testing only, no sharing, no publication
d06 (same as d05, but gramps42 format) Media:Places-2.gramps.zip 2.8MB 65598 place objects (expanded) 22MB Testing only, no sharing, no publication

Hardware configurations

Hardware Code Processor clock RAM Storage OS User
H01 Pentium 4 2.66 GHz 512 MB HDD Linux ?
H02 ? 1.7 GHz 512 MB HDD Linux ?
H03 AMD Athlon64 X2 2x2.1 GHz 1 GB HDD Kubuntu 6.06 ?
H04 Intel Centrino Duo 2x1.66 GHz 2 GB HDD Ubuntu 9.04 User:Duncan
H05 Intel Centrino Duo 2x1.66 GHz 2 GB HDD Ubuntu 8.10 User:Duncan
H06 AMD Phenom 9500 Quad Core 2.2 GHz 3GB HDD Windows Vista Jean-Raymond Floquet
H07 Intel Pentium 4 2.80 GHz 512 MB * HDD Ubuntu 9.04 User:Romjerome
H08 Intel Celeron Dual Core 2.60 GHz 2 GB HDD Ubuntu 10.04 User:Romjerome
H09 Intel i5-2520M 2.50 GHz 8 GB SSD Ubuntu 14.04.3 User:Sam888

(*) + 80MB of swap used on import

Tests table legend

Test Code Test Description
T01 Time to import GEDCOM/GRAMPS in empty native file format (GRDB)
T01_a Time to import GEDCOM/GRAMPS XML in empty native file format (BSDDB)
T02 Size native file format (GRDB)
T03 Time to open native file format (GRDB) for clean/non-clean start on people view (*)
T04 Time to open edit person dialog
T05 Time to delete/undelete person
T06 Open event view clean/after T03 (*)
T07 Sort on date in event view
T08 Overall editing responsiveness

(*) clean start means computer restart (so also python methods/modules must be loaded and started). Non clean means you have opened Gramps with .grdb file before, and open it again. Parts will be still in memory and access will be faster, as well as python being in memory.

Performance results

Gnome-important.png
General remark:

Tests are done with in Gramps preferences: transactions enabled, unless indicated otherwise with notrans. This gives a performance boost. For safety: only change this setting on an empty database -- you are warned!

Hardware Code Gramps data T01 T02
H03 2.2.4 notrans d01 (xml) 2h 542.6MB (v11)
H03 2.2.4 d01 (xml) 24 min 544.5MB
H03 2.2.4 d02 (xml) 20 min 323MB
H03 2.2.4 d03 (xml) 25 min 527MB
H03 2.2.6 d02 (xml) 15min 332MB
H03 2.2.6 d03 (xml) 23min 528MB (v12)
H04 2.2.10 (trans?) d03 (xml) 1h:56min ?
H05 3.0.4 d03 (xml) 1h:56min ?
H06 3.1.2 d04 (gedcom) 8min 937MB
H07 3.1.90 - 2009-7-20 (trans?) d03 (xml) 2h:44min 2GB *
H08 3.3.0 (+ DB upgrade v13 ? + v14 + v15) d03 (xml) 1h:47min 547MB (v15)
H08 3.3.0 d03_alternate (xml) 1h:46min! 543MB (v15)

(*) 1520MB log files - 480MB tables


Hardware Code data Gramps T03 T04 T05 T06 T07 T08 result
H02 d01 2.2.4 T03 = 4m17s T04 = ? T05 = ?/? T06 = ? T07 = ? T08 =
H03 d03 2.2.4 T03 = 2m37s/4m3s T04 = 3s T05 = 43s/23s T06 = 1m23s/12s T07 = 20s T08 = very bad
H03 d01 2.2.4 T03 = 2m22s/2m T04 = 3s T05 = 33s T06 = 1m9s/10s T07 = 18s T08 = very bad
H02 d01 2.2.5 T03 = 12s T04 = ? T05 = ?/? T06 = ? T07 = ? T08 =
H03 d03 2.2.6 T03 = /17s T04 = 1s T05 = 20s/18s T06 = ?/9s T07 = 21s T08 = Excellent
H03 d02 2.2.6 T03 = ?/24s T04 = 1s T05 = 17s/13s T06 = ?/11s T07 = 17s T08 = Excellent
H05 d03 2.2.10 T03 = 1m15s/16s T04 = 1s T05 = 16s/13s T06 = 11s/1s T07 = 26s T08 = good after loading each view once
H06 d04 3.1.2 T03 = 1m30/? T04 = 10s T05 = ?/? T06 = ? T07 = 19s T08 = 11s not bad
H07 d03 3.1.90 2009-7-20 Cannot allocate memory (also python-2.6) T04 = \ T05 = \ T06 = \ T07 = \ T08 = size limitation on 3.0.x, 3.1.x and trunk ...
H08 d03 3.3.0 T03 = 16s (flat) / 30s (tree) T04 = 1s T05 = 1s/1s T06 = 15s T07 = 30s T08 = 1s Excellent
? db version T03 = ?/? T04 = ? T05 = ?/? T06 = ? T07 = ? T08 = description

Dataset summaries

For every test dataset, create a Database Summary Report:

Database Summary Report's

Summary of database test d01

Number of individuals: 100993
Males: 53046
Females: 47947
Individuals with incomplete names: 324
Individuals missing birth dates: 42726
Disconnected individuals: 19
Number of families: 36554
Unique surnames: 15308

Summary of database test d02

Number of individuals: 82688
Males: 44736
Females: 37952
Individuals with incomplete names: 17120
Individuals missing birth dates: 31528
Disconnected individuals: 880
Number of families: 32256
Unique surnames: 13957

Summary of database test d03

Number of individuals: 124032
Males: 67104
Females: 56928
Individuals with incomplete names: 25680
Individuals missing birth dates: 47292
Disconnected individuals: 1320
Number of families: 48384
Unique surnames: 20695

Summary of database test d04

Number of individuals: 52699
Males: 26420
Females: 26279
Individuals with incomplete names: 2
Individuals missing birth dates: 16427
Disconnected individuals: 0
Number of families: 24604
Unique surnames: 5822

Summary of database test d05

Number of individuals:   2132
Number of families:       749
Number of events:        4981
Number of places:       65598
Number of sources:          9
Number of media paths:      7
Number of repositories:     5
Number of notes:         1509

User Stories

Running the tests can be slow, so here some user testimonies about Gramps Performance

Robert 2012-10, version 3.3.1

I work with a database of 141,000+names currently without difficulty (Gramps 3.3.1-1 on Fedora 16). Initial start is fairly slow though. First time to load each view is slow, but subsequent visits to views is almost immediate. Initial view load times:

  • people 11 to 12 secs
  • relationship abt 7 secs
  • family 3 to 4 secs
  • events 7 to 8 secs
  • places 3 to 4 secs
  • notes 11 to 12 secs
  • ancestry view abt 1 sec or less
  • Media abt 2 secs (although I only have about 1000 media in database)
  • Repositories almost immediate
  • sources about 1 sec - (time selecting a source varies according to number references for that source - my worst case is a civil registry which has about twice as many references as people in my database).

JohnBoyTheGreat 2019-12, version 5.1.1

Import tested with the GRANDMA Mennonite database of 1.4 million people. by user on reddit!! https://www.reddit.com/r/gramps/comments/dzevcl/database_size_limit_for_gramps/fb6hdbj/

FOLLOWUP...

It's been a few weeks since I asked whether anyone had attempted to use GRAMPS with the GRANDMA Mennonite database of 1.4 million people.

Based upon the suggestion above, I tried to load the Catalog of Life database...it took several days and it seemed to be working, but it eventually locked up GRAMPS. However, it seemed to be working. But, since I was only loading the Catalog of Life to test it, I decided not to waste time trying again.

In the meantime, I had ordered the GRANDMA database, but the organization selling it somehow gave me a bad download code and I couldn't reach them for several days (holiday and weekend). It was more than a week later before I was able to try loading the GRANDMA database into GRAMPS.

The result was SUCCESS!

It took about three days to load the GRANDMA database into GRAMPS, after giving GRAMPS a realtime priority in Windows. Setting GRAMPS to realtime sped up the loading considerably. My computer is rather fast compared to many, so anyone who wants to do the same thing should consider that it could take a week or longer to load up a huge database like GRANDMA.

At this point I have the GRANDMA database loaded into GRAMPS and running okay. It's really slow to switch to various functions, but it works okay once you get to each part of the GRAMPS program. One difficulty I ran into was scrolling through the 1.4 million records. It moves through the records quickly, but there are so many that you can't just use your cursor to pull to the surname you want to explore. Instead, I have to move the slider to a surname as close as possible, then scroll repeatedly until I find it. That can be 10-20 spins of the mouse wheel, so the process can be exhausting when you are looking for many different names.

My next step is going to be to copy the individual profiles which I need to a second family tree database that will be much, much smaller.

CONCLUSION: GRAMPS can handle 1.4 million records in a database. It's slow and takes several days to load, but it works.

Possible Future Optimizations

One can fine tune some things to obtain better results. An overview.

See if Gramps can pass this:

See also