Filters, proxies, generators and iterators

July 3rd, 2009

Dear devs,

As a follow-up to my earlier work with iterators and context managers, I am taking a serious look at the way we iterate over the database. Typically we do something like:


my_list = db.get_obj_handles()

where “obj” is people, places, events, etc.


for handle in my_list:
    # do something with the handle

Sometimes the very first thing in the loop is:

my_obj = db.get_obj_from_handle(handle)

In this case, the better approach is:


with db.get_obj_cursor() as my_cursor:
    for handle, obj in cursor:
        #I already have the handle and the data

There are times when this approach is not applicable, e.g. in Check.py where we might be modifying the data we’re iterating over (at least without major surgery on Check.py and there’s no compelling reason to do so.)

Sometimes though, the list is passed to other methods for further processing. The most obvious of these are the various filtering mechanisms, including the proxy databases. I’m experimenting with adding iterator methods to these classes. e.g. if a class has a get_person_handles() method that returns a list of person handles, I’m adding an iter_person_handles() method that returns an iterator over the handles. (For the analytically inclined, the first is an O(n) algorithm and uses O(n) memory and the second is O(1) to set up and uses O(1) memory.) The advantage of this approach shows up when you combine filters.

As a test case, I modified (locally) Narrative Web (very slightly) to take advantage of this as follows:

~/gramps32$ svn diff src/plugins/webreport/NarrativeWeb.py
Index: src/plugins/webreport/NarrativeWeb.py
===================================================================
--- src/plugins/webreport/NarrativeWeb.py (revision 12753)
+++ src/plugins/webreport/NarrativeWeb.py (working copy)
@@ -4186,8 +4186,8 @@


# gets the person list and applies the requested filter
- ind_list = self.database.get_person_handles(sort_handles=False)
- self.progress.set_pass(_('Applying Filter...'), len(ind_list))
+ ind_list = self.database.iter_person_handles()
+ self.progress.set_pass(_('Applying Filter...'), self.database.get_number_of_people())
ind_list = self.filter.apply(self.database, ind_list, self.progress)

return ind_list

Note that the apply() method still returns a list, which is fine for the moment. When you generate a Narrative Web Site, you can specify that you want neither living people nor private records included. When you do so without my patches, here’s what happens:

1. All the people in the database are passed through the “living” filter (via the living proxy db) and a list of dead people is returned.
2. All the people in the list returned from the first step are looked at to see if their records are marked private and removed from the list if that is so.
3. The updated list is passed to the filter.apply method for a third pass and updated once again.

In effect, we pass the data by the filters. A more intelligent approach is to pass the filters by the data. This is what happens with my patch in place:

1. An iterator is set up to filter out living people
2. An iterator is set up to filter out records marked private
3. The filter.apply method reads from the second iterator, which reads from the first iterator. In fact, it’s very much like piping data through the shell:

cat my_data | remove_living | remove_private | apply_filter > my_final_data

A pleasant surprise for me was that the “filter.apply()” method invoked above is just as happy with an iterator as with a list. That means that we do not need to modify any filters to use this approach.

So, what’s next? Extend this idea to the other objects and use iterators instead of list builders wherever possible.

What might go wrong? Well, the biggest immediate hassle is that you cannot write:

len(iterator)

Since the length is unknown until the the iterator is exhausted. There is one easy way around that. There is a get_number_of_people() method available for use that gets the count directly from bsddb. It might be high if there are lots of filters involved, since it returns the count of items in the database, unfiltered but it is usable.

When does this not help as much? When you are going to sort or index the data.

Metrics:

To see the advantage of this approach I worked up some metrics comparing 3.1.2 with the 3.2 trunk in svn with my patches. I modified Narrative Web to include this code:


from timeit import Timer
self.progress.set_pass(_('Applying Filter...'), self.database.get_number_of_people())
def t1():
    ind_list = self.database._person_handles()
    ind_list = self.filter.apply(self.database, ind_list, self.progress)
print "time:", Timer(t1).timeit(100)

and inserted "get" for in 3.1.2 and "iter" in 3.2. Then I ran the report on both 3.1.2 and 3.2 using the sample.ged in svn as the database. This is a small database with less than 50 people. I specified that living people and records marked private should be excluded, in order to trigger the corresponding filters. As you can see, I did 100 repetitions.

3.1.2 results:

time: 76.0091130733 or about .76 seconds per repetition
size: 120

3.2 results:

time: 35.1433758736 or about .35 seconds per repetition
size: 24

Thus, not only was the storage reduced (no surprise there) but the time was cut in half. Next, I tried the same with the large sample.gramps in svn which has about 2000 people. I set the repetitions to just one because I'm an impatient fellow:

3.1.2 results:

time: 80.1852030754 or about 80 seconds
size: 5748

3.2 results:

time: 39.8106660843 or about 40 seconds
size: 24

Again, the time was cut approximately in half.

Look, without GUI!

June 19th, 2009

Yes,

The time has come. GRAMPS can finally run as a standalone command line application, without the need for a GUI to be present or the GTK libraries to be installed. It took quite a larger axe to achieve than I hoped when I started. But I like the result. So now you can have an apache server with GEDCOMS and convert them to gramps xml files with a simple:

gramps -i JohnDoe.ged -f ged -e JohnDoe.xml -f gramps

If the GRAMPS mime types are installed, you can drop the -f flag.

For reference, the startup scheme is now:

  1. General configuration (python present, basic logging, …)
  2. Parse command line arguments
  3. Decide based on 2. if user wants GUI or CLI version of GRAMPS
    1. if CLI, start a cli sessionmanager, a cli databasemanager, and handle the arguments, finish.
    2. if GUI, start a GTK loop, and run a Gramps() process in it.
      This process sets up the error report hook, starts up the viewmanager that will manage the main window you see, then starts handling any cli arguments given (open and import are possible). When the arguments are acted upon, control goes to the viewmanager, the interface is shown with the family tree loaded, or with the family tree manager open

I hope this flow is considered a natural way of doing things. I hoped to have time to look at how other projects handle this, but my free time was again too limited to actually have time for it.

My main hope is that now other developers use this work, to clean up some code (it is mostly reshuffling of code following GEPS 8, not a code rewrite), and make the GRAMPS reports also workable when GTK is not installed. That would make GRAMPS usefull to other webbased projects (think a website with a button to create a GRAMPS detailed descendant report on the fly when requested by the user).

Does it work? I works on my box, but gtk is installed here. In theory it should work, but it will depend how the plugins are written. Eg, I see in ImportGedcom the statement: import ErrorDialog…. Somebody has to spend time to make ErrorDialog do something usefull if GTK is not present or should rewrite ImportGedcom to use print if import of ErrorDialog fails. That would be for a different developer though, I have other irons in the fire :-)

Benny

Cursors, Iterators and Context Managers, on my!

June 17th, 2009

Lately I’ve been looking at how various gramps modules access the database. We tend to have patterns like:


some_list = self.db.get_some_handles(sort_handles=False)
for handle in some_list:
    some_data = \
    self.db.get_something_from_handle(handle)

This works just fine of course, but while looking at these patterns I started to get worried about the performance implications of this approach.  If you drill down to find out what “get_some_handles()” actually does, you will arrive at a line in base.py like this:


return table.keys()

which calls the database keys() function.  What does that do? Quoting from the bsddb documentation:

Return the list of keys contained in the DB file. The order of the list is unspecified and should not be relied on. In particular, the order of the list returned is different for different file formats.

Well, OK, but really happens is that the method retrieves all the keys in the database and builds a Python list, which it then returns to the caller.  However, looking at our “for” loop, above, we go through the database again, this time extracting the data that we really want. Thinking of the complexity of this approach, we have Omega(n) operations to retrieve the keys and Omega(n) operations to retrieve the data associated with the keys.  On top of that, we have a list that is — you guessed it — Omega(n) in size.

Cursors:

Wouldn’t it be nice if we could do both things at the same time and maybe not even have to build a list of keys? Well, we can!  You see, there is another method, using a database cursor.  I could rewrite the for loop above like this:


mycursor = self.db.get_some_cursor()
for key, data in mycursor:
# do something with the key and data

The complexity of this algorithm is Omega(n) < 2*Omega(n) and the storage used is only O(1) < Omega(n).

Note 1: This approach will not work if you need the data sorted or filtered. Obviously if you need the data sorted, you need to build some kind of list and sort it, which is the default behavior of get_some_handles().  Also, if you need to filter the data, there is currently no way to avoid the first approach, but I’m thinking about that…

Note 2: If you rely on attributes of a list (such as len() or indexing), you can’t use them on an iterator.  If you still need to know how many elements there are in some table, you can generally get that with a call to:


self.db.get_number_of_something()

Iterators:

Actually the modified “for” loop using the cursor also uses a new method added to the GrampsCursor class: an iterator.  That is, there is a __iter__ method in the class that does the work that allows you do say:


for key, data in mycursor:

Without the iterator, you need to use a longer approach:


data = mycursor.first()
while data:
    # use the data
    data = mycursor.next()
cursor.close()

Context Managers:

Did you notice the call to the close() method?  Did you also see that the revised “for” loop has no call to this method?  In practice, this would leave an active cursor with possibilities for deadlocks. However, thanks to another method (two, actually) added to GrampsCursor, it is now possible to code it like this:


with self.db.get_some_cursor() as mycursor:
    for key, data in mycursor:
        # use the data

If you haven’t used the “with” statement before, spend a few minutes reviewing its use in the Python documentation.  Basically it establishes a context inside which you can work.  It also guarantees that certain things are done when you leave the context.  In this case, when you leave the cursor context, the cursor is closed automagically.

To Do:

The next step is to look at proxy databases and filters and see if we can turn some of their functions into iterators as well.  Stay tuned.

Html output for textreports

June 12th, 2009

I’ve finally ticked off some items of my todo list. I wanted to add styled notes to the html output of GRAMPS. I added it to pdf before the 3.1.1 release, but doing it in html meant I needed to understand css better.

In the meantime Gerald wrote a new class to write Html files easily, and Brian complained about the textdoc api being encumbered by style related things. Oh, and the filestructure of the basedoc classes was a mess, and template based html output should be deprecated. It is often like that, you want to do a little thing, but the standards of OSS are sometimes that high that you can only achieve that if you at the same time tackle 6 not directly related things.

That is what I ended up doing though. First reorganizing the code, then factoring the style related stuff in their own classes, then removing the template code replacing it with the general style sheets we already have, and finally adding the note markup to the html output. Time for a picture to display the result, the detailed descendant report, with Nebraska css and some markup notes:

DDR with Nebraska style

DDR with Nebraska style

Note the error here with the preformatted note that runs into the picture. A minor problem I believe. Not sure it can be fixed due to the nature of the pre tag in html. The example also shows clearly that preformatted text can contain style just as any other note. Whitespace is retained however.

The next picture shows the kinship report with Basic-Ash css, and headers that have been edited via the style editor of the report (add borders and center).

Kinship report with Basic-Ash style

Kinship report with Basic-Ash style

So what I set out to do works. I realize that the fact you can select the css has little influence on the report itself. Every text report has it’s own generated css classes, overriding the main css files. However, this does lay the foundation for others to improve the layout:

  • The styles of the report are in a css file, so one can generate the text report, and replace one css file with a custom made one
  • In the main style sheets, like Nebraska, support for the style id’s and classes of the most common reports can be built in, allowing the report output to fully integrate with a web site one has up.
  • Having html, one can add code to have hyperlinks linking the people. In the detailed descendant report, this could really improve the usability. Perhaps some other developer is interested in this. One could link to the citation section too
  • One could use these html reports on the download page of a generated narrative website

Well, I hope some people use this revamped html output, and also hope some developers run away with the idea to improve the layout and other things.

So, please test, test, test this. If you have the development version at hand, try out some text reports, and see if all works. The output can often be improved with some minor changes to the way the report is generated:

  1. A paragraphstyle for a header should use the set_header_level function. This will cause the header to be written with a h1 to h6 tag or a <div id=’grampsheading’> tag, instead of just a <p> tag, allowing nice style integration with the css style where the h1/6 tags are already defined.
  2. All textreports write out some fixed id’s, and some classes based on the defined style of the report. The new id’s are:
    1. id=”grampstextdoc” : en spans the entire text report
    2. id=”grampsheading” : a small defined heading, used for header_levels > 6
    3. id=”grampsstylednote” : start of part with a styled note, divided in paragraphs

    So if all css stylesheets define these styles, together with h1 to h6, the layout of all text reports can be improved.

  3. There should be at least one style with header_level =1 and with a stylename that contains ‘Title’. That will be used as the title of the html page.

So, any webhackers out there, test it, and make it better.

[Update] To show how integration can be achieved, for the above kinship report, the subtitle style has been given a level of 3, so style.set_report_level(3). The consequence is an immediate better integration of the report with the Basic-Ash style:

Kinship with header support

Kinship with header support

It is better without the borders added, which is the default for this report.

Benny

3.1.1 was released

April 6th, 2009

9 march saw the release of GRAMPS 3.1.1, an evolutionary release over the previous 3.0.x series. The release notes say it all.

I’m quite happy with this release. It shows we are able to produce a minor upgrade with new features on a yearly basis. So if all goes well, spring 2010 should see a 3.2.x release. The roadmap for that version is still under construction, if you plan on developing on 3.2 in 2009, add your plans now.

I however will still be busy a while with bug fixes in the 3.1.x series. I updated the pdf/print output of the reports for 3.1, but some issues still remain. I like to think I improved it a lot over 3.0, but I want it to be better than good. Next up is installing a beta of ubuntu 9.04, as some people are submitting bugs in that version that where not present before. Those are normally complicated issues with base libraries we depend on. It is important though that GRAMPS keeps working on the fast moving linux platform, so although I don’t like it, we need to spend a large amount of our developing time on supporting our moving base (new gtk libs, new distutils, new python, ….). It can be annoying though to be working on GRAMPS all the time, but never get a new feature in. Here is to hoping no new issues come up after I fixed the present ones.

Benny

What are Gramplets?

December 26th, 2008

Ever since my first few moments using GRAMPS I was struck by two points:

  1. the main Views are great for for seeing and moving about a family tree
  2. the Reports have a wealth of information

After using GRAMPS for a while, I began to wish that these two points could be merged. That is, I wanted Reports that acted like Views: I wanted to be able to click on items in a report to move about my data, and that these reports would change dynamically (without having to run the report again). Furthermore, I wanted to be able to click on summary data, such as “Individuals with Unknown Gender”, and see exactly who these individuals were.

One possible solution to this dilemma was to add more views to GRAMPS. I think most of the developers felt that we already have too many Views, and I didn’t wanted to suggest adding dozens more. Another possible solution was to add a new method of “printing” a report—one that would go to the screen and be clickable rather than go to a file or paper. I worked on the latter for a bit, but that was going to take a lot of work and restructure of all of the reports. And it didn’t solve all of the issues that I was thinking about. For example, you’d still have to “run” the report.

Almost a year ago today, I suggested to my GRAMPS developer-colleagues a new view that would contain tiny report/view hybrids. The working name for this view was “My Gramps” and I called the tiny views a “Gadget”. After a good discussion we decided on the more playful term “Gramplet”, which became the term for the view as well.

No one really understood how this view would be received, or how it fit into the rest of GRAMPS, including the Tools, Reports, and Quick Views. But Brian allowed the view in to see what people thought of it. The real usefulness wasn’t truly appreciated (at least not by me) until Benny pointed out that a detached Gramplet can be moved to the side of another View. This allows users to create their own View of sorts, by surrounding their screen with Gramplets. At that point, I (and others) began to see how Gramplets fit into Gramps.

Today, there are nearly 20 Gramplets, although some are purely experimental. I think the idea of a Gramplet also helps to better define the other parts of GRAMPS. Tools should be little programs that process data, reports are for printing, and Quick Views are lists tied to a particular object and don’t update with a change to the active person (or other object).

In my next post, I’ll introduce a couple of new gramplets for GRAMPS 3.1.

-Doug Blank

Gramps is not a web browser …. yes?

December 10th, 2008

I am currently working at integrating the GeoView Serge contributed inside of GRAMPS. GeoView is a way of displaying your genealogy information on a map. This has been tried before in GRAMPS with MapView, which never got past some initial coding efforts (before me time with GRAMPS, I don’t really know what the issues where). MapView shows already however that the GRAMPS developers are very aware that geographical information can help a researcher in his analysis.

So, in the beginning of 2008, Serge contributed GeoView, a way of showing Google Maps inside of GRAMPS, driven from data in your family tree. Serge went a long way in improving his code. He added OpenStreetMap due to concern of the core GRAMPS contributers, got consent of the BSD style library driving the map abstraction layer, … . So, some weeks ago I decided to jump in and get this code into GRAMPS as experimental code. If people like it, and want it, this can be shipped as an addition to version 3.0.1, if not, well, I’m sure I’ll have fun trying this. I worked on PlaceUtils, the Place Completion tool and an export to KML to see data in Google Earth, so I guess I’m a bit interested in seeing this effort succeed.

The first thing I did was add Serge’s code to subversion, it works after all, even if some polish and care is needed. Now I’m in the process of reorganizing this code and make sure it is easy to maintain. The code was one large class, so some refractoring is in order. The first thing I did was abstract away the difference between the two backends, WebKit or Mozilla. Perhaps a window developer can now jump in and add explorer ;-) ?

Next, I decided to split off all the code that drives the webpage display into it’s own class, HtmlView. Like that, GeoView can inherit from HtmlView, and bother only about creating the javascript needed to drive

Gramps website inside of GRAMPS

Gramps website inside of GRAMPS

the map display. I committed HtmlView today, a picture says more than 1000 words :-)

HtmlView is meant as a base class for GeoView, not for a real view, but this is fun nevertheless. You can type a web address at the top, and GRAMPS dutifully will display it. The bugs of the renderer you get for free (see the image in the top left missing). I don’t plan to add anything fancy to HtmlView, if somebody is interested in adding backward, forward, … then be my guess. I’ll now change GeoView to inherit from this class.

Quick info on how you can try this out yourself:

  • get the latest SVN version and make it
  • edit your ~/.gramps/keys.ini file: Look up your data-views key, and add HtmlView. For me this means I now have:

    data-views=GrampletView,PersonView,RelationshipView,FamilyListView,
    PedigreeView,EventView,SourceView,PlaceView,MediaView,RepositoryView,NoteView,
    GeoView,HtmlView

  • Make sure you have one of the backends. I have gtkmozembed installed, Serge Webkit. Webkit does currently not work in Ubuntu Hardy, you are warned.
  • restart Gramps and try it out.

It would be interesting if somebody on windows can find out if he can make this work on that platform.

Open source: the speed of light…

October 7th, 2008

There has been calls for a few more ’stories’ for the gramps blog, so I figured I’d toss this one into the mix…

Though I live in New Zealand, most all of my own genealogy work relates to an area out to the west of Houston, TX, USA…

In some of my recent additions and consolidations, I’ve managed to end up with many of my various notes becoming ‘disconnected’ – they referred to people who, after I had imported from other sources, I may have removed from my own database.

After checking out the tools that are used to identify unused objects, it seemed to me that somehow, in the development, no one had ever thought to add notes in: it happily corrected problems with unused records of other sorts, but nothing for notes.

I decided I could either:
* go through a delete the extras manually, or
* try out the system!

I registered the situation on the gramps ‘bug tracker’, describing it as clearly as I could.

Then, pretty much in the time it took me to catch the bus in to work (and this isn’t a very big town, so that doesn’t take long!) it had some action! I logged on to the gramps IRC channel, since I like to ‘listen in’ on what is being said there, and saw that the bug had been assigned to someone, picked up and within only about 2 hours of the time I reported it, the svn ‘up to date’ files for gramps had the fix incorporated!

You can’t beat responsiveness and speed like that outside of the open source community, eh?

Now, if only someone could tell me more about Tom (?) Anderson, one of my GGGrandfathers, I’d be *really* happy!

Nick Wallingford
Tauranga, New Zealand
(Wallingford, Ogg, Hegar, Loyd/Lloyd, Robertson, Campbell – Waller and Montgomery Co, TX)

Gramplets

February 14th, 2008

I looked into the Gramplet code Doug wrote for 3.0 for the first time. Some really complicated stuff is in there, but it does have some elegance. Mutual referral (A points to B points to A) is something I like to avoid myself, but we must all agree the Gramplet system is a wonderful pluggable addition to GRAMPS.

My foray into this code was due to requests on the user mailing list for better navigation on the pedigree view, later turned into a feature request. We often get requests to change the main views, but this is a very difficult balance. Too much information and many users are put off and want a simpler design, to little, and many users complain they don’t see what they want easily.

Hence my idea to see if the issue could not be solved with a floating gramplet, and two hours of hacking and trying to understand the code brought up a functional design. Things can be improved, but I like the result. An analogue idea can be used for other viewing purposes, giving more power to the user, without having tens of options to change a standard view. Eg a person gramplet, with an overview of information about the active person, visible on screen without opening the person editor and without the need to add a new view to achieve it. The user who wants the extra info just places an appropriate gramplet on his screen.

It was fun coding this, now back to reality and encoding bugs, database bugs and relative path problems…

Benny

3.0 artwork

January 22nd, 2008

Away with web 2.0, here comes 3.0!

As I’ve now been involved with GRAMPS development for a while, it is time to take up the burden of doing some writing on this blog. Let me write about one of my achievements for 3.0, the artwork. I’m really pleased with it, and hope you are too. You’ll have to be looking at it for an awful long time if you use GRAMPS regularly.

The call to rework the icons came from Don perhaps two years ago. He wanted to shift from the semi realistic look of 2.0.x to the new freedesktop/GNOME Tango style, and partially started that for 2.2.x in 2006. He is no graphical artist though, so this was only partially done. Now, it must be known that I cannot draw, nor paint, well, actually, I barely can hold a pen straight. However, I don’t mind to try, and try again, and again. The beauty with PC is you can mess up as much you want, you have UNDO, and ZOOM and whatnot to help along while you stumble. So that is how most of the new icons came about. In all fairness, most are shameless ripoffs of other Tango Icons, others are based on Gnome icons and for 2 or 3 some people send in improvements (eg the family view icon). You can see the changes here and a somewhat outdated page with the new icons here.

The consequence of all this work is that only the most generic icons are dependant of the base system, things like edit, add, list, … . All the rest will look the same whether you are on Windows or GNOME, … . This should help users as it means the screenshots on the manual will actually look mostly as they see them before them. The system is furthermore set up in such a way that one can override the base GRAMPS icons with icons from an icon set, provided they add icons for gramps. Well, that probably needs some testing, although 2007-04-01 did see it in action (see attachment there).

Is this it? Probably not. Some icons are still missing, eg the import and export. And styles keep changing. So if you are itching, tune in, and send in some suggestions, especially if you are a better artist than I am!