GEPS 032: Database Backend API

From Gramps
Jump to: navigation, search
Gnome-important.png
GEPS Closed

This GEPS (Gramps Enhancement Proposal) is closed and available in the version of Gramps indicated below.
Do not edit this page. Submit bugs/features to https://gramps-project.org/bugs/.

This GEP is now complete and has been implemented for Gramps 5.0, now in github master (Aug 14, 2015).

This proposal defines a complete Database Backend API so that we can have plug-in replacements for BSDDB. This would allow the use of other databases.

This is an idea refined from GEPS 010: Relational Backend. However, without the relational components.

Plan

Step 1

Create a Database Backend plugin. Create a functioning BSDDB plugin, that is used by default. Identify which plugin to use via a file in the database directory.

This was developed under the branch geps/gep-032-database-backend here:

https://github.com/gramps-project/gramps/tree/geps/gep-032-database-backend

It has now been committed to gramps50 (aka master as of this writing).

Identify the items that:

  • need to be fixed
  • are too BSDDB specific (some tools?)
  • could be abstracted away from details

Step one is completed.

Step 2

Develop alternative plugins that support the db API.

Exactly what is required for a class to implement a fully functional database for Gramps is only determined by examination of the existing BSDDB class. This involves the following components:

  1. data and metadata update, add, and delete
  2. transactions for batch or atomic changes
  3. signal handling

Once a full Gramps Database class is created, there needs to be a way of:

  1. selecting which backend to use for new databases [DONE]
  2. selecting the database to load (Family Tree Manager) [DONE]

We will use the directory structure, as we do now. In each directory, the type of database needs to be identified. This could be done in two ways:

  1. well-defined database backend types. These could be registered, like any plugin/addon. [DONE]
  2. is there really any reason for Gramps to have to have the code for the db backend? All that is necessary is for the backend to create the Database instance. [DONE- Backend code is either in gramps/plugins/database or in ~/.gramps/gramps50/plugins)

It makes sense that we will reuse and share the backends, so we should use option 1, and develop a database backend plugin type.

Because there are so many functions for the database layer, extensive testing should be created to test all functions to ensure that a backend works correctly.

We should factor-out all BSDDB dependencies, and make BSDDB the first database backend plugin. The plugin API should include functions for:

  • making a new database, given a directory [DONE]
  • loading the database, given the directory [DONE]

Other things that might need to be changed:

Listing the databases (-l and -L). That might include changing the current listing: [DONE]

Family Tree "test_family, gramps40":
   Bsddb version: (5, 1, 29)
   Last accessed: 11/17/2013 08:47:12 AM
   Locked?: no
   Number of people: Unknown
   Path: /home/dblank/.gramps/grampsdb/528787d0
   Schema version: Unknown

DjangoDb, DictionaryDb, and DBAPI are complete.

Step 3

Develop a fully-tested alternative to BSDDB.

Proposal is to develop a DB-API 2.0 database backend, testing with sqlite, postgresql, and mysql. [COMPLETE]

The database directory will have a small initialize program to create the database, and return a class that can create the connection.

Transactions

Because BSDDB didn't support transaction natively but required a two-step transactional system, Gramps developed this support in Python. Most database system have transaction built into the system. Most database backends will perhaps not use transactions as BSDDB does.

There still could be uses for the transactions, However. For example, we can use an abstraction for the History undo/redo. Although the current system only exists in the current session, and is limited. We can probably create a better method with more features (such as diff between versions, lifetime changes, etc).

COMPLETE

Complications

Some systems, like Django, have a single database used per Python session. That makes sense, given that it is designed to be a webserver with fixed settings. However, what to do if you want to use the Django ORM and switch settings?

I have implemented a solution based on the following:

import sys

class ModulesCheckpoint(object):
    def __init__(self):
        self.original = sys.modules.copy()
        
    def reset(self):
        # clear modules:
        for key in list(sys.modules.keys()):
            del(sys.modules[key])
        # load previous:
        for key in self.original:
            sys.modules[key] = self.original[key]

checkpoint = ModulesCheckpoint()
# do stuff
checkpoint.reset()
# do stuff again

However, this has to be done in the right place, and can't (as far as I can see) be embedded in an the imported database plugin. [DONE]

It would be nice to do away with hack, but is the only method I can find to unload Django. It works fine, so far.

Progress

The first step is to separate all of the gramps.gen.db code into reusable and extendable components. [DONE]

This has begun with the DictionaryDB 4972, which is a in-memory replacement for the BSDDB. Still needs the indexes, and metadata support (gender names, bookmarks, etc). Also, the Dictionary transaction is non-existent.

Currently, the best working replacement backend is "dictionarydb". [DONE]

We can develop backends that work directly on Exported formats. Others to consider: GEDCOM and CSV. These would probably use a DbDictionary, and simply import/export on load/close (would lose data if power outage; would be fast as in-memory, slow on start/stop).

Branch geps/gep-032-backend-database is complete.

Other Backends

Other Backend plugins to consider developing:

  • MongoDB
  • CouchDB
  • CSV - spreadsheet
  • SQLHeavy - probably not... sqlite is a thin, robust, backwards-compatible-guaranteed layer. SQLHeavy has too much that we do not need.
  • Libgda

See Also