Talk:GEPS 010: Relational Backend

From Gramps
Jump to: navigation, search

I tried to keep these in order and make a little more sense out of them but it's still a bit rough. --AaronS 18:33, 26 March 2009 (UTC)

Databases

I suspect that we would have something like SQLite as a default, but allow experts to move to more sophisticated backends. -- ?

From previous GRAMPS discussions:

http://osdir.com/ml/genealogy.gramps.devel/2005-02/msg00092.html

From the GRAMPS archives:

> Now, sometimes we get a request for a major architectural change that we
> will accept. A good example is the new database backend for the upcoming
> GRAMPS 2.0. The request came in to support a real database backend so we
> could support larger databases. We analyzed the request, and felt that
> it matched the goals of the project and would provide a significant step
> forward in the usability of the program. The result was a major redesign
> effort that will soon be released.

I think I and few others are the ones that impacted this decision.  Having an 
850,000 person database tends to be deadly to the XML architecture that we 
were with.  I've been the main person to test the integrity of the system 
with my Gedcom file importing.  When I found that I couldn't import my file 
without extensive data loss, I came to Don and Alex and we all sought for 
solutions.  We found that the XML interface was taking huge amounts of 
memory, and we looked for database backends that would handle the load.  Don 
and Alex came through with the BSDDB backend, and ever since 1.1.3, I've been 
happy as a clam with the Gramps project, because I'm one step closer to 
killing Windows.

I personally want to do away with it, but I need it for other applications.  
I've also come to the realization that both Windows and Linux are good, but 
in their own realms.  I don't want this to become a huge flame war about 
Linux and Windows. so if you have other questions as to why I feel this way, 
email me.

> So, would we accept a mySQL database backend? There is a good chance we
> would (depending on the implementation), as long did not impact Aunt
> Martha. We have even architected the backend to support this, since we
> can see that higher end databases could provide additional functionality
> such as versioning and multiuser support.

We could accept mySQL because of this, but I agree with Don.  If it negatively 
impacts the end user, why would we want to proceed with it?  I have a friend 
that wondered about mySQL interaction, but he can see the impact that BSDDB 
has had on my database, and he has sided with me as well as the rest of the 
team.  Not to say that this is not a possibility, but we need to remain 
focused on the tasks at hand.

> So, in summary, the project is going in a direction that seems to meet
> the needs of our users. If we changed directions, we might or might not
> be able to reach a larger audience, but numbers are not our goal. We
> fully support others submitting patches and other contributions, but
> they will be weighed on how they match the goals of the project (and
> most of the patches we've received to date do match the goals). If
> someone wants us take the project in a different direction, we may or
> may not be receptive depending if the direction matches our goals.
> However, we will support your efforts if you decide to fork the project.
> Who knows, maybe a remerge will occur in the future, or a forked project
> will make us irrelevent.

I agree with Don on this, numbers don't matter as long as the users are happy.  
Getting things appropriately nailed down and ready for the end user's use is 
what is paramount.  After all, if there were no users, why would we even have 
a project with which to collaborate in the first place?

We are here for the users, especially Aunt Martha, because of the fact that 
many people are just moving over to Linux and having something familiar to 
them, like a genealogical program is what matters to them.  Making the 
transition to Linux is hard, don't get me wrong.  But we are making it one 
step easier by not complicating the user's experience in their move.

Like I said before, I'm just a bug finder.  I'm not really a Python 
programmer, or anything, but I like to find bugs.  Even if that's all I do on 
this project, I'm rather content.  Everyone else that wants to port over to 
other toolkits and whatnot is free to do so.

But also as an end user that's still a greenie to Linux in general, I can say 
that this program has helped my move over to Linux that much easier.  Even if 
I have only contributed a little in the way of feedback (mostly from the 
end-user perspective).

-Jason

From http://osdir.com/ml/genealogy.gramps.user/2004-06/msg00078.html:

Alex said:

    SQLite might be better or it might not, we haven't tried it. A great factor
    speaking for BSDDB is that it is supported by a standard Python module,
    bsddb. 


Don said:

This is an important factor here - ease of setup and use. GRAMPS is
difficult enough to get installed on some platforms (especially
KDE-centric systems). Requiring someone to get an SQL database up and
running to try out the program is probably too much effort. What I've
discovered is that GRAMPS is one of the first programs that a lot of
new users want to get running - usually before they have a lot of
Linux experience. So we can't make the barriers to entry too high.
"Requiring someone to get an SQL database up and
running to try out the program is probably too much effort." This simply isn't true of sqlite
at all. The program would simply write to the db file. No server setup, no user accounts, no
 connection settings. Just a file name. users wouldn't even know. The embeded version of MySQL
may be similar but I haven't tried it out. This might be true of MySQL though. However, I believe
it's possible to use scripts and or code to manage launching and stoping the server. It might
be possible to make it seamless for the user but would depend on the implementation.
--AaronS


It's good to see this discussion on gramps and is actually why I'm thinking of giving it another try depending on how hard it is to implement this. Yes I know it will be hard but probably much easier and productive than starting my own project. I'm a developer my self and when it came time to evaluate gramps the lack of a relational db backend was one of the main reasons I decided to keep looking.

Don't discount MySQL over SQLite. While I haven't tried it out yet there is an embeddable version of MySQL which might overcome some of sqlites advantages. If a database abstraction layer is used both could be easily supported. They both have their advantages and disadvantages.

Personally I think SQLite makes more sense for genealogical software. but mysqls tools and the fact that it's a "real" enterprise level relational db are serious advantages. -- AaronS

Database Abstraction

It is quite powerful, but perhaps more sophisticated than what we need. I think we want to find the right balance between power and dependencies. --?


What you looking for here is called a Database Abstraction Layer they are indeed quite powerful and are exactly what you need. if your going to bother switching the back end don't waste your time and not use one. you'll kick yourself later if you don't. just be careful which one you choose. I know that in php every web framework seems to have their own. I suspect the same for python. Django has their own but allows for the use of others (if that tells you anything). might be a place to check for alternatives. While their framework might be for websites that shouldn't matter for the DB Abstraction layer.

What to look for in a db Abstraction Layer is which dbs it can use. sqlite and mysql are musts, you may even find one that can talk to BSDDB but probably not. Oracle and PostgreSQL are pluses but will probably never be used but who knows what will happen in 5 or 10 yrs. who knows maybe oracle would get fed up with mysql and release the db open source charging for service. stranger things have happened. ease of use, readablity and outer joins are also important. don't worry too much about how complex of sql queries its supposed to allow you to create since complex queries through a db layer tend to be difficult to create, read and predict. ie sub queries and the like. usually those queries are far easier to just build as a query.

in my experience a db abstraction layer is good for most of the db io. however, for the complex stuff a sort of localization object (or even file) is a good bet with named queries. this would work similar to how different languages are usually supported in projects. with a different object or file per db. I'd recommend an actual object with a function per query over a file of constants/variables since some db's might require a little more manipulation than others. again this would only be for the most complex queries. a good rule of thumb would be if you had to start writing parts of the query as strings move it to the db localization object. This db localization object isn't used for all queries because you only want to have to tweak the minimum amount of queries across dbs --Aarons

Recomendations

AaronS

Let me preface this by restating that I've never actually used any of these abstraction layers and I'm not yet familiar with the gramps code and developers strengths. Other people with more knowledge should be the ones making the decision. Also any decisions need to be revistable after we actually start coding in case they just don't work.

I've spent the last few days trying to look at the current options for db abstraction. From what I currently know I think I'm going to recommend we use sqlalchemy with sqlite.

sqlite. no server to manage and single file db's will make them easy to share and manage multiple dbs at the same time. also make merging simpler. will allow websites to be developed that will work directly from the db. As long as gramps doesn't switch focus to be some kind of mass user website for editing large trees I think sqlite will fit the bill.

sqlalchemy. this seems to have a large following and good documentation. It should allow us to support different db back ends easier in the future. at least some people think it's the best python orm available. it seems to provide good tools for when the ORM starts to get in the way.

Reasons I don't recommend the other options include:

MySQL. probably not as user friendly and since gramps isn't a client / server sort of program I don't think it's necessary.

DB-API. with sqlite3 It sounds as if the DB-API in practice doesn't support the changing of dbs as much as might be thought. If we commit to sqlite though this might be an option.

SQLObject. this seems like a viable alternative to sqlalchemy but slqlalchemy seems to have more documentation and user acceptance. Also the ORM layer might not step out of the way very nicely. the website says it will but I wasn't quite buying it from the examples.

Storm. while this project looks promising and may be easier to use than sqlalchemy it's only a year old and as I was recently burned by picking a fringe tech for my tech stack I'm a bit skittish of anything that doesn't have wide acceptance and use.

Additional notes: I was originally advocating for database abstraction not an orm layer. I've never used a true orm and can't fully say how they work in practice. While I'm not solidly on the orm badwagon I do think an orm layer might do gramps some good. There will be situations where simply writing queries will be far easier. Our implementation model should take that into account. from the website sqlalchemy sounds like it will provide both abstraction and an orm and we'll be able to use both as the needs determine. While I don't fully agree with the severity of this article he does make some valid points. There is a reason that true object databases haven't caught on. I guess I'm advocating for something like "Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access ... to carry them past those areas where an O/R-M would create problems." article

--AaronS 00:52, 26 March 2009 (UTC)

Amendment What ever abstraction we use it would appear that they almost all sit on top of a DB-API module. sqlAlchemy's intro does a good explanation of the layers. Therefore we will at least be using one of those. One option would be to start coding around a couple of DB-API modules first, one for each database (sqlite and mysql) and see how the abstraction would work for each case. If we start the testing development side by side than we will be able to see how interchangeable they will be. Note that there are multiple DB-API modules available for each database. There appear to be leaders but their might be choices for a reason. If we pick an ORM then the choice will be made for us but if we don't than a choice will have to be made. Personally I'm more familiar with a model (mvc) approach where each table or basic construct has an object that handles the db io but isn't actually trying to be that table. I only suggested sqlAlchamey because that appears to be what the current consensuses is for abstraction. --AaronS 19:07, 26 March 2009 (UTC)

Python Anydbm