On database systems

Topic: On database systems (Read 7259 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

On database systems

May 17, 2013, 09:19:13 am

We just merged a PR which removed SQLite dependencies from Elk core.
https://github.com/elkarte/Elkarte/pull/399

My proposal on this, has several aspects:

database layer implementations for PostgreSQL and SQLite were already (of course) separated in their own files. But there are exceptions, some core code has been written specifically for them. Those dependencies, should be factored out and removed, from the 'bare bones' Elk core codebase.
for SQLite, they seemed to be few, though I might have missed stuff (I didn't quite try hard). If other dependencies are found, they're supposed to be removed and/or factored out and removed.
for PostgreSQL, there are more than that. Postgres compatibility is IMO important, and I don't think it's a good way to just break it. For example: specific queries for personal messages. The way I see here, is to move those to .subs, then, at a later date, we can split .subs for PostgreSQL, and make it available to people as part of PostgreSQL extension. This is what I meant by factoring out the dependencies.
I was starting off extensions for other database systems in the tools repository, i.e.: https://github.com/elkarte/tools/pull/8.
That PR is out of date now, as SQLite changed during database layer changes; please ignore that, and just consider it as example of what I mean to propose here: tools/extensions/sqlite/ is meant to simply have the last good known SQLite files (*snickers*), before removing them from Elk core repository completely.
the extension files can contain also .subs files. Since all queries are meant to be in .subs, any query to be executed, which doesn't fit for a system, can be rewritten by its own .subs file.
there is currently no mechanism in Elk core, to load specific-database-systems .subs files, and to execute their specific functions. If we move forward in this direction, we will add that. The simplest way to think of it is like templates are loaded: loadSubs(). (not suggesting, just exemplifying.)
I make no assumption (and no proposal yet), on the format these extensions are to be delivered to users. Examples: webinstaller asks what db system to install, and grabs appropriate files for PostgreSQL; or, specific distribution, with its installer/maintenance/upgrader, which contains the files for PostgreSQL, including the .subs it needs, and not those it does not.

Notes:
With the PR on database layer, which removes SQLite dependencies, it's no longer a matter of procedure to update SQLite install.sql or upgrade.sql when making changes to core, or rather, not the same way we did. They will of course need updated (assuming SQLite 2.8 is of any interest, which in fact I believe it's not; SQLite 3.x may be, but that's another issue, it requires updating the old files to it first anyway).
My suggestion is: when we make changes of relevance for installer/upgrade or otherwise other database systems, inform the core extensions of it. If people want to update the support, they need to know what breaking changes to address. For example, file an issue or a PR to /tools, pointing out to "my awesome follow-ups feature meant to shut off Teh Cat" which will require update. Yes, they can figure that out from install.sql history, but you can help.

Extensions are meant to be open to the community, to empower people to drive them to whatever they want them to perform as. (just like core is, for that matter). Under the project guidelines, subprojects simply need a way for people to take up, communicate, know how it works, and make it work.
I know we might have "inherited" a bit of SMF-type tendency to 'centralization', as in "all or nothing", or as in "if the core developers aren't doing it, then it doesn't exist", result (IMO) of its proprietary history in a significant measure. But lets not get into that now: simple thing is, database systems layers and support, need to be more modular, "distributed", available for interested people to step up and make it work better. Both technically, and process-wise.

The outline of the intentions above, refers more at the technical (well, design-wise) level. But I didn't even get into other details (such as them pesky differences between 1 and false in a query heh), or the need to change implementation to PDO-based, etc.
I'd appreciate any feedback on this. It's still on basics only, but it's here to explain why I was doing things the way I did in the PRs, and to open for discussion how we go in the future, on multiple database systems support.

Re: On database systems

Reply #1 – May 17, 2013, 09:22:42 am

Surely the whole point of adhering to MVC is that all the business logic is separate from the database handling, and that creating a new DB backend is essentially just to create a new set of Models?

As in the Controllers should have absolutely no idea how the data is gathered, merely that it is the data they are expecting to find.

Re: On database systems

Reply #2 – May 18, 2013, 06:16:08 pm

Fully agree on controllers. Queries and such, how the data is gathered, indeed, are absolutely not the job of the controllers. That's one of the most important aspects of the separation we're targeting, and this is one of the many results of.

OTOH, I cannot support exactly the statement that: switching the db backend has to mean "a new set of Models". Conceptually, or architecture-wise, this statement is not true. It shouldn't change models either. :)

But there are at least two things that matter for Elk here:
1 - we don't have a database abstraction layer. We have a database layer, with some level of abstraction but I wouldn't call it abstraction layer. :) Sure, we could say it abstracts away an INET_ATON for PostgreSQL, for example, but that's not yet an abstraction layer. It does a job, but indeed I expect that for better behavior of a database backend, more than this layer needs change in Elk: some .subs need it or can benefit from it, too.
2 - we don't have exactly models yet. Not model classes - so an object oriented model. Nor a set of functions, "self-containing" enough (such as, a clear set for CRUD only). Not exactly; the files contain much more: any database work, built on our layer.

If we'd have at least one of two or both, then models would be self-contained units of behavior which use at their turn, some functions to trigger database updates; but models wouldn't use raw queries which may need change (or queries with still a bit too much raw SQL; ours isn't raw, but it isn't abstracted away either).
In that case, models would not need change, when implementing even a very different database backend.

In Elk, I'd like to see at least the second part of point 2. (and probably at some point, point 1). Personally, I disagree to even target object oriented models (at the very least, for timeline-related reasons), but the second part is becoming true with the current work: from .subs, I expect we'd have a set of functions which do the basic creation/reading/update/removal of some kind of 'entities'. Like createPost() exists since long. It's not a simple insert a row in the messages table, it contains its own logic: a model logic, a processing to create a thing we know as a post. (a complete enough post.)

There is already (and I expect it will be more) a difference, in .subs, between a simple utility function implemented around some query, and those functions which serve to build and manipulate the entities of the application: posts, topics, personal messages (with all updates, not just a row) etc. The latter become models. (in a more strict sense.)
For the purpose of this discussion, I'd say we can think of .subs as having at least two (or more) parts, conceptually. Some are [becoming] main functionality of models, others are serving specific database-work needs.

Why do I say this? Well, two reasons: on one side, personally I should be careful to call yet our .subs Models. I did on the wiki (well, I set quotes around the role of "model", lol, and maybe I should slightly update), and of course they're on the model-side of Elk. Yes: they're not routing, controllers, templating, sessions handling, etc. They're the model-type part of Elk. But they're different than standard strict design "Models", and that's ok, just need to make a difference.

And, on the other hand, models shouldn't change with db backend. That's partly the job of database abstraction, partly the job of implementing models such as they don't have (ideally) those queries which may change. (I'd argue they shouldn't have any queries at all, actually, but we're not aiming that.)
There are already proposals and PRs to refactor some .subs functions, to use the simpler .subs functions available. That might be a good idea, and go in the same direction: some more complex Model-ish behavior, may be implemented with the help of the simpler atomic pure-database-work functions.

Back to the database backend: I think some .subs functions (i.e. some simply-database-work functions in the subs files, such as the PostgreSQL example above or others) will always need to be re-implemented. Perhaps because of some SQL detail, perhaps because of performance. Even with a better abstraction layer, we will not get (IMO) to entirely abstract away the complexities of some queries. Not if we want to have the best behavior of the forum on another db system. For a forum, there are complex queries for which abstraction doesn't quite cut it. It isn't a blog - content, comments, users, pingback, done. O:-)
But not all, not even most. One shouldn't need to reimplement a new set of models. Along, of course, with controllers, templates, and the rest.

Re: On database systems

Reply #3 – May 18, 2013, 06:30:56 pm

QuoteOTOH, I cannot support exactly the statement that: switching the db backend has to mean "a new set of Models". Conceptually, or architecture-wise, this statement is not true. It shouldn't change models either.

I respectfully disagree. I see where you're coming from, but I can give you a brilliant example of where I'm coming from.

Some time ago I was asked about building MS SQL Server support for SMF, and I came to the conclusion that it isn't possible, at least not in the way PGSQL/SQLite are supported.

MS SQL Server doesn't support LIMIT x,y the way it is used. It's not an OLTP server by design, so it doesn't do OLTP style queries, but OLAP data warehousing by design - it eschews LIMIT x,y and expects to offer only TOP x or SET ROWCOUNT.

I doubt you'll be targeting MS SQL Server, but it's an example of what I mean: the only way to support it is to provide an alternate set of models.

In your refactoring, yes, you'll probably phase out FIND_IN_SET in favour of alternative structures that use conventional table joins, but having a complete set of alternate models allows you to alter the queries per backend for performance. Abstraction is all well and wonderful, but it's not exactly ideal in performance terms, it's really down to how important it is to you to offer that level of flexibility.

Personally I don't give two hoots about DBs that aren't MySQL, and that's not 'just' because I'm more familiar with it (there's the one maintenance branch aspect, the fact that its performance characteristics are well understood, it's more available etc.) but that's just me.

I see where you're going and in an ideal world I'd agree with you that you shouldn't need a complete set of models. In an ideal world, all DBs would use the same queries and abstraction would just magically fix the differences, but the reality just isn't quite like that. As long as it's possible for a specific backend to replace specific models if and when needed, for peculiarities to that backend, you'll cover what's actually needed.

Re: On database systems

Reply #4 – May 19, 2013, 02:11:05 am

This is the fun of abstraction - you can abstract everything and not be right or wrong. It is all preference. You can add a data abstraction layer and then add a NoSQL layer and a SQL layer then add a Redis driver or an Oracle driver. You can add a class for every object and then add a class to access a specific table then add a class for tables that link to tables. You can add a SQL abstraction layer so you don't write out "SELECT * FROM table" anymore, now you write $db->getSQL()->select($columns->(a, b, c), $where->('1=1'));. You can do all of this and not be wrong in any of it. It is all "right" because you're removing coupling. In the end though, how much more development does that add and how much harder is it for you to find WTF you're looking for?

One tip: in removing joins, you make abstraction a hell of a lot easier. I like ZF2's table abstraction when you're not dealing with joins. I like having a function or a couple of functions in a central location that deal with the database. Then I can use those in different models.

I think it is stupid to abstract SQL. Takes way too much development and you never really test on the other platforms. I don't think anyone that says "supports MySQL, PostgreSQL, Oracle, and MSSQL" really supports it as well as whatever the developers usually use. If you really want to support them, stick with the ANSI standards and don't use any performance enhancements. Let the performance enhancements happen at the driver level (maybe give a hint if you can).

My models usually look something like this class myModel { constructor($dic); getFoo() { if !$dic->get('cache')->get('foo') ? $dic->get('db')->getTable('foo')->getAllById($id); $cache->set('foo', $result); } }

Re: On database systems

Reply #5 – May 19, 2013, 02:19:24 am

I see no distinction between "database abstraction layer" and "database layer." If you're implying what I think you are, I would call that a SQL abstraction layer.

The subs are far from being "models" yet. They are what I would still consider helper functions. Some are farther than others, but I wouldn't consider them a model until they handle an "object" (as any noun, not a class) as a single entity. So, you can have OOP without classes and you can have classes without OOP. You can't have OOP without dealing with a singular object. It can be a global variable (nasty) but it needs to have be in a single variable and not have 100 different ways it is editable in different locations. The easiest way to contain it is in a class then have that class delivered as a service. Some standard services would be: user (current), theme (current), database, cache, settings.

Re: On database systems

Reply #6 – May 19, 2013, 03:14:35 am

Arantor,
You're right. But I think we don't disagree at all.

I actually had in mind SQL Server example too, on this.
I just don't use the concept 'models' for database-system-dependent classes or functions. Or I try not to use it, as much as possible.

Allow me to put it this way: Elk 1.0+
./sources/controllers
./sources/controllers/PersonalMessages.controller.php
./sources/subs
./sources/subs/PersonalMessages.subs.php
   createPM()
   deleteMessages()
   markMessages()
   convoMode()
   ...

Code: [Select]

function convoMode()
{
  // init stuff
  // query
  // gather data, return it
}

./sources/database
// strictly necessary layer

Elk 2.5+
./sources/controllers
./sources/controllers/PersonalMessages.controller.php
./sources/subs/
./sources/subs/PersonalMessage.php
   createPM()
   deletePMs()
   updatePMs()
   listPMs($mode)
./sources/subs/PersonalMessages.subs.php
   addMessages()
   deleteMessages()
   markMessages()
   convoMode()
   ...

Code: [Select]

function convoMode()
{
  // init stuff
  // query
  // gather data, return it
}

Elk 3.0+
./sources/controllers
./sources/controllers/PersonalMessages.controller.php
./sources/models/
./sources/models/PersonalMessage.class.php
class PersonalMessage
{
   function create()
   function delete()
   function update() // mark, unmark, label, etc
   function list($mode)
}
./sources/subs
./sources/subs/PersonalMessages.subs.php
   addMessages()
   deleteMessages()
   markMessages()
   convoMode()
   ...

Code: [Select]

function convoMode()
{
  // init stuff
  // query
  // gather data, return it
}

// /sources/models uses /subs. It contains the logic to update an entire PM, or to retrieve it according to $criteria, and uses /subs for raw data gathering/update/etc.
./sources/database/
// abstraction layer improvements
// example of potential addition to the database layer (to .subs):
https://github.com/elkarte/Elkarte/pull/206/files#L0R470
Note that the function knows about tables and data, not about posts, topics, boards. No content. Looks to me like Ema needed a database abstraction layer, and made that.

Now. If I switch the database backend to PostgreSQL or SQL Server:
Do I need to create a new set of models? No.
Do I need to update some /subs functions? Possible.
Does a better abstraction layer shield me from updating /subs? Yes, more than today, but not enough for tricky stuffs or performance wants and needs, or some very specific SQL. And that's ok: /subs is still there.

What I am saying, is that from subs, we will eventually detach more abstracted functions, which are becoming models. Their implementation might not depend forever on database backend.

Does switching database layer has to mean "a new set of models"? No.
The interface of PM model is not dependent on implementation.
The implementation of PM model comes to deal with internal logic of updating/creating/etc a PM (no raw SQL or semi-raw SQL), and uses both db abstraction layer and/or ./subs lower-level database functions as needed. Even, business logic can be part of a simple PM model class. (not permissions or stuff, but logic as in logic of triggering updates after an operation for example.)

ETA: whether this off-hand Elk roadmap will entirely be workable (entirely = with no relevant exceptions) or a good way to do it will be acceptable considering all trade-offs, it's a different question IMHO. If not, then I will simply resume to not exactly calling .subs Models by the book, but model-side of Elk, or, that part of Elk that plays in a way the role of "model". Please consider it small and tricksy personal choice on terminology. I don't think we disagree at all on the content of the issue here.

Re: On database systems

Reply #7 – May 19, 2013, 03:32:33 am

What's the difference between PersonalMessage::delete() and deleteMessages()? Seems like a huge waste to break them up. Seems like the model classes are just an interface with CRUD methods but not really where the object body is at.

PersonalMessage should be PersonalMessages since it is an object pool (even though you don't want it to be individual classes for each message, each message can still be considered an object).

Another way to make db abstraction really easy is by naming all of the requests and making sure everything is parameterized. Then anyone can use hooks to change the request/result.

Re: On database systems

Reply #8 – May 19, 2013, 04:01:39 am

Yes, I agree. But it's just an example off-hand, Josh. If there is no difference in practice, there won't be that particular set of methods/functions. I'm not writing the real-world interfaces nor implementation of Elk 3.0 right now, and not in a post.

Can we go back to 1.0 now?

Re: On database systems

Reply #9 – May 19, 2013, 04:23:54 am

Can the next version have a PSR compliant architecture - drop the .controller.php and other garbage?

Re: On database systems

Reply #10 – May 19, 2013, 05:23:24 am

Well, I don't consider our specific model garbage, and I don't consider PSR and JavaBeans the answer to all prayers.

.controller.php, .subs.php, .template.php are named that way for a reason (well, more). This software has .template.php since forever, and it had -Subs.php.
=> The Elk naming for files makes those consistent, and consistent with a MVC-type of architecture of core code design.
Yes, we need a custom autoloader, but we need a custom autoloader anyway.

But don't ask only me on this. (I'm actually flexible, or at least I have been: after all we have tried PSR-0 on smCore/3.0 previous work. Only one developer liked it, as far as I am aware. Liked enough to feel comfortable to work with it, I mean.)

Check out: http://www.elkarte.net/index.php?topic=74.0
https://github.com/elkarte/Elkarte/issues/356. Fair note: I intend to reject Inter's request now. (when I get to it, eventually after we have addons and file another for web folder, which is a good idea). We can re-open/re-discuss it for next versions.

On a another note. Check out also years-history of SMF pre-2010 intentions and implementations on sm.org site. This direction is embedded in the codebase, and deeply related to the identity of the software, if you ask me. Of course, YMMV.