Database Type

Languages have type systems... arrays, integers, strings, etc.

But we don't have a type that we can declare which is a Database.... The database schema is duplicated across several programs and we end up in a horrible mess. People abuse lists, hashes, associate arrays, pascal records, C structs etc. as their databases. People map their databases to their hashes and lists... and even objects. Yes people even abuse OOP and map objects to their database... which is duplication.

It would be nice if the database was an actual type that we could declare and use in our program. Or at least it would be nice if the database schema were maintained in one single place instead of all this mapping and DoubleDipping. Mapping duplicates and violates OnceAndOnlyOnce. Automatic mapping is better than manual mapping at times, but it would be better to reduce the mapping to nothing... and just declare the schema and maintain the schema as if it were a native type (with the server being able to modify the schema through TCP/IP in one place too). The problems are: database lifetime.. who creates the database? The program or a person somewhere else? How do all the programs get updated with the same database schema? (global schema would be good...)


Here's the concept in code:

  unit schema;
  // Everyone who has access to the database using SQL also modifies this schema
  // this schema can be controlled remotely... not just in a text editor locally
  // This schema is global in that two programs do not redeclare it over and over
  var
    d: databaseSchema (
         customers:
           id: int;
           name: string
         otherTable:
           foo: int;
           bar: int
    );

end.

program accessTheDb; uses schema, // we declared the schema once.. othermodule; begin // now we access a customer print(customers.name where customers.id = 123); end.
Every time the database schema is modifed on the SQL server (or some form of database server), the schema in our database type module (unit schema) is updated. Our type may/could even have more detailed information such as all the foreign keys, indexes, etc. We need to access schemas more robustly than our current solutions that are really just error prone maps.

The above schema could be the database schema itself on the server... and our programs query for this unit when we use it. The real problem with today's technology is the database has a separate life than the program... so we create maps to the database and duplicate our schemas in several programs.

The problem may be impossible to solve perfectly (we can dream, and make compromises) and the above may be an idealistic pipe dream or reinventing of existing mapping techniques...

It may also be that a unified DatabaseLanguage? built into the program must be used... which means every programming language will have to use some standard built-in database language that isn't so possible to implement in each and every language natively. i.e. reinventing SQL.

This page is here to brainstorm ways of how to get the database into our programs? Currently, the database is too far away from us and too hard to use. When I say too far away from us... I mean it is like pulling teeth just trying to get databases connected to our programs. Whereas if I want to create an Array or a Hashlist it is really easy. That's because arrays and hashlists are types... hence the page title of DatabaseType.

Toward what you're suggesting, DateAndDarwen's abstract "D" language specification in TheThirdManifesto is not just about creating replacements for SQL, but about defining the desired characteristics of general-purpose programming languages that incorporate persistent relations (via relation-valued variables, aka relvars) as first class objects. There's Microsoft's LINQ, which is like Embedded SQL done better. There is also ExtendedSetTheory, which implies unifying containers, collections, sets, persistent constructs such as files, and even external resources under one set-oriented algebra. Implementations based on these ideas, in one way or another, can easily overcome simple database integration problems within a given language, but not a broader issue that is certainly not unique to database application development: Any form of distributed application development or maintenance becomes problematic when changes to, or definitions of, a centralised, structurally-mutable service need to be deployed to multiple clients or peers. This problem is, of course, faced when developing database applications, but also exists in any environment based on client/server architectures such as CORBA, J2EE or WebServices. It centers around the need to maintain and/or deploy live applications from multiple code bases. In short, there typically must exist separate sets of client code and server code, which must be maintained and kept in agreement with each other over the entire application lifecycle. This is awkward. I believe a step in the right direction would be full-featured general-purpose programming languages designed for a distributed environment that: We should extend Codd's 11th Rule for Relational Databases ("The distribution of portions of the database to various locations should be invisible to users of the database.") to distributed systems in general: The distribution of portions of the applications to various locations should be invisible to users and developers of the application. In short, developing and deploying distributed applications needs to be indistinguishable from developing and deploying non-distributed apps. Except where proven impossible or impractical, of course...

I suspect the third bullet point, above, will be the most difficult one to get right.

-- DaveVoorhis


I just realized recently that a RelVar is actually an instance of a DatabaseType (or, maybe more specifically, a Table Type)... for example, all variables are instances of a type (for those who think in types). So in fact a relvar is really an instance of some sort of grander type.. or RelType?. The specifications of the RelType? or DatabaseType are the schema, the constraints, etc. The types that determine attribute restraints (column restraints) are just like how a struct or record has sub fields with more type specifications per each struct or record field.

No, a relvar is an instance of a variable. A variable may possess a type which constrains the values it may contain, but a variable is not a type.

{Well it is a 'rel'var, not just any old 'var'. That implies it is named at the top level of a relational database, and that it names a mutable relation (if not mutable, not really a 'variable'). I would note that relation does not equal table... except in practice. Tables can't be infinite. Relations - subsets of cartesian products of potentially infinite domains - certainly can be.}

It is, in fact, "any old 'var'" that happens to contain a value which is a relation, and which is often but not necessarily persistent, as a relvar (short for "relation-valued variable" or "relation variable") may be temporary or local. A relvar is not necessarily at the top level of a relational database (i.e., it may be a local relvar), though the top level of a relational database consists of persistent relvars. Variables are named, relations are not. See RdbRelVar. By the way, the cardinality of a relation held by a relvar is immaterial to the definition of "relvar".

{A 'var' that necessarily identifies a relation isn't just "any old 'var'" - the proof being trivial, that I can point at a 'var' at random (e.g. one identifying an 'int') and it isn't a relvar. As far as the 'naming' goes, from my viewpoint variables never 'contain' values (values being immaterial and entirely un-containable). Variables do identify (or 'point at') and thereby give name to values - generally a temporally limited name based upon the lifetime of an object containing a representation of said value and an associated decodec. And I do know that the cardinality of a relation held by a relvar is immaterial to the definition of "relvar". That's why it would be a 'wrong answer' to say that a relvar has some sort of 'table type', because tables necessarily have finite cardinality, they being defined in terms of specifying exactly what they contain. A relvar identifies a relation, which could potentially have infinite cardinality, which is sufficient to say that it is not a 'table type'. (The statement to which you're responding was targeted originally at someone else rather than you, of course.)}


Re: "The main problem has been that most languages lack a very flexible type-system" [emph. added] -- It's one of the reasons why "tag free" languages seem more suited to databases in my opinion: you don't have to think that much about mapping the language's types to the database's types except when you do transformations. I'd estimate that typically 80% of the "cells" retrieved are simply passed along as-is to I/O or to back to the same or other tables. If the API's allow them to go out the way they came in, then we don't have to think about types much: we just marshal them around based on marshaling business logic and don't mess with most of them on a value level. (The custom biz apps I see tend to be marshal-heavy and transform/compute-light.) We don't really need a MirrorModel of the schema, which is arguably a violation of OnceAndOnlyOnce. One could argue this is fragile per schema change, but even in a type-heavy compiled language we'd probably have an app crash anyhow if the schema changes. Typically a DBA adds columns rather than changes their type or removes them. Typically both types of apps would be "blind" to column additions. Changes in the semantics or usage of columns is fairly common, but I'm not sure heavy-typing on the app side will detect significantly more of these than a tag-free approach. Usually these kinds of errors are detected when the app expects something a certain way but doesn't get it, and either pukes or produces garbage. --top

What are "tag free" languages, and how are they more suited to databases?

I've seen shops design databases under a policy that only the 'text' data type is allowed. That delivers a certain flexibility until you need to share the database among many users and applications and expect to maintain database integrity. Implementing constraints and database-side processing is onerous and fraught with peril when the only type is a character string, and it results in poor performance when the database grows.


I'm not sure allowing a column of type "database" is a good idea; it can create a nested mess. A reference to a table, I'm okay with (link). A column or "cell" with a database "inside" strikes me as playing with fire. It's putting fire in the match instead of using matches to create (external) fires. I'd like to see demonstrations of the benefits first. It risks creating a HierarchicalDatabase when we want set-oriented relational. Relationships are typically treated on an as-needed basis in relational. Nesting tends to force a hierarchical relationship that is more difficult to de-nest our view of when we don't want a nested view.

A column of type "database" is no different from a column of any other type. If we allow a column of type "image", or type "Excel file", or type "integer", why not type "database"? You don't have to use it if you don't need it, but if you do need it, what's the alternative?

In relational the default view is more or less "flat" and we add projections to give us specific data-structure-like views, not the other way around. Starting from a "flat" base is a better starting point in my opinion, because you will often have to make two transformations if you don't want a tree view: first flatten your view, and then re-project it into your desired structure. Think of the hub of a wheel where the "flat" view is the center. If you allow or encourage starting at the "tree" spoke, then to get a different view, you'll often have to first drive to the hub, and then drive to the desired spoke. Most transformations will be a shorter drive if you start at the hub. --top

In the RelationalModel there is no such thing as a "default view" nor is there any notion of "flat" or non-"flat". The RelationalModel consists of relations and a RelationalAlgebra for manipulating them. The types of the attributes of the relations can be any type, which implies that they can be relations.


See also RelationalLanguage, DoesRelationalRequireTypes, TighterAppAndDatabaseIntegration, EmbraceSql

EditText of this page (last edited November 11, 2014) or FindPage with title or text search