July 11, 2024

Structured Query Language for the Web - Schema

In the previous posts we have introduced a javascript database class and a json based sql variant - json-fetch. We can now implement a dynamic CRUD layer, but there is an important piece missing.

Let’s recap that we made the case that

Sql database is the right storage system
We need a way to safely build sql on the fly.

So what is the missing piece?
I’m sure you guessed it by now. Now that we have the sql for web, we need a database definition for the web a schema.

We shall first discuss why we need a schema at all.
Then we will make the case for web-friendly schema.
Finally, we shall look at code used to translate from schema to an actual set of database tables.

Why a schema

The javascript world obviously likes json, and historically the no-sql or schema-less storage systems had great json support.

Over the years the sql engines have introduced first class json support, however not everywhere. Queries have to be written in sql, which is not exactly developer friendly. And introducing a strict schema means, we can’t simply add a property to a json object and store it in the database. Instead we need to update the schema first. Which is another language to learn, the ddl. Often the database is “owned” by another engineer or team, and changing the database becomes a whole dance of tickets, emails, approvals, woo-doo and animal sacrifice.

As a javascript developer I just want to store and retrieve a bunch of jsons. Going with a relational database means, I have to learn SQL, DDL (or rely and wait on someone else). Then I need to invent code that will turn ‘store this json’ into, DDL updates and SQL commands. While making sure I don’t introduce sql-injections and other vulnerabilities. Looking at the problem from this angle, I can understand the lure of a no-sql… because you don’t need any of that code, drama and politics.

But, there is huge power behind an explicit schema. And there are massive performance benefits of proper schema - or relational model. Building joins and calculating aggregations in no-sql will be complex and even done right very slow (you have to read all the data over network, parse and load it into memory).

There has to be better way. And there is.

Machine readable schema

For a aatabase schema to be for the web it has to be json, so that it can be easily manipulated with javascript, serialized and deserialized and even read and written by hand.

The job of the schema-api (often called the metadata api) is to simplify the schema code a developer needs to write, down to a single api call - This json is our new schema, go make it happen.

The job of the schema is to express what objects (tables) exist in our db. What attributes (columns) each object has. And what the relationships are between objects (each customer object can be associated with multiple invoice for example. This is a 1:N relationship).

We can also add more details to our schema, that do not exist at the sql level (like whether an empty string is allowed as value or ranges for numeric and date type attributes). This way, we can make the schema deal with business requirements instead of technical sql details, instead of tables it deals with business objects.

A business object has a name and a set of attributes (eg a customer has a name, address, phonenumber) and relationships (customer has invoices). The schema is a rich description, at the same time it translates naturally to sql relational model of tables and columns.

Object

Object - represents a business object. and will be backed by a database table.
We will enforce some additional useful properties for all objects.
An object will always have

id attribute - the unique record identifier
name attribute - a human version of a unique identifier (customer name, or invoice number)
createdon, createdby, modifiedon, modifiedby attributes - to track some lifetime information about a record
rowversion - (MSSQL specific record versioning)
ownerid - the id of the record’s owner user (lookup to systemuser object)
currencyid - the id the record’s monetary currency (lookup to currency object)

Object Attribute

Object attributes has only 2 required properties

name - the name of the attribute (mapped to database column name)
type - the business type

Depending on the type there are more properties available: Type

string:
- flags: multiline (large text field), telephone, email, url. Options & WriteIn - available options for the attribute values, and whether the user can enter any value.
- max: the maximum length
integer:
- min, max, options
float, decimal:
- min,max,
- precision (number of decimal places)
Datetime:
- flags: date-only, utc
Lookup:
- target. This attribute specifies a foreignkey, it is used to implement a 1:N relationship. For example an invoice object will have a customerid attribute to associate the invoice with a customer. The target property will be set to “customer”
Money: this is a decimal attribute that is backed by two database columns. We need to columns to store the value in the record’s currency and the exchange-rate converted value in the database currency.

Schema create & schema upgrade

So far this has been theoretical, but how do we get from a json schema to a database of tables and columns. And even more importantly, how do we update an existing database structure, when the json schema changes.

How do we implement this mythical single api call that makes it all happen.

Good news is, this is a straightforward process.

Sidenote. We could synthesize (recover) the json schema from the database structure. All sql engines provide a way to query (introspect) what tables, columns (indices, procedures, etc…) a database contains. However there is great value to have a schema format independent of the db, that is fully controlled by us, that is a richer description of data constraints and that is extensible. The richness of json schema has the effect that the json -> db schema translation is lossy. Also keep in mind that our schema is not a super-set of the sql schema. We don’t support all sql types and options. We also don’t care about composite keys, stored procedures and a whole lot more that can be used in sql.

The code below implements the translation of the json schema. It will generate the commands needed to create database tables and columns for a given json schema. (This language is called the DDL - data definition language).

export class Schema {
	public generate(objs: IMetaObject[], options?: { sqlite: boolean }): string {
		let sb = "";
		for (let o of objs) {
			sb += Schema.generateObject(o, options);
		}
		
		return sb;
	}
	private static generateObject(o: IMetaObject, options?: { sqlite: boolean }): string {
		let sb = "CREATE TABLE " + o.logicalName + "(\n";
		
		Schema.ensureSystemFields(o);

		sb += Schema.generateProps(o, o.properties);
		sb = sb.substring(0, sb.length - 2) + "\n"; // remove last new line and coma
		
		if (options && options.sqlite && o.type === "many") {
			const fk = o.properties.filter(x => x.type === MetaPropertyType.Lookup);
			sb += ",UNIQUE(" + fk[0].logicalName + "," + fk[1].logicalName + ")\n";
		}

		sb += ");\n\n";
		return sb;
	}

	private static generateProps(o: IMetaObject, props: IMetaProperty[]): string{
		let sb = "";
		for (let p of props) {

			sb += "\t" + p.logicalName + " " + Schema.mapPropType(p) + (p.logicalName === o.primaryKeyName ? " PRIMARY KEY DEFAULT newid()" : "") +",\n";
			if (p.type === MetaPropertyType.Lookup) {
				sb += "\t" + p.logicalName + "Target NVARCHAR(200),\n";
			}
		}
		return sb;
	}

	private static ensureSystemFields(o: IMetaObject) {

		let pos = 0;
		const tryAdd = (props: IMetaProperty[], prop: IMetaProperty) => {
			if (props.findIndex(x => x.logicalName === prop.logicalName) < 0) {
				props.splice(pos++, 0, prop);
			}
		};
		if (o.type === "many") {
			tryAdd(o.properties, { logicalName: "versionnumber", type: MetaPropertyType.VersionNumber, flags: 0 });
			return;
		}

		tryAdd(o.properties, { logicalName: "id", type: MetaPropertyType.Guid, flags: 0 });
		tryAdd(o.properties, { logicalName: "name", type: MetaPropertyType.String, flags: 0, max: "200" });
		tryAdd(o.properties, { logicalName: "createdon", type: MetaPropertyType.DateTime, flags: 0 });
		tryAdd(o.properties, { logicalName: "createdby", type: MetaPropertyType.Lookup, flags: 0, targets: ["systemuser"] });
		tryAdd(o.properties, { logicalName: "modifiedon", type: MetaPropertyType.DateTime, flags: 0 });
		tryAdd(o.properties, { logicalName: "modifiedby", type: MetaPropertyType.Lookup, flags: 0, targets: ["systemuser"] });
		tryAdd(o.properties, { logicalName: "statuscode", type: MetaPropertyType.Integer, flags: MetaPropertyFlags.OptionSet, options: ["1", "2"] });
		tryAdd(o.properties, { logicalName: "statecode", type: MetaPropertyType.Integer, flags: MetaPropertyFlags.OptionSet, options: ["0", "1"] });
		tryAdd(o.properties, { logicalName: "versionnumber", type: MetaPropertyType.VersionNumber, flags: 0 });
	}
		private static mapPropType(p: IMetaProperty): string {
		switch (p.type) {
			case MetaPropertyType.DateTime:
				return "DATETIMEOFFSET";
			case MetaPropertyType.Money:
				return "MONEY";
			case MetaPropertyType.Decimal:
				return "DECIMAL(23,10)";
			case MetaPropertyType.Float:
				return "FLOAT";
			case MetaPropertyType.Integer:
				return "INT";
			case MetaPropertyType.Boolean:
				return "BIT";
			case MetaPropertyType.Guid:
			case MetaPropertyType.Lookup:
				return "UNIQUEIDENTIFIER";
			case MetaPropertyType.String:
				return p.max ? "NVARCHAR(" + ((+p.max)) + ")" : "NVARCHAR(max)";
			case MetaPropertyType.Binary:
				return "BINARY";
			case MetaPropertyType.VersionNumber:
				return "ROWVERSION";
			default:
				throw Error("Unknown property type:" + p.logicalName + " type:" + p.type);
		}
	}
}

The initial creation is only the first step. For real production work we need a way to non-destructively update the db schema, to add and remove objects and attributes.

The great thing about having the schema in json means, we can easily manipulate it in code. Conceptually going from schema old to schema new we need to do an new - old subtraction operation. The result of this operation should be a list of create, alter and drop DDL statements.

Clearly we must require that the resulting db schema will 100% match that, what would be generated when new schema was translated to db schema naively (the code above).

Let’s look at the various situation that can occur when updating old to new. First of all, it should be obvious that we can look at each object separately. The idea of the algorithm is:

For each object:

Object exists in new only -> create table
Object exists in old only -> drop table
Object exists in both -> compare attributes

Comparing attributes is really the same idea.

Attribute exists in new only -> create column(s)
Attribute exists in old only -> drop column(s).
Attribute exists in both - alter column (apart from max string length, type changes are destructive).

You can visualize this process as closing a zipper. If a teeth is missing on the right we remove it on the left, if a teeth is missing on the right we add it to the left.

export class Schema {
	public static generateUpdate(oldArr: IMetaObject[], newArr: IMetaObject[]) {
		let sb = "";

		oldArr = oldArr.slice();
		newArr = newArr.slice();

		for (let o of newArr) {
			let oldObjIdx = oldArr.findIndex(x => x.logicalName === o.logicalName);
			if (oldObjIdx >= 0) {
				Schema.ensureSystemFields(o);
				// diff table

				let oldObj = oldArr[oldObjIdx];
				let oldProps = oldObj.properties.slice();
				let newProps: IMetaProperty[] = [];
				let alterProps: IMetaProperty[] = [];
				for (let p of o.properties) {
					let oldIdx = oldProps.findIndex(x => x.logicalName === p.logicalName);
					if (oldIdx >= 0) {
						let oldProp = oldProps[oldIdx];
						if (oldProp.type !== p.type) {
							// Changed: leave in oldProps -> will drop	
						} else {
							// Not Changed: remove from oldProps
							if (oldProp.max !== p.max && p.type === MetaPropertyType.String) {
								// NVARCHAR size changed.
								alterProps.push(p);
							}
							oldProps.splice(oldIdx, 1);
							continue;
						}
					}
					newProps.push(p);
				}
				if (oldProps.length > 0) {
					sb += "ALTER TABLE " + o.logicalName + " DROP COLUMN\n";
					let dropColumnNames = oldProps.map(p => p.logicalName);
					dropColumnNames = dropColumnNames.concat(oldProps.filter(p => p.type === MetaPropertyType.Lookup).map(p => p.logicalName + "Target"));
					sb += dropColumnNames.join(',');
					sb += ";\n\n";
				}
				if (alterProps.length > 0) {
					for (const altProp of alterProps) {
						sb += "ALTER TABLE " + o.logicalName + " ALTER COLUMN " + 
							Schema.generateProps(o, [altProp]);
						sb = sb.substring(0, sb.length - 2);
						sb += ";\n";
					}
					sb += "\n\n";
				}
				if (newProps.length > 0) {
					sb += "ALTER TABLE " + o.logicalName + " ADD\n";
					sb += Schema.generateProps(o, newProps);
					sb = sb.substring(0, sb.length - 2);
					sb += "\n;\n\n";
				}

				oldArr.splice(oldObjIdx, 1);
			}
			else {
				sb += Schema.generateObject(o);
			}
		}

		for (let o of oldArr) {
			sb += "DROP TABLE " + o.logicalName + " ;\n";
		}

		return sb;
	}
}

And here a couple of definitions.


export enum MetaPropertyType {
	String,
	Integer,
	Decimal,
	Float,
	Money,
	DateTime,
	Guid,
	Lookup,
	Boolean,
	Binary,
	VersionNumber,
}

export enum MetaPropertyFlags {
	None = 0,
	MultiLine = 1, // string is multiline, display as large text area and allow new-lines.
	DateOnly = 2, // Only store date of a DateTime value
	UTC = 4, // Do not convert the UTC stored DB value to client Timezone
	Required = 8,
	Nullable = 16,
	OptionSet = 32, // the options property contains the available options.
	MultiSelect = 64,
	WriteIn = 128, // to be combined with OptionSet
	Json = 256,
	Url = 512, // string is url
	Email = 1024, // string is email
	Phone = 2048, // string is email
	Image = 4096, // string is base64 encoded image
	File = 8192,
	Duration = 0x4000,
	AutoNumber = 0x8000, // option has the name of the auto-number record or entityName + '.' + propertyName
	Age = 0x10000, // format date as age since now.
	Radio = 0x20000, // show options as radio buttons
}
export const MetaPropertyFlagMax = 18;

export interface IRichOption {
	value: string;
	label?: string;
	icon?: string;
	color?: string;
}

export interface IMetaProperty {
	logicalName: string;
	type: MetaPropertyType;
	flags: MetaPropertyFlags;
	min?: string;
	max?: string; // length for string
	options?: string[];
	richOptions?: IRichOption[];
	targets?: string[];
	precision?: string; // decimal places
	defaultValue?: string;
}

export interface IMetaObject {
	logicalName: string;
	type?: "normal" | "many";
	properties: IMetaProperty[];
}

export interface IMetadata {
	objects: IMetaObject[];
	partial?: boolean; // For importing whether to only add & update or replace all objects	
}

export interface ICurrency {
	id: string;
	name: string;
	symbol: string;
	code: string;
	precision: number;
	mode: number; // 0 - symbol|value, 1 - value|symbol, 2- code|value, 3-value|code
	exchangerate: number;
	isdefault: number;
	formatstring?: string;
}

Recap

Over the last couple of blogs we’ve built a json based layer over a (sql) relational database. This layer can describe and alter the database structure with json. It can also query arbitrary data with a json based query language.

Now we can work with a sql relational database without ever having to write sql. Moreover, since we can access the schema and build queries in code, we can write functions and UI components that will work with any business object!

For example we can build a generic permission layer that uses the schema information to grant or deny access to data records or to dynamically narrow a query (to return only records owned by the user or her teams).

Or we can build a generic grid component that given the schema and json-fetch query can display, sort, group, filter and even edit the resulting records.

Can we build a generic detail component, a calendar, chart or map component?

Can we build a generic query builder component?

In the following blogs, we shall build just that.

Structured Query Language for the Web - Schema

Why a schema

Machine readable schema

Object

Object Attribute

Schema create & schema upgrade

Recap

Continue reading

Making Addresses Work

SQL OR condition on joins

Make an appointment and book a meeting room