June 26, 2024

Structured Query Language for the Web - 2

Wouldn't it be nice to query your SQL database with json?
But, how can we translate a json base query to SQL?
Let's explore together.

WebSQL

In the previous blog we introduced the json based structure query language, json-fetch.
Now let's look at how we can translate it to SQL.

Definitions

Let's quickly repeat the definitions from the previous blog.

export interface Fetch {
	entity: FetchEntity;
	page?: number;
	pageSize?: number;
	count?: number;
}
export interface FetchEntity {
	name: string;
	attributes?: FetchAttribute[];
	filter?: FetchFilter;
	links?: FetchLink[];
	orders?: FetchOrder[];
	allattrs?: boolean;
}
export interface FetchAttribute {
	attribute: string;
	entityName?: string;
	alias?: string;
	aggregate?: "count" | "countdistinct" | "min" | "max" | "sum" | "average";
	groupBy?: "column" | "year" | "quarter" | "month" | "week" | "day";
	// quarter, month, week -> will group by year first implicitly.
}
export interface FetchOrder {
	attribute: string;
	alias?: string;
	descending?: boolean;
}
export class FetchFilter {
	type?: 'and' | 'or';
	conditions?: FetchCondition[];
	filters?: FetchFilter[];
}
export interface FetchCondition {
	attribute: string;
	entityName?: string;
	operator: FetchConditionOperator;
	value?: string;
	values?: string[];
	valueLabel?: string;
}
export type FetchConditionOperator =
	"eq" |
	"ne" |
	"gt" |
	"lt" |
	"ge" |
	"le" |
	"on" |
	"null" |
	"not-null" |
	"like" |
	"not-like";

export interface FetchLink extends FetchEntity {
	from: string;
	to: string;
	type?: "inner" | "outer";
	alias: string;
}

And, let's look an example query we can make.

let q = {
	entity: {
		name: "customers",
		attributes: [{ attribute: "name" }],
		filter: {
			conditions: [{ attribute: "name", operator: "like", value: "inu %" }]
		}
	},
	orders: [{ attribute: "name" }],
	links: [{
		name: "contact", from: "primarycontactid", to: "id", type: "outer", alias: "k",
		attributes: [{ attribute: "name" }, { attribute: "emailaddress" }, { attribute: "phonenumber" }],
	}
}

Query Validation

Since we are going to generate SQL it is vital we validate the json fetch.
We could make the validation even tougher, setting limits on each type of object (attributes, filters, links, etc.).
And we could also check whether the tables and columns that the fetch refers to actually exists.
However, we leave this to the SQL engine, for simplicity.

First, we will introduce a couple of convenience methods.
This provide a quick check for whether a value is an integer, a (short) text string, or is one of supported values.

const isInteger = (x: any) => {
	if (x === undefined || x === null)
		return false;
	if (x === 0)
		return true;
	return Number.isInteger(+x);
}

const isString = (s: string | undefined, maxLength: number = 100, isOptional: boolean = false) => {
	if (s === undefined || s === null)
		return isOptional;
	if (typeof s !== "string")
		return false;
	if (s.length > maxLength)
		return false;
	return true;
}

const checkInList = (s: string|undefined, list: string[], err: string, isOptional: boolean = false) => {
	if (s === undefined || s === null) {
		if (isOptional)
			return true;
		throw new Error(err);
	}
	if (typeof s !== "string" || list.findIndex(x => x === s) < 0) {
		throw new Error(err);
	}
	return true;
}

The isIdent method checks whether a string is an identifier. Identifiers are entity (table), attribute (column) and alias names.
We enforce a rather strict rule of lower case ascii chars, numbers and underscore.
It is better to be paranoid, than allow an SQL-injection attack through.

const identRegex = new RegExp("^[a-z][a-z0-9_]*$");
const isIdent = (identifier: string) => {
	if (!isString(identifier)) {
		return false;
	}
	return identRegex.test(identifier);
}
const checkIdent = (x: string, msg: string) => {
	if (!isIdent(x))
		throw new Error("Fetch Error: " + msg + " is invalid");
}	

Now the validation main method, will check all the json fetch properties and throw Error if there is something wrong.

The checkEntity function checks

  1. name - whether it is a valid isIdent
  2. filter - calls the checkFilter method
  3. attributes - checks the attribute name to we a valid isIdent
  4. orders - checks the attribute name to we a valid isIdent
  5. links - checks the link properties to be isIdent and calls checkEntity recursively.

The checkFilter function checks whether the recursive filtes are valid.

  1. attribute - checks the attribute name to we a valid isIdent
  2. condition - must be in the list of valid conditions.
  3. we do not check the value - it will be passed as parameter to the SQL engine, which will do a much stricter check.
export const validateFetch = (fetch: Fetch) => {

	if (fetch.count !== undefined && !isInteger(fetch.count))
		throw new Error("Fetch Error: count must be an integer");
	if (fetch.pageSize !== undefined && !isInteger(fetch.pageSize))
		throw  Error("Fetch Error: pageSize must be an integer")
	if (fetch.page !== undefined && !isInteger(fetch.page))
		throw  Error("Fetch Error: page must be an integer")
	if (!fetch.entity || !fetch.entity.name)
		throw new Error("Fetch Error: entity is required");
	
	const checkFilter = (o: FetchFilter) => {
		if (o.type !== undefined && o.type !== "and" && o.type !== "or")
			throw new Error("Fetch Error: filter type is invalid");
		if (o.conditions !== undefined) {
			if (!Array.isArray(o.conditions))
				throw new Error("Fetch Error: conditions are invalid");
			for (const c of o.conditions) {
				checkIdent(c.attribute, "condition attribute");
				//checkIdent(c.operator, "condition operator");
				if (!c.operator || typeof c.operator !== "string" || !/^[a-z\-]+$/.test(c.operator))
					throw new Error("Fetch Error: condition operator invalid");
				if (c.entityName !== undefined)
					checkIdent(c.entityName, "condition entity name");
			}
		}
		if (o.filters != undefined) {
			if (!Array.isArray(o.filters))
				throw new Error("Fetch Error: filters are invalid");
			for (const childfilter of o.filters) {
				checkFilter(childfilter);
			}
		}
	}
	const checkEntity = (e: FetchEntity) => {
		checkIdent(e.name, "entity name");
		if (e.filter !== undefined)
			checkFilter(e.filter);
		if (e.attributes !== undefined) {
			if (!Array.isArray(e.attributes))
				throw new Error("Fetch Error: attributes are invalid");
			for (const a of e.attributes) {
				checkIdent(a.attribute, "attribute name");
				if (a.alias !== undefined)
					checkIdent(a.alias, "attribute alias");
				if (a.aggregate !== undefined)
					checkIdent(a.aggregate, "attribute aggregate");
		
				checkInList(a.groupBy, ["column", "year", "quarter", "month", "week", "day"], "Fetch Error: groupBy is invalid", true);
				checkInList(a.aggregate, ["count", "countdistinct", "min", "max", "sum", "average"] , "Fetch Error: aggregate is invalid", true);
	
				if (a.entityName !== undefined)
					checkIdent(a.entityName, "attribute entityName");
			}
		}
		if (e.allattrs !== undefined && e.allattrs !== true && e.allattrs !== false)
			throw new Error("Fetch Error: allattrs invalid"); 
		if (e.orders !== undefined) {
			if (!Array.isArray(e.orders))
				throw new Error("Fetch Error: orders are invalid");
			for (const ord of e.orders) {
				if (ord.descending !== undefined && ord.descending !== false && ord.descending !== true)
					throw new Error("Fetch Error: order type is invalid");
				checkIdent(ord.attribute, "order attribute");
				if (ord.alias !== undefined)
					checkIdent(ord.alias, "order alias");
			}
		}
		if (e.links !== undefined) {
			if (!Array.isArray(e.links))
				throw new Error("Fetch Error: links are invalid");
			for (const link of e.links) {
				if (link.type !== undefined && link.type !== "inner" && link.type !== "outer")
					throw new Error("Fetch Error: link type is invalid");
				
				checkIdent(link.to, "link to");
				checkIdent(link.from, "link from");
				checkIdent(link.alias, "link alias");
				checkEntity(link);
			}
		}
	}
	checkEntity(fetch.entity);
}

Fetch to SQL translation

Now that we are reasonably sure that query is valid, we can proceed with the translation.

Entry Point

Let's get started with main entry point.
As you can see translate method takes a json fetch and returns the SQL and parameters for the SQL engine.

The inner code is rather simple, it delegates the work to dedicated functions and just concatenates the pieces.

  1. generateSelect - generates the list of columns the SQL query shall return
  2. generateFrom - generates the FROM and JOIN SQL parts
  3. generateWhere - generates the WHERE conditions
  4. generateOrder - generates the ORDER BY ordering of results.
  5. generateGroup - generates the GROUP BY groupping for aggregate queries.
export interface ValueParam {
	name: string;
	value: any;
	entityName: string;
}

export interface IDictionary<V> {
	[index: string]: V;
}

export class ValueList {
	public sqliteParams: boolean = false;
	public values: ValueParam[] = [];
	public makeParam(paramName: string) {
		return this.sqliteParams ? "?" : "@" + paramName;
	}
}

export class FetchToSql {

	public static translate(fetch: Fetch, options: {sqlite?: boolean}):
		{ sql: string, values: ValueList } {

		const condValues = new ValueList();
		condValues.sqliteParams = options.sqlite === true;
		const selectPart = FetchToSql.generateSelect(fetch);
		const fromPart = FetchToSql.generateFrom(fetch, condValues);
		const wherePart = FetchToSql.generateWhere(fetch.entity, "__PRIMARY__", condValues);
		const orderPart = FetchToSql.generateOrder(fetch);
		const groupPart = FetchToSql.generateGroup(fetch.entity);
		const keyword = fetch.distinct ? "SELECT DISTINCT " : "SELECT ";

		let sql = keyword + selectPart + fromPart + wherePart + groupPart + " " + orderPart;
		if (fetch.pageSize) {
			const skip = fetch.page && fetch.page > 1 ? (fetch.page - 1) * fetch.pageSize : 0;
			if (options.sqlite)
				sql += ` LIMIT ${fetch.pageSize} OFFSET ${skip}`;
			else
				sql += ` OFFSET ${skip} ROWS FETCH NEXT ${fetch.pageSize} ROWS ONLY`;
		}
		return { sql: sql, values: condValues };
	}
}

GenerateSelect

This method will take the attributes from the fetch main entity and all child links and generate a string of database colums with aliases.

The final alias for an attribute (eg, the name that will come back from SQL) depends on

  1. Attribute's explicit alias. If set, this is the final name. Must be unique for the query.
  2. Attribute is on the primary (root) fetch entity. In this case the alias is just the name of the column (that means no aliasing)
  3. Attribute is on a linked entity. The final alias is the alias of the link and the column name joined by __ (two underscores).

Json fetch provides higher level groupping operators for date field: year, quarter, month, week, day.
These have to be translated to low-level SQL.
Observe how we add the year value to the group by value and how we keep the week ordered (adding 0 prefix for weeks 1-9).

export class FetchToSql {
	private static generateSelect(fetch: Fetch) {
		let sb = "";

		FetchToSql.visitLinks(fetch.entity, "", (e, alias) => {
	
			if (e.attributes) {

				for (let a of e.attributes) {
					if (sb.length > 0) {
						sb += ", ";
					}
					const prefix1 = (alias || "__PRIMARY__") + ".";
					const prefix2 = alias ? alias + "__" : "";
					const attrAlias = (a.alias || (prefix2 + a.attribute));

					let expr = prefix1 + a.attribute;
					if (a.aggregate) {
						let method = "COUNT(";
						switch (a.aggregate) {
							case "average": method = "AVG("; break;
							case "min": method = "MIN("; break;
							case "max": method = "MAX("; break;
							case "sum": method = "SUM("; break;
							case "countdistinct": method = "COUNT( DISTINCT "; break;
						}
						expr = method + expr + ")";

					} else if (a.groupBy) {
						expr = FetchToSql.makeGroupByExpression(a, expr);
					}
					
					attr = expr + " AS " + attrAlias;
					sb += attr;
				}
			}
		});

		return sb;
	}

	private static makeGroupByExpression(a: FetchAttribute, expr: string) {
		switch (a.groupBy) {
			case "year": expr = "DATEPART(YEAR," + expr + ")"; break;
			case "quarter": expr = "CONCAT(DATEPART(YEAR," + expr + "),'_',DATEPART(QUARTER," + expr + "))"; break;
			case "month": expr = "FORMAT(" + expr + ", 'yyyy_MM')"; break;
			case "week": expr = "CONCAT(DATEPART(YEAR," + expr + "),'_',RIGHT(CONCAT('00', DATEPART(WEEK, "+expr+")), 2))"; break;
			case "day": expr = "FORMAT(" + expr + ", 'yyyy_MM_dd')"; break;
		}
		return expr;
	}
	private static visitLinks(e: FetchEntity, alias: string, visitor: (entity: FetchEntity, alias: string) => void) {
		visitor(e, alias);
		if (e.links) {
			for (const link of e.links) {
				this.visitLinks(link, link.alias, visitor);
			}
		}
	}
}

GenerateFrom

Here we need to tell the SQL engine from which table(s) to return rows.
Observe that the primary fetch entity has the special alias __PRIMARY__.
This is needed because of self-joins.

You can also see in the code that the filter-conditions on a linked entity are generated inside the JOIN ON () statement, by calling generateWhere.

export class FetchToSql {
	private static generateFrom(fetch: Fetch, condValues: ValueList) {
		const entityName = fetch.entity.name;
		let sb = " FROM " + entityName + " AS __PRIMARY__ ";

		const handleLinks = (primaryAlias: string, links: FetchLink[]) => {
			for (const link of links) {
			
				const isOuter = (link.type === "outer");
					
				sb += " " + (isOuter ? "LEFT" : "INNER") + " JOIN " + link.name + " AS " + link.alias +
					" ON (" + link.alias + "." + link.to + "=" + primaryAlias + "." + link.from;
				
				if (link.filter) {
					sb += FetchToSql.generateWhere(link, link.alias, condValues, " AND ");
				}
				
				sb += ")";
			}
			for (const link of links) {
				if (link.links) {
					handleLinks(link.alias, link.links);
				}
			}
		};

		if (fetch.entity.links) {
			handleLinks("__PRIMARY__", fetch.entity.links);
		}
		return sb;
	}
}

GenerateWhere

This is the real meat of the translation. Since efficiently retrieving only the data we need is the reason we turn to SQL.

  1. The code will first check whether the filter is empty and bail out if so.

  2. Then it will go over the list of conditons and generate a list of SQL conditons (name LIKE 'abc%') joined by the filter type (and, or).

  3. Finally it will call itself recursively for each sub-filter and again join the generated SQL by the filter type.

export class FetchToSql {
	private static generateWhere(entity: FetchEntity, alias: string, condValues: ValueList, keyword = " WHERE ") {

		const isEmptyFilter = (f: FetchFilter | undefined): boolean => {
			return (!f || ((!f.conditions || f.conditions.length === 0) && (!f.filters || f.filters.every(isEmptyFilter))));
		}

		let f = entity.filter;
		if (!f || isEmptyFilter(f)) return "";

		let sb = keyword + " (";

		const handleFilter = (f: FetchFilter) => {
		
			let fop = (f.type && f.type === "or") ? " or " : " and ";
			let first = true;
			if (f.conditions && f.conditions.length > 0) {
				for (const cond of f.conditions) {
					if (!first) {
						sb += fop;
					}
					first = false;
					sb += "(" + FetchToSql.generateCondition(cond, entity.name, alias, condValues) + ")";
				}
			}
			
			if (f.filters && f.filters.length > 0) {
				for (const ff of f.filters) {
					if (isEmptyFilter(ff)) continue;
					if (!first) {
						sb += fop;
					}
					first = false;
					sb += "(";
					handleFilter(ff);
					sb += ")"
				}
			}
		}
		
		handleFilter(f);

		sb += ")";
		return sb;
	}
}

Now let's look at what the generateCondition method is doing.

We switch over the list of operators and generate the corresponding SQL equivalent.
We have to ensure we have the correct prefix for the condition attribute (for conditions on links).

And we use the ValueList.makeParam to generate the SQL parameter placeholder.
(MSSQL uses explicit parameter names, while SQLite works with '?' and simple indexing).

As a convenience we generate not-equal conditons as (not-equal or NULL) as this is what we humans intuitively assume should happen.

export class FetchToSql {
	private static generateCondition(cond: FetchCondition, entityName: string, alias: string, condValues: ValueList) {

		const getParam = () => {
			const pname = "P" + condValues.values.length;
			const meta = condValues.meta[cond.entityName || entityName];
			if (!meta)
				throw new Error("Object not found: " + entityName);
			
			condValues.values.push({ name: pname, entityName: meta.logicalName, value: value });
			return condValues.makeParam(pname);
		};

		const op = cond.operator;
		const prefix = cond.entityName || alias;
		const attrName = prefix ? (prefix + "." + cond.attribute) : cond.attribute;
		let sb = attrName;
		switch (op) {
			case "null": sb += " IS NULL"; break;
			case "not-null": sb += " IS NOT NULL"; break;
			case "eq": sb += "=" + getParam(); break;
			case "ne": sb += "<>" + getParam() + " OR " + attrName + " IS NULL"; break;
			case "lt": sb += "<" + getParam(); break;
			case "le": sb += "<=" + getParam(); break;
			case "gt": sb += ">" + getParam(); break;
			case "ge": sb += ">=" + getParam(); break;
			case "like": sb += " LIKE " + getParam(); break;
			case "not-like": sb += " NOT LIKE " + getParam() + " OR " + attrName + " IS NULL"; break;
			default:
				throw new Error("Unsupported operator: " + op);
		}
		return sb;
	}
}

GenerateOrder

Ordering of results is a simple afair. The only complication is if we want to sort by a linked entity attribute.

We will automatically sort by id for non-aggreate queries.

export class FetchToSql {
	private static generateOrder(fetch: Fetch) {
		let orders = fetch.entity.orders || [];

		const isAggregate = fetch.entity.attributes && fetch.entity.attributes.findIndex(x => !!x.aggregate) >= 0;
		if (isAggregate && orders.length === 0) {
			return " ";
		}

		if (!isAggregate && orders.findIndex(o => o.attribute === "id") < 0) {
			orders = orders.concat({ attribute: 'id' });
		}

		let sb = " ORDER BY ";

		for (const o of orders) {
			if (o.alias) {
				let aliasedOrder = o.alias;
				sb += aliasedOrder;
			} else {
				let name = "__PRIMARY__." + o.attribute;
				sb += name;	
			}
			sb += o.descending ? " DESC," : ",";
		}
		sb = sb.substring(0, sb.length - 1);

		return sb;
	}
}

GenerateGroup

Groupping of results is useful for aggregate queries. This is a very straightforward translation.

export class FetchToSql {
	private static generateGroup(entity: FetchEntity, sqlite: boolean = false, entityAlias: string = "__PRIMARY__.") {
		let sb = "";
		for (const attr of entity.attributes || []) {
			if (attr.groupBy) {
				if (!sb)
					sb += " GROUP BY "
				else 
					sb += ", "
				// Remember the SELECT alias link_alias + "__" + column_name is not available in group by. We have to fully qualify the name
				sb += FetchToSql.makeGroupByExpression(attr, entityAlias + attr.attribute);
			}
		}
		if (entity.links) {
			for (const link of entity.links) {
				sb += " " + this.generateGroup(link, sqlite, link.alias + ".");
			}
		}
		return sb;
	}
}

Wrap

As you can see a json-fetch language can be both simple, easy to use and straightforward to translate to SQL.
Of course, the code above is just the start, but we kept it simple, so you'll be able to adapt it to your needs.

If you have any questions, suggestions or corrections, please let us know!

Happy hacking!