May 29, 2024

Run code on schedule

Do you want to automate common worklows?
Do you need to send weekly reminders?
Do you require monthly sales totals?
Is it something else for each customer?

Scheduled action

Motivation

Every service will eventually evolve to the point where you need to run code at a specific time, usually repeatedly.

There are many reasons why this is needed. For example synchronizing data from other systems, like getting the daily money exchange rates from the central bank.

Another example is calculating various statistics, daily, weekly or even yearly. And then sending them by email to managers. This can be further extended by thresholds, eg running a specific action (email) when a calculated number is above or below a certain value.

One of the most common scenarios is automation. For example if a client started shopping, but the shopping-cart record is in the state open and wasn’t modified since yesterday, let’s send the client an email. Granted this is just an illustration. There are ready made services for sales and marketing automation with much more sophisticated trigger-actions workflows.

Finally, a great use of scheduled actions are reminders. Wha ever your business processes are, it is certain that there are deadlines. You can use scheduled actions to catch missed deadlines and notify the user (and/or the manager). For example: is the case open for one day without a response? Was the AAA-rated client contacted in the last 30 days?

The next step on the evolution ladder is to allow a customer to define the schedule and the action to be taken. For saas products the added complication is that the action must be properly secured. It mustn’t interfere with other customers. It also must not compromise the availability of the service. We have explored the topic of untrusted code safety and security in detail in this blog.
Quick recap, we want our customer to upload and execute virtually any javascript, but it is constrained by time, memory and access - only the customer’s data is reachable.

In this blog we shall look at how we can run code, trusted or untrusted, on schedule.

Defining the schedule

Conceptually this is really straightforward.
The schedule is a list of actions: when to execute and what to execute.
All we need is a runScheduledAction method that reads the list and runs the actions.

As always, the devil is in the detail.

Sidenote
You could use an external service or a OS component for the the above (like Unix Cron or Windows Task-Scheduler).
We believe implementing the functionality within the application is more flexible, easier to maintain, cleaner and more portable.
(There might be limits to the number of OS tasks, what OS user can create tasks, how the OS task scheduler can call our service, etc.)

Our implementation will be in NodeJS & Typescript.

The list of actions must be durable, so we store them in a sql table. Other options are nosql document or even a file, but multiple services might access the list in parallel so some form of locking will be needed down the road.

Each action has a scheduled-date and code to execute. We also want to mark when we execute the action and the result of the code.

The runScheduledAction method has a few steps.
Figure out what to run:

  1. It will read the list of actions, filter the not-yet-run and sort it by scheduled-date.
  2. Take the oldest action

Then decide:

  1. The schedule date is in the past - execute now.
    1. Was this a repeatable action? Add the next occurrence to the list of actions.
    2. Set an timeout of 1 ms - to handle the next action in the list.
  2. The schedule date is in the future - set a timeout for that time.
  3. There is no action. We are done

In code:

const runScheduledAction = async (orgName: string) => {
	let reschedule = false;
	let rescheduleTimeout = 1;

	try {
		const actions = await readNextAction(); // SELECT * FROM scheduledactions WHERE status IS NULL ORDER BY scheduledate
		const action = actions && actions[0];
		if (!action)
			return; 

		reschedule = true;
		const scheduled = new Date(action.createdon).valueOf();
		const now = Date.now();
		if (scheduled > now) {
			rescheduleTimeout = scheduled - now;
			if (rescheduleTimeout > 86400 * 1000 * 10) // must fit into int32
				rescheduleTimeout = 86400 * 1000 * 10;
		} else {			
			action.startdate = new Date().toISOString();
			action.status = "done";
			const recordsModified = await updateActionRecord(orgName, action);
			if (recordsModified === 0) {
				// race-condition, somebody else modified the record and is executing the action.
				// get the next action, timeout(1) will be called ijn the finally block
				return;
			}

			try {
				const scriptResult = await executeAction(orgName, action);
				
				action.enddate = new Date().toISOString();
				action.result = "ok " + scriptResult;
				await updateActionRecord(orgName, action);
			}
			catch (e) {
				action.enddate = new Date().toISOString();
				action.error = "Exception: " + e;
				await updateActionRecord(orgName, action);
			}
		}
	}
	catch (e) {
		logger("ERROR", "runScheduledAction", e);
	}
	finally {
		reschedule = reschedule || ORG_LOCK[orgName] > 1;
		ORG_LOCK[orgName] = 0;
		if (ORG_TIMER[orgName]) {
			clearTimeout(ORG_TIMER[orgName]);
			ORG_TIMER[orgName] = 0;
		}

		if (reschedule) {
			ORG_TIMER[orgName] = setTimeout(() => scheduleActions(orgName), rescheduleTimeout);
		}
	}
}

const ORG_LOCK: { [name: string]: number } = {};
const ORG_TIMER: { [name: string]: any } = {};

const scheduleActions = (orgName: string) => {

	// async code still executing?
	if (ORG_LOCK[orgName]) {
		ORG_LOCK[orgName]++;
		return;
	}
	ORG_LOCK[orgName] = 1;
	/* no await */ runScheduledAction(orgName);
}

To get the ball rolling, this method has to be called on the server startup. In a saas multi-tenant scenario, it actually has to be run for each tenant (if you store the action table per tenant, which you should;).

We also have to be really careful with running the code on multiple service instances (multiple servers) at the same time. Here we have two choices.

  1. Either always run the code (for a specific customer-tenant) only on one service instance (server, machine).
  2. The first step of Take the oldest action, must atomically remove the action from the list. In sql we can use optimistic locks. That means updating the action record as ‘done’ right away, but only if the modified timestamp is the same as when we read the action record. Thus if the update “fails” no record is modified, we know another service got the record first, so we can skip this action and try the next one.

Sleep, work, eat, repeat.

You might say, this is all nice and simple, but what about repeatable actions? How do the actions get on the list? Where is the code for the schedule next occurrence in the runScheduledAction method? Here we go.

First we need to add an identifier to each action - to connect it with its schedule - the scheduleId.

You have to decide (or let your customers decide) what to do if an occurrence is missed.
Because of an error, service unavailability or the action (or another one) run so long that it missed the deadline. For sales statistics, it is important to run the missed deadlines. For reminders, it is better not to spam users. Thus one size does not fit all, and it might be best to leave it configurable.

Then we need to define the recurrence pattern - when to repeat the actions. What are the options?

  1. Roll your own. Do not be mistaken, what looks like a simple frequency choice (daily, weekly,…) is actually very complex.
  2. Cron (format) - familiar to most admins, as it is the unix task scheduler.
  3. RRule - the calendaring standard - familiar to most users who ever created a repeatable calendar event in outlook or a similar tool. https://datatracker.ietf.org/doc/html/rfc5545#section-3.8.5.3

Being very flexible and hopefully familiar also to non-admin users, we use option 3.

Warning: the complexity of RRule allows to define an rule that will never happen! Make sure you catch this and won’t introduce an endless loop in your system.

Instead of creating our own parser we can rely on existing packages. For nodejs we can install and use the rrule package. https://www.npmjs.com/package/rrule/v/2.1.0

The author of the package helpfully provides a playground where your can experiment with various rules and see how they are serialized: https://jkbrzt.github.io/rrule/

Creating a good UX for recurrence rules is a topic for another blog. If you ever tried to create a more advanced repeatable calendar event you might remember it can get confusing really fast. Thus it is best to limit the options or to reveal them gradually.
A nice feature of the rrule package is the human readable rule description - which can be used as a mental check for the user. As a way to double check that the configured rule matches the intention.
Another friendly UX idea is to show the next date(s) the rule defines. Again to ensure that what the user configured matches her intention.

With the help of the rrule package, the next occurence method is only a couple of lines. We will keep the recurrence RRule in a sql table called schedules.


export const scheduleNextAction = async (db: DatabaseConnection, scheduleId: string, startDateStr?: string) => {
	try {
		const startDate = (startDateStr && new Date(startDateStr)) || new Date();

		if (!startDateStr) {
			// remove any scheduled 
			await db.executeNonQuery("DELETE FROM scheduledactions WHERE scheduleId=@P0 AND status IS NULL",
				[{ name: "P0", type: TYPES.NVarChar, value: scheduleId }]);
		}

		const schedules = await db.executeQuery("SELECT * FROM schedules WHERE id=@P0 FOR JSON PATH", true, 
			[{ name: "P0", type: TYPES.UniqueIdentifier, value: scheduleId }]) as DataTrigger[];
		const schedule = schedules && schedules[0];
		if (!schedule)
			return; // schedule was deleted. we are done.

		let rulePattern = schedule.rule;
		if (rulePattern && rulePattern.indexOf("UNTIL=") < 0)
			rulePattern += ";UNTIL=20340101T000000Z"; // add a maximum date!
		const rule = RRule.fromString(schedule.rule);
		const nextDate = rule.after(startDate);

		if (nextDate) {
			await createActionRecord(db, SYS_ADMIN, scheduleId, nextDate.toISOString());
		}
	}
	catch (ex) {
		logger("ERROR", "scheduleNextAction", ex);	
	}
}

const createActionRecord = (db: DatabaseConnetion,  user: {id: string}, scheduleId?: string, scheduledDate?: string) => {
	// 1. write action record to db
	// ...
	
	// 2. ask to run the schedule method again:
	scheduleActions(db.orgName);
}

Setting the clock

We will call scheduleNextAction whenever the schedule changes and when the action was completed.

Whenever a record is modified, the service will call this generic method. If a schedules record is modified (created, updated or deleted), we will call scheduleNextAction.

export const onRecordChanged = async (db: DatabaseConnection, user: {id: string}, operation: string, 
				objectName: string, id: string, prevRecord: any, record: any) => {

	if (objectName === "schedules") {
		await scheduleNextAction(db, id, undefined);
	}
	/// rest of the method, handling other situations...

This method is used to update action records. Completed actions have a enddate attribute set.

export const updateActionRecord = (orgName: string, action: IAction) => {
	const params = convertActionToSqlParams(action);
	await db.updateRowWithId("scheduledactions", [{ name: "id", type: TYPES.UniqueIdentifier, value: action.id }], params);
	
	// The enddate is needed so that we don't reschedule when we mark the action record as "done" before the code was executed.
	// if the action has a schedule -> add new occurence.
	if (action.enddate && action.scheduleId) {
		await scheduleNextAction(db, action.scriptid, action.startdate);
	}
}
				

End of times

Every mature service needs to provide a run-action-at-date feature.
It can be as simple as a a choice of predefined code and frequency.
Or it can be a customer parametrized query with results sent by email.
Or it can be as complex as dynamic - customer defined code.

In this blog we've implemented the method that chooses what to run and when.
And we explored the options for declaring repeated (recurring) actions.

We identified and discussed the common pitfals and edge cases that came up along the way.

I hope the arguments and/or the code will be useful, and if there is anything you'd add, remove or change, please let me know.

Happy hacking!