January 19, 2024

Blobs for fun and profit

A picture is worth a thousand words.
But it is also worth a million bytes.
Let's look at how we can utilize Azure Blobs for efficient multimedia and document storage.

Blob Storage

Motivation

Imagine you are building an app.
It wouldn't be a very interesting app, if there was no data:)
In the broadest terms we have two basic types of data.
And you will use both of them.

First is the structured data.

For example you will have a list of users.
That is a collection of items where each item represents a user that has a name, email, phone, birthday, etc.
You might have a collection of groups of users or a collection of bikes per user.
The common theme here is that we have items (objects, tables, entities, whatever you want to call it).
They can be related to each other (group-user, user-bike) and can have internal items (name, email,).

Second is the unstructured data.

For example you will want to have a profile picture for your users.
Now of-course the image data has an internal structure, but from the point of view of your app, it is a bunch of bytes.
We will take these bytes and put them in an <img /> tag.

Because we are not officers of the Bureau of Data classification, we don't call it unstructured data.
Instead we called it BLOB - Binary Large OBject.

Database

Databases were invented to help us manage structured data.
To find and store the data in an efficient and performant way.
These days all the databases have a way to deal with BLOBs, so you could in theory put all your data in a db.
But, there are good reasons not too:

  1. Money - cloud managed databases have expensive storage. Storing a photo in a DB can the 10x more expensive.
  2. Performance - BLOBs will grow you DB fast, making caching harder. DB engines try to work around this, but it requires tuning.
  3. Backup - You might not want or need to backup your blobs as often and make your backup needlesly complex.
  4. Access - For displaying images or documents we usually need a link. If the BLOB is in database, you will have to put it some temp storage.

So what is the alternative?
We will store BLOBs outside of the db and store a link to the BLOB in the database instead.
Easy peasy lemon squeezy.

Blob Storage

All cloud providers offer a blob (or file, bucket, bin) storage solution.
For this blog we will look at Azure Blob Storage.
But the concent is the same whether you want to use Amazon S3 or Google Cloud Storage.

Plan

We will build a trivial web app with a JS frontend and NodeJS backend.
The only functionality will be that a user can upload and view his profile info.

We assume that user's have to login. And of course their profile images must not be publicly accessible.
For simplicity we will not deal with the authentication code.

The frontend will upload a user record with structured and unstructured data as json to the backend.
The backend will remove the blob from the json and store it in Azure blob and store a link to the BLOB in the database.

The frontend will use the BLOB link to display the user's profile image in a <img> tag.

Frontend upload

Very simplified upload code:

const fileToBase64 = async (file) => {
	const promise = new Promise((resolve, reject) => {
		const reader = new FileReader();
		reader.readAsDataURL(file);
		reader.onload = () => resolve(reader.result);
		reader.onerror = error => reject(error);
	});
	return promise; 
}

const onUploadFile = async  (e)=> {
	const files = e.target.files;
	if (files && files.length > 0) {
		const f = files[0];
		const base64body = await fileToBase64(f);
		
		const user = { userName: "chuliomartinez", profileImage: base64body, fileName: f.name };

		await fetch("/api/uploaduser", {
			method: "POST",
			body: JSON.stringify(user),
			headers: {
				"Content-Type": "application/json"
			}
		});
	}
}

Backend upload to Azure Blob Storage

Here is the handling of the call in NodeJS.
To work with azure we import the @azure/storage-blob packages.
BLOBs are stored in the profiles container.
Each blob will have an generated name and content-based *extension.
Alternative is to use the original fileName and add some random bytes (because many users might try to upload a file called profile.jpg).

Note that while this is not shown in the code below, the profiles container is private.

import { BlobServiceClient, ContainerClient, ContainerSASPermissions } from "@azure/storage-blob";
import crypto from "crypto";

const connectToAzureBlob = () => {
	const connectionString = "DefaultEndpointsProtocol=https;AccountName=********";

	const blobServiceClient = BlobServiceClient.fromConnectionString(connectionString);
	return blobServiceClient;
}

export const AZURE_BLOB_PROTO = "azblob://";

const isBlobName = (name: string) => {
	return name && name.startsWith(AZURE_BLOB_PROTO)
}

const getBlobName = (name: string) => {
	if (isBlobName(name)) {
		return name.substring(AZURE_BLOB_PROTO.length);
	}
	return null;
}

const uploadBlobToAzure = async (cont: ContainerClient, data: string) => {
	
	const b = crypto.randomBytes(50);
	const s = b.toString("hex");
	let blobName = s;
	const typePart = (data || "").substring(0, 1000);
	if (typePart.indexOf("image/") > 0)
		blobName += ".image";
	else if (typePart.indexOf("application/pdf") > 0)
		blobName += ".pdf";
	else if (typePart.indexOf("application/vnd.openxmlformats-officedocument.presentationml.presentation") > 0)
		blobName += ".pptx";
	else if (typePart.indexOf("application/vnd.openxmlformats-officedocument.wordprocessingml.document") > 0)
		blobName += ".docx";
	else if (typePart.indexOf("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet") > 0)
		blobName += ".xlsx";
	
	const buffer = Buffer.from(data.substring(data.lastIndexOf(',') + 1), "base64");

	const client = cont.getBlockBlobClient(blobName);
	await client.uploadData(buffer);
	return AZURE_BLOB_PROTO + blobName;
}

interface IApplicationUser {
	userName: string;
	profileImage: string;
}

const saveBlobsToAzure = async (user: IApplicationUser) => {
	const containerName = "profiles";
	const top = connectToAzureBlob();
	const cont = top.getContainerClient(containerName);
	
	const data = user.profileImage;

	if (data && !isBlobName(data)) { // we have base64 data -> save it
		user.profileImage = await uploadBlobToAzure(cont, data);
	}
}

// might be usefull for testing
app.use(express.json({limit:"50mb"}));

app.post("/api/uploaduser", async (req, res) => {
	const user = (req.body as IApplicationUser);

	await saveBlobsToAzure(user);

	await saveUserToDatabase(user);

	res.send("ok");
});

Frontend display image BLOB

The code below checks if it got an azure blob link.
If yes, the link is transformed into a link that we can pass to the browser.

Now the raw link to the Azure Blob Storage is not visible to the public, because the container is private.
To solve this issue we will use a SAS Token.
You can read more about them here: SAS Tokens
We will append the token to the raw link and voila the browser can load the image.

Here is the NodeJS code to get the access token.


export const getAccessTokenForContainer = async (container: string) => {
	const top = connectToAzureBlob();
	const cont = top.getContainerClient(container);
	await cont.createIfNotExists();
	const sas = await cont.generateSasUrl({
		"permissions": ContainerSASPermissions.from({ read: true, write: false }),
		"expiresOn": new Date(new Date().valueOf() + (1000 * 86400))
	});
	return sas.substring(sas.indexOf("?") + 1);
}

app.get("/api/sas/profiles", async (req, res) => {
	// FIXME: at this point we should have an authenticated session!
	const token = await getAccessTokenForContainer("profiles");
	res.type("text/plain");
	res.send(token)
});

Now the image display part.

export const BLOB_PREFIX = "azblob://";

export const getFileNameFromBlobUrl = (url: string) => {
	const prefix = BLOB_PREFIX;
	if (url && url.startsWith(prefix)) {
		return url.substring(prefix.length);
	}
	return null;
}

export const getAzureBlobLink = (container: string, sasToken: string, url: string) => {
	const fileName = getFileNameFromBlobUrl(url);
	if (fileName) {
		url = "https://STORAGE_ACCOUNT_NAME.blob.core.windows.net/" + container + "/" + fileName + "?" + sasToken;
	}
	return url;
}

const showUserProfile = async (user: IApplicationUser) => {

	const sasResp = await fetch("/api/sas/profiles");
	const sasToken = await sasResp.text();

	const link = getAzureBlobLink("profiles", sasToken, user.profileImage);
	if(link) {
		document.getElementById("userProfile").src = link;
	}
}

Bonus code

Images are widely used, but still only one type of potential BLOBs we might want to use.
What if we want to display a document?

To show a PDF we only need a nice iframe.


const showPdfDocument = async (invoice: Invoice) => {

	const sasResp = await fetch("/api/sas/invoices");
	const sasToken = await sasResp.text();

	const link = getAzureBlobLink("invoices", sasToken, invoice.reportPdf);
	if(link) {
		document.getElementById("documentIFrame").src = link;
	}
}

But, what about all those Microsoft Office files? Can we do something about them?
Yes, we can. We can use Microsoft or Google provided online services for rendering office files.

The code below will append the document azure blob link to the MS or Google service.
Then load an iframe with the service url.

const getOfficeFileViewerUrl = (src: string, useMicrosoft: boolean) => {
	const viewer = useMicrosoft ?
		"https://view.officeapps.live.com/op/embed.aspx?src=" :
		"https://docs.google.com/gview?embedded=true&url=";
	const u = viewer + encodeURIComponent(props.src);
	return u;
}

const showOfficeDocument = async (invoice: Invoice) => {

	const sasResp = await fetch("/api/sas/invoices");
	const sasToken = await sasResp.text();

	const link = getAzureBlobLink("invoices", sasToken, invoice.reportWord);
	if(link) {
		const previewLink = getOfficeFileViewerUrl(link, true);
		document.getElementById("documentIFrame").src = previewLink;
	}
}

Sassy BLOBs

I hope I conviced you that BLOBs need their space and should not live in the database.
You should do it for the money-saving, performance-upgrade or the interesting features that BLOBs in cloud allow.

Enjoy your SAS(sy) tokenized BLOBs.

Happy hacking!