May 11, 2025

Sql OR conditions on joined columns

Queries with OR conditions are notorious for being hard to make fast.
Avoid them if you can, read on if you cannot.

WebSQL

Problem

Let’s backtrack to the problem. Our database is modeling a simple family - person (1:N) situation. A person belongs to a single family (household). To make it a little more real-world, we have another table person_relationship, which models how one person is related to another. (Sidenote, we only store relationships between people from different families (households).)

Now the client would like to choose a family and get a list of persons, who belong to the family or are related to someone from the family.

Solution?

The straightforward way to get the answer is… actually in the title of the post.

We query the person table and add an outer join on the relationship and inner join on the (other) person. Finally we check if the (root/from) person’s or the joined (other) person’s family is the one we look for.

SELECT p.id, p.name
FROM person p
LEFT JOIN person_relationship r ON r.from_person_id = p.id
LEFT JOIN person other ON other.id = r.to_person_id
WHERE p.familyid = @familyId OR other.familyid = @familyId;

When I run the query, it is not only slow, it also produces too many records. Argh.

Performance tip 1

You can safely always do this one. Push the relevant where condition(s) into the join on-condition.

We copy the family condition from the where condition onto the join. Then we replace the where condition with an existence check: id of the joined (other) person is not null.

SELECT p.id, p.name
FROM person p
LEFT JOIN person_relationship r ON r.from_person_id = p.id
LEFT JOIN person other ON other.id = r.to_person_id AND other.familyid = @familyId
WHERE p.familyid = @familyId OR other.id IS NOT NULL;

Performance tip 2

Both joins in our query are left (outer). The second join however could be an inner join relative to the relationship.

But, if we change the second join to inner, this will change the semantics! It effectively makes the first join inner too.

How do we tell sql we want to nest joins? This one is really not obvious, i found the proper syntax on stackoverflow thank you Gajus.

  1. Add brackets around the join. Open at just after the join keyword. Close at the end of the second join.
  2. Move the first join’s ON condition just after the closing bracket.
SELECT p.id, p.name
FROM person p
LEFT JOIN (
    person_relationship r
    INNER JOIN person other ON other.id = r.to_person_id AND other.familyid = @familyId
) ON r.from_person_id = p.id
WHERE p.familyid = @familyId OR other.id IS NOT NULL;

Too many records - use the Distinct Luke

Why too many records? I’m sure you already know this one. The person and person_relationship are 1:N. If Jane is related to Peter and Paul (grandma to grandsons) then we would get both rows in the output.
Ok, ok, yes we can slap a Distinct on it and its done.

SELECT DISTINCT p.id, p.name
FROM person p
LEFT JOIN person_relationship r ON r.from_person_id = p.id
LEFT JOIN person other ON other.id = r.to_person_id AND other.familyid = @familyId
WHERE p.familyid = @familyId OR other.id IS NOT NULL;

Too many records - apply yourself

Can we do better? Is there another way to tell sql we only care whether at least one relationship exists? The more obscure way is to use APPLY. This allows as to “join” with a subquery and control how many record the “join” produces. It is very useful if we need

  1. One particular result (for example the newest order for a customer)
  2. More then one, less then all (for example up get up-to three latest order's for a customer)

For our particular scenario, the next fix is much more obvious.

SELECT p.id, p.name
FROM person p
OUTER APPLY (
    SELECT TOP 1 r.from_person_id
    FROM person_relationship r
    INNER JOIN person other ON other.id = r.to_person_id
    WHERE r.from_person_id = p.id AND other.familyid = @familyId
) rel
WHERE p.familyid = @familyId OR rel.from_person_id IS NOT NULL;

Too many records - better solution exists

Experienced db admins are rolling their eyes by now, so let’s get to (one of the) point(s). If you care whether some row exists, then say so and use Exists (duh, i know).

The join becomes a subquery.

Just keep in mind that

  1. Exists cannot add any columns from the subquery to the results. Whatever you put into the select is ignored.
  2. If you need columns from the “join” then use apply.
SELECT p.id, p.name
FROM person p
WHERE p.familyid = @familyId
   OR EXISTS (
       SELECT 1
       FROM person_relationship r
       INNER JOIN person other ON other.id = r.to_person_id
       WHERE r.from_person_id = p.id AND other.familyid = @familyId
   );

Performance tip 3

The “exists subquery" tip made our real-life query fast. From 15 seconds to sub second. Until we added an order-by and top. The top is added automatically by our generic paging code.

Sql server decided it will again use the index for the order-by, because surely we will find the top 50 rows fast.
Nope…

Since you and me live in the real world, we know that the result of the query will be at most tens of people, usually less. The wast majority of people have a couple of siblings, two parents and a few grandparents.

That means to get the 50 results, we will go over the whole table. Which is not optimal to say the least.

Hint 1.

There is a trick we can use to nudge sql server towards the better execution plan.

OPTION(USE HINT('DISABLE_OPTIMIZER_ROWGOAL'))). 

This tells sql server to disregard TOP from optimizations.

However, adding a large NVARCHAR column to the select and we are back to scanning the whole table. Since you followed us this far the rabbit hole, is there a way to hint more forcefully? Not really, but we can put the large columns on a self join, thus hiding them from the optimizer.

Seeing how brittle this is, we can only advise this “fix” as the very last resort solution.

SELECT p.id, p.name, s.long_text_column_1, s.long_text_column_2
FROM person p 
	INNER JOIN person s ON (p.id=s.id)
WHERE p.familyid = @familyId
   OR EXISTS (
       SELECT 1
       FROM person_relationship r
       INNER JOIN person other ON other.id = r.to_person_id
       WHERE r.from_person_id = p.id AND other.familyid = @familyId
   );
ORDER BY p.name, p.id
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY
OPTION(USE HINT('DISABLE_OPTIMIZER_ROWGOAL'))

The real McCoy fix?

If we go back to the start and examine what the client asks for? Give me a list of

  1. All persons from family
  2. All persons related to someone from the family

This is not one query, but two…, of course. Sadly, any time you have an OR condition on different and especially joined columns, you have to leave the declarative world behind and imperatively tell sql what to do. And that is to write the query as a union of the first and second case.

-- First: Persons in the family
SELECT p.id, p.name
FROM person p
WHERE p.familyid = @familyId

UNION

-- Second: Persons related to someone in the family (but not in the family themselves)
SELECT other.id, other.name
FROM person_relationship r
INNER JOIN person from_p ON r.from_person_id = from_p.id
INNER JOIN person other ON other.id = r.to_person_id
WHERE from_p.familyid = @familyId -- AND (other.familyid IS NULL OR other.familyid <> @familyId);

Since it is much faster to use union all (does not remove duplicates), we should make the results non-overlapping (disjoined).

  1. For the second query, we can add the negation of the first query condition: person’s familyid is not equal to family or familyid is null. Sidenote: In our data we don’t add relationships within the family, which means these sets are disjoined already.
  2. The second query will again return too many results. So try one of the too-many-fixes and see which one works the best for your case. It could very well be that UNION will be the best one.

Our experience is with SQL Server, but seems like Postgresql behaves similarly.

Long story short

  1. Premature optimization is the root of all evil. Log slow queries, profile and then optimize the real world problem.
  2. Joins can bring too many rows we don’t care about. Apply or Exists are there for you.
  3. A query with OR is logically two or more queries. Most of the time you will have to tell sql this fact. It might feel like giving up, but UNION can be your friend. And surely we all have better things to do than arguing with the sql optimizer.

Happy hacking!

Continue reading

Make an appointment and book a meeting room

April 13, 2025
Make an appointment and book a meeting room

Do I live in the moment or is my past self controlling my life? I do things, he put in the calendar, call people he arranged meetings with. It makes me wonder, for a short while. Then I remember I need to start planning work for my future self.

CSV file - the data mover

March 09, 2025
CSV file - the data mover

Are you looking for a good way to move data from and to your app? You know, export and import? Do you wish for maximum compatibility? Nothing beats a good ol’ CSV.

Live Learning: A Shared Text Editor for Collaboration

March 02, 2025
Live Learning: A Shared Text Editor for Collaboration

Recently we have started on a journey of teaching programming. The setup is a little bit unusual, the students are beginners and the lessons will be online. To make things easier for everyone, we need a simple environment for students to write and test code and for teachers to see it and correct mistakes.

©2022-2025 Inuko s.r.o
Privacy PolicyCloud
ENSKCZ
ICO 54507006
VAT SK2121695400
Vajanskeho 5
Senec 90301
Slovakia EU

contact@inuko.net