sql server - Which order of joins is faster? -
i'm looking @ ms sql server database developed company expert @ database design (or i'm told) , noticed curious pattern of joins/indexes. it's upside down have done, wonder if has performance benefits (the db big).
the table structure (simplified pseudocode) is:
table jobs (about 1k rows):
- job_id [int, primary key]
- server_id [int, foreign key]
- job_name [string]
table job_history (about 17m rows):
- history_id [int, primary key]
- job_id [int, foreign key]
- server_id [int, foreign key]
- job_start [datetime]
- job_duration [int]
note denormalization server_id
in both tables.
what did is:
select t1.job_name, t2.job_start, t2.job_duration jobs t1 inner join job_history t2 on (t1.job_id = t2.job_id , t1.server_id = t2.server_id) t1.server_id = @param_server_id , t2.job_start >= @param_from , t2.job_start <= @param_to
and have indexes:
- jobs => (server_id)
- job_history => (job_id, server_id, job_start)
in other words, when select rows, first filter jobs jobs
table , relevant job_history
entries. db forced do, because of indexes.
what have done bottom-up version:
select t1.job_name, t2.job_start, t2.job_duration job_history t2 inner join jobs t1 on (t1.job_id = t2.job_id) t2.server_id = @param_server_id , t2.job_start >= @param_from , t2.job_start <= @param_to
and single index:
- job_history => (server_id, job_start)
so, basically, directly select relevant rows large job_history
, attached data jobs
table.
is there reason prefer 1 on other?
well, bit bored thought i'd re-create you. first setup (i'm using numbers table generate 1k , 17m rows, of course, random data , doesn't represent system :) i'm assuming theres clustered index on each table, though imply wouldn't have one.
use tempdb; go drop table if exists #jobs; drop table if exists #job_history; create table #jobs ( job_id int identity primary key ,server_id int ,job_name varchar(50) ); create table #job_history ( history_id int identity primary key ,job_id int ,server_id int ,job_start datetime default sysdatetime() ,job_duration int default abs(checksum(newid())) % 5000 ); go insert #jobs select server_id = n.n ,job_name = convert(varchar(50), newid()) dba.dim.numbers n n < 1000; insert #job_history ( job_id ,server_id ) select job_id = j1.job_id ,server_id = j1.server_id #jobs j1 cross join dba.dim.numbers n n < 17000;
now, case 1 (their way)
drop index if exists idx_job_hist on #job_history; create nonclustered index idx_job_hist on #job_history (job_id, server_id, job_start); dbcc freeproccache dbcc dropcleanbuffers declare @param_server_id int = 1234 declare @param_from int = 500 declare @param_to int = 1000 select t1.job_name, t2.job_start, t2.job_duration #jobs t1 inner join #job_history t2 on (t1.job_id = t2.job_id , t1.server_id = t2.server_id) t1.server_id = @param_server_id , t2.job_start >= @param_from , t2.job_start <= @param_to;
and case 2 (your way)
drop index if exists idx_job_hist on #job_history; create nonclustered index idx_job_hist on #job_history (server_id, job_start); select t1.job_name, t2.job_start, t2.job_duration #job_history t2 inner join #jobs t1 on (t1.job_id = t2.job_id) t2.server_id = @param_server_id , t2.job_start >= @param_from , t2.job_start <= @param_to;
and (totally non-conclusive, because system isn't system...) results:
the costs plan higher overall.
but rather artificial exercise prove point - run plans, answer - depends.
(thanks excuse play this, fun :)
Comments
Post a Comment