Guys,
I really need to know how to do a recursive join in SQL.
Imagine any recursive data-structure such as a file directory system.
I could have a ParentDirectory table defined as:
PKey, PName
1 , DirA
2 , DirB
3 , DirC
I could then have a ChildDirectory defined as:
FKey, CName
1, DirB
1, DirD
2, DirC
2, DirE
Given a specific directory from ParentDirectory such as DirA i want to know all its direct and indirect sub-directories, such that i get the following set:
DirA, DirB
DirA, DirC
DirA, DirE
DirA, DirD
As DirA includes DirB then DirB includes DirC and DirE so transitively they are sub-directories of DirA.
This is quite simple to do in a procedural language but its mind boggling to do in SQL for me. Can someone please help.
Note: Although im using directories for this example, the items in the child table could have multiple parents in the parent table.
Thanks
The first thing that I might suggest is to change your table structure to something more like this:
declare @.mockup table
( PKey integer,
parent integer,
dirName varchar(5)
)
insert into @.mockup
select 1, null, 'DirA' union all
select 2, 1, 'DirB' union all
select 3, 2, 'DirC' union all
select 4, 1, 'DirD' union all
select 5, 2, 'DirE'
select * from @.mockup/*
PKey parent dirName
-- -- -
1 NULL DirA
2 1 DirB
3 2 DirC
4 1 DirD
5 2 DirE
*/
This structure provides a lot of flexibility in that not only can it handle the output from your request like this:
;with subDirectory as
( select PKey as rootKey,
PKey as currentKey,
dirName as rootDir,
dirName as currentDir
from @.mockup
where parent is nullunion all
select a.rootKey,
b.PKey,
a.rootDir,
b.dirName
from subDirectory a
inner join @.mockup b
on a.currentKey = b.parent
)
select rootDir + ', ' + currentDir as [base/Subdirectory]
from subDirectory
where rootKey != currentKey/*
base/Subdirectory
--
DirA, DirB
DirA, DirD
DirA, DirC
DirA, DirE
*/
This table structure allows for more.
|||Another request that we have seen at this forum is to show the directory path. This can be done something like this:
|||declare @.mockup table
( PKey integer,
parent integer,
dirName varchar(5)
)
insert into @.mockup
select 1, null, 'DirA' union all
select 2, 1, 'DirB' union all
select 3, 2, 'DirC' union all
select 4, 1, 'DirD' union all
select 5, 2, 'DirE'
select * from @.mockup;with subDirectory as
( select PKey,
cast(right(' ' + dirName,5) + ',' as varchar(60)) as pathBasis
from @.mockup
where parent is null
union all
select b.PKey,
cast(a.pathBasis + right(' ' + b.dirName,5) + ',' as varchar(60))
from subDirectory a
inner join @.mockup b
on a.Pkey = b.parent
)
select reverse(substring (reverse(replace (replace (pathBasis, ' ', ''), ',', ', ')), 3, 70))
as tree
from subDirectory
order by pathBasis/*
tree
-
DirA
DirA, DirB
DirA, DirB, DirC
DirA, DirB, DirE
DirA, DirD
*/
Thanks for your input Kent. Its going to take me a while to understand that, as it is far removed from my initial simplistic schema.
Plus SQL just does not come easy or naturally to me.
The problem is a given item in the child table can have multiple parents in the parent table.
For example an employee who reports to multiple bosses, or a subclass that multiple inherits.
Can you give an example with the original schema, or are you saying like that it is not possible?
I cannot easily change the table structure.
Again is it possible to recurse on having to first join two tables - Its this i cannot get my head round?
Thanks for your help.
|||Yes, I can certainly put this together; however, I am afraid that I don't understand a metaphor. To me a sub-directory has a single parent. Moreover, I think that with the schema you have provided that it is possible to have circular mobius structures. Hang on, and I will put together your original request. Maybe somebody that already has it can post their version too.
I think the mobius structure issue is mute; I think either scheme can have mobius loops. Ignore that particular comment.
I have reached my end-of-day and I must leave ontime today; perhaps someone else can help you with this. What I can see from this that I don't like is that in some cases I seem to be joining based on the PKey and FKey but in other cases joining based on names -- and I think that is funky.
Hopefully someone else can complete this. Also, I would like other opinions on this joining by names.
|||Kent
Thanks Kent, it possible it could have loops - or the transitive closure as i like to call it.
Where say A -> A could come out. But yes lets not worry about this at the moment.
And Yes the directories is not a great example, just an attempt to map it onto an everyday familar object.
And you are right the parent has its direct set of child relations via the joining of the PKey with the FKey.
But it just so happens that the name in the same child table rows can then be joined with the name in the parent table - and so on.
In effect this is the indirect recursion.
If i could just break through on this - id be eternally grateful...
Thanks.
|||A very good short article on Recursive Common table Expressions (CTEs) which helped me understand them is:
http://www.sqlservercentral.com/columnists/sSampath/recursivequeriesinsqlserver2005.asp
This is the construct that Kent is using (and is only available in SQL Server 2005).
|||Thanks that throws a bit more light on it.
But it seems to me to only support the case of a single parent, and consequently the table that references itself
simply by having something like the following two member columns
ID
ParentID
When a problem domain has multiple parents or dependencies then the table structure needs to change under the
rules of normalisation. Consequently we end up with something like the schema i have shown here.
ParentTable
ID
ChildTable
ID
ParentID
Where ParentID in ChildTable is a foreign key on ID in ParentTable.
ID and ParentID in ChildTable would form a primary composite key or a unique pair.
Is there any other way of representing it so that we could use recursive CTE ?
I noticed the restriction in the article that the recursive member query could only reference one CTE.
All the examples are only recursing over a single table implying a single parent hierarchy.
Are we saying its not possible currently to recurse over a structure with multiple parents using SQL ?
So many real world problems have this property.
I still believe it is possible but just cannot see it yet. Else if it is not possible then i can give up.
|||"All the examples are only recursing over a single table implying a single parent hierarchy."
I'm not sure why you say this.
I suspect the recursive solution will work even if you have multiple records for a single child, e.g., to have "multiple managers/bosses" for the same employee.
I think the main thing is that you must specify a limit on the number of recursions for your recursion relationship, so that you will not be harmed terribly by loops within the data.
Your recursion limit would basically be slightly larger than the maximum depth you anticipate for your tree.
Your table would have columns like
P_Key,
Child_ID,
Parent_ID
Each child could have as many parents as you like (one record for each parent). Any root(s) of the tree would either have a NULL Parent_ID, or a bogus Parent_ID that never occurs in the Child_ID column.
Hope that helps,
Dan
|||Thanks Dan.
So with my orginal schema can you provide an example of a recursive CTE that would work?
Kent, what happened to you?
I thought you were the one who was going to shine the light! :-)
|||Sorry Dan, i think i see what you are suggesting now. Using a link table.
The problem is that the Child table is not the same entity as the parent table although some entries from the child table can join with the rows in the parent table others will not.
Imagine a file that depends on other files.
FileA
use FileB
use FileC
FileB
use FileX
the dependencies are references only and not actual files, unless a match by names is found in the parent table.
The parent table will have alot more properties(columns) that the child(references) table does not.
Its this aspect that allows it to become useful as i can recursively find all the unresolved references etc.
Sky is the limit - if i could just work out with all your help a working example.
Or at least understand why it cannot work this way.
Please help...
|||PR,
Here is what I have in mind, based on the data you provided in your first post. It relies on a #temp table based on the design in my previous post. (You have my apologies for the spacing in the code below: I cut and paste from Notepad, as has been suggested for this forum, but the cut/paste operation seems to trash the spacing. The spacing is fine if I paste it into another text editor, Word, etc. -- anywhere but in this Forum. I don't know what I'm doing wrong in that regard.)
Dan
[code language = 'SQL']
-- Create a table to hold parent-child data for recursion.
create table #parent_child
(
child varchar(20),
parent varchar(20)
)
-- Populate the table from existing tables.
insert into #parent_child
(
child,
parent
)
select distinct
cname,
pname
from ParentDirectory pd
inner join ChildDirectory cd
on pd.PKey = cd.FKey
-- Index the table. Both indexes are useful if you intend to make CTEs that go "from parent down" and "from child up".
create index parent_child_ndx1
on #parent_child (child, parent)
create index parent_child_ndx2
on #parent_child (parent, child)
-- Create a CTE. (I'm not sure if you need separate "parent down" and "child up" situations, since this CTE seems to provide all the combinations for either approach, I think.)
with prnt_chld_cte (parent, child, dist_from_parent) AS
(
select
parent,
child,
1 as dist_from_parent
from #parent_child
union all
select
pcc.parent,
pc.child,
dist_from_parent + 1
from #parent_child pc
inner join prnt_chld_cte pcc
on pc.parent = pcc.child
)
-- Create a query that employs the CTE. Limit recursion to 20. (You can change that number to whatever you feel is appropriate to your data situation.)
select
parent,
child,
dist_from_parent
from prnt_chld_cte
order by 1, 3, 2
OPTION (MAXRECURSION 20);
[\code]
|||Wow Dan, thanks for all that. It will take some time to work through all that but will give it a go and let you know the outcome.
I will need to pay you soon for working for me
Even though you are using directories as an example I assume it will work with multiple parents?
Yes i will want to traverse any way i can up down sideways inside out etc. But if i could just get one to work then I would begin to understand things alot better. Iv already ordered some books on SQL Server and T SQL as im obviously weak in this area.
Certainly i would want to say give me all the transitive dependents of x
And then all the trasitive dependencies of x.
Other questions would be give me all the dependencies of x that are not defined in the parent table etc etc etc.
Iv posted this on 3 of the major SQL Server forums now and still seeking a solution.
Im thinking of turning it into a competition where the person who provides a workable solution I will send a £50 cheque
What do you think?
Dans the man.
Ok here is my version
Ok created a view of the following ParentChild
SELECT dbo.Table1.Name1 AS parent, dbo.Table2.Name2 AS child
FROM dbo.Table1 INNER JOIN
dbo.Table2 ON dbo.Table1.PKey = dbo.Table2.FKey
with prnt_chld_cte (parent, child, dist_from_parent) AS
(
select
parent,
child,
1 as dist_from_parent
from ParentChild
where parent = 'A''
union all
select
pcc.parent,
pc.child,
dist_from_parent + 1
from ParentChild pc
inner join prnt_chld_cte pcc
on pc.parent = pcc.child
)
select
parent,
child,
dist_from_parent
from prnt_chld_cte
order by 1, 3, 2
OPTION (MAXRECURSION 0);
And its working, at last thanks for everyone's help.
I have a new set of problems now, Loops, with the real dataset!
How to detect loops?
Thanks again.
|||PR,
Maybe you can overcome the effects of LOOPS by putting a
WHERE NOT EXISTS (select * from prnt_child_cte pcc1 where pcc1.parent = pcc.parent and pcc1.child = pc.child)
into the "bottom" SELECT in the CTE.
I'm glad that this seems to be working for you.
I really would recommend against MAXRECURSION 0. Maybe 100, 1000, or something, but not 0.
Dan
P.S. I just tried this sort of thing with some data of my own, and found that you cannot refer to the "recursive member" in this fashion (WHERE NOT EXISTS).
Maybe you merely use a "select distinct parent, child" in a query using the CTE.
No comments:
Post a Comment