I need to shred the xml data to retrieve BrandIDs based on the following business rules.
/**************************************************************************************************************************
(1) Not every instance of xml would contain BrandIDs node
(2) Ignore BrandIDs whenever its a descendant of AlternativeState
(3) We are interested in the data stored under MarketSize whenever the CurrentEvent node is MarketSize.
(4) We are interested in the data stored under OtherEvent whenever the CurrentEvent node is not MarketSize.
****************************************************************************************************************************/
While shredding the xml data, I have noticed that out of 200,000 xml rows there are only 1000 BrandIDs nodes that do actually have data in them (e.g. <BrandIDs> 123, 234</BrandIDs>. Others are just blank in the form of </BrandIDs>. I would like to modify XQuery given below so that I could filter out such rows where even though BrandIDs node exist but it has no scalar value for me to retrieve. From the sample query result given at the end you would notice that the third row is empty. I would like to avoid such rows in result set.
I am open to any suggestions here if anyone out there could come up with a better solution.
declare @.xml xml
set @.xml =
'
<State>
<StatsState>
<CurrentState>
<BrandIDs>2698741</BrandIDs>
</CurrentState>
</StatsState>
</State>
<State>
<StatsState>
<CurrentState>
<OtherEvents>
<BrandIDs>160603,160737</BrandIDs>
</OtherEvents>
<CurrentEvent>BrandShare</CurrentEvent>
</CurrentState>
</StatsState>
</State>
<State>
<StatsState>
<CurrentState>
<MarketSize>
<BrandIDs />
<AlternativeState>
<CurrentEvent>None</CurrentEvent>
<BrandIDs>25630,8956201</BrandIDs>
</AlternativeState>
<CompanyIDs />
</MarketSize>
<CurrentEvent>MarketSize</CurrentEvent>
</CurrentState>
</StatsState>
</State>
<State>
<StatsState>
<CurrentState>
<OtherEvents>
<BrandIDs>2001,2002,2003,2004,2005,2006</BrandIDs>
</OtherEvents>
<CurrentEvent>BrandShare</CurrentEvent>
<MarketSize>
<BrandIDs>40666,71788,201225</BrandIDs>
</MarketSize>
</CurrentState>
</StatsState>
</State>
'
SELECT
Element.Val.query(
'for $s in self::node()
where $s//*/BrandIDs[not(parent::AlternativeState)]
return
if (data(($s/StatsState/CurrentState/CurrentEvent)[1]) = "MarketSize")
then $s/StatsState/CurrentState/MarketSize/BrandIDs/text()
else (
if (data(($s/StatsState/CurrentState/CurrentEvent)[1]) != "MarketSize")
then $s/StatsState/CurrentState/OtherEvents/BrandIDs/text()
else $s//BrandIDs/text())
') AS BrandIDs
FROM @.xml.nodes('/State') AS Element(Val)
GO
BrandIDs
-
2698741
160603,160737
2001,2002,2003,2004,2005,2006
If an element is empty then it does not have any child nodes meaning you can check with e.g. BrandIDs[node()] for BrandIDs elements that are not empty.
So your query could be written as
Code Snippet
SELECT
Element.Val.query(
'for $s in self::node()
return
if (data(($s/StatsState/CurrentState/CurrentEvent)[1]) = "MarketSize")
then $s/StatsState/CurrentState/MarketSize/BrandIDs/text()
else (
if (data(($s/StatsState/CurrentState/CurrentEvent)[1]) != "MarketSize")
then $s/StatsState/CurrentState/OtherEvents/BrandIDs/text()
else $s//BrandIDs/text())
') AS BrandIDs
FROM @.xml.nodes('/State[.//*/BrandIDs[node() and not(parent::AlternativeState)]]') AS Element(Val)
|||Hi marton,
Thanks once more for helping me out here. Your proposed solution does solve my problem. Is there any way that perhaps you could use the 'where' clause in FLWOR to apply the same condition? I actually have a requirement to use @.xml.nodes('/State'). I do have different set of FLWOR queries to read values for different nodes and for each the root node is always /State and I would need to combine all of them in one statement. So preferably I would like to keep the nodes clause pointing to root.
Hope you get my point.
thanks again
|||The problem is that the nodes methods shreds the xml variable into rows that you then query with the query method. So for the original example you got four rows in the result set as the nodes method yields four rows, independent of the query applied later to the each row. If you want to eliminate nodes to not yield rows at all then I think it as to be done with the nodes method.|||Hi Martin,
Thanks for clarification. I understand your point.
I actually works with xml where each of the node goes to its own relational table and I wanted to have only one select statement where each xml row is read only once and I get all of the values for nodes by specifying any business rules within FLWOR there. Hence I hesitate to specify any specific node such as BrandIDs in nodes() because it wouldn't leave an option for me to work with other nodes. I think I would have to use function for each node and call them from my select statement.
Many thanks
No comments:
Post a Comment