Hi all,
Am trying to setup a SSIS package between a sql2000, sql2005 source and a sql2005 destination.
I have 2 concerns, firstly, due to performance reasons (we have 2 huge legacy databases):
After 1st run,
Source table has: 1 - 1000 records
Destination table has: 1 - 1000 records
For 2nd run,
Source table has: 1 - 1500 records
Destination table has: 1 - 1500 records
How I insert only the 1001th record - 1500th record, without touching the 1st to 1000th record?
Secondly, if there are any changes in values in the records 1st to 1000th record, how to I compare and only update the value that has changed? Is there any particular configuration setting in sql that I can use?
Many thanks for any help provided.
This article explains how to decide whether a row already exists in the destination or not and then filter accordingly: http://www.sqlis.com/default.aspx?311
Get that but working first and then we'll tackle how to look for changes (tip: You can use a LOOKUP transform)
-Jamie
|||I have a really good method fully documented at the following URL.
http://www.mathgv.com/sql2005docs/SSISTransformScriptETL.htm
It uses the script component instead of the Lookup component. The Lookup component is really problematic. The bottom of this page
discusses just some of the problems you will run into with the Lookup
component. Let me know what you think!!!
Thanks,
Greg Van Mullem|||
Greg Van Mullem wrote:
I have a really good method fully documented at the following URL.
http://www.mathgv.com/sql2005docs/SSISTransformScriptETL.htm
It uses the script component instead of the Lookup component. The Lookup component is really problematic. The bottom of this page discusses just some of the problems you will run into with the Lookup component. Let me know what you think!!!
Thanks,
Greg Van Mullem
Copied from another thread
Hi Greg,
Fascinating stuff. You've got some really valuable code to share up there.
I'm slightly sceptical as to why this is actually necassary though. Your justification for doing it all in code seems to be that using lookups "needlessly fills up the destination databases transaction log with hoards of update commands" and "It prevents counting the records that actually needed to be updated." Well did you explore using LOOKUPs to find out whether a row that already exists has actually changed or not? Or even a derived column/conditional split component subsequent to your LOOKUp that compares the values in the pipeline with the values in the LOOKUP dataset? That is eminently possible and will solve the two problems that you mention here.
Great work though.
-Jamie
|||
Jamie,
Thanks for the
reply. I should have been more specific about the problems with the Lookup
component. I wrote that web page and this forum post because I wanted to open up a serious
discussion about best practices for solving this common problem. I have books
and lots of documentation on how the SSIS components work. But I have no good docs on how
to use them to implement common algorithms like this!
There are 2 main problems in
trying to use a Lookup component to detect record changes between a source and
destination databases. We tried and ran into all of them.
(1)
If the source record contains a NULL value in a field then the Lookup component
will send the following where clause to the SQL server
engine:
WHERE
MyDestinationTable.MyField = NULL
This syntax is
invalid but you will not get error. Everything will appear to work properly but
the record will not get updated. Of course this can be worked around using a
bunch of ISNULL() logic but what a pain that is. It's got to be slow
too.
(2) What if one of the fields changes in case only?
For example if the customer name field was changed from "kEVIN hARVICK" to"Kevin
Harvick". This is an obvious "caps lock" error fix that needs to be changed in
the destination database. With the lookup component the where clause will look
like this:
WHERE MyDestinationTable.CustomerName =
'Kevin Harvick'
Because the vast majority of databases are
set to do case insensitive comparisons, the existing value in the table "kEVIN hARVICK" is equal
to the new value "Kevin Harvick". Because of this the change is not detected. I
don't have a reasonable solution to this problem!
Thanks,
Greg Van Mullem
|||Greg,
You're absolutely right. Case insensitive comparisons are definitely a problem when using LOOKUP and that's where your solution really does come into its own.
Is using ISNULL() within a data-flow really a pain though? I guess its a question of taste. I for one would rather write a bunch of SSIS expressions than a whole chunk of code. I know one thing for sure though, it is NOT slow.
In the meantime, I've written a friendly retort here: http://blogs.conchango.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Checking-if-a-row-exists-and-if-it-does_2C00_-has-it-changed.aspx
-Jamie
|||
Jamie,
Your technique for using the Lookup component and the
Conditional Split component together is completely different from the way that
we were attempting to do it with just Lookup components only. It looks really
good. I'm going to add a link from my page to you page shortly. I might even
start using your technique in my packages after a little testing.
This is
the first time I have seen this concept. I have seen a lot of blogs talking
about using just the lookup component and glossing over the 2 problems that I
mentioned before.
It looks like your technique might solve the case
sensitivity problem I talked about earlier? It sure looks like it might solve
it?
Thanks,
Greg Van Mullem
Greg Van Mullem wrote:
Jamie,
It looks like your technique might solve the case sensitivity problem I talked about earlier? It sure looks like it might solve it?
Thanks,
Greg Van Mullem
Greg,
Unfortunately not. You're still left with the problem that the LOOKUP does case-sensitive lookups so it could, as you know, wrongly determine that a record is a new record when in fact it isn't. Once it goes down the "New record" route there isn't much you can do with it other than redo the lookup in a different way (perhaps using your technique or a MERGE JOIN).
Great discussion though. Its great to get these issues out in the open.
-Jamie
|||Jamie,
Actually it might work prefectly for my needs if the Conditional Split component does a case sensitive compare?
All of my primary / surrogate keys that I compare to determine if a record exists or not are integers. Using your technique these are the only values that the lookup componment would compare. The Conditional Split component would compare all the other varchar values.
Later,
Greg Van Mullem|||
Greg Van Mullem wrote:
Jamie, Actually it might work prefectly for my needs if the Conditional Split component does a case sensitive compare?
All of my primary / surrogate keys that I compare to determine if a record exists or not are integers. Using your technique these are the only values that the lookup componment would compare. The Conditional Split component would compare all the other varchar values.
Later,
Greg Van Mullem
Remember though that it is the LOOKUP that determines whether the row is new or existing.
The CONDITONAL SPLIT determines, if a row already exists, whether it has been changed or not.
Hence, your LOOKUP is still being used to decide on whether or not the row is new or not and hence is susceptible to case-sensitivity. In your case it sounds as though the case-sensitivity issue only affects you when you are seeing whether an existing row has changed or not - in which case this technique WILL help. Conditional Split CAN do case-insensitive lookups.
Lots of options. Lots of considerations. That's what I love about SSIS though - there's usually more than one way of achieving something.
-Jamie
|||
thanks guys...I need to try it out and will feedback here for updates on my situation.
I really appreciate the help I get here. :)
-Daren
|||i try the method at here
http://www.mathgv.com/sql2005docs/SSISTransformScriptETL.htm
but i got two errors
Error 1 Validation error. Pump Currency Data: DTS.Pipeline: input column "CodeISOnum" (1039) has lineage ID 422 that was not previously used in the Data Flow task. Package1.dtsx 0 0
Error 2 Validation error. Pump Currency Data: Test for insert or update [1911]: System.InvalidCastException: Unable to cast COM object of type 'System.__ComObject' to class type 'System.Data.SqlClient.SqlConnection'. Instances of types that represent COM components cannot be cast to types that do not represent COM components; however they can be cast to interfaces as long as the underlying COM component supports QueryInterface calls for the IID of the interface. at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.HandleUserException(Exception e) at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.AcquireConnections(Object transaction) at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostAcquireConnections(IDTSManagedComponentWrapper90 wrapper, Object transaction) Package1.dtsx 0 0
anyone can help?
|||Public Overrides Sub AcquireConnections(ByVal Transaction As Object)
connMgr = Me.Connections.Connection1
sqlConn = CType(connMgr.AcquireConnection(Nothing), SqlConnection) -> error 2
End Sub
i think error 2 which i encountered has something to do with the line above, but how to resolve it?
|||I have seen error #2 before. Something is wrong with the connection manager and/or the connection. Verify that #6 on my list of steps is configured correctly.
Thanks,
Greg
I also got these errors in my Script component...I search around microsoft.support pages but cannot find anything useful. sighed.
Warning 1 The dependency 'EnvDTE' could not be found.
Warning 2 The dependency 'Microsoft.SqlServer.VSAHosting' could not be found.
Warning 3 The dependency 'Microsoft.SqlServer.DtsMsg' could not be found.
Warning 4 The dependency 'Microsoft.SqlServer.VSAHostingDT' could not be found.
No comments:
Post a Comment