Sunday, March 11, 2012

Conditional Split Question

I have a package which has a conditional task which directs rows to its respective OLEDB command. The records are sorted from the source system in chronological order. The problem I am experiencing is that some of the operations do not seem to be occurring in the same order. An example of this would be someone inserts a record, deletes the record and reinserts in the record in that order. When we run the package we can see the records are coming down in chronological order but the delete from the split seems to occur after the inserts. Has anyone else experienced this? Is there anything I might be missing to ensure things happen in the order they should? Any advice would be greatly appreciated. Thank you.Can you provide more details surrounding your data flow setup? Try to illustrate to us how you have the data flow built from source all of the way through the destinations (or OLE DB Commands).|||

Bagles1 wrote:

I have a package which has a conditional task which directs rows to its respective OLEDB command. The records are sorted from the source system in chronological order. The problem I am experiencing is that some of the operations do not seem to be occurring in the same order. An example of this would be someone inserts a record, deletes the record and reinserts in the record in that order. When we run the package we can see the records are coming down in chronological order but the delete from the split seems to occur after the inserts. Has anyone else experienced this? Is there anything I might be missing to ensure things happen in the order they should? Any advice would be greatly appreciated. Thank you.

If you have 2 OLE DB Command tasks in the same dataflow you should not rely on the rows being actioned in the order that they enter the pipeline. There is no sychronisation between the two paths. Once the rows are in different paths then they are two seperate streams of data and will be teated as such. If you want to ensure that the deleted happen after the insertions push the data for deletioninto a raw file and issue the deletes from another data-flow.

-Jamie

|||

Jamie Thomson wrote:


If you have 2 OLE DB Command tasks in the same dataflow you should not rely on the rows being actioned in the order that they enter the pipeline. There is no sychronisation between the two paths. Once the rows are in different paths then they are two seperate streams of data and will be teated as such. If you want to ensure that the deleted happen after the insertions push the data for deletioninto a raw file and issue the deletes from another data-flow.

-Jamie

Yep, or create separate data flows, with precedence enforced at the control flow level.|||I have a data reader that pulls data from a staging environment whereas the data is sorted in chronological order. It immediately goes down into a conditional split where a field holds 1 of 3 values; I, U, D (Insert, Update, Delete). From there each condition has a RowCount transformation to count the rows as they pass through and then onto an OLEDB command that performs the necessary operation. The stream stops at the OLEDB command and that is all there is.|||Ugggghhhhh! That hurts.|||

Bagles1 wrote:

I have a data reader that pulls data from a staging environment whereas the data is sorted in chronological order. It immediately goes down into a conditional split where a field holds 1 of 3 values; I, U, D (Insert, Update, Delete). From there each condition has a RowCount transformation to count the rows as they pass through and then onto an OLEDB command that performs the necessary operation. The stream stops at the OLEDB command and that is all there is.

As I said above, there is no guarantee of the order in which rows get processed, especially when you send them to different data paths.

There is also no guarantee that data will actually enter the pipeline from the staging environment in the order that you think it does. There is no concept of a set of data being ordered - there are lot of things that can influence the order that rows appear in the pipeline.

-Jamie

|||

Jamie Thomson wrote:

Bagles1 wrote:

I have a data reader that pulls data from a staging environment whereas the data is sorted in chronological order. It immediately goes down into a conditional split where a field holds 1 of 3 values; I, U, D (Insert, Update, Delete). From there each condition has a RowCount transformation to count the rows as they pass through and then onto an OLEDB command that performs the necessary operation. The stream stops at the OLEDB command and that is all there is.

As I said above, there is no guarantee of the order in which rows get processed, especially when you send them to different data paths.

There is also no guarantee that data will actually enter the pipeline from the staging environment in the order that you think it does. There is no concept of a set of data being ordered - there are lot of things that can influence the order that rows appear in the pipeline.

-Jamie

Jamie,

Are you saying that if I have an Order By statement in my data reader that there is no guarantee that it will actually be in that order or have I misunderstood your statement?

|||

Bagles1 wrote:

Jamie Thomson wrote:

Bagles1 wrote:

I have a data reader that pulls data from a staging environment whereas the data is sorted in chronological order. It immediately goes down into a conditional split where a field holds 1 of 3 values; I, U, D (Insert, Update, Delete). From there each condition has a RowCount transformation to count the rows as they pass through and then onto an OLEDB command that performs the necessary operation. The stream stops at the OLEDB command and that is all there is.

As I said above, there is no guarantee of the order in which rows get processed, especially when you send them to different data paths.

There is also no guarantee that data will actually enter the pipeline from the staging environment in the order that you think it does. There is no concept of a set of data being ordered - there are lot of things that can influence the order that rows appear in the pipeline.

-Jamie

Jamie,

Are you saying that if I have an Order By statement in my data reader that there is no guarantee that it will actually be in that order or have I misunderstood your statement?

Well, YES, the order will be retained until some other downstream data flow component rearranges the order. Surely you can't expect a union all to maintain order, for instance.|||

Bagles1 wrote:

Jamie,

Are you saying that if I have an Order By statement in my data reader that there is no guarantee that it will actually be in that order or have I misunderstood your statement?

In that case then yes, it will enter the pipeline in the order decreed by the ORDER BYstatement but thereafter you should not rely on the ordering within the pipeline. And you should DEFINATELY not rely on rows reaching a destination/OLE DB Command in some order when those rows are in different data paths.

-Jamie

|||Thank you for your explanation. A very painful lesson to learn this late in the project. Unfortunately in our case we have to process the rows in chronological order so it looks like it will be the script component once again.|||

Bagles1 wrote:

Thank you for your explanation. A very painful lesson to learn this late in the project. Unfortunately in our case we have to process the rows in chronological order so it looks like it will be the script component once again.

If this is true then it sounds as though you have some sort of procedural logic going on - that's not really possible with the standard components. Script component may help though.

I don't actually know your requirement but wouldn't it just make more sense to count the number of inserts and deletes per "thing". If there are more inserts than deletes then you insert the "thing", otherwise you don't.

Good luck with it anyway.

-Jamie

No comments:

Post a Comment