multiple rows insert: How to do this SQL analysis...

With SQL 2005 Standard: I have records in a table that show how long it
takes to complete a task...
ID_KEY DATETIME (nearest minute) DURATION
When a task is started, a record is added with a "null" duration, and when
the task is finished, the duration is updated to represent the number of
seconds the task took to complete. So, here's sample values:
133541 2006-02-27 16:51:00 NULL
133540 2006-02-27 16:51:00 60
133539 2006-02-27 16:50:00 NULL
133538 2006-02-27 16:50:00 153
133537 2006-02-27 16:50:00 30
The above data shows that 2 tasks are not complete and 3 tasks finished in
those respective #seconds.
My question: I would like to know how to determine the number of
simulataneous tasks in process for any given period of time, or by the hour
of the day. Right this moment, I can determine that by doing SELECT
COUNT(*) FROM TABLE WHERE DURATION IS NULL, but once you get into the
historical perspective (where the nulls are all real duration values), you
have to figure our which duration values overlap with other duration values.
Yeah, I know my full-minute rounding will screw up the analysis, but coming
close would be nice right now. For example, the 153 second task was
simultaneous with the two tasks that started one minute later. But how do
you run some kind of query against the table for periods of time where the
durations are all now complete (no nulls for yesterday, for example, because
those tasks are complete). How do I determine those "max simultaneous tasks
were in progress" during hour by hour windows of time? Or something like
that? I really need this to determine staffing, because if we had 100
simulatenous tasks between 8 and 8:10 AM yesterday morning, I know we need
to hire more people."HK" <replywithingroup@.notreal.com> wrote in message
news:8kGMf.2874$xS5.921@.tornado.socal.rr.com...
> With SQL 2005 Standard: I have records in a table that show how long it
> takes to complete a task...
> ID_KEY DATETIME (nearest minute) DURATION
> When a task is started, a record is added with a "null" duration, and when
> the task is finished, the duration is updated to represent the number of
> seconds the task took to complete. So, here's sample values:
> 133541 2006-02-27 16:51:00 NULL
> 133540 2006-02-27 16:51:00 60
> 133539 2006-02-27 16:50:00 NULL
> 133538 2006-02-27 16:50:00 153
> 133537 2006-02-27 16:50:00 30
> The above data shows that 2 tasks are not complete and 3 tasks finished in
> those respective #seconds.
>
How about doing a DATEADD on the seconds for completed tasks as a time2 in
the query, then select BETWEEN time1 and time2.
Rick Sawtell
MCT, MCSD, MCDBA|||"Rick Sawtell" <Quickening@.msn.com> wrote in message
news:%23BssXF8OGHA.3728@.tk2msftngp13.phx.gbl...
> "HK" <replywithingroup@.notreal.com> wrote in message
> news:8kGMf.2874$xS5.921@.tornado.socal.rr.com...
> > With SQL 2005 Standard: I have records in a table that show how long it
> > takes to complete a task...
> >
> > ID_KEY DATETIME (nearest minute) DURATION
> >
> > When a task is started, a record is added with a "null" duration, and
when
> > the task is finished, the duration is updated to represent the number of
> > seconds the task took to complete. So, here's sample values:
> >
> > 133541 2006-02-27 16:51:00 NULL
> > 133540 2006-02-27 16:51:00 60
> > 133539 2006-02-27 16:50:00 NULL
> > 133538 2006-02-27 16:50:00 153
> > 133537 2006-02-27 16:50:00 30
> >
> > The above data shows that 2 tasks are not complete and 3 tasks finished
in
> > those respective #seconds.
> >
> How about doing a DATEADD on the seconds for completed tasks as a time2 in
> the query, then select BETWEEN time1 and time2.
> Rick Sawtell
> MCT, MCSD, MCDBA
>
>
>
Although I understand the concept and how to use date functions, I still
don't know how to put this into production in a real query (the selecting
count between time1 and time2 is still not clear when I try), and also how
to group by time intervals. Sample syntax would be much appreciated.
Thanks.|||> When a task is started, a record is added with a "null" duration, and when
> the task is finished, the duration is updated to represent the number of
> seconds the task took to complete. So, here's sample values:
> 133541 2006-02-27 16:51:00 NULL
> 133540 2006-02-27 16:51:00 60
> 133539 2006-02-27 16:50:00 NULL
> 133538 2006-02-27 16:50:00 153
> 133537 2006-02-27 16:50:00 30
> The above data shows that 2 tasks are not complete and 3 tasks finished in
> those respective #seconds.
I assume you realize why this task is so difficult. For those who don't,
this is what happens when the schema is not properly normalized. In this
case, I am assuming that there is a 1-1 relationship between the "start" row
of a given task and the "end" row of a given task - both are related by a
common value for ID_KEY. Comments below are predicated on these
assumptions.
> My question: I would like to know how to determine the number of
> simulataneous tasks in process for any given period of time, or by the
> hour
> of the day. Right this moment, I can determine that by doing SELECT
> COUNT(*) FROM TABLE WHERE DURATION IS NULL, but once you get into the
> historical perspective (where the nulls are all real duration values), you
> have to figure our which duration values overlap with other duration
> values.
> ID_KEY DATETIME (nearest minute) DURATION
The basic query that provides the data you seek is:
select t1.ID_KEY, t1.DATETIME as STARTDT,
DATEADD(mi, t2.DURATION, t1.DATETIME) as ENDDT,
t2.DURATION
from mytable as t1
left outer join mytable as t2
on t1.ID_KEY = t2.ID_KEY
and t1.DATETIME is null
and t2.DATETIME is not null
order by '
Using the above as a derived table, the logic you need is something along
the lines of:
declare @.target datetime
set @.target = '20060227 13:45:00.000'
select ....
from [this is where the derived table goes, assume an alias of durdata]
where STARTDT <= @.target
and coalesce (ENDDT, @.target) >= @.target
order by '
This gives you information as of a particular moment in time. You should be
able to expand that logic to get information for a particular time period.
> Yeah, I know my full-minute rounding will screw up the analysis, but
> coming
> close would be nice right now. For example, the 153 second task was
> simultaneous with the two tasks that started one minute later. But how
> do
> you run some kind of query against the table for periods of time where the
> durations are all now complete (no nulls for yesterday, for example,
> because
> those tasks are complete). How do I determine those "max simultaneous
> tasks
> were in progress" during hour by hour windows of time? Or something like
> that? I really need this to determine staffing, because if we had 100
> simulatenous tasks between 8 and 8:10 AM yesterday morning, I know we need
> to hire more people.
If you want hour-by-hour information, just create (and populate) a
"calendar" table. This can be done in different ways - one way is
dynamically with a table-valued function. With that table, you join to the
derived table and substitute the variable boundary values with the
associated columns from the calendar table.|||"Scott Morris" <bogus@.bogus.com> wrote in message
news:%23SUSI38OGHA.2124@.TK2MSFTNGP14.phx.gbl...
> > When a task is started, a record is added with a "null" duration, and
when
> > the task is finished, the duration is updated to represent the number of
> > seconds the task took to complete. So, here's sample values:
> >
> > 133541 2006-02-27 16:51:00 NULL
> > 133540 2006-02-27 16:51:00 60
> > 133539 2006-02-27 16:50:00 NULL
> > 133538 2006-02-27 16:50:00 153
> > 133537 2006-02-27 16:50:00 30
> >
> > The above data shows that 2 tasks are not complete and 3 tasks finished
in
> > those respective #seconds.
> I assume you realize why this task is so difficult. For those who don't,
> this is what happens when the schema is not properly normalized. In this
> case, I am assuming that there is a 1-1 relationship between the "start"
row
> of a given task and the "end" row of a given task - both are related by a
> common value for ID_KEY. Comments below are predicated on these
> assumptions.
> > My question: I would like to know how to determine the number of
> > simulataneous tasks in process for any given period of time, or by the
> > hour
> > of the day. Right this moment, I can determine that by doing SELECT
> > COUNT(*) FROM TABLE WHERE DURATION IS NULL, but once you get into the
> > historical perspective (where the nulls are all real duration values),
you
> > have to figure our which duration values overlap with other duration
> > values.
> > ID_KEY DATETIME (nearest minute) DURATION
> The basic query that provides the data you seek is:
> select t1.ID_KEY, t1.DATETIME as STARTDT,
> DATEADD(mi, t2.DURATION, t1.DATETIME) as ENDDT,
> t2.DURATION
> from mytable as t1
> left outer join mytable as t2
> on t1.ID_KEY = t2.ID_KEY
> and t1.DATETIME is null
> and t2.DATETIME is not null
> order by '
> Using the above as a derived table, the logic you need is something along
> the lines of:
> declare @.target datetime
> set @.target = '20060227 13:45:00.000'
> select ....
> from [this is where the derived table goes, assume an alias of durdata]
> where STARTDT <= @.target
> and coalesce (ENDDT, @.target) >= @.target
> order by '
> This gives you information as of a particular moment in time. You should
be
> able to expand that logic to get information for a particular time period.
> > Yeah, I know my full-minute rounding will screw up the analysis, but
> > coming
> > close would be nice right now. For example, the 153 second task was
> > simultaneous with the two tasks that started one minute later. But how
> > do
> > you run some kind of query against the table for periods of time where
the
> > durations are all now complete (no nulls for yesterday, for example,
> > because
> > those tasks are complete). How do I determine those "max simultaneous
> > tasks
> > were in progress" during hour by hour windows of time? Or something
like
> > that? I really need this to determine staffing, because if we had 100
> > simulatenous tasks between 8 and 8:10 AM yesterday morning, I know we
need
> > to hire more people.
> If you want hour-by-hour information, just create (and populate) a
> "calendar" table. This can be done in different ways - one way is
> dynamically with a table-valued function. With that table, you join to
the
> derived table and substitute the variable boundary values with the
> associated columns from the calendar table.
>
Thanks for the "join to itself" example. I'm not understanding the
calendar table, sorry. Are you able to give a quick example of that? Are
you meaning to create a temp table?|||> Thanks for the "join to itself" example. I'm not understanding the
> calendar table, sorry. Are you able to give a quick example of that? Are
> you meaning to create a temp table?
Perhaps. There are many uses for a calendar table (or, more simply, a table
of sequential numbers). You can search the newsgroup for reasons and
examples of their usage. Whether there is value in making this information
permanent is something you must determine based on its useful as well as
other factors that may not be apparent to outsiders like us. Most people
discover that there are special ways of handling information based on
whether a date is a "work day" (as opposed to a holiday, snow day, or other
non-working day)
Assume a temp table contains this info. At its simplest, it is just a table
of dates. Simplying your request some more, lets assume that we want hourly
numbers for the current date. Given the previously posted queries, what
specifically do we need in order to generate the appropriate information?
Answer - we need the boundaries of our timeperiods. Back to the example, we
need a table of two columns that define our boundary values for each period
of interest. The rows would look like the following:
'20060227 00:00:00.000' , '20060227 01:00:00.000'
'20060227 01:00:00.000' , '20060227 02:00:00.000'
'20060227 02:00:00.000' , '20060227 03:00:00.000'
Did you notice the overlap of time from the end of one period to the start
of the next. By usage / convention, we'll prevent the overlap from altering
our numbers. Why overlap - for clarification / presentation purposes only.
To get your data, join this table using the previous derived table example -
replace the variable usage with the appropriate columns and add the group by
information. Focus particularly on the coalesce usage. Note that a task
applies to a period if the startdate of the task precedes the end date of
the period (not inclusive) and the end date of the task falls on/after the
start date of the period (inclusive - this is the "usage convention"
mentioned earlier). Note that a task that has not completed applies to all
periods that end after the task start date.
There are tricks to generating the timeperiod information. For now, it is
probably easiest to focus on the logic and ignore whatever tricks might be
used for generation. It probably makes most sense, from a developing
perspective, to create a temp table and populate it with insert statements
in a script. It's only 24 insert statements and you can copy/paste the
information quickly.
Play with the concept a bit. I'll work up an example if you still have
questions.|||"Scott Morris" <bogus@.bogus.com> wrote in message
news:u1knRF%23OGHA.3728@.tk2msftngp13.phx.gbl...
> > Thanks for the "join to itself" example. I'm not understanding the
> > calendar table, sorry. Are you able to give a quick example of that?
Are
> > you meaning to create a temp table?
> Perhaps. There are many uses for a calendar table (or, more simply, a
table
> of sequential numbers). You can search the newsgroup for reasons and
> examples of their usage. Whether there is value in making this
information
> permanent is something you must determine based on its useful as well as
> other factors that may not be apparent to outsiders like us. Most people
> discover that there are special ways of handling information based on
> whether a date is a "work day" (as opposed to a holiday, snow day, or
other
> non-working day)
> Assume a temp table contains this info. At its simplest, it is just a
table
> of dates. Simplying your request some more, lets assume that we want
hourly
> numbers for the current date. Given the previously posted queries, what
> specifically do we need in order to generate the appropriate information?
> Answer - we need the boundaries of our timeperiods. Back to the example,
we
> need a table of two columns that define our boundary values for each
period
> of interest. The rows would look like the following:
> '20060227 00:00:00.000' , '20060227 01:00:00.000'
> '20060227 01:00:00.000' , '20060227 02:00:00.000'
> '20060227 02:00:00.000' , '20060227 03:00:00.000'
> Did you notice the overlap of time from the end of one period to the start
> of the next. By usage / convention, we'll prevent the overlap from
altering
> our numbers. Why overlap - for clarification / presentation purposes
only.
> To get your data, join this table using the previous derived table
example -
> replace the variable usage with the appropriate columns and add the group
by
> information. Focus particularly on the coalesce usage. Note that a task
> applies to a period if the startdate of the task precedes the end date of
> the period (not inclusive) and the end date of the task falls on/after the
> start date of the period (inclusive - this is the "usage convention"
> mentioned earlier). Note that a task that has not completed applies to all
> periods that end after the task start date.
> There are tricks to generating the timeperiod information. For now, it is
> probably easiest to focus on the logic and ignore whatever tricks might be
> used for generation. It probably makes most sense, from a developing
> perspective, to create a temp table and populate it with insert statements
> in a script. It's only 24 insert statements and you can copy/paste the
> information quickly.
> Play with the concept a bit. I'll work up an example if you still have
> questions.
>
So you were talking about creating a truly separate table, with each row
being a period of time, like hourly. I understand now, thanks.
But one thing still escapes me. So you have the thousands of raw records
(the example records I gave in the beginning of this post) in each day. If
I said tell me the busiest time of day yesterday, and how many simultaneous
tasks were being executed at that moment, how would you determine the
answer? Would you create 24*60 rows in your calendar table, one for each
minute? And if so, how would you do this in an automated way, to both
create the calendar table rows AND insert the count value into each row? I
certainly don't want to hand-type 24*60 entries for each day. I can
imagine writing code that looks through a recordset, row by row, with
counters as it goes. But I would also think that there should be a way to
do this from query analyzer.|||> So you were talking about creating a truly separate table, with each row
> being a period of time, like hourly. I understand now, thanks.
yes - based on your examples and questions only. Obviously, the solution
depends on what "answer" you seek.
> But one thing still escapes me. So you have the thousands of raw records
> (the example records I gave in the beginning of this post) in each day.
> If
> I said tell me the busiest time of day yesterday, and how many
> simultaneous
> tasks were being executed at that moment, how would you determine the
> answer? Would you create 24*60 rows in your calendar table, one for each
> minute?
Good requirements often lead you to a good solution. Define "busiest time
of the day". This raises the issue of how accurate an answer you seek (and
how accurate your data is). If the data is only accurate to the nearest
minute, then the answer is equally limited. Knowing that, we need to
calculate the outstanding task counts on a minute basis throughout the
period of interest (yesterday) and get the largest count. This involves 1
day x 24 hours x 60 minutes invidividual calculation periods; this then
drives the content of the temp table discussed earlier. BTW - a table of
1440 (24 x 60) rows is a very, VERY, small table.
> And if so, how would you do this in an automated way, to both
> create the calendar table rows AND insert the count value into each row?
> I
> certainly don't want to hand-type 24*60 entries for each day.
You said "automated", followed by "hand-type". Obviously, anything that you
do on a regular basis can be automated and there is no reason why you would
need to "hand-type" anything more than once. Is it possible to write a
script to populate a temp table with the hourly intervals for a specified
date? Of course. There is a looping construct in tsql. There are also
tricks, as mentioned earlier. Steve Kass has posted a UDF that generates a
table of sequential numbers in the .programming NG, if you want to see one
such technique.
> I can imagine writing code that looks through a recordset, row by row,
> with
> counters as it goes.
Yes - you do this because you are most familiar with conventional
programming. Good relational solutions involve set manipulations. Very
rarely is a "row-by-row" solution appropriate; often such a solution is far
from "best". Search the newsgroups for the term "cursor" to see the oft
repeated discussions about such things.
> But I would also think that there should be a way to do this from query
> analyzer.
tsql code is tsql code. How you execute it is your decision. QA is but one
tool that can do this.|||> select t1.ID_KEY, t1.DATETIME as STARTDT,
> DATEADD(mi, t2.DURATION, t1.DATETIME) as ENDDT,
> t2.DURATION
> from mytable as t1
> left outer join mytable as t2
> on t1.ID_KEY = t2.ID_KEY
> and t1.DATETIME is null
> and t2.DATETIME is not null
> order by '
This is wrong. Replace the datetime column in the join conditions with the
duration column. I should point out that there is an assumption regarding
the datetime column (nice name, btw) - specifically that all rows for a task
(identified by ID_KEY) have the same value for the DATETIME column.|||A working example
set nocount on
go
if object_id('tempdb..#rawdata') is not null
drop table #rawdata
go
if object_id('tempdb..#period') is not null
drop table #period
go
create table #rawdata (
id_key int not null
, tstamp datetime not null
, duration int null
)
insert #rawdata (id_key, tstamp, duration)
select 31, '20060227 16:51:00', NULL
union all
select 31, '20060227 16:51:00', 181
union all
select 28, '20060227 08:03:00', null
union all
select 28, '20060227 08:03:00', 600
union all
select 12, '20060227 00:00:00', null
union all
select 900, '20060227 00:00:00', null
union all
select 901, '20060227 12:00:00', null
union all
select 88, '20060227 23:00:00', null
union all
select 88, '20060227 23:00:00', 60
select id_key, tstamp, duration from #rawdata order by id_key, tstamp,
duration
select t1.id_key, t1.tstamp as STARTDT,
dateadd (mi, t2.duration, t1.tstamp) as ENDDT,
t2.DURATION
from #rawdata as t1
left outer join #rawdata as t2
on t1.id_key = t2.id_key
and t1.duration is null
and t2.duration is not null
where t1.duration is null
create table #period (
pstart datetime not null
, pend datetime not null
)
declare @.temp int, @.start datetime
set @.start = '20060227'
while @.start < '20060228'
begin
insert #period (pstart, pend) values (@.start, dateadd (hour, 1, @.start))
set @.start = dateadd (hour, 1, @.start)
end
select * from #period order by pstart
select *
from (
select t1.id_key, t1.tstamp as startdt,
dateadd (mi, t2.duration, t1.tstamp) as enddt,
t2.duration
from #rawdata as t1
left outer join #rawdata as t2
on t1.id_key = t2.id_key
and t1.duration is null
and t2.duration is not null
where t1.duration is null
) as normal
select #period.*, normal.*
from (
select t1.id_key, t1.tstamp as startdt,
dateadd (mi, t2.duration, t1.tstamp) as enddt,
t2.duration
from #rawdata as t1
left outer join #rawdata as t2
on t1.id_key = t2.id_key
and t1.duration is null
and t2.duration is not null
where t1.duration is null
) as normal
inner join #period
on normal.startdt < #period.pend
and coalesce (normal.enddt, #period.pend) > #period.pstart
order by normal.id_key, #period.pstart
select #period.pstart, #period.pend, count(*)
from (
select t1.id_key, t1.tstamp as startdt,
dateadd (mi, t2.duration, t1.tstamp) as enddt,
t2.duration
from #rawdata as t1
left outer join #rawdata as t2
on t1.id_key = t2.id_key
and t1.duration is null
and t2.duration is not null
where t1.duration is null
) as normal
inner join #period
on normal.startdt < #period.pend
and coalesce (normal.enddt, #period.pend) > #period.pstart
group by #period.pstart, #period.pend
order by #period.pstart|||Thank you for spending the time on the examples.
"Scott Morris" <bogus@.bogus.com> wrote in message
news:OBIYqVHPGHA.140@.TK2MSFTNGP12.phx.gbl...
> A working example
> set nocount on
> go
> if object_id('tempdb..#rawdata') is not null
> drop table #rawdata
> go
> if object_id('tempdb..#period') is not null
> drop table #period
> go
>
> create table #rawdata (
> id_key int not null
> , tstamp datetime not null
> , duration int null
> )
>
> insert #rawdata (id_key, tstamp, duration)
> select 31, '20060227 16:51:00', NULL
> union all
> select 31, '20060227 16:51:00', 181
> union all
> select 28, '20060227 08:03:00', null
> union all
> select 28, '20060227 08:03:00', 600
> union all
> select 12, '20060227 00:00:00', null
> union all
> select 900, '20060227 00:00:00', null
> union all
> select 901, '20060227 12:00:00', null
> union all
> select 88, '20060227 23:00:00', null
> union all
> select 88, '20060227 23:00:00', 60
> select id_key, tstamp, duration from #rawdata order by id_key, tstamp,
> duration
> select t1.id_key, t1.tstamp as STARTDT,
> dateadd (mi, t2.duration, t1.tstamp) as ENDDT,
> t2.DURATION
> from #rawdata as t1
> left outer join #rawdata as t2
> on t1.id_key = t2.id_key
> and t1.duration is null
> and t2.duration is not null
> where t1.duration is null
> create table #period (
> pstart datetime not null
> , pend datetime not null
> )
> declare @.temp int, @.start datetime
> set @.start = '20060227'
> while @.start < '20060228'
> begin
> insert #period (pstart, pend) values (@.start, dateadd (hour, 1,
@.start))
> set @.start = dateadd (hour, 1, @.start)
> end
> select * from #period order by pstart
> select *
> from (
> select t1.id_key, t1.tstamp as startdt,
> dateadd (mi, t2.duration, t1.tstamp) as enddt,
> t2.duration
> from #rawdata as t1
> left outer join #rawdata as t2
> on t1.id_key = t2.id_key
> and t1.duration is null
> and t2.duration is not null
> where t1.duration is null
> ) as normal
> select #period.*, normal.*
> from (
> select t1.id_key, t1.tstamp as startdt,
> dateadd (mi, t2.duration, t1.tstamp) as enddt,
> t2.duration
> from #rawdata as t1
> left outer join #rawdata as t2
> on t1.id_key = t2.id_key
> and t1.duration is null
> and t2.duration is not null
> where t1.duration is null
> ) as normal
> inner join #period
> on normal.startdt < #period.pend
> and coalesce (normal.enddt, #period.pend) > #period.pstart
> order by normal.id_key, #period.pstart
>
> select #period.pstart, #period.pend, count(*)
> from (
> select t1.id_key, t1.tstamp as startdt,
> dateadd (mi, t2.duration, t1.tstamp) as enddt,
> t2.duration
> from #rawdata as t1
> left outer join #rawdata as t2
> on t1.id_key = t2.id_key
> and t1.duration is null
> and t2.duration is not null
> where t1.duration is null
> ) as normal
> inner join #period
> on normal.startdt < #period.pend
> and coalesce (normal.enddt, #period.pend) > #period.pstart
> group by #period.pstart, #period.pend
> order by #period.pstart
>

Wednesday, March 7, 2012

How to do this SQL analysis...

No comments:

Post a Comment

multiple rows insert

Blog Archive

About Me