Thursday, March 29, 2012

configuration/load balancing

Hi

On our SQl Server (2000) we can have queries that take at least 10-20mins
(require full table scan fo billion row table). When one of these queries
is running it substantailly slows down very quick queries (sub second
queries take several seconds).

I believe there is no way to set priorities but was wondering if there
are other configuration settings that could help. The server is dual
processor so maybe setting maxdop to 1 will help. Memory size is
dynmaically assigned up to 4Gb but as the DB size is > 1Tb I'm not sure
if allowing this much memory is actually decreasing performance when the
quick query trys to get through.

Any suggestions.

Thanks
MikeMike Read wrote:
> Hi
> On our SQl Server (2000) we can have queries that take at least
> 10-20mins (require full table scan fo billion row table). When one of
> these queries is running it substantailly slows down very quick
> queries (sub second queries take several seconds).

I don't know your env and its requirements but to me the difference
between sub 1 sec and several seconds doesn't sound worth the effort
changing anything - unless, of course, you have lots of these short
queries executed in sequence and the difference sums up dramatically.

> I believe there is no way to set priorities but was wondering if there
> are other configuration settings that could help. The server is dual
> processor so maybe setting maxdop to 1 will help. Memory size is
> dynmaically assigned up to 4Gb but as the DB size is > 1Tb I'm not
> sure if allowing this much memory is actually decreasing performance
> when the quick query trys to get through.
> Any suggestions.

A random list that comes to mind:

- scheduling: make sure the long runners are done during the night or
other time when the DB is mostly idle.

- distribution of data: either via some form of replication or by moving
data from one DB to a complete different system

- optimizing SQL: additional indexes, different query conditions etc.

Cheers

robert|||Hi Robert

> I don't know your env and its requirements but to me the difference
> between sub 1 sec and several seconds doesn't sound worth the effort
> changing anything - unless, of course, you have lots of these short
> queries executed in sequence and the difference sums up dramatically.

Yes there could well be a lot of the small queries.

> - scheduling: make sure the long runners are done during the night or
> other time when the DB is mostly idle.

I'm trying to write some sort of queue to help with this but the chances
are there will always be a long running query executing at a given time.

> - distribution of data: either via some form of replication or by moving
> data from one DB to a complete different system

We're looking at getting another server to handle the long queries
so this might utilmately be the answer

> - optimizing SQL: additional indexes, different query conditions etc.

We've pretty much done what we can but some queries will always need a
full table scan.

As all queries run at the same priority I was kind of expecting a
0.1 sec query to take approx 0.2 sec (rather than 10 secs as is happening)
if another (long) query is running.

As this isn't the case I presume there's some sort of
overhead/cache/swapping occuring that I might have been able to
reduce showhow.

Thanks
Mike|||Mike Read wrote:
> Hi Robert
>> I don't know your env and its requirements but to me the difference
>> between sub 1 sec and several seconds doesn't sound worth the effort
>> changing anything - unless, of course, you have lots of these short
>> queries executed in sequence and the difference sums up dramatically.
>>
> Yes there could well be a lot of the small queries.
>>
>> - scheduling: make sure the long runners are done during the night or
>> other time when the DB is mostly idle.
>>
> I'm trying to write some sort of queue to help with this but the
> chances are there will always be a long running query executing at a
> given time.
>> - distribution of data: either via some form of replication or by
>> moving data from one DB to a complete different system
>>
> We're looking at getting another server to handle the long queries
> so this might utilmately be the answer
>> - optimizing SQL: additional indexes, different query conditions etc.
>>
> We've pretty much done what we can but some queries will always need a
> full table scan.
> As all queries run at the same priority I was kind of expecting a
> 0.1 sec query to take approx 0.2 sec (rather than 10 secs as is
> happening) if another (long) query is running.
> As this isn't the case I presume there's some sort of
> overhead/cache/swapping occuring that I might have been able to
> reduce showhow.

My guess would be that your DB is IO bound during these phases, i.e. the
long running table scans eat up all the IO bandwidth and that's slowing
you down. I'd do some measurements to verify that before you change
anything.

Kind regards

robert|||"Mike Read" <mar@.roe.ac.uk> wrote in message
news:Pine.OSF.4.63.0601251140480.472688@.reaxp06.ro e.ac.uk...
> Hi Robert
> > - distribution of data: either via some form of replication or by moving
> > data from one DB to a complete different system
> We're looking at getting another server to handle the long queries
> so this might utilmately be the answer

This may ultimately be your best answer. But...

> > - optimizing SQL: additional indexes, different query conditions etc.
> We've pretty much done what we can but some queries will always need a
> full table scan.

Why? I'd suggest perhaps posting some DDLs here. Some folks here can
sometimes do some amazing work.

> As all queries run at the same priority I was kind of expecting a
> 0.1 sec query to take approx 0.2 sec (rather than 10 secs as is happening)
> if another (long) query is running.
> As this isn't the case I presume there's some sort of
> overhead/cache/swapping occuring that I might have been able to
> reduce showhow.

Well, generally more RAM is good.

But keep in mind SQL Server 2000 Standard is limited to 2 gig of RAM.

So make sure you're using Enterprise on an OS that will permit use of more
RAM.

I'd highly suggest at least Windows 2003 for your OS and ideally moving to
SQL 2005 to boot.

For example, SQL 2005 Enterprise on Windows 2003 Enterprise can supply up to
64 Gig of RAM. (if you really have money to burn,go to Enterprise for
Itanium Systems.. 1TB of RAM. Oh and send a few checks my way. :-)

Also, you may want to if you haven't already, get more disks and partition
tables accordingly.

For example, if it's only one large table that gets scanned, move it to its
own set of disks. This will isolate the disk I/O.

> Thanks
> Mike|||Hi Greg

>>
>> We've pretty much done what we can but some queries will always need a
>> full table scan.
>>
> Why? I'd suggest perhaps posting some DDLs here. Some folks here can
> sometimes do some amazing work.

The main table is 1 billion rows of about 60 columns, we've indexed on the
most common attributes users might select on but we allow them to mine
the data using arbitary SQL so they might decide to look for an arithmetic
combination of parameters (again some of which we have anticipated and
materialized) or do stats on the columns etc.

> Well, generally more RAM is good.
> But keep in mind SQL Server 2000 Standard is limited to 2 gig of RAM.
> So make sure you're using Enterprise on an OS that will permit use of more
> RAM.
> I'd highly suggest at least Windows 2003 for your OS and ideally moving to
> SQL 2005 to boot.

We're running on Windows 2003 with 4 Gb.

> For example, SQL 2005 Enterprise on Windows 2003 Enterprise can supply up to
> 64 Gig of RAM. (if you really have money to burn,go to Enterprise for
> Itanium Systems.. 1TB of RAM. Oh and send a few checks my way. :-)

A bit out of our price range though we could almost cache the table :)

> Also, you may want to if you haven't already, get more disks and partition
> tables accordingly.
> For example, if it's only one large table that gets scanned, move it to its
> own set of disks. This will isolate the disk I/O.

Currently we spread all database/tables across 4 RAID volumes to increase
aggregate IO. Initially most queries were accessing the main large table
but the new project DB is currentlty much smaller so there is scope for
some separation

Thanks
Mike|||Mike Read (mar@.roe.ac.uk) writes:
> On our SQl Server (2000) we can have queries that take at least 10-20mins
> (require full table scan fo billion row table). When one of these queries
> is running it substantailly slows down very quick queries (sub second
> queries take several seconds).
> I believe there is no way to set priorities but was wondering if there
> are other configuration settings that could help. The server is dual
> processor so maybe setting maxdop to 1 will help. Memory size is
> dynmaically assigned up to 4Gb but as the DB size is > 1Tb I'm not sure
> if allowing this much memory is actually decreasing performance when the
> quick query trys to get through.

Setting MAXDOP to 1 will avoid the CPUs being monopolized by huge query,
but a two-way box sounds a bit thin for that amount of data.

However, the main killer here is probably memory. As you scan that 1TB
table, the cached a number of times, and all those small queries must
read from disk. 4GB also sounds a bit thin, then again, 8 or 12 GB is
not going to make that much difference anyway.

A separate server, ot at least a seprate instance for those mean queries
would be a better bet.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Hi Erland

> A separate server, ot at least a seprate instance for those mean queries
> would be a better bet.

Thanks, this sounds like a reasonable way forward.

Mike|||Hello,
It sounds like you are creating a "data warehouse." Basically, there
is transactional data, where you are adding, updating, and deleting
data. For this a more normalized data schema is vastly preferred.

However, someday someone wants to pull that data out, and the users are
NOT very sophisticated about learning inner and outer joins to get
their data out.
Mostly they are looking at querying the data and reporting.

If you have a billion rows, you probably have fairly worthwhile
reporting requirements. How would you like to take your 18 minute
queries, and turn them into sub second response time? How about turning
ALL queries against a billion rows into sub second response? Look into
OLAP.

OLAP requires a different mind set. It does not solve all business
requirements, but it can take the vast majority and make them REALLY
fast.
I've implemented several OLAP solutions for various companies. It takes
a large data set to be worthwhile, and at a billion rows you are
probably approaching critical mass where you can't provide data
solutions to your data customers without it.

For grins, create a cube off your main table. You will have something
along the lines of "names of stuff." in your de normalized table. Make
3 or 4 of these "dimensions." Hopefully these "names of stuff" have no
more then 200-300 variations. Human understood Product names, or
countries or something. Dates are another great Dimension.

You will have numbers. These are frequently dollars, or counts, or
something like that. Make no more then two of these "measures."

It will process once every time you change something, so it takes some
patience to learn how to set it up.
Once you have it set up, it is a LOT faster to query.

Remember how slow it was to do table queries before you had any
indexes? OLAP is a LOT bigger step towards efficiency and speed then
indexes were over raw tables.

drop me a note if you want some help getting it going. I havent' done
one in a while, and i'm getting a little rusty, and would appreciate
being able to help someone get it going.
drmiller 100 at hotmail com

remove spaces!

No comments:

Post a Comment