TSQL Tuesday: Why Are DBA Skills Necessary

November 2, 2010 at 8:00 am (Misc, SQLServerPedia Syndication) (, , , , )

 

Quote: “Database stuff, all this programming stuff, is easy. Anyone can do it. That’s why everyone in the company has sa privileges.”

For nine months, I worked in an environment where everyone, from developers to QA to the sales people to the receptionist, had SA privileges. You know what? DBA skills are necessary.

I speak from the point of view of someone that has had to recover a server after a salse person helpfully “cleaned up the temporary stuff on the server” by dropping tempdb, causing a late deployment for a client. I speak from the point of view of the guy who kept a window open on his desk with the database restore script ready to run, all day long, because of “accidents” that stopped development until I could get the database restored. I speak from the point of view of someone who had to explain, multiple times, “No, you can’t hit the undo button to get that data back” in client systems. And yes, I could go on and on about this nine month gig, but I won’t. You might consider it an extreme case, mainly because, it was.

Instead, I’ll talk about another  job. It was a free-wheeling dot com. We were doing some amazing stuff. We were running  well over 300gb of data into SQL Server 7.0 in a 24/7 environment… until the day one of the managers of the organization said “You database guys are spending too much time working on maintenance of the server. SQL Server manages itself. You will stop all maintenance, immediately, and don’t work on it any more.” It only took two weeks for the server to crash. It took us three days (72 straight hours)  to rebuild the system and recover the data, with no web site available for our thousands of clients during that entire time. Funny enough, a memo came down from management, not thanking us for putting in all the effort, no, but instructing us that we should spend adequate time performaning maintenance routines on the server.

And before you ask, yes, I can keep going. I have tons of examples. How about the application development team (different company) that built the entire database out of multi-statement table valued UDFs that called other UDFs that called other UDFs that called other UDFs… which led to a delayed production roll-out.

What I’m saying is, these are not isolated or extreme examples (OK, the first one was extreme, but only a little). This is real world stuff occurring on regular basis, every day, all over the world that negatively impact the businesses that we support. It’s not that people are stupid. They’re absolutely not. It’s just that databases and database servers, and data for that matter, are still pernicious and difficult. It’s not rocket science, not by any stretch of the imagination. Why do I say that? Because I can do it. In fact, I’ve been quite successful as a DBA and Database Developer. If I can do it, it’s not that hard. But it is a specialized set of knowledge, and one that is quite extensive. There’s just so much to learn about how data is stored, retrieved, and managed within SQL Server, it’s crazy to assume you wouldn’t need a specialist as you move deeper and deeper into managing more and more information. The lack of that knowledge, can, and does, have an impact on the bottom line.

If you’re a business person or a developer (and by developers, I’m not talking about the exceptional Brainiac that can really do it all, I’m talking a normal human being), ask yourself, do you know how to restore a database to a point in time? Just in case you don’t know, this means recover the database from backups and log backups to a point, usually just before a failure or a bad update or some other problem, again, that will cost your company money.  Ask yourself, do you know how to set up backups so you can do that point in time recovery? Do you know how to tell why a query is running really slow and keeping customers from buying your stuff? Do you know how to tell why you’re getting deadlocks and losing transactions? Do you know how to tell if a process is blocked or just taking a long time? I’m clearly belaboring the point, and this is still fairly basic information. I haven’t even started delving into configuring systems for mirroring, or gathering metrics for a performance tuning effort, or any other of hundreds of tasks that need to be performed to build large scale data management systems that keep your business in business. If you don’t know what all this stuff is, or you don’t know why it’s needed, you might just need someone in your organization who can help out.

And yeah, there are tools out there now, like Object Relational Mapping (ORM) tools, that take away vast amounts of the labor that used to go into building databases. But, having worked with teams building tools with an ORM, while they’ve successfully removed the database from their thoughts & designs by using the ORM tool, they’re still storing data on a database system. Guess what? The problems, such as deadlocks, index scans, blocking, still exist. Sometimes, these problems are even exacerbated by the ORM tool. Now what? You need a specialist.

Fine, what if you toss all the mess, get rid of relational storage, go for one of the NOSQL database systems. Yeah, that might work, a bit. But guess what. You’re still storing data. It’s still going to, in most circumstances, outlive the application that built it. It’s still going to need to be reported against, backed up, recovered, protected, performance tuned… In other words, you’re going to run into situations where, despite the fact that you’ve eliminated the things that irritate you, like SQL, like constraints, you’re still dealing with pernicious data and you’re going to need someone who has been working out how to deal with that.

Does every single company with any installation of SQLExpress need to hire a DBA? No, of course not. But as soon as your business starts to rely on having data around, all the time, you just might need the skills of someone who has been working with this stuff for a while.

Permalink 11 Comments

Spools in Execution Plans

September 9, 2009 at 2:16 pm (SQL Server 2005, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , , )

I got the question the other day, when are you likely to see a spool in an execution plan? Easy, whenever SQL Server needs to walk through the data multiple times, usually in JOIN operations… Yeah, well, once again, my flip answers are not quite the entire story.

Spool operations are temporary storage of the data for later reuse in a query plan. There are two types of spool operations, eager spool and lazy spool. A spool is basically a temporary table created within the execution of the query that is used when it’s likely that data will be needed again, and again during the execution of the query. This is not an explicit #temp temporary table, but a work table for operations within the processing necessary for a given query’s behavior. A spool is created when the optimizer thinks that it can work better with a semi-permanent sub-set of data rather than have to perform multiple seeks or scans against a table or index or in other places where data re-use is important (more in a bit).

So how does this work? Take a look at a simple query:

UPDATE Person.Person
SET FirstName = 'Ted'
WHERE FirstName = 'Ted';

When the execution plan for this query is generated, it looks like this:

EagerSpool

In this case, an eager spool is used as part of the roll back mechanism and to prevent the Halloween scenario. An eager spool is one where the data is retrieved immediately.

It’s possible to see the other type of spool in a query that looks like this (straight out of the Books Online):

WITH DirectReports(ManagerID, EmployeeID, EmployeeLevel) AS 
(
    SELECT ManagerID, EmployeeID, 0 AS EmployeeLevel
    FROM HumanResources.Employee
    WHERE ManagerID IS NULL
    UNION ALL
    SELECT e.ManagerID, e.EmployeeID, EmployeeLevel + 1
    FROM HumanResources.Employee e
        INNER JOIN DirectReports d
        ON e.ManagerID = d.EmployeeID 
)
SELECT ManagerID, EmployeeID, EmployeeLevel 
FROM DirectReports ;

Which would result in this execution plan:

LazySpool

Now you see a table spool that is called a lazy spool. This means that it only loads data as the data is requested. This makes a lot of sense because the lazy spool is operating as the means for gathering the recursive data together. So it’s not going to go and get all the data available, like an eager spool. Instead it’s going to only load the data as needed, lazy.

These two scenarios are much more likely than the typical join to show a table spool. Yes, it can, and does, appear in join operations, but as I said at the beginning, that’s such a flip answer. Much better to try to be complete.

Permalink 4 Comments

Unpacking the View

April 24, 2009 at 9:16 am (SQL Server 2005, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , , , )

A view is simply a query that behaves something like a table. Most people know this. Most people also know that a view is simply a mask on top of what might be a very complex query. It all seems really simple. You call the view inside a simple query, the view runs the underlying complex query. Most people might not know that when a view is called and it gets sent to the optimizer, the optimizer unpacks the view and binds the component parts of the query necessary to create an execution plan that will return the data requested. What I didn’t know until recently was that the optimizer is VERY smart. Not only does it unpack the query of the view, but it will change the query that the view uses. Let’s take an example (using AdventureWorks2008):


CREATE VIEW dbo.vTest
AS
 SELECT soh.AccountNumber
,sod.LineTotal
,(SELECT COUNT(pcc.BusinessEntityId)
FROM Sales.PersonCreditCard AS pcc
WHERE pcc.CreditCardID = soh.CreditCardID
) AS PersonCreditCard
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS sod
ON soh.SalesOrderID = sod.SalesOrderID ;
GO

Whether or not this query is a good one (I’d say it isn’t) and whatever you might think about it, the view definition here is pretty straight forward. If you run a SELECT against this view and take a look at the execution plan, it looks like this:

vtestexecutionplan4

As you can see, the behavior is completely as defined above. The view has been resolved out to it’s component parts in order to build the query and arrive at the data requested. What happens if, instead of selecting all the columns, only a couple are selected? Well, the query that defines the view is not changing, so the optimizer will arrive at the same execution plan as that shown in Figure 1, right? Run this query and get the execution plan:

SELECT
 vt.AccountNumber
,vt.LineTotal
FROM
 dbo.vTest AS vt;

2columexecutionplan

What the heck just happened?

I told you, the optimizer is smart. Compare that execution plan with the one generated from this query:

 SELECT soh.AccountNumber
,sod.LineTotal
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS sod
ON soh.SalesOrderID = sod.SalesOrderID ;

You’ll find that they’re identical. As stated before, views are just a query stored on the side. The optimizer is smart enough to figure out that it only needs to generate a plan as if the query requested was this. But what happens when you start nesting views? Yes, yes, Yeah… I… Ok… But… YES! That’s not a good idea. So? When has that slowed people down for a New York second? I’ve seen views nested three and four layers deep. I’ve seen table valued user defined functions nested six and seven layers deep. The question is, how well does the optimizer handle something like this? This script sets up a new situation:

CREATE VIEW dbo.vSalesInfo
AS
SELECT soh.AccountNumber
,sod.LineTotal
,soh.CreditCardID
,sod.ProductID
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS  sod
ON soh.SalesOrderID = sod.SalesOrderID ;

CREATE VIEW dbo.vTest3
AS
SELECT vsi.AccountNumber
,vsi.LineTotal,
(
SELECT COUNT(pcc.BusinessEntityId)
FROM Sales.PersonCreditCard AS pcc
WHERE pcc.CreditCardID = vsi.CreditCardID) AS PersonCreditCard
,(SELECT COUNT(v.BusinessEntityId)
FROM Purchasing.Vendor v) AS VendorCount
,(SELECT COUNT(a.AddressId)
FROM Person.Address AS a) AS AddressCount
,(SELECT COUNT(be.BusinessEntityID)
FROM Person.BusinessEntity be) AS BusinessEntityCount
FROM dbo.vSalesInfo AS vsi

From the script you can see that I’ve nested one view inside another and complicated the outer a view a bit. What happens when you run the query that would eliminate tables from the view this time?

 SELECT vt.AccountNumber
,vt.LineTotal
FROM dbo.vTest3 AS vt

 3rdtest

See. The optimizer is very smart. Ah, but before you get all comfy and crack a bottle of some frothy beverage, we need to look at the compile times. Compare the STATISTICS TIME output from a query that pulls from the full view, the limited pull from the view and the pull from the tables:

–View 1
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 6 ms.

–Table
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 6 ms.

–View 2
SQL Server parse and compile time:
CPU time = 8 ms, elapsed time = 8 ms.

These are small views and easy queries, so the compile times are not large. But with one consistently 30% larger than the others in terms of execution time and actually requiring CPU cycles when the others require less, you begin to see that, despite the extreme intelligence of the optimizer, there is a cost. TANSTAAFL still applies. Keep all this in mind if you think that nested views are a nifty way to write your code or you’re trying to convince someone that nested views are not a nifty way to write their code.

Permalink 12 Comments