A Lack of Excitement

December 23, 2009 at 9:16 am (PASS, PowerShell, SCOM, SQL Server 2005, SQL Server 2008, SQLServerPedia Syndication, TSQL) (, , , , , , , )

I usually use all the problems, crashes, and issues that I run into at work as grist for my mill, aka, material to blog about. But lately, we haven’t been crashing & burning much <knock wood, turn three times, throw salt over my left shoulder, spit>. But it was suggested that may be I should mention why that is.

The fact of the matter is that I’ve been spending a lot more time working on methods for monitoring our systems so that we avoid more of the stupid stuff, full disks, failed backups, long running agent jobs, etc.. I’ve blogged before about our use of Microsoft’s Operations Manager for monitoring our servers and how we’ve built custom rules and monitors to keep an eye on things. I’ve also mentioned how we use Idera’s Diagnostic Manager as a drill-down mechanism to keep on eye on SQL Server internals that OM just didn’t do as nifty a job on. What I haven’t mentioned is, that after attending the PASS Summit (and if you didn’t go, you sure missed out) and sitting through Buck Woody’s session on monitoring using Policy Based Management, “More Servers Less Control,” I’ve implemented PBM within our organization.

If you have yet to explore Policy Based Management, PBM, get on it. I know that others have blogged about their experiences with it, and there’s a reason that people are excited. Track down Buck Woody’s session over at the PASS site (it’s worth paying for it, trust me).  You too can get this stuff up and running in no time. That’s the good news. Now for the bad news, PBM only works with SQL Server 2008. But, I’ve got some good news about that too. There’s a great utility published over at CodePlex (and isn’t that a great place to browse on occasion) called the Enterprise Policy Management Framework, EPM (you can never have enough acronyms, especially with three letters). The EPM Framework allows you to run policies against 2000 and 2005 servers. Sweet!

All of this is work has been part of a concerted effort within our organization to get really and truly proactive, to prevent errors before they occur. Between the modifications I’ve made to OM and the implementation of PBM through EPM and reports against the OM data through SSRS, SQL just isn’t having as many problems these days (I love talking through acronyms sometimes just to watch peoples toes curl). I’ve been busy as all get out, but it’s on pretty mundane, non-exciting stuff. Hence fewer blog posts.

Advertisements

Permalink 1 Comment

Record of a PSS Call

November 17, 2009 at 7:36 am (SQL Server 2008, SQLServerPedia Syndication) (, , , )

Not everyone has the opportunity to call Microsoft Premier Support. For those who have not yet had this experience, I’ll document my most recent, non-critical, call. Critical calls are a different critter entirely.

We were experiencing a very odd error on one server in our system. When a particular procedure was run with a particularly large set of data, it would produce an error, but only when called from the application. The exact same error with exactly the same data called from SSMS did not produce an error. We went through a very extensive set of tests and were unable to fix the problem on the server. After moving the production system that was experiencing the issue to a different server, we decided to contact PSS.

8:48 Am, Tuesday: I made the initial call and went through the process of validating my identity, our company, etc. I had to explain everything that we had done, what versions of the software involved everything was, etc. This took 10 minutes. Since I had called early in the day and this was non-critical call, although labeled as important since we are talking about a production system, I had to wait for the MS guys to come in.

9:47 AM, Tuesday: I received an email from PSS asking me for the SQL Server logs. I zipped up the most recent log with the error and sent it in to the guy. This took me about five minutes to get everything together and zip it up for him.

10:53AM, Tuesday: I received my first phone call on the process. I re-explained the issue to the support person. they suggested that we set up a server-side trace and get the query to error out when run from the app, and to run successfully from Management Studio and see what differences there might be, if any, in the basic trace. It took me about 30 minutes to set up the server-side trace, test it, generate the errors and run the query successfully and then zip it all up with sample code and sample data to ship off to MS.

Does it feel like this is going to take a long time? You’re right. It continues.

12:10PM, Tuesday: I get an email back with some initial thoughts. Unfortunately, it looks like we don’t have complete information. I respond in the email, providing more.

12:26PM, Tuesday: Another email back from MS confused about the trace. They had me use the default trace, which captures RPC:Completed and SQL:BatchStarting and SQL:BatchCompleted. They claimed that the app wasn’t run and must have errored before calling the database. But that’s because the app errored which means no RPC:Completed. We didn’t capture RPC:Starting. I pointed this out in a response. Nothing back yet.

1:32PM, Tuesday: I get my response back, am I sending a new trace? Nuts. I knew when I realized what they were looking for that I should have restarted the trace. There’s an hour shot.

2:35PM, Tuesday: I’ve finished rebuilding our trace, retesting and sending everything off for evaluation, again. It took longer for me this time because I was waiting on a developer who was away from his desk. PSS calls require everyone to be available, all the time. When we do critical calls we just jam everyone into the same room. It makes it easier.

6:39PM, Tuesday: The data wasn’t sufficient. It doesn’t show when the error occurs. PSS sends me a new trace template to try out. Unfortunately, I’m not at work any more and what’s more, the developers who’s support I need are long gone. I can’t do anything with this until morning.

DAY 2

8:08AM, Wednesday:  I’m setting up the trace to see what they’re having me gather while I wait for the developers to show up. Odin’s all seeing eye! They’re capturing quite a few events here. Just about every single event in “Errors and Warnings.” OLEDB Errors. Auto Stats, Degree of Parallelism, Showplan All and Showplan XML Statistics Profile from “Performance.” Server Memory Change. PreConnect Completed and Preconnect:Starting from Sessions (I’ve never even seen these before). RPC:Completed, RPC:Starting, SP:Completed, SP:Recompile, SP:Starting, SP:StmtCompleted (I knew that one was coming),SP:StmtStarting from “Stored Procedures.” Finally SQL:BatchCompleted,  SQL:BatchStarting, SQL:StmtCompleted, SQL:StmtRecompile, SQLStmtStarting from “TSQL.” A full load. This should be fun.

I’m converting it to a script and adding a filter to only capture data from the database I’m having troubles with. I’m also converting to a script for another reason, which I’ll post a little blog post about seperate from this one.

11:11AM, Wednesday: The joys of developing software. The developer I was working with yesterday wasn’t in, so I got a different developer to help out. He didn’t know how to run the code the way the other guy ran it. So it was run in a different way. I captured everything and shipped it off to Microsoft. While waiting for their response, I read through the data gathered by the trace. No error. In fact, everything ran successfully. We did a bunch of tests and found that as long as we were running the big trace, we didn’t get the error. It was almost like the error was caused by having too many resources.

12:00PM, Wedensday: The PSS person comes back and says that the data contained no error. Yep. They don’t have a single suggestion. All the indications are, the faster, more powerful server, is causing the problem.

DAY 3

I get a response back from PSS. They’re asking if I’m suggesting that I’m supposed to run the trace all the time. This is the point where I decide to bail on PSS. I’m leaving the case in a non-closed state, but I’ve been talking to some other resources and have a troubleshooting scheme from those resources that we’re going to try out.

More when I know more.

Permalink 1 Comment

Microsoft SQL Server Premier Field Engineers

June 8, 2009 at 8:12 am (SQL Server 2005, SQL Server 2008, TSQL) (, , )

Joe Sack has started a new team blog for the Microsoft SQL Server Premier Field Engineers. If you don’t know who they are, you should. The first post is just introductory, but this blog is likely to become a great resource. These are the guys that MS zip lines into tough situations with the expectations that they’ll improve them. I’d strongly suspect these are fellows worth listening to.

Permalink Leave a Comment

Help Make Connect Work Better

May 4, 2009 at 9:31 am (Misc) (, )

I’ve posted several times about how useful I find Microsoft Connect. I’ve seen issues introduced there get resolved through hot fixes and service packs. I sincerely believe it’s a very important tool in your arsenal to get the kind of service you need from Microsoft. Here’s a post on Connect that’s trying to make Connect itself easier to use. I’d strongly recommend you swing by and vote positively for this.

And for those who are unclear, click on the stars to submit your vote.

Permalink 6 Comments

Microsoft Links to Help Your Career

March 25, 2009 at 7:13 am (Misc) (, , )

I received this list of links from my Microsoft rep. It was really an impressive list. So I asked if I could post it to the blog. Figures, it was already out there. Buck Woody had compiled it. It’s worth a look through. There really is a lot of information that focuses on you and your career available from Microsoft. Who knew that a big company like that could be so helpful. Also, how great was it that Buck Woody pulled it all together for convenience.

Permalink Leave a Comment

Database.sqlpermissions

March 19, 2009 at 2:29 pm (Visual Studio) (, , , , )

Raise your hand if you think this is a real pain in the bottom method for editing user permissions? Yeah, me too. Visual Studio Team System Database Edition is far to fine a tool to make us edit XML to set database user permissions. A co-worker has posted a change request on MS Connect. Connect works really well as long as people vote for what you report. I’ve seen several things change in SQL Server or get fixed primarily because of the reports in Connect. So if doing this:

<PermissionStatement Action=”GRANT”>
     <Permission>EXECUTE</Persmission>
    <Grantee>UserRole</Grantee>
     <Object Name=”dbo” Type=”SCHEMA”/>
</PermissionStatement>

Makes you crazy and you would rather type this:

GRANT EXECUTE ON SCHEMA :: dbo TO UserRole ;

Then click on the link and get the word in front out to Microsoft.

Permalink Leave a Comment

Microsoft Concept Center

January 14, 2009 at 10:47 am (Misc) (, )

During my recent visit to the Microsoft Technology Center in Waltham, Rich Crane gave me a tour of the facility. It included a room, I think he called the Concept Center. It was a little theatre type of arrangement around a series of work areas or work styles. Microsoft uses the room for demo’s that go WAY beyond some silly PowerPoint slide show. Here are a few pictures I took while I was there.

Permalink 8 Comments

SQL Server 2008 Upgrade Lab at Microsoft

January 14, 2009 at 9:11 am (SQL Server 2008, Tools) (, , , , , )

I got an invite to take part in a lab at the Microsoft Technology Center in Waltham. I took advantage of it. I’ll use this post to describe the experience so that anyone else with the opportunity will know what to expect. 
 They knew I was coming

They knew I was coming

First, you recieve a very explicit set of pre-requisites. You need to install the SQL Server Upgrade Assistant, a tool that Microsoft licensed Scalability Experts to create for them. You have to run this against a small database, >25gb. The tool backups up all the databases from the server (so you need to put it on to a test box, rather than try to move an entire production system worth of databases). It then starts a trace that captures all the calls made to the database. I spent two days working with one of my application teams to get a server set up, the app connected, and a good set of tests run on the server to capture about an hour’s worth of trace data. It was at no point hard to meet the requirements, it just took time to get everything set up just right. They recommend you single thread the trace, meaning, just have one user run all the tests. This is because, when run without any extra work, Profiler, which replays traces, is single threaded. This can lead to unrealistic errors, especially blocking and deadlocks.

Once I had everything, I went to Waltham (a two-hour commute… the less said the better) to run the tests. The lab set up was impressive. They had a seperate room for each of the four companies that sent someone to the testing facility. We had a solid workstation (running Windows 7 by the way, fun stuff) and a set of servers on a seperate lan inside each room. The servers were running on HyperV, Microsoft’s virtual server software. Unfortunately, we did run into a snag here. Each server was supposed to have 100gb of space to accomodate the copy of the database as well as a restore of it and some more room besides. The virtual machines were configured to run on a system that only had 140gb of storage to start with. I filled that with my database during the initial set up (I ran the processes on three servers simultaneously). That put us out of commission while the MS techs went about fixing everything (which they did, quickly). It was just the pain of being first.

MS provided nice work stations

MS provided nice work stations

The documentation on the labs was very complete. Few steps were left to the imagination. Any where that had ambiguity, a second set of documentation cleared up nicely. With one exception. They did want to you to restore the System db’s. It made sense to do it, but I checked both sets of documentation and it wasn’t there, so I thought, hey, what I do know, MS is on top of this… Wrong. Had to restart, again.

Once all the initial configuration issues were done, it was simply a matter of walking through the lab. The first step was to establish a baseline, so I played back the trace on a 2000 server. Then I did an in place upgrade to 2008 and ran the trace and an upgrade to a new install using a restore and ran the trace there.  All the results could then be compared.

Over all, it was a good session. Rich Crane, Rob Walters and Sumi Dua from Microsoft were very helpful. I picked up a few tips on doing upgrade testing and got to do it away from managers and developers, making quite a few mistakes along the way. Now maybe I can do it in front of them with fewer mistakes. I liked the Upgrade Assistant tool since I’m pretty lazy, but it didn’t do anything earth shattering that you couldn’t do on your own.

One tip worth repeating, if you’re using the Upgrade Assistant to capture a trace, it doesn’t put filtering in place. You can open the trace file, filter out the databases, by ID, that you don’t need, and then save a new copy of the trace file, just for the database you’re interested in. Thanks for that one Rich.

Gentlemen, you did a nice job. I appreciate your time and your help. Sumi, nice to meet you. Rob, good to see you again. Rich, thanks again for everything, great chatting and good to see you again as well. Rich and I used to work at a dot com “back in the day.”

Permalink 1 Comment

SNESSUG Speaker

June 11, 2008 at 7:20 am (PASS, SNESSUG) (, , )

Tonight we have Rob Walters of Microsoft speaking on Visual Studio 2008 Reporting Services. This should be a good one. Rob also recently published a book, Accelerated SQL Server 2008. I’m going to a party at the MS building in Waltham next week to celebrate the release of Rob’s book. Hopefully that means I can finagle a copy too.

Permalink Leave a Comment