Best of SQL Server Central Vol 2
Best of SQL Server Central Vol 2
Best of SQL Server Central Vol 2
2
In April 2001 six geeks banded together to form a more perfect site. Three years
and 140,000+ members later, SQLServerCentral.com is one of the premier SQL
Server communities in the world. We’ve got over 1,000 articles, 100s of scripts
and FAQs, everything you would need as a SQL Server DBA or developer, and all
at a great price — free.
The Best of
This book contains the best material published on the site from 2003. It’s a
variety of topics from administration to advanced querying. XML to DTS, security
SQLServerCentral.com
to performance tuning. And of course, the famous White Board, Flip Chart, or
Notepad debate. Vol. 2
So why print a book containing material you can get for free? Take a minute,
read the introduction and find out! Essays and Ideas from the SQL Server Community
Andy Jones, Andy Warren, Bob Musser, Brian Kelley, Brian Knight, Bruce Szabo, Chad Miller,
Chris Cubley, Chris Kempster, Christopher Duncan, Christoffer Hedgate, Dale Elizabeth Corey,
Darwin Hatheway, David Poole, David Sumlin, Dinesh Asanka, Dinesh Priyankara, Don Peterson,
Book printing partially sponsored by Frank Kalis, Gheorghe Ciubuc, Greg Robidoux, Gregory Larsen, Haidong Ji, Herve Roggero,
James Travis, Jeremy Kadlec, Jon Reade, Jon Winer, Joseph Gama, Joseph Sack, Kevin Feit,
M Ivica, Mike Pearson, Nagabhushanam Ponnapalli, Narayana Raghavendra, Rahul Sharma,
Ramesh Gummadi, Randy Dyess, Robert Marda, Robin Back, Ryan Randall, Sean Burke,
Sharad Nandwani, Stefan Popovski, Steve Jones, Tom Osoba, Viktor Gorodnichenko
2030
40
5060
0 10
90
7080
2030
40
5060
The Best of SQLServerCentral.com - Vol. 2
Andy Jones
Andy Warren
Bob Musser
Brian Kelley
Brian Knight
Bruce Szabo
Chad Miller
Chris Cubley
Chris Kempster
Christoffer Hedgate
Christopher Duncan
Dale Elizabeth Corey
Darwin Hatheway
David Poole
David Sumlin
Dinesh Asanka
Dinesh Priyankara
Don Peterson
Frank Kalis
Gheorghe Ciubuc
Greg Robidoux
Gregory Larsen
Haidong Ji
Herve Roggero
James Travis
Jeremy Kadlec
Jon Reade
Jon Winer
Joseph Gama
Joseph Sack
Kevin Feit
M Ivica
Mike Pearson
Nagabhushanam Ponnapalli
Narayana Raghavendra
Rahul Sharma
Ramesh Gummadi
Randy Dyess
Robert Marda
Robin Back
Ryan Randall
Sean Burke
Sharad Nandwani
Stefan Popovski
Steve Jones
Tom Osoba
Viktor Gorodnichenko
3
The Central Publishing Group
3186 Michaels Ct
Green Cove Springs, FL 32043
U.S.A
Copyright Notice
Copyright 2004 by The Central Publishing Group. All rights reserved. Except as permitted
under the Copyright Act of 1976, no part of this publication may be reproduced under the
Copyright Act of 1976. No part of this publication may be reproduced in any form or by any
means or by a database retrieval system without the prior written consent of The Central
Publishing Group. The publication is intended for the audience of the purchaser of the book.
This publication cannot be reproduced for the use of any other person other than the
purchaser. Authors of the material contained in this book retain copyright to their respective
works.
Disclaimer
The Central Publishing Group, SQLServerCentral.com, and the authors of the articles
contained in this book are not liable for any problems resulting from the use of techniques,
source code, or compiled executables referenced in this book. Users should review all
procedures carefully, test first on a non-production server, and always have good backup
before using on a production server.
Trademarks
Microsoft, SQL Server, Windows, and Visual Basic are registered trademarks of Microsoft
Corporation, Inc. Oracle is a trademark of Oracle Corporation.
Editors
Steve Jones and Andy Warren
Cover Art
Sylvia Peretz of PeretzDesign.com
4
The Best of SQLServerCentral.com – Vol. 2
Table of Contents
Introduction 8
Administration 15
Auto Close and Auto Shrink - Just Don't Mike Pearson 16
Tips for Full-Text Indexing/Catalog Population/Querying in SQL 7.0 and 2000 Jon Winer 25
Getting Rid of Excessive Files and Filegroups in SQL Server Chad Miller 27
SQL Server Upgrade Recommendations and Best Practices - Part 1 Jeremy Kadlec 41
DTS 51
Auditing DTS Packages Haidong Ji 52
Comparison of Business Intelligence Strategies between SQL and Oracle Dinesh Priyankara 58
Security 67
5
Block the DBA? Robert Marda 68
Performance 85
Cluster That Index Christoffer Hedgate 86
Squeezing Wasted Full Scans out of SQL Server Agent Bob Musser 97
T-SQL 100
A Lookup Strategy Defined David Sumlin 101
Create Maintenance Job with a Click without using a Wizard Robin Back 104
Understanding the Difference Between IS NULL and =NULL James Travis 139
6
Replication 147
Altering Replicated Tables (SQL 2000) Andy Warren 148
XML 153
Is XML the Answer? Don Peterson 154
Miscellaneous 164
A Brief History of SQL Frank Kalis 165
7
INTRODUCTION
Welcome to The Best of SQL Server Central.com – Vol. 2!
Once again SQL Server Central.com had another fantastic year and we decided to reprint
some of the best articles, the most popular, and most read in dead tree format. We wanted to
give all our authors a chance to see their names in print as well as give you an offline resource
that you can take with you wherever you may need it. Most likely at your bedside to help you
drop off at night , but for commutes, holding your coffee cup whatever.
And Red-Gate Software has once again sponsored the book and worked with us to bring you
this great reference.
We would also like to thank everyone for their support both on the website as well as by
purchasing this book. Your visits to the site, clicking through to advertisers, purchasing
products, registering for PASS, all help us continue this community and provide you with a
valuable resource that hopefully helps you learn, perform better at your job, and grow your
career. We’d like to encourage all of you to submit an article in 2005! This is a community and
we aren’t looking for the guru’s only to contribute. We love hearing about the real world you all
live in and deal with on a daily basis. We plan to get at least one article from each author and
send you a couple copies of the book. Great for your bookshelf and they make a great
Mother’s Day present.
Once again, thanks so much for your support and we look forward to 2005.
Andy Warren
Brian Knight
Steve Jones
8
About The Authors
Andy Jones
I am currently employed at a large UK software house and am working as a SQL Server 2000 DBA within a development
environment. After previously working with Visual Basic and Oracle for three years, I chose to move over to solely
concentrate on database development. I have been in my current position working with SQL Server for the past two years, my
role encompasses daily administrative tasks like managing backups and users through to my main job of database design and
development. I also have extensive experience of reporting against RDBMS using such tools as Crystal Reports and Visual
Basic Reports.
Initial Installation of the Production Database – pg. 33
Andy Warren
Altering Replicated Tables – pg. 164
White Board, Flip Chart, or Notepad? – pg. 213
Bob Musser
Bob Musser is the President of Database Services, Inc., an Orlando based vertical market software provider. His company, in
business since 1988, provides software and support primarily for process servers and couriers. They also run an exchange
system for process servers to trade work with built on SQL Server.
Squeezing Wasted Full Scans out of SQL Server Agent – pg. 107
Brian Kelley
Brian is currently an Enterprise Systems Architect with AgFirst Farm Credit Bank (http://www.agfirst.com) in Columbia, SC.
Prior to that he served as a senior DBA and web developer with AgFirst. His primary responsibilities include overhauling the
current Windows NT infrastructure to provide for a highly available, network-optimized framework that is Active Directory
ready. Brian assumed his Architect role in December 2001. He has been at AgFirst since January of 2000 when he originally
came on-board as an Intranet web developer and database programmer.
In addition to his role at AgFirst Farm Credit Bank, Brian heads Warp Drive Design Ministries
(http://www.warpdrivedesign.org), a Christian ministry devoted to using technology for encouraging Christians in their faith
as well as presenting the Gospel in a non-confrontational manner. The main focus is an email devotional ministry currently
being penned by Michael Bishop (mbishop@clemson.edu), a computer engineering student at Clemson University
(http://www.clemson.edu). Prior to AgFirst, Brian worked as a system administrator and web developer for BellSouth's Yellow
Pages group and served on active duty as an officer with the US Air Force. He has been a columnist at SQL Server Central
since July 2001.
Brian is the author of the eBook, Start to Finish Guide to SQL Server Performance Monitoring, on sale here at SQL Server
Central: http://www.netimpress.com/shop/product.asp?ProductID=1
SQL Server Security: Login Weaknesses – pg. 71
SQL ServerSecurity: Why Security is Important – pg 81
Brian Knight
Brian Knight, MCSE, MCDBA, is on the Board of Directors for the Professional Association for SQL Server (PASS) and
runs the local SQL Server users group in Jacksonville. Brian is a contributing columnist for SQL Magazine and also
maintains a weekly column for the database website SQLServerCentral.com. He is the author of Admin911: SQL Server
(Osborne/McGraw-Hill Publishing) and co-author of Professional SQL Server DTS (Wrox Press). Brian is a Senior SQL
Server Database Consultant at Alltel in Jacksonville and spends most of his time deep in DTS and SQL Server.
Gathering Random Data pg. 131
Bruce Szabo
VBScript Class to Return Backup Information – pg. 212
Cade Bryant
9
Chad Miller
Getting Rid of Excessive Files and Filegroups in SQL Server – pg. 27
Chris Cubley
Chris Cubley is an MCSD with over four years of experience designing and implementing SQL Server-based solutions in the
education, healthcare, and telecommunications industries. He can be reached at ccubley@queryplan.com.
Using Exotic Joins in SQL Part 1 – pg. 155
Using Exotic Joins in SQL Part 2 – pg. 158
Chris Kempster
Chris has been working in the computer industry for around 8 years as an Application Development DBA. He began his
career as an Oracle AP then moved into a DBA role using Oracle. From there he has specialised in DBA technical consulting
with a focus to both Oracle and SQL Server. Chris has been actively involved with SQL Server since 1999 for a variety of
clients and is currently working with a range of large scale internet based development projects. Visit
www.chriskempster.com for further information and his ebook titled "SQL Server 2k for the Oracle DBA".
Change Management – pg. 186
Christoffer Hedgate
I work in Lund, Sweden, at a company called Apptus Technologies. Apptus are specialized in database research and
consulting, including development of search engines. Most of the time I work with SQL Server as an architect, administrator
and developer, but I have also done some work with other DBMS such as Oracle and TimesTen. I also do some
programming, mainly in Visual Basic, C# and Java (plus a couple of scripting languages). I am also the co-owner of sql.nu
(http://www.sql.nu/) where you can find more articles from me.
Clustre That Index – pg. 93
Clustre That Index – Part 2 – pg. 96
Christopher Duncan
Christopher Duncan is an author, musician, veteran programmer and corporate troublemaker, and is also President of Show
Programming of Atlanta, Inc. Irreverent, passionate, unconventional and sometimes controversial, his focus has always been
less on the academic and more on simply delivering the goods, breaking any rules that happen to be inconvenient at the
moment.
Pro Developer: This is Business – pg. 205
Darwin Hatheway
Two Best Practices! – pg. 210
David Poole
David Poole has been developing business applications since the days of the Commodore Pet. Those were the days when 8K
was called RAM not KEYBOARD BUFFER. He specialised in databases at an early stage of his career. He started
developing marketing applications using Ashton-Tate dBase II/Clipper, progressing through a variety of PC database
applications before working on HP Image/3000 systems. He has spent 5 years as DBA within the Manchester (UK) office of
the worlds No 1 advertising agency, where he focussed on data warehousing and systems integration. At present he is
working within the web development department of a company specialising in information delivery.
Lessons from my first project as a Project Manager – pg. 202
10
David Sumlin
David Sumlin has run his own contracting business for the last 6 years and has been involved in SQL development for the last
8 years. He and his team of skilled developers have tried to stay up to date with Microsoft development technologies when it
comes to SQL and .net. For the last few years, HeadWorks Inc. has been focused on data warehouses and marts and the
reporting applications that interact with them for large financial institutions. When David isn't coding, he's out looking for
birdies!
A Lookup Strategy Defined – pg. 113
Dinesh Asanka
I started my carrier in 1993 as an implementation officer in Jagath Robotics Pvt Ltd. After around three months I started his
programming life in Cobol, Dbase 3+ , Fox base. I was involved in developing accounting / plantation / Audit / Hr systems.
Then graduated from the University of Moratuwa , Sri Lanka as an electrical Engineer in 2001. Currently I am working as a
Software engineer for Eutech Cybertics Pte. Ltd. I am involving in software developments and database designing. These
days I am following a course of MBA in IT. In the field of databases I have experience in Dbase 3+ , Fox Base , Clipper,
MSAccess , Oracle and SQL SERVER. My SQLServer carrier started in 2000. Still I’m learning and there is lot more to learn
in SQL Server. I’m a cricket lover and I would like to continue knowledge sharing in SQL SERVER.
Date Time Values and Time Zones – pg. 126
Find Min/Max Values in a Ser – pg. 128
Dinesh Priyankara
Comparison of Business Intelligence Strategies between SQL and Oracle – pg. 142
Reusing Identities – pg. 57
Don Peterson
Is XML the Answer? – pg. 171
Frank Kalis
Codd’s Rules – pg. 176
A Brief History of SQL – pg. 184
Gheorghe Ciubuc
Importing and Analyzing Event Logs – pg
Greg Robidoux
Who Needs Change Management? – pg. 42
Gregory Larsen
Currently a SQL Server DBA. I've been working with SQL Server since 1999. I'm an old-time mainframe DBA. My DBA
career started in 1985. Currently studying to obtaining MCDBA.
Sequential Numbering – pg. 146
Haidong Ji
I was a developer, working with VB, SQL Server, Access and lots of other Microsoft stuff. I am currently a SQL Server DBA
in my company in the Chicago area. I am MCSD and MCDBA certified. In my spare time, if I have any, I like to do Linux, C
and other open source project. I can be reached at Happy_Haidong@yahoo.com
Auditing DTS Packages – pg. 48
Automate DTS Logging – pg. 50
Herve Roggero
Herve Roggero (MCSD, MCDBA) is an Integration Architect with Unisys and works on Windows Datacenter and SQL
Server running on 32 processors (and it rocks!). Herve has experience in several industries including retail, real estate and
insurance. First contact with RDBMS was in the early 90's. Herve is a member of the Chicago SQL Server user group.
Hobbies: Piano and Star Gazing.
Managing Max Degree of Parallelism – pg. 98
11
James Travis
I currently work for a major US Bank in their internal support service area. I develop tools and applications used by the call
center, division, and corporate employees for various tasks. I work with the following development platforms: ASP Visual
Basic ActiveX COM/DCOM .NET Crystal Reports VBScript JavaScript XML/XSL HTML DHTML VBA with Office
97/2000/XP SQL Server Oracle Informix Visual C++ Photoshop I am admin over several severs. Platforms include: SQL 7 &
2000 IIS 4 & 5 Windows NT 4 & 2000 Currently I have developed or am developing applications for: Project and employee
time tracking Financial Center/ATM information Web based password resets for various systems Call center scheduling and
adherence Inventory
Understanding the Difference Between IS NULL and = NULL – pg. 152
Jeremy Kadlec
Jeremy Kadlec is the Principal Database Engineer at Edgewood Solutions, (www.edgewoodsolutions.com) a technology services
company delivering full spectrum Microsoft SQL Server Services in North America. Jeremy can be reached at 410.591.4683
or jeremyk@edgewoodsolutions.com.
SQL Server Upgrade Recommendations and Best Practices – Part 1 – pg. 37
Jon Reade
Based in the UK at present, SQL Server DBA since early 1996 (SQL Server 6.0, so a relative novice!), prior to that a
database developer for ten years on various Microsoft based systems. Started out with 6502 assembler on the Acorn Atom
back in '78, I now work as a SQL Server database administrator / troubleshooter for various financial organisations, mainly
speeding up very, very slow databases, which for some inexplicable reason I get enormous gratification from :)
Jon Winer
Joseph Gama
José (Joseph) Gama is a software engineer currently working with .NET and SQL Server 2000. He has contributed over
10,000 lines of source code to the public domain.
"Open source is the future, a prosperous future based on trust, Truthfulness and cooperation. Freedom is not only
the right to choose but also the right to have unconditional access to tools and information.
Learning, creating, communicating and leisure are rights bond to every human being.
TSQL Virus of Bomb? – pg. 87
Joseph Sack
Joseph Sack is a SQL Server consultant based in Minneapolis, Minnesota. Since 1997, he has been developing and
supporting SQL Server environments for clients in financial services, IT consulting, and manufacturing. He is a Microsoft
Certified Database Administrator (MCDBA). Joseph has written for SQL Server Magazine, and recently wrote a book called
“SQL Server 2000 Fast Answers for DBAs and Developers”. For questions or comments, feel to contact him at
www.joesack.com.
AWE Adventures – pg 18
Troubleshooting SQL Server with the Sysperfinfo Table – pg. 109
Kevin Feit
I have been involved with software development for the past 13 years. I started working with SQL Server in 1990, when
Microsoft and Sybase were development partners. One major project I have worked was development of custom middleware
for communications between OS/2 and mainframe using SQL Server as the data store for the directory service. (This was in
1991-2, before the term middleware was even coined.) More recently, I was the project manager and database architect for
12
the development of the Intercommercial Markets web site (www.intercommercial.com), an online exchange for the green
coffee industry using a SQL Server 7 database. I am currently working for a large financial services company. Recent
activities have included upgrading my division's servers from SQL Server 7 to 2000, and developing several complex DTS
packages required for an accounting reconciliation process.
Portable DTS Packages – pg. 60
M Ivica
Creating a PDF from a Stored Procedure – pg. 123
Mike Pearson
Auto Close and Auto Shrink - Just Don't – pg. 13
Nagabhushanam Ponnapalli
Using Built in Functions in User Defined Functions – pg.. 151
Narayana Raghavendra
Multiple Table Insert – pg. 140
Rahul Sharma
Rahul Sharma is a senior database administrator for Manhattan Associates, Inc., and a columnist for databasejournal.com,
dbazine.com and SQLServerCentral.com. He has a bachelors and a masters degree in engineering and has been working with
Microsoft SQL Server since the release of SQL Server 6.5 and is currently working with both SQL Server 2000 and Oracle
9i. He is a Microsoft Certified Professional with more than six years of experience in database development and
administration. He is published (Publishers: Addison Wesley) and his book's title is: Microsoft SQL Server 2000: A Guide to
Enhancements and New Features (ISBN: 0201752832).
Scheduling SQL Server Traces – Part 2 – pg. 35
Ramesh Gummadi
Design Using an Entity-Relationship Diagram – pg. 178
Randy Dyess
I have been working with SQL Server for over 5 years as both a development and production DBA. Before SQL Server I
spent time as both a Visual Basic developer and Microsoft Access developer. Numerous projects upsizing Access to SQL
Server lead me to become a full-time SQL Server DBA. Currently I have the privilege of working on one of the world's
largest SQL Server "read-world" production installations at Verizon Communications for Verizon's billing project. We have
11 main databases totaling over 9 Terabytes of data with the largest single database over 2.2 Terabytes. My current position
is as a development DBA, developing new Transact-SQL code and enhancing existing code. Before working at Verizon, I
worked at one of the largest advertising firms in America: Rapp Collins. There I supported up to 60 SQL Server web
databases at a time, with some Oracle thrown in, doubling as both a development DBA and production DBA. Clients before
Rapp Collins include: Auto One (a leading auto loan lender), Realpage, Inc. (leader in multi-housing management software)
and BlueCross BlueShield of Texas (a large insurance company). You can find out more about me and my works by visiting
my website.
Managing Jobs Using T-SQL – pg. 138
Robert Marda
I have worked for bigdough.com since 18 May 2000 as an SQL Programmer. My duties include backup management for all
our SQL Servers, mentoring junior SQL Programmers, and serving as DBA while our DBA is on vacation. I develop, test,
and deploy stored procedures and DTS packages as well as manage most major SQL projects. Our offices are located in
Bethesda, Maryland.
I have been married to Leoncia Guzman since 23 Jul 1994. We met in the Dominican Republic where I lived for about 2
years as a missionary. We have 4 children, Willem (age 8), Adonis (age 6), Liem (age 4 and a half), and Sharleen (age 3 and
a half).
13
My hobbies include spending time with our 4 children (we play chess, dominos, mancala, and video or computer games
together), keeping tropical freshwater fish, breeding and training parakeets, coin collecting (US and foreign), and geneology.
I have a 55 gallon tank and 20 gallon tank. I have many kinds of fish (such as a pleco, tiger barbs, mollies, cichlids, tetras,
and guppies) I also have a small aquatic turtle.
Block the DBA – pg. 68
Robin Back
Create Maintenance Job with a Click without using a Wizard – pg. 117
Ryan Randall
Aged 28, from London. Just left my first job after Uni after a 6 year stint, most recently managing the development team for a
group of financial companies. Currently having a much needed rest teaching myself some new stuff before getting back into
it.
Creating a Script from a Stored Procedure – pg 125
Sean Burke
Sean has been working with databases for over 15 years, and has developed many custom solutions for dozens of high profile
companies. While he thoroughly enjoys working with the latest and greatest SQL Server version, he still has a strange affinity
for early DOS based database products like R:Base. As an intermediate VB programmer, he is an staunch advocate for
understanding the fundamentals of the database platform upon which your application relies for its data. He is currently the
CIO for Hancock Information Group in Longwood, FL, and is passively pursuing MCDBA certification.
Introduction to English Query and Speech Recognition – pg 198
Sharad Nandwani
Best Practices in an Adhoc Environment – pg 20
Stefan Popovski
Replacing BCP with SQLBulkLoad – pg 64
Steve Jones
My background is I have been working with computers since I was about 12. My first "career" job in this industry was with
network administration where I became the local DBA by default. I have also spent lots of time administering Netware and
NT networks, developing software, managing smaller IT groups, making lots of coffee, ordering pizza for late nights, etc.,
etc. For those of you interested (or really bored), you can check out my resume.
Tom Osoba
Building Business Intelligence Data Warehouses – pg. 54
Viktor Gorodnichenko
Monitoring Performance – pg. 102
14
ADMINISTRATION
This is what we do, administer servers and databases. Everyone has their own set of tricks, tips, scripts
tailored to the quirks of their own systems. We can each get sidetracked into dealing with our own
systems and miss out on understanding some other sections of SQL Server that we don’t work with.
Here’s a selection of articles to impart a little more information about the server, Autoclose, AWE,
Traces and more. In 2003, Microsoft has a very mature product that is advancing as hardware grows,
loads increase, and different stresses occur. Nothing earthshattering here, just some good information
that might help you save the day.
Tips for Full-Text Indexing/Catalog Population/Querying in SQL 7.0 and 2000 Jon Winer 24
Getting Rid of Excessive Files and Filegroups in SQL Server Chad Miller 27
SQL Server Upgrade Recommendations and Best Practices - Part 1 Jeremy Kadlec 37
15
Auto Close and Auto Shrink - Just Don't
Mike Pearson
5/5/2003
I was on-site with a client, who had a server which was performing very sluggishly. It was a beefy brute with
heaps of memory and processing power, so clearly something was just not what it should have been. For me step
1 in doing any sort of trouble-shooting is to look at the logs. Yup, always a good place to start because the
problem was ticking over at 5-8 entries per second…
So, what’s the problem then? Well, before I answer that, you can find these properties either by looking at the
‘Options’ tab of your database properties, or by running
SELECT DATABASEPROPERTYEX('DatabaseName','IsAutoShrink')
GO
SELECT DATABASEPROPERTYEX('DatabaseName','IsAutoClose')
GO
If the option is ‘True’ (the T-SQL statement will return 0=false or 1=True), then there’s a performance hit just
looking for a place to happen.
Auto_Close
When Auto_Close is set to ON/TRUE, the database is closed and shut down when all processes in the database
complete and the last user exits the database, thereby freeing up the resources held by that database. When a
new connection calls the database again, it automatically reopens. This option is set to ON/TRUE when using the
SQL Server Desktop Edition, but is set to OFF/FALSE for all other editions of SQL Server.
The problem is that most servers sit behind applications that are repeatedly opening and closing connections to
your databases, so the overhead of closing and reopening the databases between each connection is, well,
“performance abuse”. The amount of memory that is saved by this is insignificant, and certainly does not make up
for cost of repeatedly initializing the database.
Admittedly, this option may have advantages on personal desktop scenarios as (when they are closed) you can
treat these database files as any other files. You can move them and copy them, or even e-mail them to other
users. However, when it comes to a proper server environment these points are fairly irrelevant.
So as far as Auto_Close is concerned, don’t even be tempted. Just Don’t.
Auto_Shrink
The auto_shrink option has its uses in scenarios such as development servers and the like where disk space
resources are usually limited and hotly contested, but (there’s always a ‘but’) there is a performance cost.
Shrinking a database hogs the CPU and takes a long time. Plus, any indexes on the heaps (a table without a
clustered index) affected by the shrink must be adjusted because the row locators will have changed. More work
for the CPU. Like Auto_Close, this option is set to ON/TRUE for all databases when using SQL Server Desktop
Edition, and OFF for all other editions, regardless of operating system.
When this option is set to ON/TRUE, all of a database's files are marked for shrinking, and will be automatically
shrunk by SQL Server. This option causes files to be shrunk automatically when more than 25 percent of the file
contains unused space. Not a wise option for your production systems which would suddenly suffer a
performance hit when SQL decides it’s shrink-time. So, again – just don’t.
Here is a quick script which will run on SQL2000 giving you a list of your databases with the status of these
options.
SET NOCOUNT ON
SELECT [name] AS DatabaseName , CONVERT(varchar(100),DATABASEPROPERTYEX([Name] ,
'Recovery')) AS RecoveryType
16
, CONVERT(varchar(10),DATABASEPROPERTYEX([Name] , 'IsAutoClose')) AS AutoClose
, CONVERT(varchar(10),DATABASEPROPERTYEX([Name] , 'IsAutoShrink')) AS AutoShrink
FROM master.dbo.sysdatabases
Order By DatabaseName
Introduction
Let's start with an easy one? What is Autoclose?
Autoclose is one of the options that you can set for a database, along with autoshrink, auto create statistics, auto
update statistics, etc. This option basically "closes" the database file whenever the last user disconnects from the
database. The resources are freed up, but when a user connects to the server, the database is reopened.
Hmmmm, he said "never". As a general rule, I'm distrustful of someone who says "never". In this case, however,
take my word for it. You never want to set this on a production database. In fact, I'm struggling to find a reason
why you would ever want to set this, but let's take a look at what this option does:
Normally when SQL Server boots, it opens each .mdf and .ldf file for all the databases, the databases are
checked and some small amount of resources are consumed to keep track of the database files, handles, etc. I
decided to then set the database option for Northwind to autoclose (SQL Server 2000 Standard). I next checked
the SQL Server error log and found that there were a bunch of entries that all said "Starting up database
'Northwind'". Now I ran sp_who2 to verify there were no users in Northwind. Wait a minute and connect to the
server with Query Analyzer. Even though I have the Object Browser open, no queries are made of Northwind until
I actually select the database.
I next select the Northwind database in the drop down and re-check the errorlog in Enterprise Manager (requires a
refresh). I see 4 more entries that say "Starting up database 'Northwind'". I disconnect and recheck the errorlog,
no entries. I had expected a "close" message, but none appeared. I checked the error logs and no entries were
there either. I next ran Handle from SysInternals to check for file handles open. I saw all my .mdf and .ldf files
open by the sqlsrvr.exe process. Reconnect and select Northwind, re-run Handle and sure enough, there are the
Northwind files being held open by sqlsrvr.exe. I repeat this a few times and verify that when I disconnect, the file
handles are closed. I even open two connections and verify that the database opens on the first connection
(nothing happens on the 2nd) and stays open when I close the first until I close the 2nd connection.
If you search Books Online, you will get 6 results, all of which have to do with setting or checking the option. No
guidance is given to strategies for using this option. A search on Technet returns the same BOL options along with
a few bugs. One thing to note is that in the desktop edition, this option is true by default ( set in model), but false in
other editions. I guess this is to save resources on a desktop. It also allows you to easily move and copy the file
when the database is not in use (mentioned in Ch 4 - Pocket Admin Consultant)(1). Course, don't know about you,
but on my servers, if I move a db file (mdf, ldf), I usually have problems when I start the server backup or access
the database.
This is a strange option to me and I find myself wondering why Microsoft ever implemented it, every time I find it
set. After all, how many resources can an open database hold? Since SQL Server tends to grab a large amount of
memory, it's hard to see if memory changes with this option being set. I decided to run a few experiments.
On my test server, the SQL Server process, sqlsrvr.exe, has 154,732kb in use. This is the steady state in general,
with a few other things on this server. If I set Northwind to Autoclose on, then the memory usage drops to
154,672kb immediately. When I connect with QA to master, I see memory usage jump to 154,680. 8kb added for
my connection, which is what I expect. I then select the "Northwind" database. Memory moves to 154,884, but
when I change back to master, the memory is still in use by SQL Server. I disconnect and memory drops back to
my baseline of 154,672kb. I repeat this a few times, adding some queries in there and while the memory values
change (seem to fluctuate by about 20kb as a starting point), I don't see the memory usage increase when I select
17
Northwind.
I know this isn't the most scientific test, but I don't see that many resources being used. I wonder if a large
database, > 1GB, would show similar results and I hope to get some time on a production system to test this over
the next few months along with some more in depth analysis, but for now, I'll repeat my advice. DO NOT SET
THIS ON A PRODUCTION SYSTEM.
In addition, there were some issues in SQL Server 7. Q309255 confirms that Autoclose may cause a stack dump
in SQL Server 7. The fix? Turn it off. I did find a Q&A at
http://www.microsoft.com/sql/techinfo/tips/administration/autoclose.asp that gave a mention that it is used for
databases that are rarely used, but in general it should not be used, I guess. If the database isn't used much, it
probably isn't taking many resources and isn't worth setting this. If you search Google, you'll find quite a few
people who have recommended you avoid this option entirely. I concur and remind you to double check all your
servers and shut this option down.
Steve Jones
©dkRanch.net January 2003
Introduction
I was answering a question posed by a trainee DBA, who asked about an odd error she was getting
when trying to create a new database – one I'd not experienced before. The error was:
Server: Msg
1807, Level 16, State 2, Line 1
Could not obtain exclusive lock on database 'MODEL'. Retry the operation later.
Looking this up in Books Online doesn't help much, nor was there much out on the web.
So I started investigating…
As you may know, when SQL Server creates a new database, it uses the model database as a "template", which
determines the data structures, initial file sizes and a few other things, for the new database that is being created.
Whether you use Enterprise Manager, or the T-SQL CREATE DATABASE command (which is what executes in
the background when you use the Enterprise Manager GUI to create a new database), SQL Server attempts to
obtain an EXCLUSIVE lock on the model database. Presumably this is to prevent other processes from updating
the model database's schema whilst the new database is being created, so that the new database's schema is in
a known, consistent state.
18
installations, but execute a select name, dbid from master..sysdatabases if you want to
check this)
You'll also get the Error 1807 message which sparked off this article.
However, through trial and error, I found that if you have even a single connection open to the model database, it
is not possible for SQL Server to obtain this exclusive lock. This can be caused by something as simple as having
the model database selected in the database drop-down in Query Analyzer. This prevents the CREATE
DATABASE command from creating a new database.
Another reason is that if you have previously opened the model database in Enterprise Manager,
then closed it, the connection to the database remains open, which means that the Create
Database command cannot acquire the exclusive access lock that it requires to execute
successfully. Not so bad if you've just done it, but how about if it was opened and closed three
months back?
Solution?
What has this got to do with Auto Close option? Well, if you have configured model to 'Auto Close' after opening it,
then it will close, at least in Enterprise Manager, and prevent you from experiencing this error. So it might be very
tempting to set Auto Close just on model to avoid encountering error 1807.
But don't reach for that mouse just yet. Here's the real gotcha : Remember we said that SQL Server uses model
as a template for every new database? Well, that includes all of the database options – including Auto Close. So
if you set the Auto Close option to be on for the model database, every new database you create will inherit the
Auto Close option – which as Steve Jones' original article pointed out, is not what we want.
Conclusion
If you experience error 1807, remember that it's probably an open connection to the model db that's causing it.
Drop the connection and try again. But don't be tempted to set the Auto Close option on model – at some point
you'll forget you did it and all of your new databases will have it set, unless you manually reset it for each of them.
As Steve said in his original article : "If the database isn't used much, it probably isn't taking many resources
and isn't worth setting this."
Jon Reade
© Norb Technologies, March 2003.
AWE Adventures
Joseph Sack
4/16/2003
Introduction
Recently, 4GB of physical RAM was added to a SQL Server 2000 Enterprise edition instance I support. This
brought the total physical RAM available on the machine up to 8GB. By using Windows 2000 Address Windowing
Extensions (AWE), with SQL Server 2000 Enterprise or Developer Edition, on Windows 2000 Advanced Server or
Windows 2000 Data Center, SQL Server can take advantage of physical memory exceeding 4GB of physical
RAM.
Although I had read a few articles about the AWE configuration process, this was the first time I had ever actually
enabled this feature. After I completed the configuration, I discovered a few behaviors I had not read about, as
well as configurations that could have caused issues had they not been addressed. In this article I will detail how I
enabled the AWE functionality, as well as what behaviors I believe one should be aware of.
19
This scope of this article details the configuration of AWE for SQL Server 2000 Enterprise on a Windows 2000
Advanced Server machine. Configuring AWE on Windows 2000 Data Center, I’m assuming, is quite similar to
configuring it on Windows 2000 Advanced Server, but as I have not performed such an operation, I will not
address it here. Also, this article assumes you are using a single instance machine. Multiple instances and AWE
settings require special planning not discussed here.
Why use AWE?
Prior to adding the additional 4GB of memory, the application running against this particular SQL Server instance
was experiencing significant I/O spikes throughout the day, and was running under maximum memory conditions.
The buffer cache and procedure cache utilization was always 100%, with the procedure cache often being starved
for memory.
After adding the additional memory, and enabling AWE, I saw the I/O spikes decrease significantly. The extra
memory allowed both the buffer cache and procedure cache to grab a sufficient amount of memory needed for the
application queries (I’ll be writing another article describing how you can monitor such information). The bigger
buffer decreased the number of times that read and write operations needed to read from disk.
Keep in mind that extra memory will not solve all I/O and memory issues. The performance outcome after
configuring AWE will vary depending on your application activity, read/write ratio, network throughput, hardware
components (CPU, RAID settings), and database size.
Configuration Steps
1. Assuming 8GB of physical memory, after adding the extra RAM, and prior to rebooting the server, your boot.ini
should contain both the “/3GB /PAE” switches. Not having /3GB in your boot.ini will translate to 2GB of RAM
reserved for the operating system, instead of 1GB remaining free with the “/3GB” switch. The “/PAE” switch is
required if you want SQL Server to support more than 4GB of RAM.
2. Make sure that the SQL Server service account has been granted “Lock Pages in Memory”) privileges. Just
because your service account is a member of the administrators group does NOT mean that it has this policy
setting already. I configured this setting by selecting Start | Run | and typing gpedit.msc. I selected OK to launch
the Group Policy editor. I expanded Computer Configuration | expanded Windows Settings, Security Settings,
Local Policies, and then clicked User Rights Assignments. In the Policy pane (on the right), I double clicked “Lock
pages in memory”, and added the SQL Server service account used to run the SQL Server service. For Domain
member machines, be sure that no security policies at the site, domain, or organization unit overwrite your policy
change. Also, the policy change does not affect permissions of the service account until the SQL Server service
is restarted. But do not restart the service yet!
3. In Query Analyzer, connected as sysadmin for your SQL Server instance. Enable AWE by executing the
following script:
This setting does not take effect until you restart the SQL Server instance – but do not do it yet – there is more!
4. Once AWE is enabled, SQL Server will no longer dynamically manage memory. SQL Server will grab all
available physical memory, leaving 128MB or less for the OS and other applications to use. This underscores the
importance of setting a max server memory amount that SQL Server should be allowed to consume. Determine
this upper limit based on memory consumption of other applications on your server. Also note that a lower limit
(min server memory) is no longer relevant in the context of AWE.
In this example, to enable 7GB as the maximum SQL Server memory allowed to be consumed, issue the following
command:
20
RECONFIGURE
GO
sp_configure ‘show advanced options’, 0
RECONFIGURE
GO
5. NOW reboot your machine (assuming you have not rebooted since reconfiguring the boot.ini file). If you have
already rebooted after configuring the boot.ini file, you need only restart the SQL Server instance.
6. After the restart, check the SQL Log in Enterprise Manager right away. The most recent startup log should
contain the words “Address Windowing Extensions enabled” early in the log. If you didn’t do it right, the log should
say, “Cannot use Address Windowing Extensions because…”. The reasons for this message will be noted, such
as not assigning “lock pages in memory”.
After the configuration
AWE awareness is not built in to all Windows 2000 tools, so here are a few areas you should be aware of when
monitoring memory utilization…
The Windows Task Manager’s “Processes” tab tells a misleading tale about how much memory the
SQLSERVR.EXE process is using. I was alarmed to see that, after a few hours, the process was still just
consuming 118MBs, versus the maximum 6.5GB I configured it for. For a reality check, within the Windows Task
Manager, switch to the Performance tab and check out the Available physical memory. This amount should be
the total memory available less the maximum amount you set for SQL Server, along with other applications
running on your instance.
If you use Performance Monitor (System Monitor), keep in mind that for the SQLServer:Memory Manager object,
that Target Server Memory (KB) and Total Server Memory (KB) counters will display the same number. This is
because with AWE, SQL Server no longer dynamically manages the size of the memory used. It will consume the
value of your ‘max server memory’. This memory will be made up of the physical RAM only, not the paging
file. AWE memory can be monitored in Performance Monitor (System Monitor) using the Performance object
“SQLServer:Buffer Manager” several AWE related counters.
One last note on memory configuration… If you have not left enough RAM for other processes on the machine, do
consider lowering the max server memory setting. Keep in mind that this change will not take effect until you
restart the SQL Server service.
In a future article, I will review how to take a look at the memory utilization internals, so you can better monitor how
the allocated memory is actually being used.
In an environment where the developer has free access to the production servers, he can make unintentional
mistakes which can result in server degradation and performance dipping down drastically over a period of time.
The DBA needs to be aware of these common mistakes and take every precaution to monitor these mistakes,
rectify the same and convey it back to the developer so that going forward the developers do not make such
mistakes.
DATABASE CREATION
The developer may choose to create a database with default options as provided by SQL Server. The default
options have the initial size of a model database which may be very small and may result in creation of database
file which needs to be expanded every few transactions while in production. The DBA should make sure that the
developers who have admin access to the SQL Server are aware of the implications it can have on the production
environment. The developer should be able to estimate the initial size of the database, to keep it free from
overloading the server soon.
21
The default creation of the database also results in the file having unrestricted growth which leaves a lot of scope
for fragmentation. Always ask your developers to have a maximum size for the database file; this will help in
avoiding fragmentation. Keep a maximum size and have a small percentage set for increment of size.
The recovery model is by default full, which may result in very large transaction logs over a period, if the backups
are not scheduled on a regular basis or through SQL Server. The transaction log settings should be kept to
simple, or as appropriate to your environment.
The developer or designer should make sure that the column data type should be 'varchar' rather then 'character'.
This results in saving lot of memory and traffic across the network.
Although it sounds very basic, one does come across many tables and database structures which do not have a
primary key associated with them. Make sure that the Primary Keys always exist.
The database designer has to strike a balance between normalization and the denormalized form of a design. At
times the Database has to have the performance of a RDBMS and the flexibility of a warehouse.
Once the database is in use, it will be good if a trace on Profiler can be used and the events be recorded, in order
to fine tune the indexes, using Index Tuning Wizard. Make sure that the trace is done during the peak time and
the Index Tuning Wizard is used in non Peak time. Developers often write stored procedures which have dynamic
SQL. The developers should always try and avoid using dynamic SQL.
The 'DROP' command in a stored Procedure should be avoided for dropping a table and should either be
replaced by a 'truncate' command or an inline table operator for same. Another alternative can be a temporary
table.
The foreign key relation should exist for data accuracy and also to ensure that the attributes share the same data
type across tables. A query which runs on separate data types can kill the system.
The Developer should be aware of code that can result in deadlock. The objects should be accessed in the same
order in different stored procedures or triggers. The transaction isolation level should be set to low wherever
possible.
Do you remember the differences between SQL 6.5 and SQL 2000 about creating a procedure that calls another
procedure that doesn't exist? Server 6.5 would not allow the procedure to be created when it depends upon a non-
existing procedure. On the other hand, SQL Server 7.0 and 2000 will allow the procedure to be created, and the
SP_DEPENDS system procedure will not report correct results. If we run following script:
USE Northwind
go
CREATE PROCEDURE proc1
AS exec proc2
GO
CREATE PROCEDURE proc2
AS exec proc3
GO
CREATE PROCEDURE proc3
AS exec proc4
GO
CREATE PROCEDURE proc4
AS exec proc5
GO
22
CREATE PROCEDURE proc5
AS exec proc6
GO
We receive sql messages:
Cannot add rows to sysdepends for the current stored procedure because it depends
on the missing object 'proc2'.
The stored procedure will still be created.
Cannot add rows to sysdepends for the current stored procedure because it depends
on the missing object 'proc3'.
The stored procedure will still be created.
Cannot add rows to sysdepends for the current stored procedure because it depends
on the missing object 'proc4'.
The stored procedure will still be created.
Cannot add rows to sysdepends for the current stored procedure because it depends
on the missing object 'proc5'.
The stored procedure will still be created.
Cannot add rows to sysdepends for the current stored procedure because it depends
on the missing object 'proc6'.
The stored procedure will still be created.
In sysdepends table will not exist dependenciesfor "proc(i) - proc(i+1)". We can check that with this statement
which should yield zero records.
select * from sysdepends where object_name(id) like 'proc%'
So I can't trust system table sysdepends. However, sometimes I need real information about dependencies,
especially between stored procedures, to get the real processing flow. So I developed a SQL statement to show
stored procedure dependencies in one database by searching sysobjects and syscomments system tables. At first
I'm creating a recursive function which will return sp text without comments. This function erases up to 160 line
comments and 160 block comments, 32 nested levels of recursive function and five replacements in every
function call. We can increase this number if we need it.
23
begin
IF charindex('--',@Input) > 0 and charindex(char(13),@Input,charindex('--',@Input))
- charindex('--',@Input) +2 > 0
BEGIN
SET @Input =
REPLACE( @Input,
substring(@Input ,
charindex('--',@Input),
charindex(char(13),@Input,charindex('--',@Input)) - charindex('--',@Input) +2 )
, '')
END
set @i = @i+1
end
SET @Output = dbo.funProcWithoutComments (@Input)
END
RETURN @Output
END
Then I find all the dependencies in the database with the following statement:
24
Tips for Full-Text Indexing/Catalog Population/Querying in SQL 7.0 and 2000
Jon Winer
9/25/2003
This article is a brief summary of several tips & tricks I have learned through working with the Full-Text features in
SQL Server.
To fix the problem, I changed the SQL Server behavior to allow modifications to the system catalogs. I looked in
the sysFullTextCatalogs table in the current database and changed the 'path' field value to the new location. (If the
value is NULL, it means a path was not given at setup and the default installation path was used.) This allowed me
to modify the Full-Text Indexing on the new machine. (Remember to change the server behavior back to its
original setting.)
(Helpful hint: Make sure you have a field of type TimeStamp in your table. Without one, you cannot properly run an incremental population.)
Issue 1
There are some differences between SQL 7.0 and 2000 in their Full-Text Querying capabilities. SQL 7.0 has
limitations in the number of 'CONTAINS' clauses it can process at any one time (in my testing, it is around 16). I
have not been able to find any specific documentation on this issue, but SQL 2000 does not seem to have this
limit. Below is a brief reference from Microsoft on this issue:
If you are using multiple CONTAINS or FREETEXT predicates in your SQL query and are experiencing poor full-text search query
performance, reduce the number of CONTAINS or FREETEXT predicates or using "*" to use all full-text indexed columns in your query.
Issue 2
In SQL 7.0, if there are excessive grouping parenthesis (even though they match up), the query will hang. Even
when the command timeout property is set, the query will hang past the command timeout value assigned, and
you will receive an error message of 'Connection Failure -2147467259'. When the extra parentheses are
removed, the query executes fine. In SQL 2000 the original query runs with no problems.
25
Issue 3
When a Full-Text query in SQL 7.0 contained a single noise word, I would receive the error 'Query contained only
ignored words'. SQL 2000, handled the noise words and returned the query results. In SQL 7.0, I had to remove
all noise words from the query for it to run successfully. Here is a recommendation from Microsoft pertaining to
this issue:
You also may encounter Error 7619, "The query contained only ignored words" when using any of the full-text predicates in a full-text query,
such as CONTAINS(pr_info, 'between AND king'). The word "between" is an ignored or noise word and the full-text query parser considers
this an error, even with an OR clause. Consider rewriting this query to a phrase-based query, removing the noise word, or options offered in
Knowledge Base article Q246800, "INF: Correctly Parsing Quotation Marks in FTS Queries". Also, consider using Windows 2000 Server:
There have been some enhancements to the word-breaker files for Indexing Services.
For more information on Full-Text Indexing and Querying, visit Microsoft MSDN.
Server Properties:
sysfulltextcatalogs table:(6)
26
Getting Rid of Excessive Files and Filegroups in SQL Server
Chad Miller
2/11/2003
Recently I began supporting a database with 16 filegroups, which in and of itself is not an issue. However, this
particular database is only 7GB in total used size and all of the 16 files were located on a single EMC symmetrix
volume. Because of the use of file groups the database had expanded to a total of 15 GB unnecessarily doubling
its size. Although there are legitimate reasons to use filegroups; In this situation 16 file groups were clearly
excessive and did not create substantial value since all of the files were located on a single volume.
Although it could be argued that filegroups can aid in recovering certain tables without restoring the entire
database, this type of recoverability was not needed. If you buy quality physical disks and backup/recovery
software you can avoid using filegroups entirely. So, I set out to remove 15 filegroups in favor of a single
PRIMARY filegroup.
I wanted to remove files/filegroups by using scripts, so began by creating scripts to move all objects to the
PRIMARY filegroup:
I began by backing up and restoring the existing production database to a QA environment, setting the recovery
mode to simple, setting the default filegroup to primary and expanding the primary filegroup to be large enough to
hold all database objects and accommodate index rebuilds/future growth:
ALTER DATABASE MyDB
SET RECOVERY SIMPLE
GO
ALTER DATABASE MyDB
MODIFY FILEGROUP [PRIMARY]
DEFAULT
GO
ALTER DATABASE MyDB
MODIFY FILE
(NAME = MYDB_fg_primary,
SIZE = 10000MB,
MAXSIZE = 20000MB,
FILEGROWTH=500MB)
GO
Once this had been accomplished I scripted out all clustered indexes and non-clustered indexes for
those tables with clustered indexes and then rebuilt those indexes on the PRIMARY filegroup. Tables without a
clustered index or without any indexes must be handled differently. Drop the non-clustered indexes of tables with
clustered indexes
DROP INDEX TblWithClustIDX.IX_NonClustCol
Drop the clustered index, which in this case is also the primary key constraint.
ALTER TABLE TblWithClustIDX
DROP CONSTRAINT PK_TblWithClustIDX
GO
Once the non-clustered and clustered indexes have been dropped rebuild the clustered and non-clustered
indexes on the PRIMARY filegroup:
ALTER TABLE TblWithClustIDX
ADD CONSTRAINT PK_TblWithClustIDX
PRIMARY KEY CLUSTERED (ClustCol)
ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX IX_ NonClustCol
ON TblWithClustIDX(NonClustCol)
ON [PRIMARY]
GO
For tables without a clustered index, but with a non-clustered index you can move the data pages to the primary
filegroup by dropping the existing non-clustered index and recreating the index as clustered. Then you can return
the index to its original, nonclustered state by dropping and recreating again on the PRIMARY filegroup:
27
For example, TblWithClustIDXOnly does not have a clustered index, only nonclustered indexes. Drop existing
nonclustered index, which in this case is a primary key.
CREATE UNIQUE CLUSTERED
INDEX PK_TblWithClustIDXOnly ON dbo.TblWithClustIDXOnly (NonClustCol)
WITH
DROP_EXISTING
ON PRIMARY
GO
Drop the clustered index you’ve just created.
ALTER TABLE
TblWithClustIDXOnly DROP CONSTRAINT PK_TblWithClustIDXOnly
GO
Recreate the non-clustered index on the primary filegroup.
ALTER TABLE TblWithClustIDXOnly
ADD CONSTRAINT PK_TblWithClustIDXOnly
PRIMARY KEY NONCLUSTERED (NonClustCol)
ON PRIMARY
GO
For tables without indexes, you can simply move their data pages to the primary filegroup by selecting them into
another database, dropping the existing table from the original and then selecting the table back into the original
database.
SELECT * INTO DBObjCopy.dbo.NoIndexesTable FROM NoIndexesTable
GO
DROP TABLE NoIndexesTable
GO
SELECT * INTO dbo.NoIndexesTable FROM DBObjCopy.dbo.NoIndexesTable
GO
At this point I thought I was done and could safely drop the files. However when I attempted to drop several of the
filegroups SQL Server returned an error message indicating the file could not be dropped because it was not
empty (SQL Server will not let you drop a file or filegroup if it is not empty). So I set out to determine which
objects were still located on a filegroup other than the primary group. The undocumented stored procedure
sp_objectfilegroup will list the filegroup for an object, provided you pass it the object_id, but I did not know the
object_id plus I wanted to run an object to filegroup query for all objects. Using sp_objectfilegroup as a starting
point and building on the query used by sp_objectfilegroup; I came up with a query to show all of the table/object
names that are located on a filegroup other than primary:
--Listing 1: T-SQL to Display objects and filegroups not on Primary Filegroup
I could simply drop all the system-generated statistics, however I didn’t want to take the performance hit of re-
generating these statistics during production hours. So I created another script to drop all of the statistics and re-
create them. When they were recreated they were then located on the primary filegroup.
Listing 2: T-SQL to Drop and recreate all statistics
SET NOCOUNT ON
GO
28
create table #stat
(stat_name sysname,
stat_keys varchar(1000),
table_name varchar(100))
Go
Since I had moved all tables, indexes and statistics to the primary filegroup by using scripts, I thought it would be
nice to do the same with the text/image columns. Using profiler I traced the process of using Enterprise Manager
to change text/image column filegroup. Unfortunately I discovered that SQL Server would drop and recreate the
table in order to move the text/image column to primary filegroup. Although I could have accomplished moving to
a new filegroup by backing up the table via “select into…” and
dropping and recreating I felt it would be easier to just let Enterprise Manager do this and since it was only 8
tables I didn’t mind manually changing these columns via Enterprise Manager.
29
Once the text/image columns had been moved, I ran the query in Listing 1 and finally, there were no objects
located on any filegroup other than the primary filegroup. Once again I attempted to drop the files and SQL Server
again returned an error message indicating the files were not empty. I then shrank the files using DBCC
SHRINKFILE and was finally able to the drop the files and filegroups.
Mission accomplished! You’re probably wondering if this all was worth it? Yes, I could have used Enterprise
Manager entirely instead of scripts, however going through 170 tables manually changing filegroups did not appeal
to me. Also, because of production refreshes of QA, ongoing testing of filegroup moves and production
implementation I would have to go through this whole process at least a half dozen more times. So, it did save
me time. I would rather use testable, repeatable scripts to move changes into production instead of error prone
labor-intensive processes utilizing Enterprise Manager.
Most DBAs have many tasks belonging to the System Administrator (SA) in a Windows 2K network – either there
is a single person in an IT Department or in the case of a small company. One of these tasks is to read and
analyze the Event Viewer Logs daily to see if there are any problems, errors etc. As we know, the operating
system has a way of announcing to the SA when an special event appears in the system. Moreover, if we want to
take a history of events we can save these logs in a text file (example: open Event Viewer, click Security Log,
Action, Save Log File As…).
The maximum log size is 512 K in Windows 2K that makes a text file with ~2,500 rows for reading. Let’s imagine a
scenario: The company has 10 Windows 2K file servers .The network works but the logs are filled in 1 day. In this
case, the SA has to read a file with ~25,000 rows to have a conclusion about how the machines are working.
Like MS-DBA we can use SQL Server 2000 tools to make an image on this repository of Event Viewer events.
The steps for this goal are:
To copy the text file log in a SQL Server database insert a Transform Data Task:Text File with Source=Log file
text(Ap2.txt) and Destination =SQL Server table (Ap2) with following design:
Create Table [Ap2Rez] (
[Col001] [varchar] (255) , -- Date in Event Viewer Log
[Col002] [varchar] (255) , -- Time in Event Viewer Log
[Col003] [varchar] (255), -- Source] in Event Viewer Log
[Col004] [varchar] (255), -- Type in Event Viewer Log
[Col005] [varchar] (255) , -- Category in Event Viewer Log
[Col006] [varchar] (255), -- EventID in Event Viewer Log
[Col007] [varchar] (255), -- User in Event Viewer Log
[Col008] [varchar] (255) , -- Computer in Event Viewer Log
[Col009] [varchar] (456) ) . -- Description in Event Viewer Log
To adjust SQL Server table resulted (Ap2) that has an anomaly (Col009 is too big and a part of it is introduced in
Col001)
Insert an Execute SQL Task that runs a script (or a procedure) for append the rows in a table Ap2Rez2 with
following design:
[Ap2Rez2] (
[IDRow] [int] IDENTITY (1, 1) NOT NULL ,
[_Date] [datetime] NULL , --is Col001 + Co002
30
[_Source] [varchar] (255) ,
[_Type] [varchar] (255,
[_Category] [varchar] (255),
[_EventID] [int] NULL ,
[_User] ,
[_Computer] [varchar] (255) ,
[_Description] [varchar] (1000). --is Col009
+ Col001 just in case
3. Run a scheduled DTS Package to reprocess an Incrementally Updating OLAP cube made in following way:
- Cube called EventViewer.
- The Fact Table Source :Ap2Rez2 .
- The Measure:Event Id with Aggregate Function Count.
The structure of dimensions:
on
Ap2Rez
2
Time
31
Normally, only the SA (DBA) can browse data . The DBA can use this cube to see if they can balance the SQL
Server activity. For example: in replication a Distributor can be put on a Windows server with the lowest activity, or
can be viewed from the unsafe Windows server that can affect SQL Server databases.
I think that some questions can be asked about using this cube in a network:
1. How can we build an OLAP cube to see the track of an illegal attack in a network?
(I suppose it can be linked on the order of events).
2. If a whole tools based on OLAP cube engine can be developed, can be it attached on a new version of
Windows Operating System?
Introduction
Your software has passed all (your only?!) testing phase(s) and it is time to install your database into production. I
will outline below how I accomplish this task. This article is concerned with an evolving system i.e. you will perform
an initial installation, but subsequent installations may be required for such things as customer change requests
(no faults – your testing was perfect!) while retaining all data inserted since the application began use.
Scripts
I create all my database objects from scripts and not the other way around. I never use Enterprise Manager (EM)
to create a stored procedure then reverse engineer a script. If you perform unit testing against a database where
you have created objects via EM, how can you guarantee that your scripts are consistent and that when you install
to an external site you won’t introduce a bug? Aside from this, reverse engineering can sometimes produce scripts
with ugly formatting which have poor readability. After unit testing the script, we then copy it to Visual SourceSafe
(VSS) from where all version control is governed.
Testing
Our software has the following testing cycle
32
• Factory acceptance testing (FAT) (in-house test team)
• Site acceptance testing (SAT) (external test team)
For all test phases after unit testing I perform a full installation. The point is that your testing process is not only
testing your software, but its installation too. Again, if you simply start FAT testing against your development
database, you can not install to SAT with any confidence in your mechanism (objects missing out of build,
necessary look up tables not populated etc…).
Initial installation
After developing my first system, using SQL Server, I installed to production by simply restoring a backup of the
test database. I now use a Windows command file to perform all installations, following the template from a
previous excellent article by Steve Jones (Migrating Objects to Production); the file simply executes multiple scripts
using the DOS command OSQL. I will outline below why I believe restoring a backup is the wrong approach.
Your library
This is the main reason why I use this method. If you install from a backup you cannot guarantee you are installing
what is in your library under source code control. What happens if you restore a backup, then for your first patch
release you need to change a stored procedure? You check it out, make the change, test then install. Problem is,
your script under version control was inconsistent with the version in the database you restored, and you have
introduced a bug which causes another part of the system to fail with consequent down time. If you install from
your scripts in the first place then test against that, you will eliminate any potential errors like these.
Reproducible
You will need to perform the same installation time and again, for the test phases outlined above, and maybe
multiple client sites have different versions of the same database. Surely it's better to have one command file
which facilitates a completely re-producible build, which could be performed by anyone and has been pre-tested. If
multiple people are performing a number of different installations by restoring a backup, can you be sure all
installations are identical?
Documentation / Consistency
Going back to the example above where you perform an initial installation, the system gets used for a bit, then one
stored procedure needs to change following a change note. Presumably most people would perform this patch
release by executing the one script into the database (via command file or not) – you cannot restore a backup in
this case, as the customer would lose all the data they have entered already. If you had performed the initial
release by the restore method, you would now have the situation where your two installations were by two different
means. I prefer to have one consistent way to do things, which also makes documenting your procedures simpler
if your company/client requires this.
Size of build
I have found that, in a lot of cases, all the scripts required to produce the build will fit on a floppy disk, whereas
taking a backup to install usually involves burning a CD. Not a great benefit here but it does make your life slightly
simpler.
Commenting
Using a command file allows you to add comments. This makes traceability better as you can document such
things as who produced the build and the reason for it, etc.
Disadvantages
The greatest disadvantage involved in this method is the overhead of creating the command file to execute the
build. It’s less effort just to restore a backup of your test database straight into production. I believe the benefits
outlined above offset this minimal effort which is required.
Conclusion
This article outlines the methodology I use to perform my initial database release into production. How does
everybody else perform this task? it’s always interesting to see how other people do things.
33
Scheduling SQL Server Traces - Part 2
Rahul Sharma
9/16/2003
This is the second part of the article on how to schedule traces using stored procedures in SQL Server 2000. The
previous article was for SQL Server 7.0
SQL Profiler uses system stored procedures to create traces and send the trace output to the appropriate
destination. These system stored procedures can be used from within your own applications to create traces
manually, instead of using SQL Profiler. This allows you to write custom applications specific to the needs of your
enterprise. In the case of SQL Server 2000, the server side traces are not done using the extended stored
procedures anymore (as in SQL Server 7.0) but through system procedures which expose the underlying
architecture used to create these traces. You can read more on that in BOL.
In this article, I will walk you through some sample scripts that will illustrate how you can add:
a) Tracing maintenance stored procedures to your DBA toolkit, and/or
b) Add tracing capabilities in your application.
There are so many events and data columns in SQL Server 2000 that sometimes it is very easy to get lost as to
what you really want to trace for a particular scenario. What you can do is that you can maintain trace tables with
data in it for the events and data columns for a given trace type and then at run-time select the trace type which
will take in the specified values and create the traces for you.
b) The trace table is maintained in the tempdb database. You can change it to be maintained in a user database
as well if you wish to. Otherwise whenever you re-start SQL Server, it will need to be re-created in the tempdb
database (can be done using scripts for start-up as well).
c) The USP_Trace_Info stored procedure does all the trace work and generates the trace file for a specified trace
type. You can specify your filter criterias, trace names, and different parameter values as you would otherwise do
through the SQL Profiler GUI tool.
d) After running the scripts shown below, the explanation for the commands and what are the different options
available can be obtained by just executing: exec usp_trace_info '/?', from Query Analyzer. This will display all the
choices that are available to you.
/**********************************
Start of Scripts.
**********************************/
/*****************************************************************************
Trace_Scenario table: Contains the events and the data columns for the different
Trace Scenarios
Creating it in TEMPDB since this is not an application table and don't want this
to hang around...
34
Can be created in the User Database(s) as well so that even when the service is
re-started, it is available or it will need to be re-created every time the service
is re-started.
If more events and Data-Columns are needed, we can add/modify the values in here
without even touching the trace templates.
*****************************************************************************/
IF OBJECT_ID('TEMPDB.DBO.Trace_Scenario') IS NOT NULL
DROP TABLE TEMPDB.DBO.Trace_Scenario
GO
/**********************************************************************************
*******************
Different Trace Types:
Trace_Type Description
1 Slow running Queries.
2 General Performance.
3 Table/Index Scans.
4 Table/Index Usage.
5 Basic Trace for capturing the SQLs.
6 Locking/Deadlocking Issues.
7 Detailed Performance.
***********************************************************************************
******************/
CREATE TABLE tempdb.dbo.Trace_Scenario (Trace_Type int, Trace_Description
varchar(50), Events varchar(300), Data_Columns varchar(300), constraint
pk_trace_scenario primary key (trace_type))
GO
/**********************************************************************************
********************
NOTE: modify these enteries as per the finalized trace events and dala columns
***********************************************************************************
********************/
--Slow running queries
insert into tempdb.dbo.Trace_Scenario (Trace_Type, Trace_Description, Events,
Data_Columns)
values (1, 'Slow Running Queries', '10,11,12,13,17,51,52,68',
'1,2,3,6,8,10,11,12,13,14,15,16,17,18,22,25,26,40')
--General Performance
insert into tempdb.dbo.Trace_Scenario (Trace_Type, Trace_Description, Events,
Data_Columns)
values (2, 'General Performance',
'75,76,16,21,22,33,67,69,55,79,80,61,25,27,59,58,14,15,81,17,10,11,34,35,36,37,38,3
9,50,11,12',
'1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,3
1,32,33,34,35,36,37,38,39,40,41,42,43,44,')
--Table/Index Scans
insert into tempdb.dbo.Trace_Scenario (Trace_Type, Trace_Description, Events,
Data_Columns)
values (3, 'Table/Index Scans', '10,11,12,13,17,51,52,68',
'1,2,3,6,8,10,12,13,14,15,16,17,18,20,21,22,25,26,31,40')
--Table/Index Usage
insert into tempdb.dbo.Trace_Scenario (Trace_Type, Trace_Description, Events,
Data_Columns)
values (4, 'Table/Index Usage', '10,11,12,13,17,48,58,68',
'1,2,3,6,8,10,12,13,14,15,16,17,18,20,21,22,24,25,26,31,40')
--Basic Trace for capturing the SQLs
insert into tempdb.dbo.Trace_Scenario (Trace_Type, Trace_Description, Events,
Data_Columns)
values (5, 'Basic Trace for capturing the SQLs',
'10,11,12,13,16,17,23,24,26,27,33,51,52,55,58,60,61,67,68,69,79,80',
'1,2,3,6,8,9,10,11,12,13,14,15,16,17,18,20,21,22,24,25,26,31,32,35,40')
--Locking/Deadlocking Issues
insert into tempdb.dbo.Trace_Scenario (Trace_Type, Trace_Description, Events,
35
Data_Columns)
values (6, 'Locking/Deadlocking Issues',
'10,11,14,15,17,23,24,25,26,27,33,44,45,51,52,59,60,68,79,80',
'1,2,3,8,10,12,13,14,15,16,17,18,22,24,25,31,32')
--Detailed Performance
insert into tempdb.dbo.Trace_Scenario (Trace_Type, Trace_Description, Events,
Data_Columns)
values (7, 'Detailed Performance',
'53,75,76,60,92,93,94,95,16,21,22,28,33,67,69,55,79,80,61,25,27,59,58,14,15,81,17,1
0,11,34,35,36,37,38,39,50,11,12,97,98,18,100,41',
'1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,3
1,32,33,34,35,36,37,38,39,40,41,42,43,44,')
GO
/**********************************************************************************
****************
Stored Procedure: USP_TRACE_INFO
***********************************************************************************
****************/
USE master
GO
IF OBJECT_ID('TEMPDB.DBO.USP_TRACE_QUEUE') IS NOT NULL
DROP TABLE TEMPDB.DBO.USP_TRACE_QUEUE
GO
IF OBJECT_ID('USP_TRACE_INFO') IS NOT NULL
DROP PROC USP_TRACE_INFO
GO
CREATE PROC USP_TRACE_INFO
@OnOff varchar(4)='/?',
@file_name sysname=NULL,
@TraceName sysname='Sample_Trace',
@Options int=2,
@MaxFileSize bigint=4000,
@StopTime datetime=NULL,
@TraceType int=0,
@Events varchar(300)=
-- Default values
'11,13,14,15,16,17,33,42,43,45,55,67,69,79,80',
@Cols varchar(300)=
-- All columns
'1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,3
1,32,33,34,35,36,37,38,39,40,41,42,43,44,',
@IncludeTextFilter sysname=NULL,
@ExcludeTextFilter sysname=NULL,
@IncludeObjIdFilter int=NULL,
@ExcludeObjIdFilter int=NULL,
@IncludeObjNameFilter sysname=NULL,
@ExcludeObjNameFilter sysname=NULL,
@IncludeHostFilter sysname=NULL,
@ExcludeHostFilter sysname='%Query%',
@TraceId int = NULL
AS
BEGIN
SET NOCOUNT ON
SET @OnOff=UPPER(@OnOff)
IF (@OnOff='LIST') BEGIN
IF (OBJECT_ID('tempdb..USP_trace_Queue') IS NOT NULL) BEGIN
IF (@TraceId IS NULL) BEGIN
DECLARE tc CURSOR FOR SELECT * FROM tempdb..USP_trace_Queue FOR READ ONLY
DECLARE @tid int, @tname varchar(20), @tfile sysname
OPEN tc
36
FETCH tc INTO @tid, @tname, @tfile
IF @@ROWCOUNT<>0 BEGIN
WHILE @@FETCH_STATUS=0 BEGIN
SELECT TraceId, TraceName, TraceFile FROM tempdb..USP_trace_Queue WHERE
TraceId=@tid
SELECT * FROM ::fn_trace_getinfo(@tid)
FETCH tc INTO @tid, @tname, @tfile
END
END ELSE PRINT 'No traces in the trace queue.'
CLOSE tc
DEALLOCATE tc
END ELSE BEGIN
SELECT TraceId, TraceName, TraceFile FROM tempdb..USP_trace_Queue WHERE
TraceId=@TraceId
SELECT * FROM ::fn_trace_getinfo(@TraceId)
END
END ELSE PRINT 'No traces to list.'
RETURN 0
END
-- Declare variables
DECLARE @OldQueueHandle int -- Queue handle of currently running trace queue
DECLARE @QueueHandle int -- Queue handle for new running trace queue
DECLARE @On bit
DECLARE @OurObjId int -- Used to keep us out of the trace log
DECLARE @OldTraceFile sysname -- File name of running trace
DECLARE @res int -- Result var for sp calls
SET @On=1
IF @@ROWCOUNT<>0 BEGIN
EXEC sp_trace_setstatus @TraceId=@OldQueueHandle, @status=0
EXEC sp_trace_setstatus @TraceId=@OldQueueHandle, @status=2
PRINT 'Deleted trace queue ' + CAST(@OldQueueHandle AS varchar(20))+'.'
PRINT 'The trace output file name is: '+@OldTraceFile+'.trc.'
DELETE tempdb..USP_trace_Queue WHERE TraceName = @TraceName
END
END ELSE PRINT 'No active traces named '+@TraceName+'.'
END ELSE PRINT 'No active traces.'
IF @OnOff='OFF' RETURN 0 -- We've stopped the trace (if it's running), so exit
-- Append the datetime to the file name to create a new, unique file name.
IF @file_name IS NULL
37
begin
SELECT @file_name = 'c:\TEMP\tsqltrace_' + CONVERT(CHAR(8),getdate(),112) +
REPLACE(CONVERT(varchar(15),getdate(),114),':','')
end
else
begin
SELECT @file_name = 'c:\TEMP\' +@tracename + CONVERT(CHAR(8),getdate(),112) +
REPLACE(CONVERT(varchar(15),getdate(),114),':','')
end
38
END
-- Set filters (default values avoid tracing the trace activity itself)
-- Specify other filters like application name etc. by supplying strings to the
@IncludeTextFilter/@ExcludeTextFilter parameters, separated by semicolons
SET @OurObjId=OBJECT_ID('master..USP_TRACE_INFO')
EXEC sp_trace_setfilter @TraceId=@QueueHandle, @columnid=1, @logical_operator=0,
@comparison_operator=7, @value=N'EXEC% USP_TRACE_INFO%'
IF @ExcludeTextFilter IS NOT NULL EXEC sp_trace_setfilter @TraceId=@QueueHandle,
@columnid=1, @logical_operator=0, @comparison_operator=7, @value=@ExcludeTextFilter
-- Record the trace queue handle for subsequent jobs. (This allows us to know
how to stop our trace.)
IF OBJECT_ID('tempdb..USP_trace_Queue') IS NULL BEGIN
CREATE TABLE tempdb..USP_trace_Queue (TraceId int, TraceName varchar(20),
TraceFile sysname)
INSERT tempdb..USP_trace_Queue VALUES(@QueueHandle, @TraceName, @file_name)
END ELSE BEGIN
IF EXISTS(SELECT 1 FROM tempdb..USP_trace_Queue WHERE TraceName = @TraceName)
BEGIN
UPDATE tempdb..USP_trace_Queue SET TraceId = @QueueHandle, TraceFile=@file_name
WHERE TraceName = @TraceName
END ELSE BEGIN
INSERT tempdb..USP_trace_Queue VALUES(@QueueHandle, @TraceName, @file_name)
END
END
RETURN 0
Help:
PRINT 'USP_TRACE_INFO -- Starts/stops a Profiler-like trace using Transact-SQL
server side stored procedures.'
DECLARE @crlf char(2), @tabc char(1)
SET @crlf=char(13)+char(10)
SET @tabc=char(9)
PRINT @crlf+'Parameters:'
39
PRINT @crlf+@tabc+'@OnOff varchar(3) default: /? -- Help'
PRINT @crlf+@tabc+'@file_name sysname default: c:\temp\YYYYMMDDhhmissmmm.trc --
Specifies the trace file name (SQL Server always appends .trc extension)'
PRINT @crlf+@tabc+'@TraceName sysname default: tsqltrace -- Specifies the name
of the trace'
PRINT @crlf+@tabc+'@TraceType int default: 0 -- Specifies the type of trace to
run (obtained from the Trace table: tempdb.dbo.Trace_Scenario)'
PRINT @crlf+@tabc+'@Options int default: 2 (TRACE_FILE_ROLLOVER)'
PRINT @crlf+@tabc+'@MaxFileSize bigint default: 4000 MB'
PRINT @crlf+@tabc+'@StopTime datetime default: NULL'
PRINT @crlf+@tabc+'@Events varchar(300) default: SP-related events and
errors/warnings -- Comma-delimited list specifying the events numbers to trace.
(Obtained from the Trace table: tempdb.dbo.Trace_Scenario)'
PRINT @crlf+@tabc+'@Cols varchar(300) default: All columns -- Comma-delimited
list specifying the column numbers to trace. (obtained from the Trace table:
tempdb.dbo.Trace_Scenario)'
PRINT @crlf+@tabc+'@IncludeTextFilter sysname default: NULL -- String mask
specifying what TextData strings to include in the trace'
PRINT @crlf+@tabc+'@ExcludeTextFilter sysname default: NULL -- String mask
specifying what TextData strings to filter out of the trace'
PRINT @crlf+@tabc+'@IncludeObjIdFilter sysname default: NULL -- Specifies the id
of an object to target with the trace'
PRINT @crlf+@tabc+'@ExcludeObjIdFilter sysname default: NULL -- Specifies the id
of an object to exclude from the trace'
PRINT @crlf+@tabc+'@TraceId int default: NULL -- Specified the id of the trace
to list when you specify the LIST option to @OnOff'
PRINT @crlf+'Examples: '
PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO -- Displays this help text'
PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO ''ON'' -- Starts a trace'
PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO ''OFF'' -- Stops a trace'
PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO ''ON'', @file_name=''E:\log\mytrace'' --
Starts a trace with the specified file name'
PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO ''ON'',@Events=''37,43'' -- Starts a
trace the traps the specified event classes'
PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO ''ON'',@Cols=''1,2,3'' -- Starts a trace
that includes the specified columns'
PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO ''ON'',@IncludeTextFilter=''EXEC% FooProc%''
-- Starts a trace that includes events matching the specified TextData mask'
PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO ''ON'',@tracename=''Receiving_50_Ctns''
-- Starts a trace using the specified name'
PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO ''OFF'',@tracename=''Receiving_50_Ctns''
-- Stops a trace with the specified name'
PRINT @crlf+@tabc+'EXEC USP_TRACE_INFO ''ON'',@file_name = ''E:\log\mytrace'',
-- Starts a trace with the specified parameters'
PRINT @tabc+@tabc+'@TraceName = ''Receiving_50_Ctns'','
PRINT @tabc+@tabc+'@Options = 2, '
PRINT @tabc+@tabc+'@TraceType = 0,'
PRINT @tabc+@tabc+'@MaxFileSize = 500,'
PRINT @tabc+@tabc+'@StopTime = NULL, '
PRINT @tabc+@tabc+'@Events =
''10,11,14,15,16,17,27,37,40,41,55,58,67,69,79,80,98'','
PRINT @tabc+@tabc+'@Cols = DEFAULT,'
PRINT @tabc+@tabc+'@IncludeTextFilter = NULL,'
PRINT @tabc+@tabc+'@IncludeObjIdFilter = NULL,'
PRINT @tabc+@tabc+'@ExcludeObjIdFilter = NULL'
PRINT @crlf+@tabc+'To list all the traces currently running:'
PRINT @crlf+@tabc+@tabc+'USP_TRACE_INFO ''LIST'''
PRINT @crlf+@tabc+'To list information about a particular trace:'
PRINT @crlf+@tabc+@tabc+'USP_TRACE_INFO ''LIST'', @TraceId=n -- where n is the
trace ID you want to list'
PRINT @crlf+@tabc+'To stop a specific trace, supply the @TraceName parameter
when you call USP_TRACE_INFO ''OFF''.'
RETURN 0
40
SET NOCOUNT OFF
END
GO
/**********************************
End of Scripts.
**********************************/
Using these scripts, you can add tracing maintenance stored procedures to your DBA toolkit. And in case you wish
to add tracing capabilities in your application, then you can pass a filter value for the spid for which you want to
trace. You can find out the calling application program’s SPID by using @@SPID and pass that in as one of the
filter values to the stored procedure so that the trace is generated for the activity done by that spid only. When you
do that though, also make sure that you provide means of switching the traces off as well, by making another call
to the stored procedure after the trace has been generated. So, these will be the steps in that case:
a) Call the stored procedure with the appropriate trace type, passing in the @@SPID value as the filter.
b) Do the application steps which will get traced and a trace file will be generated.
c) Call the stored procedure again to turn off the trace.
Using this approach you can easily achieve an effective way of tracing events and their data columns for different
trace scenarios and add rich functionality of run-time traces to your application as well.
This article is the first of a multi-part series detailing the SQL Server Upgrade process from the technical, logistical
and business perspective. In the coming weeks, expanded articles will be published in the following areas:
Needless to say, upgrading to SQL Server 2000 can be a daunting task based on the criticality of the systems,
level of coordination and technical planning. As such, the series of articles will provide valuable explanations,
charts and graphics to best illustrate the points to assist you in the project. With this being said, be prepared to
work with new team members, wear new hats and resolve challenging issues in the course of upgrading to SQL
Server 2000.
The motivation for this article is the realization that in many companies applications are in place, but the right tool
for the job is not being leveraged. Too often, piece-meal applications are supporting business critical functions
that cannot be leveraged to save time nor generate revenue. To further elaborate:
• Companies are still running SQL Server 6.5 and limping along by having IT staff spending hours resolving
server down, corruption and data integrity problems with minimal user productivity
41
• Microsoft Access has grown from a desktop database to a department of users that are severely stressing
the database ultimately leading to corruption and frustration
• 3rd party Applications need to be upgraded in order to leverage new functionality released by the vendor
and needed for the business
• Microsoft Excel is being used to run business critical functions and important data is scattered across the
organization and is sometimes mistakenly lost
The bottom line contribution by the DBAs for the business is to improve efficiency and accuracy for the user
community as well as save time and money for the business. The DBAs win by being able focus on more
challenging IT projects on the latest and greatest technology. I am sure you can agree this is a WIN-WIN
scenario for everyone involved.
BUSINESS JUSTIFICATION
I JUSTIFICATION SUPPORTING INFORMATION
D
1 Total Cost of Ownership3 1. Total Cost of Ownership (TOC) lower than any other DBMS in the market
2 System Performance3 2. Unprecedented System Performance for both OLTP and OLAP environments
3. Improved ability to scale up and out by leveraging expanded hardware resources
1. As much as 64 GB of Memory and 32 Processors
3 Microsoft Support 4. As SQL Server 6.5 ages, Microsoft is providing less support for the product and will eventually have
few Support Engineers available to address critical needs
5. Currently, if you have a business critical issue with SQL Server 6.5, the typical Microsoft Support
recommendation is to ‘Upgrade to SQL Server 2000’
4 Regulated Industry 6. Upgrading to SQL Server 2000 becomes especially important for companies in regulated industries
Requirements that may require a several year data retention period
7. Relying on SQL Server 6.5 for the short term may not be an issue because staff is familiar with the
technology
5 DBA Support 8. In five years, finding individuals to administer SQL Server 6.5 will be difficult and not attractive to
DBAs who are typically interested in the latest and greatest technologies
6 Level of Automation 9. The level of automation from the SQL Server tool set
1. Enterprise Manager
2. Query Analyzer
3. Profiler
4. Data Transformation Services (DTS)
42
7 New Capabilities2 10. Analysis Services
11. DTS
12. XML Integration
13. Optimizer Enhancements
14. Functions
15. DBCC’s
16. Log Shipping
17. New Replication Models
18. Full Text Indexing
19. Database Recovery Models
20. Linked Servers
8 Third Party Products 21. SQL LiteSpeed – Compressed and Encrypted backups – www.sqllitespeed.com(1)
22. Lumigent Entegra – Enterprise Auditing Solution –
www.lumigent.com/products/entegra/entegra.htm
23. Lumigent Log Explorer – Review and Rollback Database Transactions -
www.lumigent.com/products/le_sql/le_sql.htm
24. Precise Indepth for SQL Server – Performance Tuning -
www.precise.com/Products/Indepth/SQLServer/
25. NetIQ SQL Management Suite – Enterprise Monitoring and Alerting -
www.netiq.com/products/sql/default.asp
43
• Determine time frame and responsibility per task1
• Incorporate meetings, sign-off and hyperlinks to existing information into the plan1
• Leverage a Project Management tool like Microsoft Project 2002 – For more information refer to –
http://www.microsoft.com/office/project/default.asp
44
The next section of the article provides a fundamental outline of the Upgrade Project Phases for the SQL Server
2000 project which can serve as a starting point for the Project Plan.
45
how to prevent management’s biggest fear during systems upgrades with a redundant architecture. Be sure to
check it out!
The Story
You’ve spent thousands of dollars on that cool technology; clustering, redundant controllers, redundant disks,
redundant power supplies, redundant NIC cards, multiple network drops, fancy tape backup devices and the latest
and greatest tape technology. You’re all set. There’s no way your going to have downtime.
But one day something does go wrong; is it the disks, no way you’ve implemented RAID with hot swappable hard
drives; is it the server, can’t be, you’ve got your servers clustered and any downtime due to failover would be so
small that hardly anyone would even notice it. Well if it’s not your stuff it must be the network. Those guys are
always making changes and not letting you know about it until there’s a problem. No, checked with them, no
changes. What’s left? It must be the application. Once again that application group rolled out a new version and
no one informed the DBAs. Once again foiled, the application team says no changes went out.
A little investigation on the database and you’ve noticed that some of the create dates on a few stored procedures
have yesterday’s date. I guess they could have been recompiled, but you didn’t do it. You take a look at one of
the procedures and, lo and behold, someone was kind enough to actually put in comments. It was Dave, one of
the DBAs in your group and guess what? He’s on vacation this week. It turns out he created a scheduled job to
run this past Sunday and forgot to let you know about it. He changed a bunch of procedures for a load that occurs
each month. You don’t know much about the load except that it runs and he is responsible. You have no idea
why the change was made nor, if you undo the change, what effect it might have. To make things worse you try to
find the old code, but you can’t find it anywhere.
The heat starts to rise as the phones keep ringing with people wondering why the database is not responding.
You start to panic trying to figure out what to do. This is a pretty critical database and a lot of people in your
company need it to do their job. The last time something like this happened you caught hell from your boss,
because her boss was breathing down her neck. Now you wish you were the one on vacation instead of Dave.
You take a deep breath and think things through.
You remember Dave sent you an email about the load, around this time last year when he went on vacation. You
quickly do a search and you find the email. The email gives you steps on how to undo the load and what, if any,
consequences you may face by undoing things. You go to a previous night's backup, do a database restore and
script out the procedures. You’re taking a gamble that you’ve got the right procedures, but that’s your only course
of action. After five or six hours of user complaints and a lot of sweating you’ve got the database back to normal
again or at least you think so. You say to yourself, “I need a vacation and Dave’s dead meat when he gets back.”
The Solution
Have you ever found yourself in this situation? Something gets changed and you don’t know about it until there’s
a problem. Or someone makes a change and says “Don’t worry, it’s a small change. No one will even notice.” I
think we have all found ourselves in these situations. The only way to fix things like this is to bolt down your
servers and make the users sign their life away if they want to use your server. Not too likely, but it’ll work if you
could get it implemented.
I think we need to look for a solution in the middle of the road. Something that works for you as a DBA and
something that works for the rest of the world. People just don’t understand how stressful your job really is.
You’re the bottom of the totem pole, well maybe the NT Engineers are the bottom, but still you’re pretty close. All
types of changes occur outside of your control and the only time you are aware is when something goes wrong.
Well you might not be able to fix the things outside of your control, but you are the DBA, the master of the
databases. In order to implement change control company-wide it takes a lot of effort, coordination, and buy-in
46
from a lot of people. But that doesn’t mean you can’t start with your own domain and put control mechanisms in
place for the databases. So where do you begin?
Start Simple
For most changes to take effect and have a quick payback, implementing things slow and steady is the way to go.
Identify a critical database, kind of like the one Dave screwed up, and start to create guidelines around how you
would handle changes for this database. If you try to make a master plan to solve all of the problems, you will
never get it implemented. Create a small project plan or a task list of things you need to accomplish and take care
of one item at a time. If something doesn’t work, adjust the plan accordingly.
Instead of making production changes on the fly, make changes on a periodic controlled basis. If people know
that changes are to be made on a set schedule you set, they can adjust their schedule accordingly. It doesn’t
mean that you can never put changes into production outside of this schedule, it just means that somebody better
have a really good reason why they need something immediately, instead of waiting for the scheduled release.
This may sound like it will slow down the development process and you know your users need those changes right
away, but having a more controlled approach will actually benefit your end users as well. The users will know
what’s coming and when it’s coming instead of finding out about a change after something that used to work no
longer works.
47
Always have a back out plan
Whenever you move objects and configurations into production always make sure you have a way to back out
your changes even if it is something as small as adding or changing one stored procedure. Your back out plan
can be from as simple as restoring a backup, to having complex back out scripts and a coordinated effort with
other key groups (i.e. Applications, NT Engineering, Networking). The back out plan is just as important as your
roll out plan. You may never have to use it for an implementation, but the one time you do you’ll be glad you took
the time to put it together.
Create a process that you can use over and over again for either the same application upgrade or for future
projects. Take a look at all of the documents, processes, emails, etc… that you have used in the past and create
a set of reusable documents that can be used for future projects. If you can recycle and reuse what you have
already put together, it will simplify and streamline your procedures.
If you really want to have more control over changes to your database, you need to first look at what you can do to
get this done. You can’t keep blaming all those Developers or Users if you haven’t set the guidelines. After you
have done your homework, then you can start to involve others. Take a look at the things that you have control
over or the things you can bring to the surface that someone has to be aware of and manage. Face it, if you don’t
do it, who will?
Convince others
As a DBA this probably sounds good to you, but how do you convince others that will have to change the way they
do things? Good question!
• Past mistakes – Take a look at past mistakes and how a process like this will eliminate the issues from
happening again.
• Management – Find someone above you that will champion the cause and take it to the next level.
• Find others that want a change – Find other people that see there has to be a better way and get them
to join you in your effort.
• Collaborate – Talk to other people in an informal manner. Get feedback from them, so you can address
their concerns early in the process. This is a sure way to get them to feel like part of the solution instead
of part of the problem.
Summary
It may seem like a daunting task to put a change management process in place for your environment, but it’s not
impossible. Many companies are already doing it through years of trial and error. The only way to get a better
process in place is to start making changes to your existing process. You might need to just formalize what you
already have or you may need a complete overhaul. Whatever it may be, start small and think big.
References
Change Management for SQL Server Presentation by Greg Robidoux
Published 01.10.2003 – Greg Robidoux – Edgewood Solutions. All rights reserved 2003
48
Who Owns That Database?
Steve Jones
3/4/2003
Introduction
Recently I was testing a new backup product, SQL Litespeed, and while setting up a stored procedure to backup
my databases I ran into an interesting issue. Whenever I ran this statement:(0)
insert #MyTempTable exec sp_helpdb
I received an error about not being able to insert a null value into the table. This was slightly confusing as I'd
declared the TempTable to accept null values. So I decided to dig in a bit further and execute only the sp_helpdb
section. Surprisingly enough, this returned the same error, unable to insert a NULL value, column does not allow
NULL values. Specifically the owner column.
Hmmmmmm.
After a moment of sitting there with a puzzled look on my face, I executed a simple:
select * from master.dbo.sysdatabases
to see where I might be having an issue. No NULLs in the sid column, which should map to the db owner. Now I
was more annoyed than confused and needed to dig in further. Fortunately Microsoft allows us to examine the
code behind the system stored procedures. Actually maybe they couldn't figure out a way to protect that code
either (since no one I know has either), so you can look at it. In Query Analyzer, I browsed to the master database
and selected the sp_helpdb procedure from the Object Browser. A quick right click and a script as create gave me
the text of the procedure. Very quickly I zeroed in on this section of the code:
/*
** Initialize #spdbdesc from sysdatabases
*/
insert into #spdbdesc (dbname, owner, created, dbid, cmptlevel)
select name, suser_sname(sid), convert
(nvarchar(11), crdate),
dbid,
cmptlevel from master.dbo.sysdatabases
where
(@dbname is null or name = @dbname)
Since this is the area where the temp table is being populated and there is a column called "owner", I was
guessing this was the problem area.
That didn't seem right. Check master, that has "sa" as the owner. Check a couple other SQL 2000 servers and
they have "sa" as the owners. There is a "SID" for this column, so what is the issue? The issue is the
suser_sname() function, which returns the name from the domain controller for this SID. In my case, however, the
domain account was deleted, so there is no matching SID. As a result, the function returns NULL.
OK, kind of interesting. How do I fix this? Well it turns out to be very simple. Though not as simple as I expected.
My first thought was to use sp_changedbowner to change the owner to "sa". This procedure is run from the
database and takes the name of an account to change to. No big deal, give it a try. It runs and returns
Server: Msg 15110, Level 16, State 1, Procedure sp_changedbowner, Line 46
The proposed new database owner is already a user in the database.
Not exactly what I was hoping for. A quick dig through Books Online confirmed that this is expected and either
there is no easy workaround, or I'm not very good at searching Books Online. I'll let you decide which is more
likely. I suppose I could have dropped sa, which is mapped to dbo for the databases, but that seemed risky to me
and I wasn't really looking for a server outage to fix this little annoyance. Instead I decided to try a simple, albeit
probably not always recommended technique. I know that the SID for the "sa" account is always 0x01, and I know
that I can run a simple command that will allow me to update the system tables. My first test was on model
because, well, it's not a critical database, and I know I can always grab model from another server and attach it
here. I ran:
49
sp_configure 'allow updates', 1
reconfigure with override
update sysdatabases
set sid = 0x01
where name = 'model'
50
DTS, ETL, and BI
The acronym chapter dealing with the movement of data and large scale analysis, a world most SQL
Server DBAs deal with relatively little. DTS, Data Transformation Services, first introduced in SQL
Server 7.0, changed the way the ETL, Extraction, Transformation, and Load, industry built products
forever. Once exclusively for large enterprises with big pockets, ETL moved down to every desktop
developer and administrator working with SQL Server. A radical shift for the entire BI industry. BI,
Business Intelligence usually deals with OLAP, Analysis Services, Cubes, and various other
mechanisms for examining large amounts of data and drawing conclusions.
51
Auditing DTS Packages
Haidong Ji
10/6/2003
I received quite a few emails from readers who read my article on managing and scheduling DTS packages. Many
readers like the idea of using a SQL Server login so that all packages are owned by the same account. Therefore
a group working on the same project can edit each other's packages. However, many also asked the question of
auditing. Sometime we want to know which person in a group that share the same SQL login edited the package. I
think this can be useful in some situations.
In this article, I will show you how we can create an audit trail for a DTS package. This method can provide
information of who modified a package, when, and from what workstation. It is not only good for auditing DTS
package changes, it can also be modified for auditing changes to other tables. In a lot of databases we manage,
we all know that some tables are more important than others, such as tables for salary, social security number,
etc. With very little modification, you can use this method to audit changes to those tables as well.
Background information
When we are searching for ways to do auditing, we naturally turn to triggers. Triggers are a special class of stored
procedure defined to execute automatically when an UPDATE, INSERT, or DELETE statement is issued against a
table or view. Triggers are powerful tools that sites can use to enforce their business rules automatically when
data is modified. We will use triggers to create an audit trail for DTS packages.
As most of you know, DTS packages are stored in the MSDB database. Whenever a DTS package is created, a
new record is created in the sysdtspackages table. Likewise, when a package is modified, a new record will be
inserted also. The only difference is that this updated record retains the id of the package and generates a new
versionid. Therefore we will just use the INSERT trigger. SQL Server automatically saves the old version of a
package when it is being updated and saved. This gives you great flexibility of going back to the old version if
needed.
Since SQL Server keeps old versions of a package, there is no need for us to keep the before-and-after states of
the packages. Therefore, to keep track of who made modifications at what time from where, we need to use data
stored in the sysprocesses in the Master database. Among the many columns in the table, the following are of
particular interest: spid (SQL Server process ID), hostname (Name of the workstation), program_name (Name of
application program), cmd (Command being executed, not the full SQL statements though), nt_domain (Windows
domain for the client), nt_username (Windows user name), net_address (Assigned unique identifier for the
network interface card on each user's workstation, NIC card number), and loginname (SQL or Windows login
name).
How do we get that information from sysprocesses, you may ask. Fortunately, SQL Server provides a global
variable of @@SPID. Based on @@SPID, we can find out who-is-doing-what-when-from-where from the
sysprocesses table.
52
and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[DTSAuditTrail]
GO
Below is the trigger script on the sysdtspackages table within msdb database. One technique I want to highlight is
the use of SCOPE_IDENTITY() function. SCOPE_IDENTITY() returns the last IDENTITY value inserted into an
IDENTITY column in the same scope. Since the audit table has an identity field, I will use that to update the record
with package name information. Using SCOPE_IDENTITY() makes the code look cleaner and simpler. We also
save a few lines of code. Please note that, in my case, the audit table is in the DBA database. As I said earlier,
you can put this table into msdb database. In that case, a slight modification of this trigger is needed. In any case,
you want to make sure that the ID that modifies the DTS package has INSERT authority to the newly created audit
table.
CREATE TRIGGER [PackageChangeAudit] ON [dbo].[sysdtspackages]
FOR INSERT
AS
--Declare a variable for package name
declare @PackageName varchar(100)
--Insert values into the audit table. These values come from master..sysprocesses
--based on @@SPID
insert into dba..DTSAuditTrail (cmd, dbid, hostname, net_address, nt_domain,
nt_username,
program_name, spid, status, loginame)
select cmd, dbid, hostname, net_address, nt_domain, nt_username, program_name,
spid, status, loginame
from master..sysprocesses where spid = @@SPID
--Get the package name
select @PackageName=name from inserted
--Update audit table with package name. Note SCOPE_IDENTITY( ) function is
--used here to make the code cleaner
update dba..DTSAuditTrail set PackageName = @PackageName
where dba..DTSAuditTrail.DTSAuditID = SCOPE_IDENTITY( )
After the trigger is created, all modifications will be recorded into the audit table. You can search that table using
package name, login ID, workstation name, and timestamp. Hopefully it can provide you with a good idea of
changes made to the packages you manage.
Conclusion
In this article, I showed you how to use @@SPID to create an audit trail for DTS packages. If you are interested in
DTS automation like automating DTS logging. You can read this article I wrote last month.
53
Automate DTS Logging
Haidong Ji
9/9/2003
Many DTS packages are written by developers who may not know much about SQL and/or SQL Server. With the
popularity of DTS as an ETL tool increasing everyday, many SQL Server DBAs are called to debug and
troubleshoot DTS packages that were poorly written and organized. One important tool to help this is DTS logging.
A DBA can use package log to troubleshoot problems that occurred during the execution of a DTS package. The
DTS package log, unlike SQL Server error log and the DTS exception log, contains information about the success
or failure of each step in a package and can help determine the step at which a package failure occurred. Each
time a package executes, execution information is appended to the package log, which is stored in msdb tables in
SQL Server or in SQL Server Meta Data Services. You can save package logs on any server running an instance
of SQL Server 2000. If a package log does not exist, the log will be created when a package is run.
An executing package writes information to the package log about all steps in the package, whether or not an
individual step runs. If a step runs, it will retain start and end times, and the step execution time. For steps that do
not run, the log lists the steps and notes that the step was not executed. In addition, with the proliferation of
packages on a server or servers, you can use DTS logging records to determine which package(s) is no longer
used and get rid of orphaned packages. I'll probably write this in a different article.
You can turn on DTS logging manually. However, it can be tedious and time-consuming, especially if you have
many packages to manage. In this article, I will show you how to turn on DTS logging programmatically. Package
logging is only available on servers running an instance of SQL Server 2000. As such, this article only applies to
SQL Server 2000.
Manually turn on DTS logging
One way to turn on DTS logging is to open the package in DTS designer, with no object selected within the
package, click the property icon, or go to the Package menu and select Properties. The package property window
(not property window of any individual component) will pop up. You can click on the Logging tab and fill out the
relevant information to start DTS logging. See the attached image.
54
However, if you have many packages to manage, manually turning on each package logging can be tedious and
time-consuming. That is why I wrote the following scripts to accomplish this task.
Use ActiveX scripts to turn on DTS package logging automatically
With DTS package automation, we naturally turn to SQL-DMO. Using SQL-DMO, you can pretty much automate
anything that is SQL Server related. The following code uses SQL-DMO to turn on package logging for a given
server. You will need to create a package. Within the package, create an ActiveX task that has the attached code
below. You then need to create 3 global variables (data type string) within this package: ServerName,
SQLLoginID, and SQLLoginPassword. The variable names explain their purpose. After you give appropriate
values to the three global variables, you are good to go.
The key concept used here is the PackageInfos collection. The EnumPackageInfos method returns a
PackageInfos collection containing information about all the packages stored in SQL Server 2000. We then use
PackageInfos.Next method to walk through every package within the collection and turn on the logging property of
that package.
After running this task, all packages logging will be turned on. However, if you create this package on the same
server with the other packages, this ActiveX package's logging property will not be turned on because it is in use.
It cannot flip the logging button while it is open. Another thing you will notice is that the visual layout of various
package components will change after this is run, but the components remain the same.
'**********************************************************************
' Author: Haidong "Alex" Ji
' Purpose: To turn on package execution logging for each and every DTS
' packages in a given server. This is
especially useful
' when there are many (hundreds) packages to
handle.
' Note: 1. This script uses DTS global variables called ServerName,
' SQLLoginID and SQLLoginPassword;
' 2. ServerName defines the server whose DTS
packages'
' execution logging you want to change. The
other 2 DTS global
' variables' names explain their purposes.
Change those
' variables' values to suit your specific needs
' 3. It seems that the layout of various
Package component will
' change after this is run, but the components
remain the same
'************************************************************************
Function Main()
Dim oApplication ' As DTS.Application
Dim oPackageSQLServer ' As DTS.PackageSQLServer
Dim oPackageInfos ' As DTS.PackageInfos
Dim oPackageInfo ' As DTS.PackageInfo
Dim oPackage ' As DTS.Package
'Note: It is IMPORTANT that oPackage be instantiated and destroyed within the loop.
Otherwise,
'previous package info will be carried over and snowballed into a bigger package
every time
55
'this loop is run. That is NOT what you want.
Do Until oPackageInfos.EOF
Set oPackage = CreateObject("DTS.Package2")
oPackage.LoadFromSQLServer DTSGlobalVariables("ServerName").Value,
DTSGlobalVariables("SQLLoginID").Value, DTSGlobalVariables("SQLLoginPassword").
Value,DTSSQLStgFlag_Default , , , , oPackageInfo.Name
oPackage.LogToSQLServer = True
oPackage.LogServerName = DTSGlobalVariables("ServerName").Value
oPackage.LogServerUserName = DTSGlobalVariables("SQLLoginID").Value
oPackage.LogServerPassword = DTSGlobalVariables("SQLLoginPassword").Value
oPackage.LogServerFlags = 0
oPackage.SaveToSQLServer DTSGlobalVariables("ServerName").Value,
DTSGlobalVariables("SQLLoginID").Value, DTSGlobalVariables("SQLLoginPassword").
Value, DTSSQLStgFlag_Default
Set oPackage = Nothing
Set oPackageInfo = oPackageInfos.Next
Loop
Main = DTSTaskExecResult_Success
End Function
Conclusion
In this article, I showed you how to use SQL-DMO to turn on DTS package logging. This is especially useful when
there are many (hundreds of) packages to handle. For DTS package ownership and scheduling issue, please see
a different article I wrote a while ago.
Introduction
Business intelligence can improve corporate performance in any information-intensive industry. With applications
like target marketing, customer profiling, and product or service usage analysis, businesses can finally use their
customer information as a competitive asset. They can enhance their customer and supplier relationships,
improve the profitability of products and services, create worthwhile new offerings, better manage risk, and reduce
expenses dramatically. In order to capture the power of business intelligence, a proper data warehouse needs to
be built. A data warehouse project needs to avoid common project pitfalls, be business driven, and deliver proper
functionality.
56
support systems.” (Mimno pg 4) Further, a data warehouse should not be considered the quick fix. A data
warehouse should be viewed as part of an overall solution of data storage and reporting needs. A common
mistake is the “failure to anticipate scalability and performance issues.” (Mimno pg 4) A data warehouse needs
proper architecture, application configuration that would include RAID configuration and data normalization. RAID
configuration constitutes how the data is placed across the disks of the server. In addition, it defines the
performance and data redundancy of data stored on the server. Data normalization is a term that refers to data
analysis technique that organizes data attributes such that they are grouped to form non-redundant, stable,
flexible, and adaptive entities.
Data warehouses exist to facilitate complex, data-intensive, and frequent ad hoc queries. Accordingly, data
warehouses must provide far greater and more efficient query support than is demanded of transactional
databases. The data warehouse access component supports enhanced spreadsheet functionality, efficient query
processing, structured queries, ad hoc queries, data mining, and materialized views. In particular, enhanced
spreadsheet functionality includes support for state-of-the-art spreadsheet applications (e.g., MS Excel) as well as
for OLAP applications programs. These offer preprogrammed functionalities such as the following:
1. Roll-up: Data is summarized with increasing generalization (e.g., weekly to quarterly to annually).
2. Drill-down: Increasing levels of detail are revealed (the complement of roll-up).
3. Pivot: Cross tabulation (also referred as rotation) is performed.
4. Slice and dice: Performing projection operations on the dimensions.
5. Sorting: Data is sorted by ordinal value.
6. Selection: Data is available by value or range.
7. Derived (computed) attributes: Attributes are computed by operations on stored and derived values.
“The classic approach to BI system implementation, users and technicians construct a data warehouse (DW) that
feeds data into functional data marts and/or "cubes" of data for query and analysis by various BI tools. The
functional data marts represent business domains such as marketing, finance, production, planning, etc. At a
conceptual level, the logical architecture of the DW attempts to model the structure of the external business
environment that it represents.” (Kurtyka)
57
The business intelligence data warehouse allows users to access information with a reporting tool to execute
business planning and analysis. The proper BI data warehouse will deliver a cyclical “process of data acquisition
(capture), analysis, decision making and action.” (Kurtyka) .
Conclusion
A data warehouse can deliver a consolidated reporting source with increased flexibility and performance.
However, in order to deliver a robust BI data warehouse solution some guiding principles need to be followed to
deliver a successful product. First, the data warehouse project should encompass company directives in order to
avoid individual projects that lead to stovepipe data marts. In addition, the ETL program that loads the data
warehouse should not be used as a toll to fix source data. Finally, the data warehouse solution should be
architected to provide superior performance and data normalization. Second, the warehouse solution should be
business driven. The project should provide a solution for a specific problem. A proper business case should be
approved and the project controlled by a steering committee. Third, the BI data warehouse should be built to
increase productivity, increase data flexibility, and reduce company cost by integrating multiple sources of
information.
References
• Kurtyka, Jerry. 2003. “The Limits of Business Intelligence: An Organizational Learning Approach.”
http://www.dmreview.com/master.cfm?NavID=193&EdID=6800(0)
• Mimno, Pieter. 2001. “Building a Successful Data Warehouse” Boot Camp Presentation. http://www.mimno.com
Recently one of our business clients wanted to have a Business Intelligence System for his company. Because
of availability of many BI platforms, he needed to do a comparison between MS SQL Server 2000 and Oracle 9i BI
58
platforms. So, I surfed on various web sites, read many white papers, and ended up with this document. Hope this
will be useful for you too.
The Leadership
Microsoft: Microsoft is quantitatively the OLAP leader and its BI platform is the equal of any other leaders such as
Hyperion, IBM, and Oracle. And the pricing and packaging advantages demonstrated with OLAP in SQL 2000 are
significant. As a result, Microsoft BI platform delivers value that is not approached by the platforms of other
leaders
Oracle: Oracle offers a more technologically consistent BI platform by delivering both OLAP and relational
capabilities in its database. But its OLAP implementation has not been widely adopted by tools and application
suppliers, and therefore has not yet achieved significant market share.
Oracle:
Toolsets: Oracle 9i Warehouse Builder provides relational build and manage capabilities. Oracle Enterprise
Manager provides OLAP build and manage capabilities.
Extraction data sources: IBM DB2, Informix, MS SQL Server, Sybase, Oracle, ODBC, Flat Files.
59
OLAP Interfaces
Microsoft:
MDX (Multi Dimensional Expression): This is Microsoft native OLAP interface and is an acronym for
Multidimensional Expression. In many ways, this is very similar to Structured Query Language (SQL), but not an
extension of SQL language. MDX provides Data Definition Language (DDL) syntax for managing data structures.
DSO (Decision Support Objects): This library supplies a hierarchical object model for use with any development
environment that can support Common Object Model (COM) objects and interfaces such as MS Visual C++, MS
Visual Basic. Its objects encapsulate server platform, SQL Server databases, MDX functions, OLAP data
structures, Data Mining models and user roles.
Pivot Table Service: This is a client-based OLE DB provider for Analysis Service OLAP and Data Mining
functionalities. This is powerful but heavy client interface.
XML for Analysis: This is a Simple Object Access Protocol (SOAP)-based XML API that has been designed by
Microsoft for accessing SQL Server Analysis Service data and functionality from the web client applications. This
makes the SQL Server 2000 BI platform the first database to offer powerful data analysis over the web. And this
allows application developers to provide analytic capabilities to any client on any device or platform, using any
programming language.
Oracle:
OLAP DML: This is the native interface to Oracle 9i data and analytic functions. Through OLAP DML, application
can access, query, navigate, and manipulate multidimensional data as well as perform analytic functions.
Java OLAP API: Application can connect to multidimensional data and can perform navigation, selection and
analysis functions but not all functions. For a example, Java application must execute OLAP DML command when
the functionality is not available.
SQL and PL/SQL: By using predefined PL/SQL packages that access OLAP command directly or OLAP
multidimensional views or accessing table functions directly, OLAP data and functionalities can be accessed.
Oracle:
Oracle 9i Data Mining API (java): This is open API and Oracle makes its published specification easily available.
Conclusion
Microsoft and Oracle address all of our business intelligence platform requirements. They provide relational data
warehousing, build and manage facilities, OLAP, data mining, and application interfaces to relational data
warehouses, to OLAP data and analytic functionality, and to data mining.
Microsoft provides a comprehensive business intelligence platform. Build and manage capabilities, OLAP
capabilities, and application interfaces are its key strengths. Data mining is very new, although data mining
integration and data mining tools are quite good.
Oracle provides a comprehensive business intelligence platform. While this platform has a complete set of
components, OLAP and data mining capabilities are unproven, data mining tools are low level, and build and
manage capabilities are not consistently implemented for relational and OLAP data.
60
When considering the price, Microsoft leaves Oracle behind. Microsoft entire BI platform can be bought at
$19,999 but it is about $80,000 for Oracle before adding $5,000 per user fees for build and manage capabilities.
I highly appreciate all your comments about this article. You can reach me through
dinesh@dineshpriyankara.com
61
dtsrun /Fmydtspkg /Nmydtspkg
to run a package named mydtskpkg.dts. Of course, we still have one major problem to overcome: the package is
still executing against the database we created on.
Making the package portable
So, how do we deal with the fact that the server name and database name are in effect hard coded in the
package? The DTS editor provides the “Dynamic Properties Task”. Add a Dynamic Properties Task to the
package. The properties window for it will appear. Type in a description, such as “Set Data Source”, and then
click the “Add…” button. Open the tree to Connections-Connection 1-Data Source. Click the checkbox “Leave
this dialog box open after adding a setting”, then click the Set… button.
In the next dialog box, set source to Global Variable and then click the Create Global Variables… button. Enter a
Name, leave the type as String, and enter a default value. Now choose the variable that you just created.
Repeat the process described for any other properties that you want to change, such as Initial Catalog (the
database name) and User ID and Password if you are not using integrated security. If you are extracting to a text
file, the Data Source for that connection will be the filename.
Important: Now that you have added the Dynamic Properties task, make sure it is the first task to execute by
adding a “On Success” workflow between it and Connection 1. If you don’t do this, the job will fail because the
values are not yet set when it starts to execute the extraction step. Your DTS package should now look
something like:
62
Figure 3. A DTS package with the Set Data Source task
At this point, save the package and execute a test run of the package from Enterprise Manager to confirm that the
changes made have been successful.
Setting variables from the command line
As you recall from the first section, we can run a DTS package from the command line using the dtsrun utility. But
how do we set the global variables? To do this use the /A switch. For example,
dtsrun /Fmydtspkg /Nmydtspkg /A”Server:8=devserver”
will set the global variable Server to devserver. The :8 is required to indicated that data type is string.
Tip: The global variable names are case-sensitive. Make sure you exactly match the name in your command line
with the name used in the package. If they don’t match, no error is reported, but the command line setting is
ignored and the default value set in the package is used instead.
63
rundts.bat
Edit the values in the four set statements accordingly to reflect your server name, database name, directory for the
extracted data, and directory for the extract log. The extract will use the same filename as the DTS package, but
with a .txt extension. Setting the /W flag to TRUE in the CALL dtsrun line indicates to log the output to the event
viewer. There are also two flags that rundts.bat accepts. The first indicates whether to start notepad and open the
output file after each step. The second flag determines whether to pause between each step. This allows the
execution to be monitored or to run unattended. So if you need to run three DTS packages, you can create
another batch file as:
CALL RUNDTS extract1 Y Y
CALL RUNDTS extract2 Y Y
CALL RUNDTS extract3 Y N
This will pause processing between each extract and open the output file for review.
Conclusion
This article provided a straightforward approach to make DTS packages portable between servers or databases.
By leveraging the SQL Server 2000 Dynamic Properties Task and the ability to run packages from the command
line, the package can be migrated with almost no effort. Of course, what is presented is just a starting point, but
the general technique can modified to meet many needs.
The good old BCP easily moves to the side as a historical tool as XML becomes the worldwide adopted standard
for data interchange. So if you are using BCP to import text data files, and you want to use XML import files
instead of text files, now is good time to change.
At first, we can see BCP utility called from xp_cmdshell procedure partial example:
DECLARE @cmd VARCHAR(256)
SET @cmd='bcp ' + @DbName + '..' + @TableName + ' in ' + @FullImportFileName + ' /
S' + @ServerName + ' /f' + @FullFmtFileName + ' -T'
EXEC @Result_cmd = master..xp_cmdshell @cmd BCP
Advantages:
• BCP enables fast import for big data files
• Database can import data without developing other application using BCP.
Although this method for importing data is very fast, it has several limitations when we want to use them in
complex data processing systems in integration with other users' applications:
• BCP is not appropriate for importing XML data
• Inattention using master..xp_cmdshell can seriously endanger SQL Server security
BCP enables full control over transactions from application to the final insert in database table. BCP was a very
good utility that can help to build application independent database. What does it mean? I want to see a clear
border between application and database. Databases have to be independent in the sense that every action from
application against database can be “Execute Procedure”. I don’t want any SQL statement from application code
acting directly in database risking damaging database logic. In that case applications bugs will have less
damaging effects in the database.
This is especially important if you want clear developers' responsibility in your development team. In the case of
an unexpected crash of application, some procedures can be executed by hand through SQL Query Analyzer.
This is because I want to grant the task for importing data in Database to the Database itself, instead of to some
application or user interface.
Then we need an appropriate tool for importing XML files in SQL Server database, called from Stored
Procedure. Although OPENXML statement can be used for direct import in database, I prefer this option:
64
Using SQLXMLBulkLoad.SQLXMLBulkLoad.3.0
On your SQL Server you have to install SQLXML3.0 SP1 (http://msdn.microsoft.com), then create the file
ImportData.xml in 'C:\Folder\ImportData.xml' ImportData.xml using the following data:
<ImportData>
<Row>
<Field1>Row1_ Filed1_Data</Field1>
<Field2>Row1_ Filed2_Data</Field2>
<Field3>Row1_ Filed3_Data</Field3>
</Row>
<Row>
<Field1>Row2_ Filed1_Data</Field1>
<Field2>Row2_ Filed2_Data</Field2>
<Field3>Row2_ Filed3_Data</Field3>
</Row>
</ImportData>
You also need to create a file called schema.xml in the same folder, as follows:
<?xml version="1.0" ?>
<Schema xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:xml:datatypes"
xmlns:sql="urn:schemas-microsoft-com:xml-sql">
<ElementType name="Field1" dt:type="string"/>
<ElementType name="Field2" dt:type="string"/>
<ElementType name="Field3" dt:type="string"/>
<ElementType name="ElementRow" sql:is-constant="1">
<element type="Row"/>
</ElementType>
<ElementType name="Row" sql:relation="TableImport">
<element type="Field1" sql:field="TabField1"/>
<element type="Field2" sql:field="TabField2"/>
<element type="Field3" sql:field="TabField3"/>
</ElementType>
</Schema>
Create TableImport1 in “YourDatabase” executing the following script SQL Query Analyzer:
CREATE TABLE [dbo].[TableImport1] (
[TabField1] [varchar] (40) NULL,
[TabField2] [varchar] (40) NULL ,
[TabField3] [varchar] (40) NULL) ON [PRIMARY]
GO
Then you can create a new procedure in your database:
CREATE PROCEDURE BulkLoad AS
DECLARE @objectINT
DECLARE @hr INT
DECLARE @src VARCHAR(255)
DECLARE @desc VARCHAR (255)
DECLARE @Schema VARCHAR(128)DECLARE @ImportDataVARCHAR(128)
DECLARE @ErrorFile VARCHAR(128)
65
END
ELSE
EXEC @hr = sp_OASetProperty @object, 'ConnectionString', 'provider=SQLOLEDB.1;data
source=SERVERNAME; database= YourDatabase;Trusted_Connection=Yes'
IF @hr <> 0
BEGIN
PRINT 'ERROR sp_OAMethod - ConnectionString'
EXEC sp_OAGetErrorInfo @object
RETURN
END
66
Security
Second only to the the Information Security group, DBAs are usually very diligent in controlling
security in the database. However this a complicated and difficult subject to deal with. 2002 saw a huge
security push from Microsoft, which resulted in quite a few SQL Server patches. 2003 brought us SQL
Slammer and ensured the security would remain at the forefront of the DBAs concerns.
67
Block the DBA?
Robert Marda
1/28/2003
Introduction
What?! That can’t be done, can it? The final answer is no. However, you can certainly block an
unknowledgeable DBA with the techniques I will describe in this article. The same techniques will block you and
other users from forgetting business rules and doing tasks you shouldn’t, or simply block you from accidentally
dropping the wrong table. I’m quite sure there are other ways to do the same things, ways that are considered
better. My goal in this article is to use some extreme methods which could take a sequence of steps to undo and
certainly will require some knowledge about system tables.
The ways I plan to do this mean modifying system tables (also known as system catalogs) which many DBAs
frown upon. Microsoft recommends you don’t modify system tables. Also, modifying system tables can mean
Microsoft won’t help you if a problem is related to said modifications. Having said this, I likely brand myself as a
rogue developer. I share them here now to show you what can be done should you feel the need to use these
methods, say in your development environment as a joke. You can automatically include all the techniques I
describe in this article in your list of worst practices and as such should not seriously consider any of them as
viable solutions.
Sample Code
The following sample code will be used for the examples in this article. I recommend that you create a new
database and then execute the below code using the new database:
CREATE TABLE [dbo].[TestTable] (
[col1] [char] (10) NULL
) ON [PRIMARY]
GO
Example 1: Block DROP TABLE Command
Here is a way to completely block the DROP TABLE command. First you must execute the following commands
on your test server to allow you to make changes to the system tables. Later in this section I will give you the
code to disallow changes to the system tables.
EXEC sp_configure 'allow updates', '1'
GO
RECONFIGURE WITH OVERRIDE
GO
Now execute the sample code given in the previous section. Execute the below code to mark the table you
created as a table that is replicated:
UPDATE o SET replinfo = 1
FROM sysobjects o
WHERE name = 'TestTable'
SQL Server will block you from dropping TestTable since it is now considered a replicated table. Upon
executution of the command “DROP TABLE TestTable” you will receive the following error:
Server: Msg 3724, Level 16, State 2, Line 1
Cannot drop the table 'TestTable' because it is being used for replication.
I don’t think I would ever mark all my user tables as replicated tables, however I have often considered changing
some of them. This would definitely avoid the mistake of issuing a DROP TABLE command on a table in
production when you thought you were connected to your development SQL Server.
68
EXAMPLE 2: Block New Database User
For this example we’ll do what you’ve either read or heard is not possible. We’re going to place a trigger on a
system table. What?! Don’t blink or you might miss something. Execute the code from Example 1 to enable
changes to system tables. Now execute the following code to let you place a trigger on the sysusers database:
UPDATE o SET xtype = 'U'
FROM sysobjects o
WHERE name = 'sysusers'
Please note that this change does not completely cause SQL Server to count the table as a user table. It will still
show up as a system table in Enterprise Manager. However, it will allow you to place a trigger on the table by
using the following code:
CREATE TRIGGER BlockNewUsers
ON sysusers
AFTER INSERT AS
DELETE sysusers FROM sysusers s INNER JOIN inserted i ON s.uid = i.uid
PRINT 'New users not allowed. Please ignore words on next line.'
Execute the below update to return the sysobjects table back to normal:
UPDATE o SET xtype = 'S'
FROM sysobjects o
WHERE name = 'sysusers'
Now execute the code that will disallow changes to system tables. Now create a new login for your SQL Server or
use an existing login. Copy the below code and replace ‘u1’ with the login you are going to use and execute the
code:
EXEC sp_adduser 'u1'
You should see the following in the Query Analyzer message window:
New users not allowed. Please ignore words on next line.
Granted database access to 'u1'.
You can view the users via Enterprise Manager or view the table sysusers and you will not find the new user since
the trigger did fire and deleted the user after it was added. You can drop the trigger at any time without modifying
the xtype of the system table.
69
Conclusions
In this article I have shown you how to modify system tables (also called system catalogs). I have shown you a
few ways you can modify the system table called sysobjects to block certain activities that a user with SA
privileges could normally do. I have also indicated that these techniques are best used as jokes on a
development SQL Server even though there is a slim chance they could be useful for other uses. Once more let
me stress that everything described in this article can be considered as worst practices and are shared to give you
a brief look into a system table and how SQL server blocks certain activities via replication and the system tables.
I look forward to your comments even if you do choose to blast me verbally for daring to share such information in
the way I did in this article. Feel free to let me know what you think.
If you've read any Microsoft literature, you know the party line is to use Windows authentication whenever
possible. This is sensible from a security perspective because it follows the concept of single sign-on. If you're not
familiar with single sign-on, it's pretty easy to understand: a user should only have to sign on one time to gain
access to any resources that user might need. By using SQL Server authentication, most users are going to have
to go through two sign-ons. The first sign-on comes when they logon to their computer systems. The second
comes when they logon to SQL Server. Admittedly, the user may not manually enter the username and password
but there are still two logins. Each additional login is another one for the user to keep track of. Too many logins
and eventually users will start writing down username and password combinations or store them in unsecured
Excel worksheets. However, if we can tie a user's rights within SQL Server (and the ability to access SQL Server)
to the Windows login, we avoid the user having to keep track of his or her SQL Server logon. We can rely on the
operating system and SQL Server to perform the tasks for authentication behind the scenes. It's all transparent to
the user. Just what most people want! And there you have one reason why Windows authentication is preferred.
Another one is administration. If I grant access for a Windows group, any members of that group have access. If a
system administrator adds a person to the group, the person has access and I as a lazy DBA have not had to lift a
finger. If a system administrator terminates a user account in the domain, that drops the user from the group. Said
user no longer has access to my system. Again, I've not had to lift a finger. This is the life!
Well, there's more to it than appeasing my laziness. Consider the scenario where you have a rogue user. The
system administrator disables the account immediately and punts the user off any logged on sessions the user
might have (there are ways to do this but I won't go into them). But said user has a laptop and has just enough
time to plug up, connect to the SQL Server and grab some data. If the user has to rely on a Windows login the
user is now out of luck. When the system administrator terminated said account, the user's access to SQL Server
was subsequently terminated. However, if the user had access through a SQL Server login and the DBAs haven't
been made aware of the situation, the user is in.
Of course, this type of scenario is rare. More often is the case where a user leaves a company and the Windows
account is disabled, but the SQL Server logins aren't cleaned up on all systems. Slips in communication do
happen. Even if you have nifty scripts to go and scrub all your database servers they are no good if you don't know
to remove the user. So in the end, Windows authentication comes to the forefront for ease of use and ease of
administration. But that's not what this article is about.
This article is about two well-known weaknesses in the SQL Server login passwords. One is a weakness as the
password is transmitted across the wire. The other is a weakness in how the password is stored. None of this is
new material. What does that mean? It means those who care about breaking into SQL Server systems have
known about them for a while. I do need to point out there hasn't been any major compromises in the news
because of these weaknesses. Keep that in mind as you read through the article (it puts things in perspective).
However, a good DBA is a well-informed one. So if you're not familiar with these two vulnerabilities, read on.
70
the attacker has to be in a position to capture the network traffic. In today’s switched environment, this isn’t easy
as it once was. Even in small office installations, most servers and workstations are connected via switches and
not hubs. The one issue with hubs is that everyone sees everyone else’s traffic. Enter in the switch. The switch
isolates my traffic. The only traffic I should see is traffic intended for just my system as well as any traffic intended
for all systems (broadcast traffic). If I look at things from a performance perspective, this is what I want. I do not
want to contend for bandwidth with the other 15 people I'm plugged in with on a 16-port network hub. Give me a
16-port switch any day so I get my bandwidth! Because of the advantages of a switch strictly from a performance
viewpoint, hubs are quickly being phased out. With the reduced prices on switches, even the most miserly
individual has a hard time cost-justifying a hub rather than a switch for a particular environment.
This is great from a security perspective because it isolates data streams so that I, by sniffing on my network port,
can’t see what John is doing on his. This isolation mitigates the SQL Server password vulnerability with respect to
most users. But it doesn’t eliminate everyone. The malicious attacker who also happens to be employed as a
network engineer can gain the setup I’ve described above. While I don’t mean to suggest all network engineers
are necessarily looking to do such things, the truth of the matter is there are some individuals who would. I don't
have any hard numbers, but I wouldn't even put it at 1% (1 in 100). The problem is we have to deal with that one
out of very many who would. Money is a big motivator and if such an individual can secure access to a critical
SQL Server and walk away with valuable information, expect the individual to do just that. As a result, if the SQL
Server passwords are easily decrypted (and they are), using SQL server logins should be an option of last resort,
one used when Windows authentication just doesn’t work. How weak is the encryption? Let’s find out!
Understanding XOR
There are a lot of methods for encrypting data and quite a few of them are very secure. Some of these methods
produce output that, even if you know the encryption algorithm, you still need the key used to encrypt the data in
order to decrypt the data within your lifetime. There are also methods that can be pulled apart by ten-year-olds.
Encryption by XOR fits the latter category. And that’s how a SQL Server password is encrypted when it goes
across the network. Bruce Schneier, one of the best-known people in the field of cryptography, makes the
comment, “An XOR might keep your kid sister from reading your files, but it won’t stop a cryptanalyst for more
than a few minutes.” Ouch! But he’s absolutely right. XOR is a logic algorithm, not an encryption one. However,
some applications tout having a fast encryption algorithm and it’s nothing more than XOR. So what’s wrong with
XOR? To understand that, let’s look at the XOR operation in more detail.
For those that may not be familiar with XOR, I’ll start with the OR operation. If I’m comparing two items, if either
one is true, my end result is true. In other words: Item1 OR Item2 = ??? For instance, I may want to compare the
following 2 statements:
Item1: The United States flag has the colors red, white, and blue.
Item2: The Canadian flag has the colors red, white, and blue.
If I look at these two items, I know Item1 is true. The US flag does have these three colors. Item2, however, is
false because the Canadian flag only has red and white. The OR operation only cares if at least one of the two
statements is true. Since Item1 is true, I can evaluate my expression: Item1 OR Item2 = TRUE The only way an
OR operation evaluates to false is if both items are false. For instance:
Item1: Tokyo is the capital of the United States.
Item2: Richmond is the capital of Canada.
I have a scenario where both statements are false. In this case my OR operation will evaluate to false. So:
Item1 OR Item2 = FALSE If both statements are true, I’ll get a true result as well. For instance:
Item1: The United States flag has the colors red, white, and blue.
Item2: The Canadian flag has the colors red and white.
Item1 and Item2 are both true. The OR operation only cares if at least one of the two statements is true. If both
statements happen to be true, the OR operation will still evaluate to true. In fact, if the first statement happens to
be true, there’s usually no reason to evaluate the second. Because of this, programming languages like C# and
Java will shortcut the evaluation if Item1 is true. There are mechanisms to force the evaluation of the second item,
because sometimes it’s necessary to carry out some programming code that’s part of the evaluation process. But
typically, if the first statement is found to be true, there’s no need to look at the second. And here is where we see
a big difference between OR and XOR.
The XOR operation is also known as “exclusive or.” Unlike the OR operation, XOR will evaluate true only if one
and only one of the two items is true. If both items are true, the XOR operation will return false. The exclusive part
of the name means one and only one side can be true for the XOR operation to evaluate to true. So in the case of
my previous example where both Item1 and Item2 were true, XOR will evaluate to false.
Item1 XOR Item2 = FALSE
71
XOR at the Bit Level
I’ve intentionally used the work “item” because I wanted to keep what was being XORed abstract. In logic,
statements are typically compared. However, when dealing with computers we will sometimes compare
statements but other times compare bits. That’s why you’ll sometimes see XOR referred to as a “bit-wise”
operation. It is an operation that is often applied at the bit level because there are some definite uses for it. If
you’ve not done much work with logic in school, this all may seem a bit confusing (pun intended). One of the
helpful things I was shown in logic was the truth table. A truth table is simply a matrix of all the statements and
what they evaluate to for all cases of true and false. Table 1 is a classic truth table for XOR from a logic class.
Notice I’ve used p and q instead of item1 and item2. The letters p and q are often substituted for statements as a
shortcut measure.
p q p XOR q
True True False
True False True
False True True
False False False
By looking at the truth table given in Table 1, I can quickly see what happens when I XOR the two statements. I
can do the same for bits. In Table 2 I show the values, except I’ll be using bit1 and bit2 instead of p and q. I’ll also
use 1 and 0 instead of “True” and “False”
When we compare two sets of bits, we line them up and check each pair of bits
individually. Table 3 shows this process:
Stream 8 7 6 5 4 3 2 1
Stream1 1 0 1 0 1 1 0 0
Stream2 0 1 1 0 0 1 1 0
XOR 1 1 0 0 1 0 1 0
XOR is a simple operation to carry out. As a result, companies looking for “encryption” may decide to use it
because simple operations tend to be very fast and XOR fits the bill perfectly. The common user won’t have any
idea of how to decipher the data, so it appears secure. The operation is very quick, so the end user also doesn’t
72
see a huge performance hit. But the problem is it’s too simple. The average end user may not realize how to
decrypt the data, but any attacker worth his or her salt will. To make things worse, we reverse the XOR operation
by using XOR yet again. Observe in Table 4 I’ve taken the result of Table 3. I then XOR it with Stream2 from
Table 3 and I have Stream1 again.
Stream 8 7 6 5 4 3 2 1
XOR 1 1 0 0 1 0 1 0
Stream2 0 1 1 0 0 1 1 0
Stream1 1 0 1 0 1 1 0 0
If I know the key that was used, reversing XOR is trivial. I simply XOR the “encrypted” data by the key and I get my
original data back. If I don’t know the key, I do have methods available to determine the length of the key and
figure out what the key is. Those methods are beyond the scope of this article and aren’t necessary in the case of
SQL Server passwords. The reason they aren’t necessary is because when a SQL Server password is transmitted
across the network, each byte is XORed with the character 0xA5 (in hexadecimal representation). So my key is a
stream of 0xA5 characters, one for each character in the original password. Since I know ahead of time the
password stream has been XORed with 0xA5, I simply perform an XOR using 0xA5 and I get the stream as it
existed before the XOR.
Flipping Bits
Microsoft does throw in a step prior to XOR when encrypting the password. That step involves flipping sets of bits.
A byte is made up on eight bits. Half a byte, or four bits, is sometimes referred to as a nibble. If I look at a byte, I
can split it down the middle and get two nibbles. For instance, a byte of 10101100 has a nibble 1010 and another
nibble 1100. What Microsoft does is flip the nibbles. So my byte 10101100 becomes 11001010 and it is this
second byte that gets XORed.
Keep in mind that Unicode characters are represented by two bytes. Each byte is treated separately with regards
to flipping the nibbles. But in the case where a byte is 00000000, the flipped byte would look the same. The
reason I bring this up is while the password is passed as a Unicode string, the second byte for most Latin
characters (A, B, c, d) is 00000000 or 0x00. This little bit of information can often help us find the right packet of a
network trace, even if we know nothing of how to tell which frames are for logins and which are for data.
If you know how to read Tabular Data Stream (TDS) frames, you know what to look for to identify which ones
correspond to logging in to SQL Server. Remember I said Latin characters would have the second byte as 0x00?
Well, 0x00 XORed by 0xA5 is 0xA5. So even if you don't know the frame codes, you can look for a stream of
hexadecimal codes that have A5 as every other code (if you're dealing with passwords requiring Unicode
characters, you'll have to look for a certain code to identify what type of TDS packet - I'll cover this in a later
article). An example would be this:
A2 A5 B3 A5 92 A5 92 A5 D2 A5 53 A5 82 A5 E3 A5
If I'm dealing with Latin characters, I can drop the A5's and I get:
A2 B3 92 92 D2 53 82 E3
Once I find the stream, I can decipher the password. I would start by XORing 0xA5 against each character. Then I
flip the nibbles and I'm left with the ascii value for the particular letter. Once I look up the ascii value I have my
letter. If I do this for all the letters and I have my password. Table 5 demonstrates the deciphering process. The
first three streams are in hexadecimal.
73
Table 5. Deciphering the Password Steam
Stream 1 2 3 4 5 6 7 8
Trace A2 B3 92 92 D2 53 82 E3
XOR 07 16 37 37 77 F6 27 46
Flipped 70 61 73 73 77 6F 72 64
Decimal 112 97 115 115 119 111 114 100
Character p a s s w o r d
The original password was simply "password" and I have a match. All SQL Server passwords go across the wire
with this weak encryption algorithm. Once you have the right bit of info (the character stream), you can crack this
thing with a scientific calculator and an ascii table. Since the SQL Server password is so easily deciphered,
encrypting the connection between client and server becomes a necessary evil. Even given the use of Windows
authentication, I would suggest a secure connection using SSL or IPSec because even if the login information isn’t
being passed in plaintext (unencrypted) or something nearly as weak, the data will be.
David Litchfield’s paper “Threat Profiling Microsoft SQL Server” describes the XOR against 0xA5 but it doesn’t
discuss the flipping of the bits, which is part of the password “encryption.” A company calling themselves Network
Intelligence India Pvt. Ltd. posted a correction to Mr. Litchfield’s paper. You can find a link to both in the Additional
Resources section at the end. Before I get ahead of myself, let me say that a secure channel is an important part
of our overall security for SQL Server, but it, in and of itself, isn’t a cure-all for SQL Server logins. David Litchfield
and crew of NGSSoftware also found a weakness in the hash SQL Server generates to secure SQL server
passwords. This hash means anyone who manages to get sysadmin rights to our SQL Servers can potentially
crack the passwords and use them against us.
SQL Server doesn’t store user passwords in plaintext (unencrypted), but instead encrypts them. When SQL
Server encrypts a password, it uses an undocumented function called pwdencrypt(). This function produces a
hash. Since hash can mean different things based on context, let me define what I mean by a hash (also called a
hash value) and a hash function. A hash or hash function is some function that takes a stream of bits or a string of
characters and transforms them into another stream of bits or string of characters, usually smaller and of a fixed-
length. A good hash function will return very few duplicate hashes, the fewer the better. The reason a good hash
function should tend to return unique hashes is because these hashes are often used for comparison, such as
with a password check. Hash functions are usually one-way functions, meaning I can’t reverse engineer the
original bit stream or string of characters from the hash (as opposed to XOR which is a reverse function of itself).
As a result, if I am doing a password check I’ll get the password from the user and I’ll then throw it through the
hash function and generate a hash value. I’ll do a comparison of the hash value I’ve just generated against what I
have previously stored for the user. If I have a match, I’ll let the user in. As a result, the less chance of a duplicate
hash being generated, the better.
The pwdencrypt() function is a hash function. It takes a plaintext password and converts it into a hash. Actually, it’s
more correct to say two hashes. The pwdencrypt() function first generates what is called a “salt.” In cryptography
what we mean by salt is a random string of data that is added to the plaintext before being sent through the hash
function. In the case of our pwdencrypt() function, the salt is basically a random integer. It’s a bit more
complicated that that, but not by a whole lot. The salt is time-dependent, however. How can we tell? If we execute
SELECT
pwdencrypt('PASSWORD')
74
We’ll get different results even if we’re only a second apart. For instance, the first time I ran this query, I received
the following hash:
0x0100DE1E92554314EE57B322B8A89BF76E61A846A801D145FCAF4314EE57B322B8A89BF76E61A846A
801D145FCAF
The second time I ran the query, I received this hash (which is clearly different):
0x01000F1F5C4BFFEE2BEFFA7D8B8AF3B519F2D7D89F2D4DAEDF49FFEE2BEFFA7D8B8AF3B519F2D7D89
F2D4DAEDF49
First, the pwdencrypt() function takes the password and converts it to Unicode if it isn’t already. It then adds the
salt to the end of the password. This is the plaintext it sends through an algorithm known as the Secure Hashing
Algorithm (SHA). SHA will generate a ciphertext (the encrypted characters) that pwdencrypt() will temporarily put
to the side. Then pwdencrypt() takes the password and makes it all uppercase. Once again, it’ll append the salt to
the end and send the resulting combination through SHA. Finally, pwdencrypt() will combine a standard static
code (0x0100 in hexadecimal), the salt, the first ciphertext (password in the original case), and the second
ciphertext (password in all uppercase) to create the password hash stored in sysxlogins.
I’m not sure why the all-uppercase version of the password is included, but needless to say, it weakens the SQL
Server password "hash." Since I only have to match against uppercase letters, I’ve eliminated 26 possible
characters (the lowercase ones) to figure out what the password is. Granted, once I discover the password I won’t
know the case of the individual characters, but to figure out the case is trivial. If I can find out that a user has a
password of say “TRICERATOPS,” I can then build a quick little program to try every possibility of case for the
word triceratops. Triceratops has 11 letters, so there are 211 possible combinations. That’s only 2048 different
possibilities. A script or program can test each possibility until it gets a match. Remember SQL Server 7 and 2000
do not have account lockout policies for too many login failures.
Consider that if I didn't have the all-uppercase version of the password I’d have to brute force every single
dictionary word and every single possible case. That means just to test triceratops to see if it were the password
(regardless of case), I’d have to run up to 2048 attempts instead of one. I would have to test every possible case
combination for every single word. I couldn’t just test the word. But since the all-uppercase version is part of what
is stored in sysxlogins, the number of attempts I may have to make to crack the password decreases drastically.
Let's look at an example. An 8-character dictionary word has 256 (28) possible case combinations. I’ve been told
the SQL Server account uses 1 of 8 words, all of them 8 characters in length (a controlled test). If I have to run
through these 8 words and I have to potentially try every single case combination, I may have to try up to 256 x 8 =
2048 combinations.
If I can test just all-uppercase words to find a match, I would have to test just 8 times to get the word. Then I’d
have to run up to 256 combinations to find the exact password. Instead of 256 x 8, I’m looking at a maximum of
256 + 8 = 264 combinations. Now extrapolate this out to the entire Webster’s dictionary.
The algorithm to attempt a dictionary attack against a SQL Server password hash isn’t very long or difficult. I’ve
pretty much explained it in this section. And when NGSSoftware put out the paper revealing the weakness, they
also included source code in VC++ to attempt such a crack. The program isn’t hard and it isn’t very complex, but it
does require the Windows 2000 Software Development Kit (SDK) because it needs the CryptoAPI that’s part of
the SDK. Figure 1 shows an example of the compiled source code in action against one of the password hashes
from earlier.
75
Figure 1. Cracking the password hash.
NGSSoftware has additional tools that are GUI-based to perform similar tests but with a much nicer interface and
a few more features. These two tools are called NGSSQLCrack and NGSSQuirrel. NGSSQLCrack does have the
ability to perform a brute force attack should the dictionary attack fails. I've included a link to Steve Jones' reviews
of both products in the Additional Resources section.
Most password hacking programs will attempt a dictionary attack first. Since dictionaries are easy to find in
electronic form, people who use a password found in a dictionary are opening themselves up to having their
passwords hacked. Too many programs can run through an entire dictionary listing in a very short time. SQL
Server passwords are no different. In reality, if a user chooses a strong password, one with alphabetic and
numeric characters as well as a special character that’s at least six characters long, the password is reasonably
secure. I say reasonably, because someone who can bring the proper computer resources to bear will eventually
be able to crack the password. The mechanism encrypting SQL Server passwords isn’t such that it is
unreasonable for an attacker to be able to crack them, should the hacker get a hold of the hash.
Tip: When I attempted to compile the VC++ code presented in NGSSoftware’s article on cracking SQL Server
passwords, VC++ did return 1 compile error with regards to the following line of code: wp = &uwttf; The error
VC++ returned indicated that it wouldn’t carry out the implicit conversion. I had to modify the line to read: wp =
(char *) &uwttf; in order to generate a successful compile. As they say on the newsgroups, “Your mileage may
vary.”
Concluding Thoughts
Microsoft recommends Windows authentication because of single sign-on and also to reduce administrative
overhead. These two reasons are good enough to use Windows authentication whenever possible. However,
there are times when DBAs are forced to use SQL Server logins because that's all a program will support. There's
not a whole lot we can do about the authentication method in those cases. But in cases where we do have a
choice, such as a home grown application, the choice should usually point in the direction of Windows
authentication. If addition to Microsoft's two main reasons, another reason is due to weaknesses in how the
passwords are transmitted and how they are stored.
I did say weaknesses but keep in mind to consider the mitigating circumstances. To dispel the FUD (Fear,
Uncertainty, and Doubt), let's consider a couple of things. In the first case, you typically have to have a rogue
network engineer. If that's the case, SQL Server access isn't the only, nor necessarily the most critical issue facing
an organization. Anyone with half a lick of creativity can imagine what such an empowered individual could do.
This doesn't mean we shouldn't take steps to reduce our vulnerability, but is also doesn't mean we should go
around with our hands in the air screaming, "The sky is falling!" In the second case, you need sysadmin privileges
to access the sysxlogins table. Yes, even without a rogue DBA, there is always the possibility of a privilege
escalation where a process somehow gets itself to sysadmin rights. NGSSoftware has a paper on that very
76
possibility. But keep in mind that passwords aren't the only things that will be vulnerable. The data is there, too.
Also, the more complex the password, the harder it is to crack, even if you do have an advantage of only having to
get the upper-case letters. The fact is, if you don't mix in numbers and symbols, the passwords become relatively
easy to crack. It's all about password complexity.
Additional Resources
• Threat Profiling Microsoft SQL Server - NGSSoftware
• Weak Password Obfuscation Scheme (Modified) in MS SQL Server - Network Intelligence India Pvt. Ltd.
• Microsoft SQL Server Passwords (Cracking the password hashes) - NGSSoftware
• Review: MSSQLCrack - Steve Jones
• Review: NGSSquirrel - Steve Jones
Typically I write about technical solutions to SQL Server problems. This is true whether I'm writing about
performance, security, or disaster recovery. This article will be atypical in that respect because it'll consist of case
studies that point out why security is critical. All of the case studies will be on security incidents that have made the
news. These incidents also involve compromised databases. Not all of them involve SQL Server but point out a
fundamental axiom: databases are primary targets because they are information stores.
As SQL Server DBAs, we have to realize that SQL Server is growing in market share. Microsoft’s SQL Server is
an enterprise-class database platform more and more companies are using to store critical and sensitive data.
SQL Server is an important cog in Microsoft’s .NET enterprise server architecture. It’s easy to use, takes fewer
resources to maintain and performance tune than other comparable database platforms, and it’s reasonably
priced. All these facts mean SQL Server has a big target painted on the side of your server that says, “I’m
important. Come and get me.” If your SQL Servers aren’t secured, someone will. Others have found out the hard
way. Here are some examples.
RealNames
In February 2000, C|Net, one of the more prominent tech news organizations, reported the company RealNames
informed customers that its customer information database had been breached and the attackers had walked off
with valuable information, to include credit card numbers.
RealNames was in the business of making complex web addresses accessible by the use of keywords. Anyone
was capable of going to the RealNames’ site, registering and paying via credit card, and thus getting keywords
associated with their website. It was this customer database the attackers broke into.
RealNames is no longer in business, and though the reason for RealNames closing its doors has nothing to do
with this security breach, obviously the breach caused numerous issues for RealNames’ customers. Credit card
numbers were valuable then and they are now. RealNames’ customers most certainly had to go through the
process of canceling any and all credit cards they might have used on the RealNames site and acquiring new
ones. At least RealNames acted in a respectable manner, sending an email to its customer base within 24 hours
of discovering the breach.
RealNames then went and hired security firm Internet Security Systems (ISS) to conduct an audit and prevent
against future security breaches. But the fact remains that up to 50,000 customers might have had their credit
card information readily accessible by an attacker, one who had been on the system undetected for at least a few
days.
77
World Economic Forum
About a year later (Feb 2001), crackers from the group Virtual Monkeywrench announced they had hacked into
the registration database of the World Economic Forum (WEF). What did they get? The group captured credit
card numbers, personal addresses and emails, home and cell phone numbers, and passport information for all
who had attended the WEF in the previous three years.
Virtual Monkeywrench then passed this information on to a Zurich, Switzerland, newspaper that published some of
it on the newspaper’s website. Among the information published: Bill Gates’ email address, Amazon.com head
Jeff Bezo’s home phone number, and a credit card number for CEO of PepsiCo Beverages Peter Thompson. But
the fun doesn’t stop there. The group was also able to grab participant passwords into the database for former US
President Bill Clinton, Russian President Vladimir Putin, and Palestinian Leader Yasser Arafat. The newspaper
that had received all the information, SonntagsZeitung, reported the crackers had turned over a CD-ROM with
800,000 pages of data!
Note: In the hacker community, hacker isn’t a negative word. Hacker is used to describe one who investigates out
of curiosity. A hacker isn’t necessarily one who breaks into systems for personal gain, though that’s how the
media uses the term. The community prefers the term crackers for those people who maliciously seek to break
into or compromise systems.
Midwest Express, an airline, was hacked in April 2002 and their flight schedule and passenger manifest was
stolen. The people, calling themselves the Deceptive Duo, who carried out the attack then hit another site, the US
Space and Naval Warfare Systems Command, and posted Midwest Express’ passenger manifest, complete with
names and emails.
But the Deceptive Duo didn’t stop there. They also hacked into several government agencies and banks. One of
their methods of attack was to compromise SQL Servers with a “default” password. Since both SQL Server 7.0
and SQL Server 2000 allow for the sa account to have a blank password, this is probably what they meant since
SQL Server 7.0’s install doesn’t even prompt for one (though to get a blank password in SQL Server 2000 I have
to knowingly choose to leave the password blank during the install).
So far as the Deceptive Duo was concerned, targeting SQL Server was part of their plan. If we put ourselves in
the minds of the hacker, that plan makes perfect sense. If the default install of SQL Server is ripe for the plucking,
why bother with anything else? And that’s probably what the Deceptive Duo thought, too.
Note: SQL Server 7.0 isn’t alone in leaving the sa password blank, a frequent item of consternation for security
experts. The open source database, MySQL, is installed by default with no password for root, the super user
account for the database. I haven't kept up with the most recent versions, so if this behavior has changed, please
let me know in the comments section. The bottom-line, regardless of database platform, is this: secure all
privileged accounts with strong passwords immediately.
In November 2001, security researchers reported a new worm that attempted to log on to servers using the
standard Microsoft SQL Server TCP port (1433), the sa account, and a blank password. Since the default port for
a SQL Server 7 installation is 1433 and since the setup program doesn’t prompt for a sa password during
installation, quite a few SQL Servers were vulnerable to this worm. The worm had successfully infected a small
78
number of systems before it was detected. Because security researchers discovered the worm very quickly, most
of the systems vulnerable to it were never attacked.
Note: Brian Knight alerted the SQLServerCentral.com community about what the worm did with his article
Security Alert: SQL Server Worm Virus Attacking Systems.
The security community reaction was swift and security experts quickly asked the owner of the FTP server to
remove the file that was downloaded by this worm. The owner did so. Then security experts sent out
announcements through sources such as CERT (http://www.cert.org), the SANS Institute (http://www.sans.org)
and NTBugTraq (http://www.ntbugtraq.com). Because security teams responded so quickly, W32.Cblade.Worm
was relatively minor in scope. It attacked in a predictable way and was easily defendable. This worm should have
served to be a wake-up call for the SQL Server community. Instead, it ended up being a warning shot before the
main firefight.
In early May 2002 security researchers starting reporting about increased port scanning for TCP port 1433. Chip
Andrews put a notice on his SQLSecurity.com site on May 4, 2002. Then on May 28th, security researchers
discovered another worm in the wild. The community gave the new worm several different names; among them
were SQLSnake and Digispid.B.Worm. I’ll stick with the latter.
The attack pattern for Digispid.B.Worm was the same as for W32.Cblade.Worm: make a connection to TCP port
1433 and attempt to login as sa with a blank password. However, this worm was more aggressive and it was able
to infect more systems than W32.Cblade.Worm. Various sources report the number of systems infected range
from a hundred to thousands. Cblade was a warning but many DBAs, system administrators and network
engineers didn’t heed it. These personnel weren’t prepared and hadn’t locked down their SQL Servers. As a
result, Digispid.B.Worm was able to get a foothold. This new worm was more intrusive than W32.Cblade.Worm
because it was far more aggressive. It required some cleanup, but it pales in comparison to the next major worm
attack.
Note: Brian once again covered this worm for the SQLServerCentral.com community with his article: SQLsnake
Worm Hits SQL Servers and Networks Hard.
SQL Slammer
If you are a SQL Server DBA and you haven't heard about SQL Slammer, now's the time for a quick education. In
January 2003, the hammer came down. A SQL Server worm hit on Friday, January 24th, and moved so fast that it
effectively became a Denial of Service attack against any network where it got a foothold, including portions of the
Internet. SQL Slammer has been to date the most aggressive worm the world has seen, bar none. I've included a
link to a study detailing its propagation in the Additional Resources section.
SQL Slammer attacked UDP port 1434, the listener port for SQL Server 2000 (SQL Server 7.0 was not
vulnerable). Clients use this port when they are trying to discover what SQL Servers are on the network, such as
when you pull down the drop-down list in Query Analyzer. Clients also use this port to identify what TCP port to
connect to for a named instance. Keep in mind the default instance typically listens on TCP 1433. If you have a
named instance, it's not going to be able to use the same port. Since SQL Server 2000 will randomly assign a port
number when you install the named instance, it could be anything. The way to deal with this issue is to have that
listener service. The client contacts it and finds out what TCP port to use. It can then connect and all of this occurs
seamlessly for the user. The problem is that this port is set for every single SQL Server 2000 installation. A target
and it's not moving! If you are running a SQL Server that doesn't require network access (local use only), as of
SP3a you can disable this listener port, but this wasn't the case when SQL Slammer hit.
SQL Slammer took advantage of a buffer overflow attack on this listener service. What really drove the security
community nuts was a patch had been made available some six months before! NGSSoftware even had
demonstration code that showed what the exploit could do (and the worm "writer" used it heavily). A lot of systems
weren't patched. Some systems were patched but it was later found out that a certain Microsoft hotfix to repair an
unrelated issue replaced critical files. The files changed to prevent the buffer overflow were overwritten with older
versions that were vulnerable. In other words, the unrelated hotfix made systems vulnerable again.
All in all, it was a very big mess that required a lot of people working a lot of long hours. Even for companies that
weren't hit, every SQL Server in inventory had to be checked and verified. When you include MSDE builds, this
was a very sizeable effort.
79
Note: Brian's write-up on this site is: Another SQL Server Virus Hits the Internet. He followed it up with Who's
to Blame for the SQL Slammer Virus.
PetCo.Com
This one is my new favorite when I talk to developers within my organization about security. Though SQL Inection
has gotten a lot of press in recent days, Guess (the clothing manufacturer) and PetCo.Com fell victim to this now
classic vulnerability. In February 2002, Guess' website was compromised by a SQL Injection attack which netted
attackers an unknown number of customer credit card numbers. So you'd figure by June 2003 every major player
on the Internet would have learned their lesson, right? Not quite.
Not long after Guess settled with the FTC, Jeremiah Jacks discovered PetCo.com was vulnerable to the exact
same SQL Injection attack he discovered on the Guess site. What would the payoff have been if he were a
malicious cracker? The prize was a database with about 500,000 credit card entries complete with names,
addresses, and order information. How did he do it? He used Google to search for pages that were likely to be
vulnerable then tried to use an injection attack. He estimated it took less than a minute to be successful! Imagine
finding a major vulnerability for one retailer and then almost a year and a half later finding the same type of
vulnerability, one easily patched mind you, on another major retailer. It's enough to drive one insane. But that's
what this security researcher found. Hopefully we've all received our wake-up call, but don't be surprised if another
major target falls to SQL Injection in the near future.
Concluding Remarks
These six case studies represent a fraction of the literature available on cases where databases have been
breached. All involve configurations that weren't secure. In most cases, simple security procedures would have
stopped an attacker cold but for whatever reason these procedures weren't done. After several of these high
profile cases, it would seem logical that security would receive a heavy focus from companies and organizations
but the reality is things haven't changed a lot. A recent survey showed that even after the incidents on 9/11, when
companies were forced to take a hard look at not only their disaster recovery but also their security procedures,
very little change has occurred. This is somewhat disheartening, but not altogether surprising.
Proper security takes time and effort. Often it's an afterthought on projects and seen as an impediment for delivery
on-time and on-schedule. I've faced this issue myself when supporting recent projects. The difference between
the ones that take security into account from the beginning as opposed to waiting until the last minute is like night
and day. Somehow we have to effect a corporate culture change where security is of paramount concern.
Hopefully these case studies start the discussions in your own circles that may bring about that change where you
work.
Remember the mantra (double negative intended): Just because I'm paranoid doesn't mean someone is not out to
get me.
Additional Resources
• RealNames is Latest Hack Victim, InternetNews.com
• RealNames' Customer Database Hacked, C|New News.Com
• Davos Hack: 'Good' Sabotage, Wired News (World Economic Forum article)
• Hackers Say They Hack for Our Sake, PCWorld (Deceptive Duo article)
• Airline Database Posted on Defacement, InternetNews.com (Midwest Express article)
• Analysis of the Sapphire Worm, CAIDA (SQL Slammer)
• FTC settles with Guess on Web vulnerabilities, InfoWorld
• PetCo Plugs Credit Card Leak, SecurityFocus
• Nearly two years after 9/11, corporate security focus still lacking, ComputerWorld
© 2003 by K. Brian Kelley. http://www.truthsolutions.com/ Author of Start to Finish Guide to SQL Server
Performance Monitoring.
80
TSQL Virus or Bomb?
Joseph Gama
12/29/2003
Yes, the first virus made in TSQL has been created! But even more dangerous, worms can be made applying
similar but simpler techniques. What could be worse than that? Time bombs hidden somewhere in the code,
waiting…
Before
After
81
Before we get into the facts, some definitions from cybercrimes.net :
Definition of virus
"A computer virus is a self-replicating program that invades and attaches itself to computer programs. Virii can
interfere with the operations of their host program or alter operations of their host computer."
Definition of worm
"A worm is a program whose primary task is to move copies of itself between computers connected by network.
Though worms do not try to cause damage to a computer, by causing copies of itself to be made a worm can
disrupt the operation of computers and computer networks."
The most complex of those three entities is the virus, which requires intrusion, execution and replication of its
code. The intrusion is theoretically impossible in a SQL Server database properly secured. As TSQL has no "low
level" features, port scanning and intrusion are not possible.
But there's more: TSQL data types used in stored procedures can't go over 8 Kb. This is a great obstacle because
the virus code takes some room and so, the virus can only replicate to "small" stored procedures, which makes it
more visible and easier to detect.
Conclusion: time bombs are the most real and dangerous threat
The three scenarios for delivering a virus are perfectly possible and quite easy and effective for a time bomb. Let's
rewrite them for this situation:
• An unhappy user deliberately hides the time bomb code in a section of a big stored procedure.
• A careless user copies code from an uncertain origin that has the time bomb hidden.
• An intruder was able to gain access to the database and, instead of causing an immediate destruction,
the intruder decided to place a time bomb that would slowly and randomly corrupt data so that even the
backups would be storing corrupted versions of the database.
82
This is the most dangerous and most realistic attack that I can think of; after all bad coding can have an impact on
the server as negative as a sneaky and pernicious worm.
83
Avoid mixed mode and Windows 9x/ME
Usually that is not the case with most real life database implementations, having a certain number of users,
databases and database objects related to each other in a way that requires careful management in order to allow
access without compromising security. Windows authentication is the recommended security mode, not only
because of Windows architecture but also because login names and passwords are not sent over the network. If
the OS is not NT/2000 then mixed mode has to be used but Windows 9x/ME have some many security flaws that
they should be avoided at all cost!
References
http://vyaskn.tripod.com/sql_server_security_best_practices.htm(2)
http://cybercrimes.net/98MSCCC/Article4/commentarysection403.html(3)
84
Performance
The hottest topic in most every company, achieving better performance is an ongoing challenge and an
essential part of any DBAs job. While Moore’s Law helps with faster and faster hardware, code bloat,
larger data sets and other factors mean that the DBA must have a few tricks up their sleeve in order to
tune their databases to the optimum level.
Squeezing Wasted Full Scans out of SQL Server Agent Bob Musser 97
85
Cluster That Index! – Part 1
Christoffer Hedgate
3/30/2003 4:19:47 PM
One topic that is sometimes discussed in SQL Server communities is whether or not you should always have
clustered indexes on your tables. Andy Warren discussed this briefly in one of his articles in the Worst Practices-
series (Not Using Primary Keys and Clustered Indexes(0)), here I will give my view on this matter. I will show you
why I think you should always have clustered indexes on your tables, and hopefully you might learn something
new about clustered indexes as well.
What is a clustered index
First off, we'll go through what a clustered index is. SQL Server has two types of indexes, clustered indexes and
non-clustered indexes. Both types are organized in the same way with a b-tree structure. The difference between
them lies in what the leaf-level nodes – the lowest level of the tree – contains. In a clustered index the leaf-level is
the data, while the leaves of a non-clustered index contains bookmarks to the actual data. This means that for a
table that has a clustered index, the data is actually stored in the order of the index.
What the bookmarks of the non-clustered index point to depends on if the table also has a clustered index or not.
If it does have a clustered index then the leaves of non-clustered indexes will contain the clustering key – the
specific value(s) of the column(s) that make up the clustered index – for each row. If the table does not have a
clustered index it is known as a heap table and the bookmarks in non-clustered indexes are in RID format
(File#:Page#:Slot#), i.e. direct pointers to the physical location the row is stored in. Later in this article we will see
why this difference is important. To make sure that everyone understands the difference between a clustered
index and a non-clustered index I have visualized them in these two images (clustered | non-clustered(1)). The
indexes correspond to those of this table:
INSERT INTO EMPLOYEES (empid, name, age) VALUES (1, 'David', 42)
INSERT INTO EMPLOYEES (empid, name, age) VALUES (2, 'Tom', 31)
INSERT INTO EMPLOYEES (empid, name, age) VALUES (3, 'Adam', 27)
INSERT INTO EMPLOYEES (empid, name, age) VALUES (4, 'John', 22)
SELECT * FROM EMPLOYEES WHERE name = 'John'
SELECT * FROM EMPLOYEES WHERE empid = 1
In the real indexes these four rows would fit on the same page, but for this discussion I've just put one row on
each page. So, to return results for the first query containing WHERE name = 'John' SQL Server will traverse the
clustered index from the root down through the intermediate node levels until it finds the leaf page containing
John, and it would have all the data available to return for the query. But to return results for the second query, it
will traverse the non-clustered index until it finds the leaf page containing empid 1, then use the clustering key
found there for empid 1 (David) for a lookup in the clustered index to find the remaining data (in this case just the
column age is missing). You can see this for yourself by viewing the execution plan for the queries in Query
Analyzer (press Ctrl-K to see the plan).
Disadvantages of having a clustered index
Although my general opinion is that you should always have a clustered index on a table, there are a few minor
disadvantages with them that in some special circumstances might remedy not having one. First of all, the lookup
operation for bookmarks in non-clustered indexes is of course faster if the bookmark contain a direct pointer to the
data in RID format, since looking up the clustering key in a clustered index requires extra page reads. However,
since this operation is very quick it will only matter in some very specific cases.
86
The other possible disadvantage of clustered indexes is that inserts might suffer a little from the page splits that
can be necessary to add a row to the table. Because the data is stored in the order of the index, to insert a new
row SQL Server must find the page with the two rows between which the new row shall be placed. Then, if there is
not room to fit the row on that page, a split occurs and some of the rows get moved from this page to a newly
created one. If the table would have been a heap – a table without a clustered index – the row would just have
been placed on any page with enough space, or a new page if none exists. Some people see this as a big
problem with clustered indexes, but many of them actually misunderstand how they work. When we say that the
data in clustered indexes are stored in order of the index, this doesn't mean that all the data pages are physically
stored in order on disk. If it actually was this way, it would mean that in order to do a page split to fit a new row, all
following pages would have to be physically moved one 'step'. As I said, this is of course not how it works. By
saying that data is stored in order of the index we only mean that the data on each page is stored in order. The
pages themselves are stored in a doubly linked list, with the pointers for the list (i.e. the page chain) in order. This
means that if a page split does occur, the new page can still be physically placed anywhere on the disk, it's just
the pointers of the pages prior and next to it that need to be adjusted. So once again, this is actually a pretty small
issue, and as you will see later in the article there are possible problems of not having a clustered index that can
have much more significance than these minor disadvantages.
Advantages of having a clustered index
Apart from avoiding the problems of not having a clustered index described later in this article, the real advantage
you can get from a clustered index lies in the fact that they sort the data. While this will not have any noticeable
effect on some queries, i.e. queries that return a single row, it could have a big effect on other queries. You can
normally expect that apart from the disadvantages shown above, a clustered index will not perform worse than
non-clustered indexes. And as I said, in some cases it will perform much better. Lets see why.
Generally, the biggest performance bottleneck of a database is I/O. Reading data pages from disk is an expensive
operation, and even if the pages are already cached in memory you always want to read as few pages as
possible. Since the data in a clustered index is stored in order this mean that the rows returned by range searches
on the column(s) that are part of the clustered index will be fetched from the same page, or at least from adjacent
pages. In contrast, although a non-clustered index could help SQL Server find the rows that satisfy the search
condition for the range search, since the rows might be placed on different pages many more data pages must be
fetched from disk in order to return the rows for the result set. Even if the pages are cached in memory each page
needs to be read once for every bookmark lookup (one for each hit in the non-clustered index), probably with each
page read several times. You can see this for yourself in Script 1 on the web.
As you can see, a carefully placed clustered index can speed up specific queries, but since you can only have one
clustered index per table (since it actually sorts the data) you need to think about which column(s) to use it for.
Unfortunately the default index type when creating a primary key in SQL Server is a clustered index, so if you're
using surrogate keys with an auto-incrementing counter, make sure you specify non-clustered index for those
primary keys as you will probably not do range searches on them. Also please note that ordering a result set is a
great example of where a clustered index can be great, because, if the data is already physically stored in the
same order as you are sorting, the result set the sort operation is (generally) free! However, make sure you don't
fall into the trap of depending on the physical ordering of the data. Even though the data is physically stored in one
order this does not mean that the result set will be returned in the same order.
If you want an ordered result set, you must always explicitly state the order in which you want it sorted.
Problems with not having a clustered index
I have now shown the minor disadvantages that might occur from having a clustered index, plus shown how they
can speed up some queries very much. However, neither of these facts are what really makes me recommend
you to always have a clustered index on your tables. Instead it is the problems that you can run into when not
having a clustered index that can really make a difference. There are two major problems with heap tables,
fragmentation and forward-pointers. If the data in a heap table becomes fragmented there is no way to
defragment it other than to copy all the data into a new table (or other data source), truncate the original table and
then copy all data back into it. With a clustered index on the table you would simply either rebuild the index or
better yet, simply run DBCC INDEXDEFRAG which is normally better since it is an online operation that doesn't
block queries in the same way as rebuilding it. Of course in some cases rebuilding the index completely might
actually suit your needs better.
The next problem, forward-pointers, is a bit more complicated. As I mentioned earlier, in a non-clustered index on
a heap table, the leaf nodes contain bookmarks to the physical location where the rows are stored on disk. This
87
means that if a row in a heap table must be moved to a different location (i.e. another data page), perhaps
because the value of a column of variable length was updated to a larger value and no longer fits on the original
page, SQL Server now has a problem. All non-clustered indexes on this table now have incorrect bookmarks. One
solution would be to update all bookmarks for the affected row(s) to point at the new physical location(s), but this
could take some time and would make the transaction unnecessarily long and would therefore hurt concurrency.
Therefore SQL Server uses forward-pointers to solve this problem.
What forward-pointers mean is that, instead of updating the bookmarks of non-clustered indexes to point to the
new physical location, SQL Server places a reference message at the old location saying that the row has been
moved and including a pointer to the new location. In this way the bookmarks of non-clustered indexes can still be
used even though they point to the old location of the row. But, it also means that when doing a bookmark lookup
from a non-clustered index for a row that has been moved, an extra page read is necessary to follow the forward-
pointer to the new page. When retrieving a single row this probably won't even be noticed, but if you're retrieving
multiple rows that have been moved from their original location it can have a significant impact. Note that even
though the problem stems from the fact that SQL Server can't update the non-clustered index bookmarks, it is not
limited to queries using the indexes. The worst case scenario is a query where SQL Server needs to do a table
scan of a heap table containing lots of forward-pointers. For each row that has been forwarded SQL Server needs
to follow the pointer to the new page to fetch the row, then go back to the page where the forward-pointer was (i.e.
the page where row was originally located). So, for every forwarded row, SQL Server needs two extra page reads
to complete the scan. If the table would have had a clustered index, the bookmarks of all non-clustered indexes
would have been clustering keys for each row, and physically moving a row on disk would of course not have any
effect on these. An extreme example of this is shown in Script 2(3). Even though this example may be a bit
extreme, forward-pointers are likely to become a problem in tables where rows are sometimes moved, because
there is no way in SQL Server to remove forward-pointers from a table.
Summary
In this article I have described what a clustered index is and how they differ from non-clustered indexes, and I
have also tried to show you why I think that you should always have a clustered index on every table. As I said
there are, of course, exceptions, but these are so uncommon that I always check that all tables have clustered
indexes as one of the first things I do when performing a database review. Please post your thoughts on this
matter in the feedback section.
I have previously discussed the issue of forward-pointers in the article Cluster that index! –Part 1. I described
what forward-pointers are and how they are created by SQL Server. I also supplied a script that showed the effect
of forward-pointers, but I did not discuss how to check for the existence of forward-pointers and how to remove
them. This article will discuss this.
Recap of problem
Forward-pointers are created by SQL Server to avoid making transactions longer than necessary. As described in
the article mentioned above, the leaf level pages of non-clustered indexes contain pointers to the data that is
indexed by them. If the table that the index is created on has a clustered index created for it, these 'pointers' are
bookmark lookup values, each one containing a key value to look up in the clustered index. If the table does not
have a clustered index, i.e. a heap table, these pointers point to the actual physical location of the rows in the data
files. The problem is that data rows sometimes need to be moved to another data page. One reason is when the
value of a variable length column is changed and the row no longer fits into the page where it is located. Now SQL
Server must either change all of the pointers for this row (in all non-clustered indexes for the table) to its new
location, or it can use forward-pointers. A forward-pointer is simply a pointer left in the original location of the row,
pointing to the new location. This way no indexes need to be updated, SQL Server just follows the forward-pointer
to the new location of the row when it needs to fetch it. As I said, instead of updating the pointers in all non-
clustered indexes each time a row is moved, SQL Server uses forward-pointers to avoid making the transactions
longer than necessary.
88
The problem with forward-pointers is that they can create a lot of extra I/O. When scanning a heap table
containing forward-pointers, SQL Server needs two extra page reads for every forward-pointer, which in extreme
situations might be very cumbersome. A script that showed this was supplied in the other article.
SELECT *
INTO Orders2
FROM Orders
GO
If you don't want to keep the clustered index, just drop it and the non-clustered indexes leaf-levels will be changed
back into pointers to the physical location of the data rows, however this time they will point to the actual location
of the rows.
As a final note, when a database is shrunk, the bookmarks of non-clustered indexes are reassigned and therefore
any forward-pointers located on pages that are removed by the shrinking process are removed.
89
Managing Max Degree of Parallelism
Herve Roggero
6/23/2003
Introduction
In situations where your tuned T-SQL statements are pushing the limits of your CPUs, more processing power
may be needed. Deploying database servers on two, four or even eight SMP systems is rather straightforward.
SQL Server usually scales almost in a linear fashion on up to eight processors.
However, some SQL Server installations may require up to 32 processors. In this kind of environment,
configuration parameters that are usually ignored in smaller configurations come into play and can offer significant
performance improvements. We will take a look at the Maximum Degree of Parallelism (DOP) and see how and
why it may make sense to change its default setting.
The same holds true in larger environments. For instance on 16 processors, SQL Server will frequently use 12 or
more processors to execute complex SELECT statements. This may turn out to be an issue for a couple of
reasons. First, using more processors means managing more threads and requires more cache synchronization.
System -> Context Switches/Sec is a measure of this effort. The more processors are used for a process, the
higher this counter will be. In addition, SQL Server has more coordination to perform since it needs to slice and
regroup the work spread over the processors. Since by default SQL Server will use as many processors as it can,
upgrading your SQL Server from 8 to 12 processors may actually degrade the overall performance of your
database. Although there are no golden rules, it appears that in most cases using more than 8 processors for a
SELECT statement can degrade performance (although this may vary greatly by system).
On large SMP systems, setting the maximum DOP to 4 or 8 is not unusual. The default value for this parameter is
0, which allows SQL Server to use all allocated processors. The following test shows the Context Switches/Sec
and average response time of a T-SQL statement running off a few million records. The server utilized for this test
90
was loaded with the /PAE boot.ini option, 16 processors and 8GB of RAM. The statement is as follows (the
statement itself is of little importance, but notice the OPTION keyword)
Select (UnitPrice - UnitCost) * TotalUnitsSold
FROM Salesdb..salesdata (NOLOCK)
WHERE
SalesYear = 2000
GROUP BY UPC
ORDER BY 1
OPTION (MAXDOP 2)
This statement was loaded 500 times in a table in a format that Profiler could understand. Then four Profilers were
loaded on that same server, each running the content of the same table. So SQL Server was receiving four select
statements at once. Note the (NOLOCK) hint that forces SQL Server to read the data without generating any
locks. The results are as follows:
As more processors are added to the query (by using the MAXDOP option), the Context Switches/Sec increases
up to 13,000, which is expected behavior. This is really a low number, considering that we are only executing 4
statements at any single point in time. This graph shows that starting at 12 processors, the execution time
degrades. Although it takes 12 seconds to execute this statement on 2 processors, it takes about 6 seconds on
eight CPUs. However, we see that setting the DOP to 12 or 16 degrades the overall performance of our query
when compared to a DOP of 8.
Leaving the default Maximum Degree of Parallelism value of 0 would yield the same result as the DOP of 16 in
our test. Hence, changing the DOP to 8 in our scenario would provide a 30% performance improvement over a
DOP of 0 (or 16).
Enforcing a system-wide Maximum DOP is a good practice since this allows you to control the maximum number
of processors SQL Server will use at any given time, regardless of the statement, as long as the MAXDOP is not
used in the query (which would override the global Maximum DOP setting).
91
Conclusion
SQL Server has many parameters that give you more control on the performance of your databases.
Understanding how SQL Server behaves on servers with 8 processors or less gives a strong understanding of the
capabilities of SQL Server. However, SQL Server offers specific configuration parameters that may give you extra
performance on larger systems.
The Maximum Degree of Parallelism is a key parameter for environments with 8 or more processors, and allows
you to gain control on the maximum number of processors used for a query. When deciding which DOP you
should use, careful evaluation of your environment is needed. Certain queries may perform better with a DOP of
4, or even 1. Testing your environment with multiple DOPs should give you the answer. In cases where your
database environment functions in OLTP and OLAP mode (for live reporting), you may consider setting a default
DOP for SQL Server that works best for your OLTP system and use the OPTION keyword for your OLAP T-SQL
to use the DOP that works best for these queries.
Finally, SELECT statements are not the only types of statements that can take advantage of the DOP, specially if
your action queries use correlated queries (in which a SELECT statement is found inside an UPDATE statement
for example). The Maximum DOP is an advanced setting, and as such it is wise to test it thoroughly before
making a decision in your production environment.
Monitoring Performance
Viktor Gorodnichenko
6/9/2003
DBAs are in charge of performance on production SQL servers and, sure, they hate to hear complaints from end-
users that the system is slow. But the truth is that often the complaints are not groundless. As long as for
developers performance is a last thing to care about after delivering a required functionality, providing wonderful
interface, debugging and loud celebrating a new release shipped to production, then we have what we have. As a
result, managers often ask “What’s going on on the server?” and DBAs really need to have a clear and accurate
answer.
I run the sp so often that I had to create a shortcut in my Query Analyzer (Tools/Customize/Custom ...). Install the
sp, set the shortcut assigning the sp, say, to Ctrl-4, press these two buttons and you've got the picture (the list of
processes wrapped to fit on the page):
CPU_ConsumedInTheTimeFragmen TotalPhy IO_InTheTimeF
ProcessId TotalCPU ...
t sical_IO ragment
--------
---------------------------- -------------
--------- --------- -------- ...
-- --------
-
55 239 109 21 10 ...
85 31328 31 7521 536 ...
88 5678 1001 795 164 ...
Hostnam ApplicationNam
NT_LoginName DatabaseName SPIDBuffer
... e e
------- -------------- ------------- ---------------------
------------
-- -- - --
BillS MSP Company\Bill MSP GetContacts
KirkA MSP Company\Kirk MSP ReassignTimeApprover
92
KimN MSP Company\Kim MSP InsertExpense
SUM_CPU_Consum
TheFragmentDuration NumberOfCPUs SUM_Physical_IO_Committed
ed
------------- -------------- -------------------------------
------------------------
-- -- --
5123 2 1141 710
Just one note about the accuracy of the numbers being showed by the sp. Microsoft Knowledge Base Article -
309377 says: In Microsoft SQL Server 2000 (all editions) CPU time for a particular server process ID (SPID) may
not accumulate correctly.
It is explained as: SQL Server maintains a pool of worker threads that execute the tasks given to it. SQL Server
may assign a different thread from this pool to a given SPID to execute a new query batch. When SQL Server
assigns a different thread to a SPID, SQL Server does not properly calculate the CPU time to indicate the
accumulated CPU time up to that point for the SPID. Microsoft has confirmed this to be a problem in SQL Server
2000. This problem was first corrected in Microsoft SQL Server 2000 Service Pack 2.
The wrong calculation almost never happens to user processes, which are most important for us, even if there is
no SP2 for SQL Server 2000 installed. On the contrary, I saw lots of such cases with system processes, for
instance, replication agents.
To identify processes blocked by others I created sp_BlockedProcesses (Listing 2). Assign it to the shortcut, say,
Ctrl-5, press the buttons and here we go:
.
BlockedBuff BlockingSP BlockingBuff waitresour BlockedHostna
BlockedSPID .
er ID er ce me
.
.
----------- ---------- ------------ ---------- -------------
----------- .
-- -- -- -- --
.
.
GetLateTask
21 65 GetImage 21 KimN .
s
.
.
5 SetStatus 65 GetImage 21 JasonC .
.
I bet you can recall cases when some simple code was started and a quick reply was expected, but it seemed was
hanging. It is quite probable that by pressing Ctrl-5 you'd see what is the matter.
93
1. A job was scheduled to run every morning to start a trace on production servers. One of the trace parameters
specifies time to stop the trace. The job runs at 8:30 AM and the trace stops itself at 5:30 PM, when the peak of
user activity on production servers is over. The name of the executed stored procedure: spTraceBuild (Listing 3).
spBuildTrace is based on a very useful stored procedure build_trace from the Microsoft Knowledge Base Article
Q283790. I did the following minor modification:
a. Added server name and current date to the trace file name
b. Added error handling. If a mistake was done (for example, an option value is incorrect or a date to stop the
trace has been already expired) and the trace wasn't created, it's nice to get a message.
c. Changed the code to expect only time to stop the trace (instead of date/time) – 5:30 PM in my case. I.e. you
don't need ever to modify ActivityTrace.ini. The trace will be always stopped at the specified time on the same day
it was started.
spTraceBuild gets configuration data from a text file named ActivityTrace.ini. Its contents could be like:
\\Process02\D$\Program Files\Microsoft SQL
@tracefile =
Server\MSSQL\LOG\Trace
@maxfilesize = 15000
@stoptime = 5:30
@options = 2
@events = 10,12
@columns = 1,3,12,13,14,16,17,18
@filter1 = 10, 0, 7, N'SQL Profiler'
Apparently you need to modify at least the first parameter, @tracefile, to make it appropriate. Two types of events
give us consumption numbers:
10 - RPC Completed
12 - Batch Completed
2. Another job was scheduled to run every night to absorb the trace file and process it, i.e. to insert the trace data
into a SQL Server table and aggregate the information. Why collect the data into a file to bulkcopy them afterward
into a table instead of collecting them directly into the table? Firstly, collecting trace data into a file works faster,
secondly, you cannot run a trace programmatically into a table as you do when starting a trace from Profiler. I
created the following aggregations:
- top CPU consumers
- top long-runners
3. The processing job sends an email containing the reports to managers of development. Every morning
development managers can find the "Top consumers" report in their Inbox. That is important as long as
performance is a serious issue in my company. You can schedule the trace and processing/reporting jobs to run
once a week, for example, on Tuesday, if user activity and workload do not differ from day to day.
The name of the processing/reporting stored procedure is spProcessTrace (Listing 4). An important part of the
code is a UDF fnExtractSPNameFromTextData (Listing 5). Definitely, you can aggregate data from a trace only if
codes running against the server are named codes as stored procedures are. Ad hoc queries will be out of the
scope. However, I do not think any interactive, frequently executed codes can be running as ad hoc queries, which
would need compilation on the fly. Therefore, all real top consumers should be represented in the report.
94
AddStandardTaskTime 673 457651 680 594 829
TaskbyResource 2480 130684 52 0 6656
GetAssetTypes 5318 88720 16 0 78
SubmitExpenseById 1583 63696 40 0 719
BillingRatesByBillingOffices 110 63164 574 32 1312
SessionCleanUp 1231 56099 45 0 19406
CheckSummaryTask 230 16443 71 46 110
RollupSummaryTask 207 15844 76 0 281
CreateBatchNumber 2720 14146 5 0 32
RejectTime 1345 13396 9 0 79
DeleteBillingRole 12 12108 1009 578 1390
ProjectSummary 143 10003 69 15 172
GetApprovedInvoices 12 9767 813 718 1032
ProgressProject 228 8322 36 0 94
AddSubProject 280 7875 28 0 265
InsertExpense 7 7422 1060 0 5906
LoadCustomer 16 6953 434 312 688
PercentOfContractIncurred 164 5790 35 15 47
GetTaxes 8 5469 683 640 828
RolesFeatures 6 5330 888 750 1016
GetWorkflowTypes 246 4519 18 0 78
GetDraftInvoices 250 4439 17 0 63
Activity Graph
The Results Pane of Query Analyzer does not allow us to represent results graphically, but spActivityGraph
(Listing 6) challenges this limitation. The stored procedure uses the trace table created by spProcessTrace.
spActivityGraph shows how processes running on the SQL Server interfere with each other. Looking at the graph
you can see peak-times, concurrency and how this or that process is taking longer than usually being surrounded
by a tough company of other processes:
Durati 11:3011:3911:4911:
StartTime Text
on 59
95
11:40:25 30530 PopulatePlanFact
----
11:40:28 30543 ProjectSummary
----
11:40:28 30516 LoadLeadByResource
----
11:40:30 30513 ProjectSummary
----
11:40:36 11736 SetLockAllTask
--
11:40:38 21623 InvoiceByClient
--
11:40:42 103116 PopulatePlanFact
------------
11:40:44 15780 GetDraftInvoices
--
11:40:49 10310 InsertAd
--
ModifyCodeUpdatedExpen
11:40:50 9513
se
--
11:40:51 8280 DeleteBillingRole
--
11:40:59 60966 ProjectSummary
--------
11:41:04 30516 AutoEscalate
----
11:41:07 30446 GetLicenceUpdate
----
11:41:21 5046 GetImageBatch
-
spActivityGraph has 6 input parameters:
I would suggest building graphs for 30-minute intervals. If you increase the interval trying to cover the entire day or
at least a half of the day in one graph, duration of one unit (shown by dash '-') will be also increased and even
processes with duration more than 10 sec will be missed in the graph. What if nonetheless you would like to see
the activity for the entire day? No problem: the stored procedure spShowActivityGraphByChunks (Listing 7) will
give you the full day picture divided into 0.5-hour pieces. The only 2 mandatory input parameters for the stored
procedure (@ServerName, @ReportDate) serve to identify a trace table to work with.
Conclusion
Stored procedures showing instantaneous and overall performance reports give us a clear picture of user activity
on production SQL Servers and help us to find ways to make the performance better.
96
Squeezing Wasted Full Scans out of SQL Server Agent
Bob Musser
2/13/2003
Introduction:
This tweak in this article was done on a server running NT 4, SP6, with SQL Server 7 SP4 installed. The machine
is a dual processor 1Ghz Pentium 3 with 2 Gig of ram. As always, make a backup first – your mileage may vary.
This isn't for the faint of heart, it involves editing a MS supplied system stored procedure. Additionally, if you're
using SQL Server Agent Alerts you won't see any performance benefit.
The Problem:
While using the NT Performance Monitor to check out our server this weekend, I decided to add Full
Scans/second and Index Searches/second to the graph to give me some general feedback on data access and
how well our indexes and queries were designed. (You'll find Full Scans/second and Index Searches/second
under SQL Server/Access Methods in Performance Monitor.) I was disappointed to find an average of 16 Full
Scans/second even with a light load. It peaks every 20 seconds at 161 Full Scans/second, the rest of the time it's
pretty much at zero. All in all, a very regular "heartbeat" looking graph, although a very slow one. I was quite
unhappy to think that our software was doing a full table or index scan on such a regular basis and decided to dive
into QA and the SQL Profiler to find out who was at fault and get it fixed. I'll spare you the details of the time I
spent with profiler and reviewing our code (way too long to admit) to find the culprit, here's the summary:
It was SQL Server Agent doing the full scans. Now, like many of you, we use SQL Server Agent to manage
backups and optimizations. We have nightly full backups, weekly optimizations and transaction log backups
several times an hour. But not anything every 20 seconds. Regardless, it seems that Agent checks out something
every 20 seconds.
A couple of questions immediately came to mind. One: Full scans are supposed to be a bad thing. Is this amount
worth chasing? Probably not, but I wanted the Full Scans/second to be a red flag for me. Besides, 161 Full Scans
3 times a minute adds up eventually and we have a relatively busy server. The second question: How do I fix it?
Can I add an index to whatever table Agent is scanning? Can I turn off a piece of Agent that I don't use like Alerts?
Using Profiler, I found that the scans occur when Agent runs msdb.dbo.sp_sqlagent_get_perf_counters to see
what's going on with your server so it can decide if you need any alerts generated. This scan takes place whether
you have any defined, active alerts or not. I decided to "improve" on MS's efforts just a bit.
The SP does two things. It first makes a temp table of all your enabled, defined alerts. The check for
(performance_condition IS NOT NULL) is most likely done because the sample Alerts that come installed are
enabled, but don't have a performance condition. Secondly, the SP does a pretty involved Select statement
against the Master DB to find the alerts in your temp table that have out of band numbers in the Master DB. This
second section of the SP is complex enough that I didn't want to rewrite it and I immediately ruled out adding any
indexes to the tables it was looking at because they are tables in the Master DB.
97
CHARINDEX('|', performance_condition, PATINDEX('%[_|_]%',
performance_condition) + 1) - 1)
FROM msdb.dbo.sysalerts
WHERE (performance_condition IS NOT NULL)
AND (enabled = 1)
END
If (@@RowCount > 0) or (@all_counters = 1)
Begin
--Long Select Statement against master.dbo.sysperfinfo
--that checks every performance counter SQL has
--and has a "not equals" in the Where clause
End
Conclusion:
It's working just fine. My every 20 second Full Scans are gone from the NT Performance monitor. Presumably if I
add alerts in the future, I haven't broken the process. And my original goal, of treating any Full Scans as bad
things that need to be investigated, is easier to monitor. Besides, over 695,000 full scans aren't taking place on my
server every day now.
MS probably wrote a good SP here in the first place. I think that they added the portion of the Where clause with
the "not equals" later to avoid some problem. With the (spi1.cntr_type <> 1073939459) present in the second
section, any index on the master.dbo.sysperfinfo table won't be used efficiently, resulting in the full scan.
When given a choice between using GUI tools and using Transact-SQL, I choose the latter whenever possible or
practical. This isn’t from a sense of technical superiority, but rather a need to counteract my lazy nature. This
article will briefly describe a few queries that I use to troubleshoot memory bottleneck issues that are normally
identified using System Monitor (Performance Monitor). System Monitor is useful for tracking trends over time
(using counter logs), however sometimes I like to see snapshots of the current state of a SQL Server Instance.
Using Query Analyzer, you can add or integrate these queries I detail into your own Transact-SQL script library or
procedures as you see fit.
SQL Server 2000 memory address space is made up of the memory pool and the executable code pool. The
executable code pool contains memory objects such as loaded OLE DB Provider DLLs for distributed queries,
extended stored procedure DLLs, and executable files for the SQL Server engine and net-libraries. The memory
pool contains the various system table data structures; buffer cache (where data pages are read), procedure
cache (containing execution plans for Transact-SQL statements), log cache (each transaction log for each
database has its own cache of buffer pages), and connection context information. The memory pool is often the
highest consumer of memory for busy SQL Server instances.
Generally speaking, I've identified most "true" memory bottleneck issues via errors that manifest in the SQL Log.
For example, a user may submit a prepared statement with an enormous IN clause. In such a scenario, we may
see an error such as "Failed to reserve contiguous memory of Size=XXXX". When I see this error, I like to run a
few different queries in Query Analyzer to pinpoint any abnormally high or low numbers. In all of these queries, I
use the sysperfinfo system table. This table is used to store internal SQL Server performance counters – the very
same counters that are retrieved by using System Monitor.
When investigating a potential memory bottleneck scenario, I begin by checking the total memory used by the
SQL Server executable. For a default instance of SQL Server I execute:
SELECT cntr_value/1024 as 'MBs used'
from master.dbo.sysperfinfo
where object_name = 'SQLServer:Memory Manager' and
counter_name = 'Total Server Memory (KB)'
For a Named instance, I use the following code instead, where InstanceName
is the second part of your Named Instance name, for example SERVERNAME\INSTANCENAME:
98
SELECT cntr_value/1024 as 'MBs used'
from master.dbo.sysperfinfo
where object_name = 'MSSQL$InstanceName:Memory Manager' and
counter_name = 'Total Server Memory (KB)'
This query returns the total MBs used by SQL Server. Of course, this number can fluctuate from second to
second. Using the System Monitor may become necessary in order to track trends in memory utilization, in which
case you could create a counter log (not covered in this article).
When viewing the total server memory, let's start with the obvious questions… Is the total MB used by SQL Server
less than the maximum available? Maximum memory usage should cause you to dig further. Less than
maximum should also cause concern if your SQL Server instance is on a machine with other applications (not
recommended). SQL Server may not be reaching its potential if it has to compete for resources.
This next query is used for returning the size of the buffer cache, procedure cache, and free pages in MBs for a
Default instance. For querying Named Instances, remember to replace 'SQLServer:Buffer' with
'MSSQL$InstanceName:Buffer Manager'.
SELECT 'Procedure
Cache Allocated', CONVERT(int,((CONVERT(numeric(10,2),cntr_value) * 8192)/1024)/
1024) as 'MBs'
from master.dbo.sysperfinfo
where object_name = 'SQLServer:Buffer Manager' and
counter_name = 'Procedure cache pages'
UNION
SELECT 'Buffer Cache database pages',
CONVERT(int,((CONVERT(numeric(10,2),cntr_value) * 8192)/1024)/1024) as 'MBs'
from master.dbo.sysperfinfo
where object_name = 'SQLServer:Buffer Manager' and
counter_name = 'Database pages'
UNION
SELECT 'Free pages',
CONVERT(int,((CONVERT(numeric(10,2), cntr_value) * 8192)/1024)/1024) as 'MBs'
from master.dbo.sysperfinfo
where object_name = 'SQLServer:Buffer Manager' and
counter_name = 'Free pages'
Regarding these results returned from this query, keep watch for very high or low numbers. For example, with
“contiguous memory” errors look out for a large buffer cache coupled with a small procedure cache (small being
relative to your query activity, of course). Sometimes prepared statements or other user queries may suffer when
the procedure cache is unable to expand due to fully utilized buffer caches.
This is by no means a full account of SQL Server memory bottleneck investigation methodology, but rather a
helpful technique that you can use in your troubleshooting toolkit.
99
T-SQL
Each major database platform has its own slightly different version of SQL. Once you get past a basic
select, insert, update, or delete, the vendors have added some unique features to their products to allow
you to work with the data differently.
This section looks a number of interesting ways to use Transact SQL or T-SQL, SQL Server’s version of
the Structured Query Language.
Understanding the Difference Between IS NULL and =NULL James Travis 144
100
A Lookup Strategy Defined
David Sumlin
2/20/2003
Most database designs nowadays seem to have at least a few if not many lookup or reference tables. (I’ll use
these two terms interchangeably) These tables are those small tables in which you maintain your list of States, or
CustomerTypes or JobStatus or any number of valid domain values used to maintain data integrity within your
application. These reference tables usually have simple 2–4 columns with the naming convention usually
following along the lines of ID, Value, and Description, and maybe Active. (e.g. CustomerTypeID,
CustomerTypeValue, CustomerTypeDesc, CustomerTypeActive) I have seen database designs that have
hundreds of these reference tables.
There is nothing wrong with the mere existence of these tables, but they do bring some baggage along with them.
One of the considerations that happens when you have these tables is that someone has to design and approve
them. Someone then has to design, code, and approve any necessary views, and stored procedures around
them. And most of these tables, views, and stored procedures are fairly simple. There’s usually very little insert,
update, or delete (IUD) activity happening. They’re mostly used for lookups and for joins in views to represent the
entire picture of a record. In a large application you can also clutter up your list of tables with so many of these
that you begin to think that you need to have a special naming convention for objects that are lookup related. (e.g.
lkpCustomerType, lkpStates, kp_GetCustomerType, etc).
All of the previous issues that I presented were from the DBA or database developer’s perspective, but there’s
another perspective to take into consideration. The application developer’s perspective. Whether it’s a traditional
client server or internet application, the application developer usually has to create separate functions to access &
modify the data within each table, often creating separate classes to represent each table. Then the developer
needs to create an interface for the user to maintain the values within these tables. This naturally makes more
work for the developer.
I’ve created a lookup architecture that simplifies things a bit. This is not necessarily a brand new concept, but it is
something I’ve rarely seen. What I’ve done is to create two tables. I call them Look and LookType. The structure
is shown in Figure 1.
Figure 1.
101
Before I go any further, let me first explain another design and naming convention that I have. All tables that I
design have a field that is unique and is named after the table with the suffix of GID (Generated ID). This value is
usually an IDENTITY integer although can sometimes be a uniqueidentifier. This field is not necessarily the
Primary Key, although in this instance it is. My other convention is that all Foreign Key fields have the suffix FID
(Foreign ID). This field doesn’t necessarily have to have the same name as the Primary Key it references, but
usually ends up that way. So that explains the LookTypeGID, LookGID, and LookTypeFID fields. Each of the GID
fields are IDENTITY integer fields and are also the Primary Key. The LookTypeFID field is the foreign key to the
LookTypeGID field. The other convention that I have is that all foreign key values in the tables that point to the
Look table have the LID (Lookup ID) suffix. This makes it easier for me to at a glance realize where things are
related to. The main fields are the Value fields. These are normally where the reference value is stored. There is
also a description field which can be used for longer and more descriptive descriptions of the value. On both
tables there is also an Active field which can be used to either inactivate a single value in the list or an entire list.
The LookOrder field is used solely for display or sorting purposes. In lookup lists, there isn’t a standard way of
sorting things. Usually somebody wants things sorted a particular way besides alphabetical, numerical, etc. This
field is for that and is of integer data type. The other important field is the Constant field. This is a place where
you can put an easier to remember value to reference that row from your application. You don’t want to go hard
coding distinct values into your application code such as SELECT * FROM Customers WHERE CustomerTypeLID
= 2. The reason that this is bad is
Now you’re either a bit confused or you’re possibly saying “so what”? Well, first let’s put a couple of values in the
tables so that you can see an example of what I’m talking about.
102
Now, from a lookup perspective, all of the values will come from the Look table. The only table within the
database that would reference the LookType table would be the Look table. Its sole purpose is to create a
grouping identifier for the Look table values. So we can see that our List of Shippers has a LookTypeGID of 37
and has 3 shippers in it. We use the constant value of SHIPPER_UPS etc. to identify within the application which
row we’re referencing. A sample Order table would then have an integer field called ShipperLID with a possible
value of 112 for UPS. If I wanted to get the list of shippers I’d call one of my stored procedures like “EXEC
s_LookListByTypeConst ‘SHIPPERS’, NULL” (NULL representing my Active field. I can either get all of the active
records, or all the records no matter whether active or not. It defaults to Null which here means only Active)
Now, I know that there are probably a number of you who immediately see that this design breaks the 1st form of
normalization. I contend that there are always exceptions to the rule based upon applicability of the situation.
With this design, you never need to create new lookup or reference tables. You just need to add data to
preexisting tables. That then leads us to the next and most valuable aspect of this design. We can make generic
procedures that allow us to do anything and everything with these tables with a small list of stored procedures or
functions. These stored procedures or functions can then be used for all application development. This is where
it now gets interesting. I do mostly web front end applications for my data marts. I’ve created a single asp page
that has access to a VBScript class which access the stored procedures. This page then allows the application
users or managers to manage their own lookup lists. Gone are the days when the application manager asked me
to add the new CustomerType of Wholesale or add an entire lookup table and the corresponding stored
procedures, or to sort the ProjectStatus of Open to the top of the list and Closed at the bottom.
Here’s a list of the stored procedures that I use. You’ll get the gist of their meanings since I’m fairly verbose with
my object names. I’ve also included a couple of samples.
s_LookAddEdit
s_LookTypeAddEdit
s_LookDelete
s_LookTypeDelete
s_LookListByGID
s_LookListByConst
CREATE PROCEDURE s_LookValueByConst
(
@const varchar(100),
@active int = NULL,
@value varchar(1000) OUT
)
AS
SET NOCOUNT ON
103
(
@const varchar(100),
@active int = NULL
)
AS
SET NOCOUNT ON
COMMIT TRAN
GO
I also have one view called v_Look which combines the two tables and all of the values.
You’ll notice that I set the transaction isolation levels to read uncommitted. I do this since these are fairly popular
tables used in many different views and sometimes used in the same view more than once. These tables are also
fairly static and so speed is my main concern here.
Now realize that there are considerations and some requirements in order to implement this design.
1) All of your value & description fields need to be the same data type, most normally varchar. You can
obviously store numbers or dates in the value fields, but they’ll need to be cast correctly on the application for
input and output.
2) You need to have unique LookConst values.
3) You should have check constraints on tables that reference the Look table so that you can validate that only
allowable values are put into that field. (e.g. there would be nothing invalid about putting the value of 30 into the
ShipperLID field in the Order table. Unfortunately, that would then mean that the Shipper was “December”.
4) All data access code should come through the stored procedures.
5) This design currently does not take into consideration different security rights for different reference domains.
(e.g. If Mary can change CustomerType values, but not Shipper values.) This has to currently be done at the
application level.
6) This design is limited in that if you have a domain of values that require more than the designed fields, it
doesn’t work very well and you’ll be better off making an individual table for it.
7) In some of my applications I have added CreatedDate & CreatedBy & ModifiedDate & ModifiedBy fields for
auditing purposes.
8) I found this design to work very well for those application level setting tables or application / DTS global
variable tables.
I have slowly over the years refined this module of functionality to include such things as reordering the display
order if a user changes, inserts, or deletes a Look record, creating stored procedures to return not only the value
but also the description (sometimes a user may want to see a drop down list of State abbreviations, other times
the entire State name), and I am now working on the addition of functions to replicate a lot of the stored
procedures so that a developer could use the function in a SELECT query simply.
I hope this gets you to thinking on how to reduce the number of reference tables in your databases and if you have
any feedback, please let me know. I’m always interested in hearing other developers’ thoughts.
I have found, working at a company using a few hundred SQL servers, how much time I spent to track if a
database has required database- and transaction log backups. We also had a few different ways of getting reports
of the jobs, to know if they had run successfully or not. We all know the importance of standards, don't we?
104
At our company we use a monitoring product called Tivoli (www.tivoli.com) to check, for example, that SQL server
is accessible and that disks are not filled up and so on. I came to the conclusion that our dba group should use
this tool for monitoring our standard jobs for database maintenance, and also to get our Control Center to call our
emergency service whenever anything goes wrong.
We have a policy to use SQL Server Agent for backing up both databases and transaction logs (when needed);
we also provide a complete rebuild of all indexes once a week (or more often if any customer would like to), and
an update of the statistics if for any reason we can't let SQL server handle this feature itself. Tivoli can also report
to us whenever a job has not been run as scheduled for any reason. I have stripped the code from our special
things that Tivoli needed so that you can use it in your environment.
Now we take one server at the time and replace any existing maintenance job with the new ones. Doing this we
will for sure have control of the maintenance of all the sql-servers out there that we have contracts to take care of.
It's an enormous task to change all of the jobs with different times so that transactional backups do not cross any
database backup and so on. Also to have indexes rebuilt run on different times so that the server does not have to
rebuild indexes for more then one database at the time, or even different days for performance reasons. That is
why I have developed this script for saving time, and to avoid any typos or other human mistakes, no one does
those – right?
Before running the script
The script has been tested on SQL 7 and SQL 2000: create_maintenance_jobs.sql (Script available at
www.sqlservercentral.com)
Change parameters
First time you run the script I suggest that you take a close look at all the parameters described, and make
necessary changes before running the script. The section for changing the parameters is found in the script under
"Variables that are OK to change".
105
@description Description for all the scheduled jobs
created
@owner_login_name Name of the user that will own and If the user is a member of the sysadmin role the user rights of the user that
execute the scheduled job run SQL Server Agent will be used. If not a member of sysadmin role the
proxy user (if present) will be used. The user that executes the scheduled
job has to have write access to both the backup folder as well as to the
folder @workdir\LOG.
@notify_level_eventlog Should the jobs write any record to 0=never
the NT eventlog 1=on success
2=on failure
3=always
@workdir The script will check the registry for The account that execute the script has to have read permissions in the
the SQL Installation path, normally: registry.
"C:\Program Files\Microsoft SQL
Server\MSSQL". If changing this variable, make sure you un-comment that row in the script.
Permissions
Make sure that all permissions are met in the description of the parameters. The user that executes the script
might also have to have the correct permissions to create the @workdir directory and its subfolders "JOBS" and
"LOG" if not present on disk.
The first thing the script will do is to check that anything that the script creates do not already exist at the server. If,
for example, there is a scheduled job with the same name as any of the ones that will be created, you will be
prompted to delete the scheduled job (or simply rename it). The same goes for the stored procedures that will be
created in all the user databases. Make sure to read each line carefully before running the output in another
window or, as I would suggest, delete (or rename) everything by hand. The output might look something like:
-- 1. Delete stored procedure 'REBUILD_INDEX' in database 'LSIPT100'
use LSIPT100
drop proc REBUILD_INDEX
go
-- 2. Delete stored procedure 'REBUILD_INDEX' in database 'DOCUMENTS'
use DOCUMENTS
drop proc REBUILD_INDEX
go
Scheduled Jobs
All scheduled jobs will stop if any of the steps fail, and reports the error as set in the parameters
@notify_level_eventlog. No scheduled jobs will be created for the example databases pubs and Northwind. Note
that "database" stands for the name of the database the scheduled job affects.
DATABASE TY
NAME STEP DESCRIPTION
PE
106
System BACKUP - 1. DBCC Run DBCC CHECKCATALOG
User database - (DBCC) CHECKCATALOG -
datbase
2. DBCC CHECKDB - Run DBCC CHECKDB
database
3. BACKUP - database Will perform a full backup of the database to disk.
- DATABASE
The filename will describe what database it's used for and what day the
backup started:
database_BACKUP_20030718.BAK
Note that no backup will be performed if any error is found in one of the
two dbcc checks.
4. DELETE OLD All database backups for this database older then
DATABASE BACKUPS @keep_databasebackup_days will be deleted.
- database
User BACKUP - 1. DELETE OLD All transaction log backups for this database older then
database - TRANSACTION LOGS @keep_transactionlog_days will be deleted.
TRANSACTION - database
2. BACKUP - database Will perform a transaction log backup of the database to disk. The
- TRANSACTION scheduled job will be created but disabled if database option "Truncate Log
On Checkpoint" is enabled.
The filename will describe what database it's used for and what day the
backup started:
database_BACKUP_TRANSACTION_20030718.TRN
Note that script will append all transaction log backups for each day in one
file per day.
User REBUILD INDEX 1. REBUILD INDEX - Will run the stored procedure REBUILD_INDEX in database and rebuild
- database database all indexes using the default fillfactor used to create the index.
User UPDATE 1. UPDATE Will run the stored procedure UPDATE_STATISTICS in database and
STATISTICS - STATISTICS - database update all the statistics for the database.
database
Note that this job will only be created if for any reason the options has been
disabled for SQL-server to perform this by itself.
open tnames_cursor
fetch next from tnames_cursor into @tablename, @tableowner
while (@@fetch_status <> -1)
begin
if (@@fetch_status <> -2)
begin
select @tablename_header = '***** Updating ' + rtrim(upper(@tablename)) + ' ('
+ convert(varchar, getdate(), 20) + ') *****'
107
print @tablename_header
select @sql = 'dbcc dbreindex ( ''' + @tableowner + '.' + @tablename +
''','''',0 )'
exec ( @sql )
end
fetch next from tnames_cursor into @tablename, @tableowner
end
print ''
print ''
print '***** DBReindex have been updated for all tables (' + convert
(varchar,getdate(),20) + ') *****'
close tnames_cursor
deallocate tnames_cursor
UPDATE_STATISTICS
The stored procedure update all the statistics in the database. Note that this stored procedure will only be created
if for any reason the options has been disabled for SQL-server to perform this by itself.
create procedure UPDATE_STATISTICS
as
declare @tablename varchar(255)
declare @tableowner varchar(255)
declare @tablename_header varchar(600)
declare @sql varchar(600)
declare tnames_cursor CURSOR FOR
select 'tablename'=so.name,
'tableowner'=su.name
from dbo.sysobjects so
inner join dbo.sysusers su on so.uid = su.uid
where so.type = 'U'
open tnames_cursor
fetch next from tnames_cursor into @tablename, @tableowner
while (@@fetch_status <> -1)
begin
if (@@fetch_status <> -2)
begin
select @tablename_header = '***** Updating ' + rtrim(upper(@tablename)) + ' (' +
convert(varchar, getdate(), 20) + ') *****'
print @tablename_header
select @sql = 'update statistics ' + @tableowner + '.' + @tablename
exec ( @sql )
end
fetch next from tnames_cursor into @tablename, @tableowner
end
print ''
print ''
print '***** Statistics has been updated for all tables (' + convert
(varchar,getdate(),20) + ') *****'
close tnames_cursor
deallocate tnames_cursor
Tables
Following tables will be created in tempdb and dropped when the script finishes. Note that if one of the tables
already exists in tempdb it will be dropped without any notification:
temporary_table_directory
temporary_table_db
temporary_table_dbsize
temporary_table_sproc
108
Logs
Each step in every scheduled job will generate a log-file in the default SQL-server installation folder "LOG". The
name convention is the name of the step followed by the file extension ".LOG". All white spaces in the name are
replaced with an underscore "_". All the steps are set to overwrite any existing logfile with the same name. The
easiest way to access the correct log is to right-click the job, select the Steps tab. Double-click the desired step
and select the Advanced tab and click the button View.
Do only those backups exist on disk that should exist according to the variables @keep_databasebackup_days
and @keep_transactionlog_days was set to?
Re-schedule jobs
Since the variable @backup_mb_per_sek was an estimation, you might have to re-schedule some of the jobs, if
you feel that it's not OK for some of the jobs to conflict with each other.
SUMMARY
You should of course customize the jobs created, or the script, to meet your company's need. I have, as
mentioned before, stripped the script of everything that I do not find useful for everyone. You could, for example,
set the jobs to notify you by email or net send (Notification tab in scheduled job properties).
Note that the script uses the undocumented extended stored procedures "xp_regread" and "xp_fileexist". The
article of Alexander Chigrik provides an explanation of these two procedures among with some other
undocumented extended stored procedures.
109
Creating a PDF from a Stored Procedure
M Ivica
8/26/2003
This article explains how to create a stored procedure that will in turn create a simple column based report in PDF
without using any external tools or libraries (and their associated licensing costs!).
SQL2PDF makes a PDF report from text inserted in the table psopdf ( nvarchar(80) ). First a table named psopdf
should be created.
After that create the stored procedure SQL2PDF. And table psopdf has to be filled with your data as shown in
examples below. At the end the stored procedure is called using the file name only (not extension).
EXAMPLE 1:
After INSERT call the stored procedure with file name demo2.
EXEC sql2pdf 'demo2'
The result is in your C:\ directory.
110
(1)
EXAMPLE 2:
Second example uses a database pubs.
USE pubs
INSERT psopdf(code) SELECT t1.au_lname + ' ' + t1.au_fname + ' ' + t1.phone
+
' ' + t1.address + ' ' + t1.city + ' ' + t1.state + ' ' + t1.zip FROM
authors t1, authors t2
After INSERT call the stored procedure with file name demo1.
EXEC sql2pdf 'demo1'
The result is in your C:\ directory.
111
(2)
A simple task, I thought, but it took me to some interesting places. The method is broadly this:
1) Create an instance of SQL-DMO SQL Server, and use the script method to save the create table text in a file.
2) Get the text from the file into a sp variable.
3) Delete the text file.
Here are the details of the method, and a summary which puts it all together:
1) Create an instance of SQL-DMO SQL Server, and use the script method to save the create table text in a file.
Here's the usage:
exec run_script 'my_server', 'my_database', 'my_table', 74077, 'my_path_name'
112
set @is_error = 0
--connect to sql server using windows nt and verify the connection EXEC
@i = sp_OASetProperty @object, 'LoginSecure', 1
IF NOT @i = 0 EXEC sp_OAGetErrorInfo @object
EXEC @i = sp_OAMethod @object, 'Connect', NULL, @server
IF NOT @i = 0 EXEC sp_OAGetErrorInfo @object
EXEC @i = sp_OAMethod @object, 'VerifyConnection', @return OUT
IF NOT @i = 0 EXEC sp_OAGetErrorInfo @object
return @is_error
GO
2) Get the text from the file into a sp variable. My first try was to use the
FileSystemObject...
CREATE proc get_from_file @file_output varchar(8000) output,
@path_name varchar(200) as
--outputs all the text of a file concatenated into a single string.
--Note - 255 character limitation.
DECLARE @file_output varchar(8000)
DECLARE @fso int
DECLARE @ts int
DECLARE @i int
113
set nocount on
--get_unique_name for temporary table
declare @unique_table_name varchar(100)
exec get_unique_name @unique_table_name output
set @unique_table_name = '##' + @unique_table_name
--create concatenated string and puts it into the table
exec('create table #t1 (c1 varchar(8000))
output
--drop our temporary table
exec ('drop table ' + @unique_table_name)
select @output =
replace(system_user, '\', '_') + '_' +
cast(datepart(yyyy, getdate()) as varchar(4)) + '_' +
cast(datepart(mm, getdate()) as varchar(2)) + '_' +
cast(datepart(dd, getdate()) as varchar(2)) + '_' +
cast(datepart(hh, getdate()) as varchar(2)) + '_' +
cast(datepart(mi, getdate()) as varchar(2)) + '_' +
cast(datepart(ss, getdate()) as varchar(2)) + '_' +
cast(datepart(ms, getdate()) as varchar(3))
GO
3) Delete the text file. This uses a familiar method. This time there are no limitations. Here's the usage...
114
select @object_text
–outputs a create table script for a sql table. To do this, it runs a script to
put it into a file, then gets it from the file and deletes the file
--create the 'create table' script and put it into sql file
exec @return = run_script @server, @database_name, @table_name, 74077, @path_name
Introduction
DateTime always gives headaches to the database developers, because of their various combinations. Here is
another problem of Datetime. I have given a solution and it is open to discussion. It will be highly appreciated if
you can share your ideas to this and your solutions.
So we need a solution to identify the actual time. One way of doing this is by keeping the Location with each
record. Then we know that we can get the actual time. But, as you can imagine, it will be a tedious task. What if
we can keep a common time for all of them ? We can keep all the Datetime in Universal Time Coordinate, better
known as Greenwich Mean Time.
115
As far as end users are concerned, it will be difficult them to deal with GMT as they are already used to their own
system time. So the option would be to display the system time and convert them to GMT when storing to
database as well as convert when reading from the database.
If you are saving current time (GetDate()) now you have to save the current GMT Time(GetUTCDate())
If you are saving user defined time like reservation time of guests, then you must convert this time to GMT.
Following Stored Procedure will convert the current time to GMT.
Conclusion
Next version of SQL Server has to cater for more functions for the Time Zone. Then SQL Server will be more user
friendly as far as end users and DBAs are concerned and will make DBA's job much easier.
116
Find Min/Max Values in a Set
Dinesh Asanka
11/21/2003
In Oracle Magazine there was a discussion about finding the Nth Max or Min value from a value set. After three
issues of the magazine it came across with the following query as the solution to the problem.
Select Min(Col1) From
(Select Col1 From (Select Distinct Col1 From Tab1 Order By Col1 Desc)
Where RowNum <=&N
I was trying to do the same with SQL Server. But I found that there is no field name called ROWNUM in
SQLServer. Then I posted it into the Discussion board of SQLServerCentral.Com. You can see that link named
RowNum Function in SQLServer. After studying this discussion I felt that there is no direct method of doing it like
in Oracle. This might be included in the next version of the SQL Server!
I wrote the below query to get the result set starting from minimum to maximum along with sequence number.
select rank=count(*), s1.number from (Select distinct(number) from NMaxMin)
s1,
(Select distinct(number) from NMaxMin) s2 where s1.number >= s2.number
group by s1.number order by 1
After running the query the output will be like in Figure 2.
117
Now you can see there are only 11 records (Previously there were 14 records. This has happened because there
are 2 records of 1’s and 3 records of 45’s. From the above table now it will be easy to find out the Nth maximum. If
you want the 5th maximum value, query will be:
I recently had the basic need to retrieve a record from the database at random. What seemed to be an easy task
quickly became a complex one. This case showed an interesting quirk with T-SQL that was resolved in an equally
quirky way. This quick article shows you a method to retrieve random data or randomize the display of data.
Why would you ever want to retrieve random data?
• In my case, I wanted to pull a random article to display on this site’s homepage
• Choose a random user to receive a prize
• Choose a random employee for a drug test
118
The problem with retrieving random data using the RAND() function is how it’s actually used in the query. For
example, if you run the below query against the Northwind database, you can see that you will see the same
random value and date for each row in the results.
SELECT TOP 3 RAND(), GETDATE(), ProductID, ProductName
FROM Products
Results:
0.54429273766415864 2003-03-19 15:06:27.327 17 Alice Mutton
0.54429273766415864 2003-03-19 15:06:27.327 3 Aniseed
Syrup
0.54429273766415864 2003-03-19 15:06:27.327 40 Boston Crab
Meat
This behavior prohibits the obvious way to retrieve random data by using a query like this:
Results in:
ProductID ProductName
----------- ----------------------------------------
17 Alice Mutton
3 Aniseed Syrup
40 Boston Crab Meat
If you execute this query over and over again, you should see the same results each time. The trick then is to use
a system function that doesn’t use this type of behavior. The newid() function is a system function used in
replication that produces a Global Unique Identifier (GUID). You can see in the following query that it produces
unique records at the row-level.
Results in:
ProductID ProductName
------------------------------------ -----------
----------------------------------------
8D0A4758-0C90-49DC-AF3A-3FC949540B45 17 Alice Mutton
E6460D00-A5D1-4ADC-86D5-DE8A08C2DCF0 3 Aniseed Syrup
FC0D00BF-F3A2-4341-A584-728DC8DDA513 40 Boston Crab Meat
You can also execute the following query to randomize your data (TOP clause optional):
SELECT TOP 1 ProductID, ProductName
FROM products
ORDER BY NEWID()
Results in:
ProductID ProductName
----------- ----------------------------------------
7 Uncle Bob's Organic Dried Pears
Each time you fire it off, you should retrieve a different result. There’s also an additional way to actually use the
rand() function that Itzik Ben-Gan has discovered using user defined functions and views as a workaround. The
secret there is to produce a view that uses the rand() function as shown below:
CREATE VIEW VRand
AS
SELECT RAND() AS rnd
GO
Then create a user defined function (only works in SQL Server 2000) that selects from the view and returns the
random value.
CREATE FUNCTION dbo.fn_row_rand() RETURNS FLOAT
AS
119
BEGIN
RETURN (SELECT rnd FROM VRand)
END
To use the function, you can use syntax as shown below to retrieve random records.
How many times have you said this to yourself (or to your boss/colleagues)? How often have others said this to
you? The fact is that, while it’s true that T-SQL has its limitations when compared to a “real” programming
language like C++, you might be amazed at the tricks you can pull off with T-SQL if you’re willing to do some
digging and trial-and-error experimentation.
Such an opportunity to push the limits of T-SQL came to me just yesterday at my job. I work at a talent agency,
and one of the databases I manage stores information relating to performing artists’ and bands’ live performances
(or “personal appearances” as it’s called in the industry). A personal appearance consists of such attributes as
venue locations, venue capacities, show dates, and count of tickets sold (on a per-day as well as on an aggregate
basis).
The users wanted a Crystal Report that would display some basic header information about an artist’s or band’s
upcoming shows. They also wanted one of the columns to display a carriage-return-delimited list (one item per
line) of ticket count dates, along with (for each date shown) the total number of tickets sold for that date and the
number sold on that particular day. The original spec called for a list of the last five ticket-count dates, with the
intent that it would look something like this:
10/16/03 - 1181 (19)
10/15/03 - 1162 (14)
10/14/03 - 1148 (28)
10/13/03 - 1120 (9)
10/10/03 - 1111 (10)
The number to the immediate right of each date represents the total number sold to that date, and the number in
parenthesis represents the number sold on that day. The dates are in descending order, counting downward from
the most recent ticket-sale-count date. You can see, for example, that, on 10/16/2003, 19 tickets were sold,
bringing the total up from the previous day from 1162 to 1181.
I assumed that this would be fairly simple with SQL Server 2000 – but it actually proved to me more complex than
I thought. I wanted my solution to be set-oriented, if at all possible, avoiding cursors and loops. Luckily I was able
to create a UDF to perform this task, as the following code shows:
CREATE FUNCTION GetLast5TicketCounts
(@ShowID INT, @ClientName VARCHAR(200))
RETURNS VARCHAR(8000)
AS
BEGIN
120
SELECT TOP 5 @Counts = ISNULL(@Counts + '<BR><BR><BR>', '') +
CONVERT(VARCHAR(100), CountDate, 101) + ' - ' + CAST(TicketCount AS VARCHAR(10)) +
ISNULL(' (' +
CAST(TicketCount - ISNULL( (SELECT MAX(tc2.TicketCount)
FROM vTicketCount tc2
WHERE tc2.ShowID = @ShowID
AND tc2.ClientName = @ClientName
AND tc2.CountDate < tc1.CountDate), 0) AS VARCHAR(100)) + ')', '')
FROM vTicketCount tc1
WHERE ShowID = @ShowID
AND ClientName = @ClientName
ORDER BY CountDate DESC
END
As you can see, this took quite a bit of coding! Note the use of the inner subquery, whose result (the previous
day’s total to-date ticket count) is subtracted from the current day’s total to-date count in order to obtain the ticket
count for the current day only. If the date in question happens to be the first date of ticket sales (meaning that
there is no previous ticket-count date), then ISNULL forces it to return 0, and nothing is subtracted.
Also note the use of the HTML <BR> tags in the code in order to force a carriage-return/line-break after each item.
The reason for this was that the T-SQL function CHAR(13) doesn’t seem to work with fields in Crystal Reports –
but, Crystal fields can be set to be HTML-aware. Thus I make liberal use of HTML tags when I’m coding queries
that are to be used in reports. (For some reason, I find that I need to use three <BR> tags in order to effect a line
break).
I incorporated this UDF into the stored procedure that drives the report and assumed that my work was done.
Then my boss informed me that the users are probably not going to want to be limited to seeing just the last five
ticket-count dates; they will probably want to specify how many “last ticket-count” dates to show! Uh oh. I knew
that this would require dynamic SQL if I were to use the same basic code layout (in a TOP n clause, you can’t use
a variable as n – you have to write “EXEC(‘SELECT TOP ‘ + @variable + ‘)”). I also knew that you cannot use
dynamic SQL in a UDF – nor can you use the SET ROWCOUNT n statement. So I explained to my boss that the
users’ request was probably not possible with SQL Server, but I would do my best to find a way.
With a little experimentation, I discovered that this operation (allowing the user to specify the number of records to
return) could indeed be performed in a UDF – but it required coding a WHILE loop, something I was trying to avoid
(I try to stick with pure set-oriented operations as much as possible – WHILE loops and cursors are terribly
inefficient in T-SQL as compared with set-oriented solutions). Here is the code I came up with:
CREATE FUNCTION GetLastNTicketCounts (@ShowID INT, @ClientName VARCHAR(200),
@NumCounts INT)
RETURNS VARCHAR(8000)
AS
BEGIN
INSERT INTO @t
SELECT *
FROM ( SELECT TOP 100 PERCENT TicketCount, CountDate
FROM vTicketCount
WHERE ShowID = @ShowID
AND ClientName = @ClientName
ORDER BY CountDate DESC
) t
121
BEGIN
SELECT @Counts = ISNULL(@Counts + '<BR><BR><BR>', '') +
CONVERT(VARCHAR(100), CountDate, 101) + ' - ' +
CAST(TicketCount AS VARCHAR(10)) + ISNULL(' (' +
CAST(TicketCount - ISNULL( (SELECT MAX(tc2.TicketCount) FROM @t tc2
WHERE tc2.CountDate < tc1.CountDate), 0) AS VARCHAR(100)) + ')', '')
FROM @t tc1
WHERE ID = @Counter
SET @Counter = @Counter + 1
END
END
Note the use of a table variable to store the results (ordered in descending order of the count date and including
an IDENTITY column for convenience in incrementally stepping through the data in the loop). Since we are only
dealing with a small amount of data (it’s unlikely that the report will contain more than 50 records or that the user
will opt to see more than 10 count dates per record), the addition of the loop did not cause any real performance
hit.
Feeling like a hero as I presented this to my boss, I then got the bomb dropped on me when I was told that the
user would not only want to see n number of last ticket-sale dates – they would also want to see (in the same field)
n number of first ticket-sale dates, too (that is, dates counting up from the first date of ticket sales)! Oh, and could
I also make sure a neat little line appears separating the set of “last” dates from the set of “first” dates?
I knew I was in trouble on this one, because, in order for this to work, the first part of the query (which would return
the “first” set of dates) would need to have the data sorted in ascending order by date in order to work properly –
just like the second part of the query (which returns the “last” dates) would need the data sorted in descending
order by date. After much experimentation, attempting to coalesce the two resultsets together into the @Counts
variable via adding a second WHILE loop pass (and ending up each time with the dates in the wrong order and
hence inaccurate count figures – even though I was using ORDER BY), I discovered that I could get around this
by declaring two separate table variables – each sorted ascendingly or descendingly as the case required.
Since I already had one half of the equation figured out (how to display the last n ticket count dates/figures), I only
needed to “reverse” my logic in order to display the first n dates/figures. Rather than combining both operations
into one monster UDF, I decided to create a second UDF to handle returning the “first” dates, and then
concatenate the results of each together in the main stored procedure. Here is the code – which as you can see is
nearly identical to that of the previous UDF, with the exception of the bolded text:
CREATE FUNCTION GetFirstNTicketCounts
(@ShowID INT, @ClientName VARCHAR(200), @NumCounts INT = NULL)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @t TABLE
(ID INT IDENTITY, TicketCount INT, CountDate DATETIME)
INSERT INTO @t
SELECT *
FROM (SELECT TOP 100 PERCENT TicketCount, CountDate
FROM vTicketCount
WHERE ShowID = @ShowID
AND ClientName = @ClientName
ORDER BY CountDate ASC
) t
122
WHILE @Counter <= @NumCounts
BEGIN
SELECT @Counts = ISNULL(@Counts + '<BR><BR><BR>', '') +
CONVERT(VARCHAR(100), CountDate, 101) + ' - ' +
CAST(TicketCount AS VARCHAR(10)) + ISNULL(' (' +
CAST(TicketCount - ISNULL( (
SELECT MAX(tc2.TicketCount)
FROM @t tc2
WHERE tc2.CountDate < tc1.CountDate
), 0) AS VARCHAR(100)) + ')', '')
FROM @t tc1
WHERE ID = @Counter
One other minor issue remained: how to display the separator line between the “first” dates and the “last” dates.
This line should only be displayed if the user has opted to display both the “first” count dates and the “last” count
dates (there would be no need for a separator line if only one set of count dates were being displayed, or if no
dates at all were being displayed). I added the following code to the stored procedure:
DECLARE @LineBreak VARCHAR(100)
SET @LineBreak =
CASE
WHEN ISNULL(@FCNT, 0) = 0 OR ISNULL(@LCNT, 0) = 0
THEN ''
ELSE '<BR><BR><BR>----------------------------<BR><BR><BR>'
END
Note that the @FCNT and @LCNT variables represent the number of “first” and “last” ticket count dates to
display, respectively. I then added this line of code to the SELECT portion of the procedure to concatenate it all
together:
NULLIF(
MAX(dbo.GetFirstNTicketCounts(tc.ShowID, @CN, @FCNT)) + @LineBreak +
MAX(dbo.GetLastNTicketCounts(tc.ShowID, @CN, @LCNT)), @LineBreak
) AS TicketCount
Here is the entire code of the resulting stored procedure:
AS
SET @LineBreak =
CASE
WHEN ISNULL(@FCNT, 0) = 0 OR ISNULL(@LCNT, 0) = 0
THEN ''
ELSE '<BR><BR><BR>----------------------------<BR><BR><BR>'
END
123
SELECT ClientName,
CONVERT(VARCHAR, ShowDate, 101) AS ShowDate,
VenueName,
Contact,
Phone,
VenueCityState,
Capacity,
CAST((MAX(TicketCount) * 100) / Capacity AS VARCHAR(10)) + '%' AS PctSold,
NULLIF(
MAX(dbo.GetFirstNTicketCounts(tc.ShowID, @CN, @FCNT)) + @LineBreak +
MAX(dbo.GetLastNTicketCounts(tc.ShowID, @CN, @LCNT)), @LineBreak
) AS TicketCount
FROM vTicketCount tc -- a view joining all the relevant tables together
LEFT JOIN PATC_Contacts c ON tc.ShowID = c.ShowID
WHERE ClientName = @CN
AND (ShowDate >= GETDATE() OR ISNULL(@FDO, 0) = 0)
GROUP BY ClientName,
tc.ShowID,
ShowDate,
VenueName,
Contact,
Phone,
VenueCityState,
Capacity
Result?
The report now returns exactly the data that the users wanted (including the “neat little line break”), while still
performing efficiently! Here is a partial screenshot showing a few columns of the Crystal Report (run with the user
opting to see the first 5 and last 5 count dates). Notice the far left-hand column:
The moral of this story is: I’ve learned to not be so quick to “write off” a programming challenge as being beyond
the scope of T-SQL. I’ve learned not to palm coding tasks off onto the front-end developers without thoroughly
experimenting to see if, by any possible way, the task can be performed efficiently on the database server. And in
the process I’ve significantly minimized (if not eliminated altogether) those instances in which I’m tempted to raise
my hands in frustration and declare those dreaded six words:
“That can’t be done in SQL.”
Having the honor of working for quite a few companies that did not have the resources to buy any of the nice SQL
Server toys that exist out there or were willing to put an email client on the servers, I have found myself spending
a good deal of time each morning checking the status of the numerous jobs running on my servers. Not a hard
thing to accomplish, but very time consuming when you are talking about dozens of servers with hundreds of jobs.
Maybe it was just me, but no matter how much I pleaded at some of these companies, they would go through the
red-tape to get an email client put on the SQL Servers so I could use the job notification ability to send me a nice
email each morning if a particular job failed. Being the poor companies' DBA, I had to come up with something
else.
The one computer that usually had email abilities was my local desktop; funny how they always made sure I could
get the hundreds of emails telling me what to do each day. To solve my problem, I made use of my desktop and
created a system that checked the outcome of all the jobs across all my servers and sent me a nice little report
each morning.
The first thing I did was to connect to my local msdb database and create a table to hold the report information.
You can adjust the table how you want to since I just included the basic information.
IF OBJECT_ID('tJobReport') IS NOT NULL
DROP TABLE tJobReport
GO
124
CREATE TABLE tJobReport
(
lngID INTEGER IDENTITY(1,1)
,server VARCHAR(20)
,jobname VARCHAR(50)
,status VARCHAR(10)
,rundate VARCHAR(10)
,runtime CHAR(8)
,runduration CHAR(8)
)
GO
Given the nature of some the schedules for the job, I felt like this would grow into a sizable table in a very short
time so I created a clustered index to speed the data retrieval up.
CREATE CLUSTERED INDEX tJobReport_clustered
ON tJobReport(server,jobname,rundate,runtime)
GO
Next, create a stored procedure that will populate your new table. This example makes use of linked servers to job
information and job history from each of my servers; you could change the linked server format over to
OPENDATASOURCE if you like.
--Server 1
INSERT INTO tJobReport (server, jobname, status, rundate, runtime, runduration)
SELECT sj.originating_server, sj.name,
--What is it in English
CASE sjh.run_status
WHEN 0 THEN 'Failed'
WHEN 1 THEN 'Succeeded'
WHEN 2 THEN 'Retry'
WHEN 3 THEN 'Canceled'
ELSE 'Unknown'
END,
125
SUBSTRING(RIGHT('000000' + CAST(run_time AS VARCHAR(10)),6),3,2) + ':' +
RIGHT(RIGHT('000000' + CAST(run_time AS VARCHAR(10)),6),2)
--Job history
INNER JOIN msdb.dbo.sysjobhistory sjh
ON sj.job_id = sjh.job_id
--Server 2
INSERT INTO tJobReport (server, jobname, status, rundate, runtime, runduration)
SELECT sj.originating_server, sj.name,
--What is it in English
CASE sjh.run_status
WHEN 0 THEN 'Failed'
WHEN 1 THEN 'Succeeded'
WHEN 2 THEN 'Retry'
WHEN 3 THEN 'Canceled'
ELSE 'Unknown'
END,
126
RIGHT(RIGHT('000000' + CAST(run_duration AS VARCHAR(10)),6),2)
--Job history
INNER JOIN dev2.msdb.dbo.sysjobhistory sjh
ON sj.job_id = sjh.job_id
If you want an automatic email sent to you, just configure SQL Mail on your desktop and create a new job or new
job step that uses the xp_sendmail system stored procedure to run a basic query.
EXEC master.dbo.xp_sendmail @recipients = 'randydyess@transactsql.com',
@message = 'Daily Job Report',
@query = '
SELECT status,server, jobname
FROM msdb.dbo.tJobReport
WHERE status = 'Failed'
AND rundate > DATEADD(hh,-25,GETDATE())',
@subject = 'Job Report',
@attach_results = 'TRUE'
So, if you have the same bad luck in getting those great tools out there or want a centralized way to keep in
control of your job outcomes and history, this simple technique can go along way in helping you quickly manage
those hundreds of jobs we all seem to accumulate over time.
You can find out more about sysjobs. sysjobhistory and xp_sendmail in my last book Transact-SQL Language
Reference Guide.
Copyright 2003 by Randy Dyess, All rights Reserved
www.TransactSQL.Com
127
Multiple Table Insert
Narayana Raghavendra
11/18/2003
You Want To INSERT Data into More Than One Table. You want to include conditions to specify all tables that
participates as “Destination” in Multi Table Insert part. This Stored Procedure can insert rows into any number of
tables based on the source table with or without conditions.
SP Script
CREATE PROCEDURE SP_MULTI_INSERTS
(@SUB_QUERY AS VARCHAR(2000),
@INSERT_PART AS VARCHAR(2000),
@DELIMITER AS VARCHAR(100),
@ERRORMESSAGE AS VARCHAR(2000)
)
AS
--VARIABLES DECLARATION
DECLARE @SAND AS VARCHAR(10)
DECLARE @SSTR AS VARCHAR(2000)
DECLARE @SSTR2 AS VARCHAR(2000)
DECLARE @SSTR3 AS VARCHAR(2000)
DECLARE @SSQL AS VARCHAR(2000)
DECLARE @SUB_QUERY2 AS VARCHAR(2000)
IF LEN(@INSERT_PART) = 0 OR LEN(@SUB_QUERY) = 0
BEGIN
SET @ERRORMESSAGE = 'INCOMPLETE INFORMATION'
RETURN -1
END
SET @LASTPOS = 0
SET @SAND = ' '
--CHECK WHETHER SUBQUERY I.E. SOURCE DATA QUERY HAS WHERE CONDITION
IF CHARINDEX(' WHERE ', @SUB_QUERY) > 0
BEGIN
IF CHARINDEX(' WHERE ', @SUB_QUERY) > CHARINDEX(' FROM ', @SUB_QUERY)
SET @SAND = ' AND '
END
ELSE
SET @SAND = ' WHERE '
128
SET @SSTR = SUBSTRING(@INSERT_PART, @LASTPOS2, @LASTPOS-@LASTPOS2)
--CHECK WHETHER 'WHERE' CONDITION REQUIRED FOR INSERT SQL
IF LEFT(@SSTR, 5) = 'WHEN '
BEGIN
SET @SUB_QUERY2 = @SUB_QUERY + @SAND + SUBSTRING(@SSTR, 5, 2001)
SET @LASTPOS2 = @LASTPOS
SET @LASTPOS3 = CHARINDEX(@DELIMITER, @INSERT_PART, @LASTPOS+LEN
(@DELIMITER))
IF @LASTPOS3 = 0
SET @SSTR = SUBSTRING(@INSERT_PART, @LASTPOS2+LEN(@DELIMITER), 2001)
ELSE
SET @SSTR = SUBSTRING(@INSERT_PART, @LASTPOS2+LEN(@DELIMITER),
@LASTPOS3 - (@LASTPOS2+LEN(@DELIMITER)))
SET @LASTPOS = @LASTPOS3
END
ELSE
BEGIN
SET @SUB_QUERY2 = @SUB_QUERY
END
END
--LOOP ENDS
Parameters
129
@DELIMITER Delimiter value that delimits multiple inserts and where conditions
@ErrorMessage [INPUT/OUTPUT Parameter] Any error during the SP execution.
Returns
Algorithm
a) Accepts parameters for Source dataset, destination table with/without conditions, and the delimiter string that
delimits the table, column names and where conditions.
b) Check the parameters passed, if the information is improper or incomplete, return error.
c) Check whether the subquery i.e. source data set has the where condition in the Query, this is to identify whether
to add "And" or "Where" as condition if the user has given any conditions in Source sub query itself.
d) Loop till the insertion of Rows into destination tables is completed.
• Get the sub string of Multiple Table insertion string by using the Delimiter. The character position of the
Delimiter is recorded in a variable, later it is used to find the next delimiter to extract either When" or "Into"
sub string
• If the extracted sub string starts with 'When ' that means user is giving a filter condition while inserting
rows into that particular table. Include that filter condition to the source dataset query.
• The next delimited part contains the column name and value list that needs to be inserted into a table.
Manipulate the Destination table parameter to construct an "Insert" SQL statement.
• Execute the constructed Insert statement, and check for errors.
• Exit the loop if the insertion to multiple tables finished the last insertion.
Base logic in SP
Inserting Rows Using INSERT...SELECT. The “Insert..Select” sql statement is constructing using @Insert_part
parameter with little manipulation.
Example
This example uses the “Employee” table in the Northwind database. The structure(without constraints) of the
Employee table is copied to Employee2 and Employee3 to try out an example. This example copies the
LastName, FirstName data from Employees table To Employees1 – If the EmployeeID in Employees table is less
than 5, To Employees2 – if the EmployeeID in Employees table is greater than 4
Result
(EmployeeID in Employee1 and Employee2 table is generated because it is an Identity column, increments by 1)
130
In this example, rows will be inserted into the SalaryHistory table only when the value of the
Salary is greater than 30000 (the annual salary of the employee is more than 30,000). Rows
will not be inserted into the ManagerHistory table unless the manager ID is 200.
Usage
• To achieve insertion to multiple tables in a single shot. As the functionality is written in a Stored
Procedure, the task is performed little faster.
• It is has similar functionality to Oracle 9i “Multi Table Insert” feature, you can use this as an alternate if you
are migrating from Oracle 9i to MS SQL Server 2000. This SP is more tuned to accept Column names of
Tables in Insert Parameter, and you can give condition to specific/all tables that participates in Multi table
insert Destination part.
Note
Maintain the sequence of “When”(Optional) and “Into” part in @Insert_Part parameter with proper delimiter after
every “When” and “Into” key words.
131
Reusing Identities
Dinesh Priyankara
2/18/2003
In most table designs, Identity columns are used to maintain the uniqueness of records. There is no problem with
insertion and modification of data using an identity column. With deletions though, gaps can occur between
identity values. There are several ways to reuse these deleted (removed) identity values. You can find a good
solution in Books Online but I wanted to find a new way and my research ended up with a good solution. After
several comparisons, I decided to continue with my solution. So, I'd like to share my method with you all and let
you decide what solution to use.
First of all, let’s create a table called ‘'OrderHeader'’ that has three columns. Note that the first column intID is
identity type column.
IF OBJECT_ID('OrderHeader') IS NOT NULL
DROP TABLE OrderHeader
GO
CREATE TABLE OrderHeader
(intID int IDENTITY(1,1) PRIMARY KEY,
strOrderNumber varchar(10) NOT NULL,
strDescription varchar(100))
Now let’s add some records to the table. If you want, you can add small amount of records but I added 10000
records because most tables have more than 10000 records and we must always try to make our testing
environment real.
DECLARE @A smallint
SET @A = 1
WHILE (@A <> 10001)
BEGIN
INSERT INTO OrderHeader
(strOrderNumber,
strDescription)
VALUES
(‘OD-' + CONVERT(varchar(3), @A), -- Adding something for Order Number
'Description' + CONVERT(varchar(3), @A)) -- Adding something for
Description
SET @A = @A + 1
END
OK. Let’s delete some randomly selected records from the table.
DELETE OrderHeader WHERE intID = 9212
DELETE OrderHeader WHERE intID = 2210
DELETE OrderHeader WHERE intID = 3200
If you run now a simple select query against the table, you will see some gaps between the column intID values.
Now it is time to find these gaps and reuse. As I mentioned above there are two methods (or more methods if you
have already done in some other way). First let’s see the BOL example.
Method 1
DECLARE @NextIdentityValue int
Output:
NextIdentityValue
--------------------
2210
132
This is a very simple query. You can find the first deleted identity value and can reuse it. But remember you have
to set the IDENTITY_INSERT ON that is allowed to explicit values to be inserted into identity column.
SET IDENTITY_INSERT OrderHeader ON
Output:
NextIdentityValue
--------------------
2210
This is a very simple query too. I have used RIGHT OUTER JOIN to join the OrderHeader table with tb_Numbers.
This join causes to return all rows (numbers) from tb_Numbers table. Then I have used some search conditions
(WHERE clauses) to get the correct result set. This result set contains all missing values in intID column. By
using TOP 1, we can get the desired result.
You can do the insertion the same way as I have done in Method 1.
Now it is time to compare these two methods. I simply used STATISTICS IO and the EXECUTION TIME to get
the evaluation.
133
Comparison
DECLARE @StartingTime datetime, @EndingTime datetime
Print ‘method1:’
SET STATISTICS IO ON
SET @StartingTime = getdate()
Print ‘method2:’
SET STATISTICS IO ON
SET @StartingTime = getdate()
Output:
Method1:
2210
Table 'OrderHeader'. Scan count 9998, logical reads 20086, physical reads 0, read-
ahead reads 0.
ExecTimeInMS
------------
200
Method2:
2210
ExecTimeInMS
------------
0
As per the output, there are 20086 logical reads and it has taken 200 ms for the first method. But in the second
method there are only 19 logical reads and the execution time is considerably less. That’s why I selected to
134
continue in my way. But there may be a side that I have not seen but you can see. So, try this, and see
whether/how this T-SQL solution will suit you.
I highly appreciate your comments and suggestions. You can reach me through dinesh@dineshpriyankara.com.
Sequential Numbering
Gregory Larsen
12/5/2003
Microsoft SQL server does not support a method of identifying the row numbers for records stored on disk,
although there are a number of different techniques to associate a sequential number with a row. You might want
to display a set of records might where each record is listed with a generated number that identifies the records
position relative to the rest of the records in the set.
The numbers might be sequential that start at 1 and are incremented by 1 for each following record, like 1,2,3,4,
etc.. Or in another case you may want to sequentially number groupings of records where each specific set of
records are numbered starting at 1 and incremented by 1 until the next set is reached where the sequence starts
over. This article will show a number of different methods of assigning a record sequence number to records
returned from a query.
135
create table #HireDate (rank int identity,
HireDate datetime,
LastName nvarchar(20),
FirstName nvarchar(20)
)
set nocount on
alter table pubs.dbo.titles
add rownum int identity(1,1)
go
select rownum, title from pubs.dbo.titles
where rownum < 6
order by rownum
go
alter table pubs.dbo.titles
drop column rownum
Note this example first alters the table, then displays the first 5 rows, and lastly drops the identity column. This way
the row numbers are produced, displayed and finally removed, so in effect the table is left as it was prior to
running the script. The output from the above script would look like this.
rownum title
----------- ----------------------------------------------------------------
1 But Is It User Friendly?
2 Computer Phobic AND Non-Phobic Individuals: Behavior Variations
3 Cooking with Computers: Surreptitious Balance Sheets
4 Emotional Security: A New Algorithm
136
5 Fifty Years in Buckingham Palace Kitchens
This method works well for a small number of records, a few hundred or less. Since the number of records counts
produced by a self join can grow quite big when large sets are involved, causing the performance of this technique
to have a slow response times for large set. This method also does not work if there are duplicate values in the
columns used in the self join. If there are duplicates then the RecNum column will contain missing values.
declare @i int
declare @name varchar(200)
declare authors_cursor cursor
for select rtrim(au_lname) + ', ' + rtrim(au_fname) from pubs.dbo.authors
where au_lname < 'G'
order by au_lname, au_fname
open authors_cursor
fetch next from authors_cursor into @name
set @i = 0
print 'recnum name'
print '------ -------------------------------'
while @@fetch_status = 0
begin
set @i = @i + 1
print cast(@i as char(7)) + rtrim(@name)
fetch next from authors_cursor into @name
end
close authors_cursor
137
deallocate authors_cursor
Output from the cursor query looks like this
RecNum Name
1 Bennet, Abraham
2 Blotchet-Halls, Reginald
3 Carson, Cheryl
4 DeFrance, Michel
5 del Castillo, Innes
6 Dull, Ann
Sequentially Numbering Groups of Records
Another case I have run across for sequentially numbering records, is where you want to number groups of
records. Where each group starts numbering from 1 to N, where N is the number of records in the group, and
then starts over again from 1, when the next group is encountered.
For an example of what I am talking about, let's say you have a set of order detail records for different orders,
where you want to associate a line number with each order detail record. The line number will range from 1 to N,
where N is the number of order detail records per order. The following code produces line numbers for orders in
the Northwind Order Detail table.
select OD.OrderID, LineNumber, OD.ProductID, UnitPrice, Quantity, Discount
from Northwind.dbo.[Order Details] OD
join
(select count(*) LineNumber, a.OrderID, a.ProductID from Northwind.dbo.[Order
Details] A
Join Northwind.dbo.[Order Details] B
on A.ProductID >= B.ProductID
and A.OrderID = B.OrderID
group by A.OrderID, A.ProductID
) N
on OD.OrderID= N.OrderID and OD.ProductID = N.ProductID
where OD.OrderID < 10251
order by OD.OrderID, OD.ProductID
This code is similar to the prior self join example, except this code calculates the LineNumber as part of a
subquery. This way the LineNumber calculated in the subquery can be joined with the complete Order Detail
record.
Conclusion
These examples represent a number of different approaches at sequentially numbering sets for records. None of
these methods are perfect. But hopefully these methods will give you some ideas on how you might be able to
tackle your sequential record numbering issues
138
If you follow the various newsgroups on Microsoft SQL Server and other user groups, you often see people
asking, ‘Is there any way to use GETDATE() inside a user defined function?’. The answer to this simple question
is NO. But there is a way to do this. In this article I will explain how you can use built-in functions inside a UDF.
As we know, SQL Server does not allow you to use a Built-in function that can return different data on each call
inside user-defined functions. The built-in functions that are not allowed in user-defined functions are:
GETDATE GETUTCDATE
NEWID RAND
TEXTPTR @@CONNECTIONS
@@CPU_BUSY @@IDLE
@@IO_BUSY @@MAX_CONNECTIONS
@@PACK_RECEIVED @@PACK_SENT
@@PACKET_ERRORS @@TIMETICKS
@@TOTAL_ERRORS @@TOTAL_READ
@@TOTAL_WRITE
If you really want to use them inside a UDF, here is the way - create a view called v_Built_in_funs and call the
view inside your UDF. Here is the example:
CREATE VIEW v_Built_in_funs AS select getdate() systemdate, @@spid spid
Call the UDF to get new objects created for the day:
When a variable is created in SQL with the declare statement it is created with no data and stored in the variable
table (vtable) inside SQLs memory space. The vtable contains the name and memory address of the variable.
However, when the variable is created no memory address is allocated to the variable and thus the variable is not
defined in terms of memory.
139
When you SET the variable it is allotted a memory address and the initial data is stored in that address. When you
SET the value again the data in the memory address pointed to by the variable is then changed to the new value.
Now for the difference and why each behaves the way it does.
“= NULL”
“= NULL” is an expression of value. Meaning, if the variable has been set and memory created for the storage of
data it has a value. A variable can in fact be set to NULL which means the data value of the objects is unknown. If
the value has been set like so:
DECLARE @val CHAR(4)
SET @val = NULL
You have explicitly set the value of the data to unknown and so when you do:
If @val = NULL
It will evaluate as a true expression. But if I do:
DECLARE @val CHAR(4)
If @val = NULL
It will evaluate to false. The reason for this is the fact that I am checking for NULL as the value of @val. Since I
have not SET the value of @val no memory address has been assigned and therefore no value exists for @val.
Note: See section on SET ANSI_NULLS (ON|OFF) due to differences in SQL 7 and 2000 defaults that cause
examples to not work. This is based on SQL 7.
“IS NULL”
Now “IS NULL” is a little trickier and is the preferred method for evaluating the condition of a variable being NULL.
When you use the “IS NULL” clause, it checks both the address of the variable and the data within the variable as
being unknown. So if I for example do:
If @val IS NULL
PRINT ‘TRUE’
ELSE
PRINT ‘FALSE’
If @val IS NULL
PRINT ‘TRUE’
ELSE
PRINT ‘FALSE’
Both outputs will be TRUE. The reason is in the first @val IS NULL I have only declared the variable and no
address space for data has been set which “IS NULL” check for. And in the second the value has been explicitly
set to NULL which “IS NULL” checks also.
SET ANSI_NULLS ON
If @val =NULL
PRINT ‘TRUE’
ELSE
PRINT ‘FALSE’
140
SET ANSI_NULLS OFF
If @val =NULL
PRINT ‘TRUE’
ELSE
PRINT ‘FALSE’
You will note the first time you run the = NULL statement after doing SET ANSI_NULLS ON you get a FALSE and
after setting OFF you get a TRUE. The reason is as follows.
So as defined by SQL92, “= NULL” should always evaluate false. So even setting the value explicitly means you
will never meet the = NULL if condition and your code may not work as intended. The biggest reason where =
NULL will shoot you in the foot is this, SQL 7 when shipped and installed is defaulted to ANSI_NULL OFF but SQL
2000 is defaulted to ANSI_NULL ON. Of course you can alter this several ways but if you upgraded a database
from 7 to 2000 and found the = NULL worked only when you set if explicitly when you roll out a default 2000 server
your code now breaks and can cause data issues.
Yet another reason to use IS NULL instead as under SQL 92 guidelines it is still going to evaluate to TRUE and
thus your code is safer for upgrading the server.
Summary
If summary unless you need to check that the value of a variable was set to equal NULL and you have set
ANSI_NULLS ON, then always use the “IS NULL” clause to validate if a variable is NULL. By using = NULL
instead you can cause yourself a lot of headaches in trying to troubleshoot issues that may arise from it, now or
unexpectedly in the future.
Basis
Some of the information provided comes from how C++ works and how SQL behaves under each circumstance.
Unfortunately, SQL as far as I know does not have an addressof function to allow me to output the actual memory
address to show what occurs under the hood. In C++ when a variable is created the variable has an address of
0xddddddd (in debug but it can be different non-real addresses as well). When you set the variable the first time
checking the address will give you a valid memory address where the data is being stored. Also, more information
can be obtained from SQL Books Online in the sections on IS NULL and SET ANSI_NULLS….
When most developers think of joins, they think of “a.SomethingID = b.SomethingID”. This type of join, the
equijoin, is vitally important to SQL programming; however, it only scratches the surface of the power of the SQL
join. This is the first in a series of articles that will look at several different types of “exotic” joins in SQL. This
article will focus on using the BETWEEN operator in joins when dealing with range-based data.
141
Introducing the BETWEEN Join
When dealing with things like calendars, grading scales, and other range-based data, the BETWEEN operator
comes in very handy in the WHERE clause. It is often forgotten that the BETWEEN operator can also be used in
join criteria.
In the WHERE clause, the BETWEEN operator is usually used to test whether some field is between two
constants. However, the BETWEEN operator can take any valid SQL expression for any or all of its three
arguments. This includes columns of tables.
One use of a BETWEEN join is to determine in which range a particular value falls. Joins of this nature tend to
have the following pattern:
In this pattern, the “fact data” is contained in a table with instances of data such as payments, test scores, login
attempts, or clock in/out events. The other table, the “range lookup table”, is usually a smaller table which
provides a range minimum and maximum and other data for the various ranges.
For example, consider a scenario in which a student is enrolled in a class. A student receives a numeric grade for
a class on a scale of 0 to 100. This numeric grade corresponds to a letter grade of A, B, C, D, or E. However, the
school does not use the traditional grading scale in which 90 to 100 corresponds to an A, 80-89 corresponds to a
B, and so forth. Instead, the school uses the following grading scale:
To accommodate the school’s custom grading scale, their records database has the following table defined:
CREATE TABLE tb_GradeScale(
LetterGrade char(1) NOT NULL,
MinNumeric int NOT NULL,
MaxNumeric int NOT NULL,
IsFailing smallint NOT NULL,
CONSTRAINT PK_GradeScale PRIMARY KEY(LetterGrade),
CONSTRAINT CK_MinMax CHECK(MinNumeric <= MaxNumeric)
)
The students’ numeric scores are stored in the following table:
CREATE TABLE tb_StudentGrade(
StudentID int NOT NULL,
ClassID varchar(5) NOT NULL,
NumericGrade int NOT NULL,
CONSTRAINT PK_StudentGrade PRIMARY KEY(StudentID, ClassID),
CONSTRAINT CK_StudentGrade_NumericGrade
CHECK(NumericGrade BETWEEN 0 AND 100)
)
In this scenario, the tb_StudentGrade table is the “fact table” and the tb_GradeScale table is the “range lookup
table”. The NumericGrade field serves as “fact data” while the MinNumeric and MaxNumeric fields serve as the
“range minimum” and “range maximum”. Thus, following the fact-min-max pattern, we can construct the following
join criteria:
NumericGrade BETWEEN MinNumeric AND MaxNumeric
If we put these join criteria into the context of a query which generates a report containing all the students’ letter
grades for English 101, we end up with the following:
SELECT
s.StudentID,
g.LetterGrade
FROM
142
tb_StudentGrade s
INNER JOIN
tb_GradeScale g
ON(
s.NumericGrade BETWEEN g.MinNumeric AND g.MaxNumeric
)
WHERE
ClassID = 'EH101'
In this query, we join the student grade table with the grading scale table in order to translate a numeric grade to a
letter grade. In order to accomplish this, we use the BETWEEN operator to specify the relationship between the
two tables being joined.
Suppose you are trying to write a report as part of a customer payment processing system. This report
summarizes the total number and amount of payments by accounting period. The records of the customer
payments are stored in the following table:
CREATE TABLE tb_Payment(
PaymentID int NOT NULL IDENTITY(1, 1),
AccountID int NOT NULL,
PostedDatetime datetime NOT NULL DEFAULT(GETDATE()),
PaymentAmt money NOT NULL,
CONSTRAINT PK_Payment PRIMARY KEY(PaymentID)
)
In order to construct the query needed for the report, you must first determine the fiscal year and accounting
period in which each payment occurred. You must then group by the fiscal year and accounting period, summing
the PaymentAmt field and counting the number of records in each group.
To determine each payment’s accounting period, you can use a BETWEEN join to the tb_FiscalCalendar table:
FROM
tb_Payment p
INNER JOIN
tb_FiscalCalendar c
ON(
p.PostedDatetime BETWEEN c.StartDatetime AND c.EndDatetime
)
As do many other joins using the BETWEEN operator, this join follows the fact-min-max pattern seen in the
grading scale example. Each payment record (of which there are many) provides a “fact” stating that a certain
payment occurred at a particular date and time. The fiscal calendar table acts more as a configuration table that
specifies a range of datetime values and provides configuration data about this range.
143
To finish off the payment reporting query, we add the grouping, aggregate functions, and an ORDER BY clause to
make the output more readable:
SELECT
c.FiscalYear,
c.AcctPeriod,
COUNT(*) AS PaymentCount,
SUM(PaymentAmt) AS TotalPaymentAmt
FROM
tb_Payment p
INNER JOIN
tb_FiscalCalendar c
ON(
p.PostedDatetime BETWEEN c.StartDatetime AND c.EndDatetime
)
GROUP BY
c.FiscalYear,
c.AcctPeriod
ORDER BY
c.FiscalYear,
c.AcctPeriod
The output yields the needed report easily and efficiently. With proper indexing, this query should run quite well
even against large sets of data.
In the previous article, you saw how the BETWEEN operator could be used in joins to solve problems dealing with
range-based data. In this article, I will show you how to take joins even further by using multiple criteria in joins as
well as using the greater than, less than, and not equals operators in joins.
Compound Joins
Compound joins are joins which use multiple criteria combined with a logical operator such as AND. This is a
relatively simple concept and is commonly used in database systems that employ compound primary keys.
For a simple example of a database schema in which compound joins are necessary, consider a school
management system where one of the features is tracking which classes are taught in which classrooms. The
system must match up the features of the classrooms to the needs of the classes. In order to perform these
functions, the following two tables are defined:
CREATE TABLE tb_Classroom(
BuildingName char(10) NOT NULL,
RoomNumber int NOT NULL,
RoomCapacity int NOT NULL,
HasLabEquip smallint NOT NULL,
CONSTRAINT PK_Classroom PRIMARY KEY(BuildingName, RoomNumber)
)
144
BuildingName char(10) NOT NULL,
RoomNumber int NOT NULL,
InstructorID int NOT NULL,
ScheduleID int NOT NULL,
SectionCapacity int NOT NULL,
RequiresLabEquip smallint NOT NULL,
CONSTRAINT PK_ClassSection PRIMARY KEY(CourseID, SectionNumber),
CONSTRAINT FK_ClassSection_Classroom
FOREIGN KEY(BuildingName, RoomNumber)
REFERENCES tb_Classroom(BuildingName, RoomNumber)
)
In this example, the tb_Classroom table defines a list of classrooms in which classes are taught. The
tb_ClassSection table contains instances of various courses taught at the school. A class section is taught in a
particular classroom by an instructor according to a standard class schedule. Both the tb_Classroom and
tb_ClassSection tables use natural compound primary keys.
One of the reports in the school management system lists the class sections being taught along with the capacity
of their respective classrooms. In order to construct this report, the tb_ClassSection table must be joined the
tb_Classroom table based upon the compound primary key of the tb_Classroom table. This can be accomplished
by using a compound join to return rows where both the BuildingName AND RoomNumber columns match.
SELECT
s.CourseID,
s.SectionNumber,
c.RoomCapacity
FROM
tb_ClassSection s
INNER JOIN
tb_Classroom c
ON(
s.BuildingName = c.BuildingName
AND
s.RoomNumber = s.RoomNumber
)
This query is relatively straightforward. If you’ve been using SQL for a while, chances are you’ve seen queries like
it. The query is a simple equijoin that uses the AND logical operator to include multiple criteria. Despite its
simplicity, this example provides the basis for a much more powerful query construction tool.
With this query, the trick is to first join each class section to the classroom in which is being taught and then add
the additional criterion that the classroom’s capacity is less than that of the class section. To do this, simply take
the query from the last example and add the additional criterion.
SELECT
s.CourseID,
s.SectionNumber,
c.RoomCapacity,
s.SectionCapacity
FROM
tb_ClassSection s
INNER JOIN
tb_Classroom c
ON(
s.BuildingName = c.BuildingName
AND
s.RoomNumber = s.RoomNumber
145
AND
c.RoomCapacity < s.SectionCapacity
)
A common mistake when constructing queries such as this is not including the equijoin criteria necessary to
match up the rows to be compared by the inequality operator. If only the inequality comparison is included in the
criteria, the query returns all the rows where a classroom’s capacity is less than that of any class section,
regardless of whether or not the class section was taught in that classroom.
This problem follows a similar pattern to that of the capacity problem. The only difference is the use of the Not
Equals operator in place of the Less Than operator. After matching the class section with the classroom in which
it is being taught, the value of the RequiresLabEquip column must be compared with the HasLabEquip column. If
these values are not equal, there is a laboratory equipment allocation problem and the class section should be
included on the report. Applying these criteria result in the following query:
SELECT
s.CourseID,
s.SectionNumber,
c.HasLabEquip,
s.RequiresLabEquip
FROM
tb_ClassSection s
INNER JOIN
tb_Classroom c
ON(
s.BuildingName = c.BuildingName
AND
s.RoomNumber = s.RoomNumber
AND
c.HasLabEquip <> s.RequiresLabEquip
)
When using the Not Equals operator in joins, it is even more vital to remember to use additional join criteria than it
is when using the Greater Than and Less Than operators. In this case, if only the Not Equals criterion was
specified, the query would perform a cross join and then exclude only the class section-classroom pairs where the
laboratory indicator was not equal. If there were 100 classrooms and 500 class sections, this could possibly
return a result set of 25,000 - 50,000 rows – definitely not what was intended.
146
Replication
Replication is one of those topics that isn’t as widely used, but can be mission critical to the success of
an application or enterprise. Moving data seamlessly between systems, distributing it widely, making it
available without the hassles of custom programming can be critical for a DBA.
Unfortunately point and click does not always work and a deeper understanding is needed. Here are a
few articles from 2003 from those dealing with real world replication.
147
Altering Replicated Tables (SQL 2000)
Andy Warren
8/8/2003
A few weeks ago I published an article about modifying replicated tables with SQL 7. If you haven't read that
article, I encourage you to do so before continuing.
With SQL 2000 you can now add a column to a table (and a publication) with very little effort. The only thing to
remember is that if you want the new column to be added to the subscribers, you MUST make the change via the
'Filter Columns' tab of the publication properties. SQL still provides no help if you want to modify an existing
column. You can drop a column as long as it is not part of the primary key or part of a filter (thanks to Jeff Cook
for pointing this out to me). If you don't want the new column to be part of any existing publication you can add the
column via Enterprise Manager or Query Analyzer.
For the following demo, I created two databases, ReplSource and ReplDestination, both on the same machine
running an instance of SQL2K Developer Edition. I then imported the Authors table from Pubs into ReplSource
and created a standard transactional publication, using the default options. Here is the original schema:
To use the Filter Columns tab you can either use 'Create & Manage Publications' found on the Tools|Replication
menu, or you can right click the publication itself either under Databases or under Replication Monitor.
148
Click on Filter Columns. You'll see the Add Column to Table button. Clicking that brings up the following dialog.
My one complaint here is that instead of the nice editing tools you normally get when making changes through
Enterprise Manager, you have to type everything in. If you're not sure of the syntax, make a quick copy of the table
schema and use Enterprise Manager to make the change, then script the changes out so you can copy the DDL
for the column you're adding. If you make a mistake here, you'll have to apply the same process you would with
SQL 7!
149
Once you add a column, it's automatically selected as part of the article. When you close the publication
properties the change will be sent to each subscriber the next time the log reader & distribution agent run.
150
That's all there is to it. A big step up from SQL 7 and if you do use these changes often, probably worth the
upgrade right there! You've probably noticed that there is also a 'Drop Selected Column' button. Let's look at what
happens when you click it:
That's right, even though you're working on a publication, if you use this button it will actually drop the column from
both the publisher and all the subscribers. Useful, but use with care!
Another thing you can do from Filter Columns is to remove a column from the article. You just can't do this easily
in SQL 7, but with SQL 2000 you just clear the checkbox – well, almost. It does most of the work for you, but
unfortunately requires you to force a snapshot to occur. Until the snapshot is done, no transactions will be
distributed to subscribers of that publication.
151
That's all there is to it. SQL 2000 greatly reduces the time needed to perform one of the more common tasks of
adding a column to a published article. Maybe in a future release we'll see enhancements that will support
modifying existing columns without having to do a snapshot.
152
XML
It’s been a few years now that XML has been a hot buzzword in the computer industry. SQL Server
2000 added XML capabilities to SQL Server, but few of us ever use them judging by the number of
articles and questions on the subject.
We had relatively few articles in 2003 that dealt with XML, and here are a couple that we picked for
publication.
153
Is XML the Answer?
Don Peterson
10/7/2003
Despite the breathless marketing claims being made by all the major vendors and the natural desire to keep skills
up-to-date, it would be prudent to examine exactly what advantages are offered by XML and compare them to the
costs before jumping headlong into the XML pool. The idea of XML is simple enough; basically just add tags to a
data file. These tags are sometimes referred to as metadata. XML is inherently and strongly hierarchical. The
main benefits are touted as being:
• Self describing data
• Facilitation of cross-platform data sharing or “loose coupling” of applications
• Ease of modeling “unstructured” data
Self-describing
At first the idea of self-describing data sounds great, but let’s look at it in detail. A classic example of the self-
describing nature of XML is given as follows:
<Product>Shirt
<Color>Red</Color>
<Size>L</Size>
<Style>Hawaiian</Style>
<InStock>Y</InStock>
</Product>
One possible equivalent text document could be as follows:
Red,L,Hawaiian,Y
Anyone can look at the XML document and infer the meaning of each item, not so for the equivalent simple text
document. But is this truly an advantage? After all, it’s not people we want to read the data files, it’s a machine.
Which is more efficient for a machine to read or generate? Which makes better use of limited network
bandwidth? The XML file is more than six times the size of the plain text file. In my experience XML files will tend
to be around 3–4 times the size of an equivalent delimited file. Due to the bloated nature of XML, hardware
vendors are actually offering accelerators to compensate. Worse yet, there are more and more non-standard
XML parsers being written to “optimize” XML, thus completely destroying any illusion of “compatibility.” (See
http://techupdate.zdnet.com/techupdate/stories/main/0,14179,2896005,00.html)
Communication facilitation
The self-documenting nature of XML is often cited as facilitating cross application communication because as
humans we can look at an XML file and make reasonable guesses as to the data’s meaning based on hints
provided by the tags. Also, the format of the file can change without affecting that communication because it is all
based on tags rather than position. However, if the tags change, or don’t match exactly in the first place, the
communication will be broken. Remember that, at least for now, computers are very bad at guessing.
In order to effect communication between systems with a text file, both the sender and receiver must agree in
advance on what data elements will be sent (by extension, this mandates that the meaning of each attribute is
defined), and the position of each attribute in the file.
When using XML each element must be defined and the corresponding tags must be agreed upon. Note that
tags in and of themselves are NOT sufficient to truly describe the data and its meaning which, of necessity,
includes the business rules that govern the data’s use unless a universal standard is created to define the
appropriate tag for every possible thing that might be described in an XML document and that standard is
rigorously adhered to. (See http://www.well.com/~doctorow/metacrap.htm)
That XML is self-describing has led many to wrongly assume that their particular tags would correctly convey the
exact meaning of the data. At best, tags alone convey an approximate meaning, and approximate is not good
enough. In fact, it has been noted that XML tags are metadata only if you don’t understand what metadata really
is. (http://www.tdan.com/i024hy01.htm).
154
No matter the method of data transmission, the work of correctly identifying data and its meaning is the same.
The only thing XML “brings to the table” in that regard is a large amount of overhead on your systems.
Unstructured data
The very idea of unstructured or semi-structured data is an oxymoron. Without a framework in which the data is
created, modified and used data is just so much gibberish. At the risk of being redundant, data is only meaningful
within the context of the business rules in which it is created and modified. This point cannot possibly be
overemphasized. A very simple example to illustrate the point follows: the data ‘983779009-9937’ is
undecipherable without a rule that tells me that it is actually a valid part number. Another example often thrown
about by XML proponents is that of a book. A book consists of sections, chapters, paragraphs, words, and letters
all placed in a particular order, so don’t tell me that a book is unstructured.
Again, what benefit does XML confer? None. The data still must be modeled if the meaning is to be preserved,
but XML is inherently hierarchical and imposes that nature on the data. In fact it has been noted that XML is
merely a return to the hierarchical databases of the past, or worse yet, a return to application managed
hierarchical data files. The problem is that not all data is actually hierarchical in nature. The relational model of
data is not inherently hierarchical but it is certainly capable of preserving hierarchies that actually do exist.
Hierarchies are not neutral so a hierarchy that works well for one application, or one way of viewing the data, could
be totally wrong for another, thus further eroding data independence.
(http://www.geocities.com/tablizer/sets1.htm).
Herein lies the real problem. No matter how bloated and inefficient XML may be for data transport, it is downright
scary when it is used for data management. Hierarchical databases went the way of the dinosaur decades ago,
and for good reason; they are inflexible and notoriously difficult to manage.
I can understand why many object-oriented programmers tend to like XML. Both OO and XML are hierarchical
and if you are used to thinking in terms of trees and inheritance, sets can seem downright alien. This is one of the
fundamental problems with the OO paradigm and it’s about time that data management professionals educate
themselves about the fundamentals. Set theory and predicate logic (the foundations of the relational model of
data) have been proven superior to hierarchical DBMSs, which are based on graph theory. Why is it that data
integrity always seems to take a back seat whenever some programmer cries about the perceived “impedance
mismatch” between OO and relational data? Why is it that the “fault” is automatically assumed to lie with the
database rather than a flawed programming paradigm?
What I am seeing is a push from many development teams to store raw XML in the database as a large varchar,
or text column. This turns the database into nothing more than a simple staging ground for XML. This, of course
violates one of the first principles of database design: atomicity, or one column, one value. How can a DBMS
enforce any kind of integrity on a single column containing raw XML? How do I know that the various XML strings
stored in a given table are even related? Indexing and optimization using such a scheme is impossible.
Vendors
Why are the major hardware and software vendors so excited about XML if it is so bad? There are several
possibilities:
• Ignorance. Often times marketing departments drive the products, and marketing departments like
nothing more than for their products to be full buzzword compliant.
• Stupidity. The technical “experts” are often ignorant as well, only they have no excuse, so I call it stupidity.
I spent several hours at the last SQL PASS Summit trying to find someone on the SQL Server product
team who could provide a single good reason to use XML. By the end of the conversation there were at
least five “experts” around the table, all unable to make their arguments hold up to the scrutiny of reason.
Some of the answers they gave were shockingly stupid. One of these “experts” stated tha the biggest
benefit of XML is to allow programmers to easily adapt a database to changing needs by “loading”
columns with multiple attributes of which the database is unaware! I’m sure they were glad to see me go
so they could get back to their fantasy world of XML nirvana. I left that conversation with a growing sense
of disquiet about the future direction of SQL Server. Instead of taking steps to more fully implement the
relational model, they and other vendors are chasing their tails trying to implement a failed idea from
decades past.
• Greed. I recently read an article extolling the virtues of XML. In it the author claimed that companies are
finding “XML enriches their information capabilities, it also results in the need for major systems
155
upgrades.” Interestingly, the author does not define or quantify just how XML “enriches” anyone but the
software and hardware vendors.
However you choose to look at it, the major vendors do not have your best interests at heart and when XML is
finally recognized for the bad idea that it is, and they will gladly help you clean up the mess…for a price.
Conclusion
Do not be fooled by the fuzzy language and glitzy marketing-speak. As data management professionals you have
a responsibility to safeguard your company’s data and you can’t possibly do that effectively if you don’t know, or
ignore, the fundamentals. Pick up An Introduction to Database Management Systems by Chris Date and
Practical Issues in Database Management by Fabian Pascal and get yourself solidly grounded in sound data
management principles. The alternative? Spend your time riding the merry-go-round chasing after the latest
industry fad, which happens to be last year’s fad and so on…throwing money at vendors and consultants with
each cycle.
156
Design and Strategies
Every DBA needs to have guiding principles and rules. These may differ among individuals and
organizations, but they will all be grounded in basic database principles.
157
Codd's Rules
Frank Kalis
12/10/2003
These rules were formulated by E.F.Codd and published in 1985 1). They describe what a relational database
system must support in order to call itself relational.
So, without further introduction, let's dive into the gospel of relational databases!
1. Information Rule
Data is presented only in one way. As values in columns in rows. Simple, consistent and versatile. A table (aka an
entity or relation) is a logical grouping of related data in columns and rows. Each row (aka record or tuple)
represents a single fact and each row contains information about just one fact. Each column (aka field or attribute)
describes a single property of an object. Each value (datum) is defined by the intersection of column and row.
Metadata is data which describe the structure of the database, its objects and how they are related. This
catalogue is an integral part of the database and can be queried by authorized users just like any other table.
Another name for this online catalogue is system catalogue or data dictionary.
Although SQL is not the only data query language, it is by far the most common one. SQL is a linear, non-
procedural or declarative language. It allows the user to state what he wants from the database, without explicitly
stating where to find the data or how to retrieve the data.
158
6. View Updating Rule
When presenting data to the user, a relational database should not be limited to tables. Views are 'virtual tables' or
abstractions of the source tables. They react like tables with the one exception that they are dynamically created
when the query is executed. Defining a view does not duplicate data. They are current at runtime.
All theoretically updateable views should be updateable by the system. If data is changed in a view, it should also
be changed in the underlying table. Updateable views are not always possible. For example there is a problem
when a view addresses only that part of a table that includes no candidate key. This could mean that updates
could cause entity integrity violations. Some sources on the internet state that 'Codd himself did not fully
understand this'. I haven't found any rationale for this.
0. Foundation Rule
Interestingly Codd defined a Rule 0 for relational database systems.
159
"For any system that is advertised as, or claimed to be, a relational database management system, that system
must be able to manage databases entirely through its relational capabilities, no matter what additional capabilities
the system may support." (Codd, 1990)
That means, no matter what additional features a relational database might support, in order to be truly called
relational it must comply with the 12 rules. Codd added this rule in 1990. Also he expanded these 12 rules to 18 to
include rules on catalogs, data types (domains), authorization and other. 2)
Codd himself had to admit the fact that, based on the above rules, there is no fully relational database system
available. This has not changed since 1990. To be more specific, rules 6, 9, 10, 11 and 12 seem to be difficult to
satisfy.
REFERENCES:
1)
Codd, E.F. "Is Your DBMS Really Relational?" and "Does Your DBMS Run By the Rules?"
ComputerWorld, October 14 1985 and October 21 1985.
2)
Codd, E.F. The Relational Model for Database Management, Version 2; Addison-Wesley; 1990.
Database developers involved in the task of designing a database have to translate real world data into relational
data, i.e. data organized in the form of tables. First they have to understand the data, then represent it in a design
view and then translate into a RDBMS. One of the techniques that is great to use is the E-R diagram. Most of the
developers who are involved in data base systems might already be familiar with it, or at least have heard about
it. I am going to try to briefly explain the concept and give an example to understand it.
160
In the E-R model all the above listed terms are represented in a diagrammatic technique known as the E-R
diagram.
Each entity is shown as a rectangle. For weak entities the rectangle has a double border. In the above diagram,
regular entities are University, College, Dean, Professor, Department, Student and Course. Section is a weak
entity.
Properties or attributes of an entity are shown in ellipses and are attached to their respective entity by a single
solid line. In this diagram I am showing properties for only the student entity, for the sake of the clarity of the
diagram. The relationships between entities are shown as diamonds and the entities which are a part of the
relationship are connected to the diamond by a solid line labeled either '1' or 'M' indicating whether the relationship
is one-to-many, one-to-one or many-to-many.
161
One-to-One or Zero relationship – Usually primary key on one side is a foreign key on the other side.
Let me now derive tables from the above diagram from this set of rules.
All the regular entities represented by a rectangle can be translated into base tables.
Table – University
UID (primary key) int
Name varchar (20)
Chancellor varchar (20)
There is a 1–M relationship between University and College and 1–1 relationship between Dean and College. So
the primary key in the table University will be a foreign key in the table College and a primary key in the table Dean
will be a foreign key in the table College. The rest of the tables also follow the same pattern.
Table – College
CID (primary key) int
University (foreign key references UID in University table) int
Dean (foreign key references DeanID from Dean table) int
varchar
Name
(20)
Table – Dean
DeanID (primary key) int
Name varchar (20)
Age int
Table – Department
DID (primary key) int
College ( foreign key references CID in College table) int
Chair (foreign key references PID in professor table) int
Name varchar (20)
Table – Professor
PID (primary key) int
Department ( foreign key references DID in Department
int
table)
varchar
Name
(20)
Table – Course
CourseID (primary key) Int
Department ( foreign key references DID in Department
Int
table)
162
varchar
Name
(20)
Table – Section
SectionID (primary key) Int
Course ( foreign key references CourseID in Course table) Int
Professor (foreign key references PID in professor table) Int
varchar
Name
(20)
Table – Student
StudentID (primary key) int
Department ( foreign key references DID in Department
int
table)
varchar
Name
(20)
smalldateti
DateofEnrollment
me
varchar
TelephoneNumber
(20)
There is only one many-to-many relationship in the above diagram and that is between Section and Student. That
means a student can register for many sections and a section has many students. To establish this relationship
we will create a new table called Student_Registration.
Table – Student_Registration
i
Student (foreign key references StudentID in Student table) n
t
i
Section ( foreign key references SectionID in Section table) n
t
Cool! Now we finished designing a database with the help of an E-R diagram. So, folks, tell me now if this
technique is useful and simple to use and start using it for your projects.
Conclusion
This example is simple and you can design this database from common sense, without actually using the E-R
diagram. However when you are given a task of designing a database, first putting it in the form of a diagram
makes your job easy. When the task is of designing a big data mart or data warehouse this technique is
indispensable. I welcome any comments or suggestions.
References
163
Miscellaneous
Everything that’s left. Not the best or worst, not really in a theme, just everything that didn’t seem to fit
in any of the other categories.
164
A Brief History of SQL
Frank Kalis
9/10/2003
The original concept behind relational databases was first published by Edgar Frank Codd (an IBM researcher,
commonly referred to as E.F.Codd) in a paper, “Derivability, Redundancy, and Consistency of Relations Stored
inLarge Data Banks” (RJ599), dated 08/19/1969. However, what is commonly viewed as the first milestone in the
development of relational databases is a publication by Codd entitled "A Relational Model of Data for Large
Shared Data Banks“ in Communications of the ACM (Vol. 13, No. 6, June 1970, pp. 377–87). This was only a
revised version of the 1969 paper. This article awoke massive public opinion in both the academic community and
industry in the feasibility and usability of relational databases for commercial products.
Several other articles by Codd throughout the seventies and eighties are still viewed almost as gospel for
relational database implementation. One of these articles is the famous so-called 12 rules for relational
databases, which was published in two parts in Computerworld. Part 1 was named „Is Your DBMS Really
Relational?"(published 10/14/1985); Part 2 was called „Does Your DBMS Run By the Rules? " (10/21/1985). Codd
continuously added new rules to these 12 originals and published them in his book "The Relational Model for
Database Management, Version 2"" (Addison-Wesley, 1990).
But to continue with the evolution of SQL and relational databases we must take a step back in time to the year
1974. In 1974 Donald Chamberlin and others developed, for IBM, System R as a first prototype of a relational
database. The Query Language was named SEQUEL (Structured English Query Language). System R also
became part of IBM’s prototype SEQUEL-XRM during 1974 and 1975. It was completely rewritten in 1976–1977.
In addition there were new features like multi-table and multi-user capabilities implemented. The result of this
revision was quickly named SEQUEL/2, but had to be renamed due to legal reasons to SQL because Hawker
Siddeley Aircraft Company claimed the trademark SEQUEL for themselves.
In 1978 systematic tests were performed to prove real world usability on customers’ systems. It became a big
success for IBM, because this new system proved both useful and practical. With this result, IBM began to
develop commercial products based on System R. SQL/DS came out in 1981; DB2 hit the streets in 1983. But
although IBM had done most of the research work, in fact, it was a small, unknown software company named
Relational Software to first release a RDBMS in 1979, two years before IBM. This unknown software company
was later renamed Oracle. As an interesting sidenote, Relational Software released its product as Version 2.
Obviously a brilliant marketing move: no one had to worry about a buggy and/or unstable Version 1 of this new
kind of software product.
Being the de facto standard, SQL became also an official standard through the American National Standards
Institute (ANSI) certification in 1986 (X3.135-1986). But this standard could only be viewed as a cleaned-up
version of DB2’s SQL dialect. Just one year later, in 1987, followed the standardization by the International
Standards Organization (ISO 9075-1987). Only two years later, ANSI released a revised standard (X3.135-1989).
So did ISO with ISO/IEC 9075:1989.
Partially due to the commercial interests of the software firms, many parts of the standard were left vague and
unclear. This standard was viewed as the least common denominator and missed its intention. It was some 150
pages long. To strengthen and establish the standard, ANSI revised SQL89 thoroughly, and released in 1992 the
SQL2 standard (X3.135-1992). This time they did it right!
Several weaknesses of SQL89 were eliminated. Further conceptual features were standardized although at that
time they were far beyond the possibilities of all relational databases. The new standard was some 500 pages
long. But even today there is no single product available that fully complies with SQL92. Due to this disparity there
were three levels of conformity introduced:
165
• Full conformance.
Recently SQL99, also known as SQL3, was published. This standard addresses some of the modern, previously
ignored features of certain modern SQL systems. There are object-relational database models, call-level
interfaces and integrity management. SQL99 replaces SQL92 level of conformance with its own: Core SQL99 and
Enhanced SQL99. SQL99 is split into 5 parts:
So what’s next?
Change Management
Chris Kempster
3/18/2003
One of the many core tasks of the DBA is that of change control management. This article discusses the
processes I use from day to day and follows the cycle of change from development, test then into production. The
core topics will include:
a) formalising the process
b) script management
c) developer security privileges in extreme programming environments
d) going live with change
e) managing ad-hoc (hot fix) changes
Environment Overview
With any serious, mission critical applications development, we should always have three to five core
environments in which the team is operating. They include:
a) development
a. rarely rebuilt, busy server in which the database reflects any number of change controls, some of which
never get to test and others go all the way through.
b) test
a. refreshed from production of a regular basis and in sync with a "batch" of change controls that are going to
production within a defined change control window.
b. ongoing user acceptance testing
c. database security privileges reflect what will (or is) in production
c) production support
a. mirror of production at a point in time for user testing and the testing of fixes or debugging of critical
problems rather than working in production.
d) pre-production
a. mirror of production
b. used when "compiling code" into production and the final pre-testing of production changes
e) production
166
The cycle of change is shown in the diagram below through some of these servers:
We will discuss each element of the change window cycle throughout this article. The whole change management
system, be it in-house built or a third party product has seen a distinct shift to the whole CRM (relationship
management) experience, tying in a variety of processes to form (where possible) this:
167
This ties in a variety of policy and procedures to provide end-to-end service delivery for the customer. The "IR
database" shown in the previous diagram doesn't quite meet all requirements, but covers resource planning, IR
and task management, and subsequent change window planning. With good policy and practice, paper based
processes to document server configuration and application components assist in other areas of the services
delivery and maintenance framework.
The system tracks all new developments (3 month max cycle), mini projects (5–10 days), long term projects
(measured and managed in 3 month blocks) and other enhancements and system bugs. This forms the heart
and sole of the team in terms of task management and task tracking. As such, it also drives the change control
windows and what of the tasks will be rolled into production each week (we have a scheduled downtime of 2 hours
each Wednesday for change controls).
The resource meeting identifies and deals with issues within the environments, tasks to be completed or nearing
completion and the work schedule over the next two weeks. The Manager will not dictate the content of the
change window but guide resourcing and task allocation issues. The team leaders and the development staff will
allocate their tasks to a change control window with a simple incrementing number representing the next change
window. This number and associated change information in the IR database is linked to a single report that the
DBA will use on Tuesday afternoon to "lock" the change control away and use it to prepare for a production rollout.
168
Visual Source Safe (VSS)
The key item underpinning any development project is source control software. There is a variety on the market
but on all clients' sites I have visited to date, all use Microsoft VSS. Personally, I can't stand the product; with its
outdated interface, lacking functionality and unintuitive design, it's something most tend to put up with. Even so, a
well managed and secured VSS database is critical to ongoing source management.
Managing Servers
There are not a lot of development teams that I have come across that have their own server administrators. It is
also rare that the servers fall under any SOE or contractual agreement in terms of their ongoing administration on
the LAN and responsibility of the IT department. As such, the DBA should take the lead and be responsible for all
server activities where possible, covering:
a) server backups – including a basic 20 tape cycle (daily full backups) and associated audit log, try and get the
tapes off site where possible and keep security in mind.
b) software installed – the DBA should log all installations and de-installations of software on the server. The
process should be documented and proactively tracked. This is essential for the future rollout of application
components in production and for server rebuilds.
c) licensing and terminal server administration
d) any changes to active-directory (where applicable)
e) user management and password expiration
f) administrator account access
On the Development and Test servers I allow Administrator access to simplify the whole process. Before going
live, security is locked down on the application and its OS access to mimic production as best we can. If need be,
we will contact the company's systems administrators to review work done and recommend changes.
Allowing administrative access to any server usually raises hairs of the back of people's necks, but in a managed
environment with strict adherence of responsibilities and procedure, this sort of flexibility with staff is appreciated
and works well with the team.
Development Server
The DBA maintains a "database change control form", separate from the IR management system and any other
change management documentation. The form includes the three core server environments (dev, test and prod)
169
and associated areas for developers to sign in order for generated scripts from dev to make their way between
server environments. This form is shown below:
In terms of security and database source management, the developers are fully aware of:
a) naming conventions for all stored procedures and views
b) the DBA is the only person to make any database change
c) database roles to be used by the application database components
d) DBO owns all objects and roles security will be verified and re-checked before code is prompted to test
e) Developers are responsible for utilising visual source safe for stored procedure and view management
f) the DBA manages and is responsible for all aspects of database auditing via triggers and their associated
audit tables
g) production server administrators must be contacted when concerned with file security and associated proxy
user accounts setup to run COM+ components, ftp access, and security shares and remove virtual directory
connections via IIS used by the application.
h) strict NTFS security privileges
With this in mind, I am quite lenient with the server and database environment, giving the following privileges. Be
aware that I am a change control nut and refuse to move any code into production unless the above is adhered to
and standard practices are met throughout the server change cycle. There are no exceptions.
a) Server
a. Administrator access is given via terminal services to manage any portion of the application
b. DBA is responsible for server backups to tape (including OS, file system objects applicable to the
application and the databases)
b) Database
170
a. ddl_admin access – to add, delete or alter stored procedures, views, user defined functions.
b. db_securityadmin access – to deny/revoke security as need be to their stored procedures and views.
Database changes are scripted and the scripts stored in Visual Source Safe. The form is updated with the script
and its run order or associated pre-post manual tasks to be performed. To generate the scripts, I am relatively
lazy. I alter all structures via the diagrammer, generate the script, and alter accordingly to cover off issues with
triggers or very large tables that can be better scripted. This method (with good naming conventions) is simple
and relatively fail-safe, and, may I say, very quick. All scripts are stored in VSS.
The database is refreshed on "quiet" times from production. This may only be a data refresh, but when possible
(based on the status of changes between servers), a full database replacement from a production database
backup is done. The timeline varies, but on average a data refresh occurs every 3–5 months and a complete
replacement every 8–12 months.
Test Server
The test server database configuration in relation to security, user accounts, OS privileges, database settings are
as close to production as we can get them. Even so, its difficult to mimic the environment in its entirety as many
production systems include web farms, clusters, disk arrays etc. that are too expensive to replicate in test.
Here the DBA will apply scripts generated from complete change control forms that alter database structure,
namely tables, triggers, schema bound views, full-text indexing, user defined data types and changes in security.
The developers will ask the DBA to move up stored procedures and views from development into test as need be
to complete UAT (user acceptance testing).
The DBA will "refresh" the test server database on a regular basis from production. This tends to coincide with a
production change control window rollout. On completion of the refresh, the DBA might need to re-apply database
change control forms still "in test". All scripts are sourced from VSS.
Production Support
The production server box is similar to that of test, but is controlled by the person who is packaging up the next
production release of scripts and other source code ready for production. This server is used for:
a) production support – restoring the production database to it at a point in time and debugging critical application
errors, or pre-running end of month/quarter jobs.
b) pre-production testing – final test before going live with code, especially handy when we have many DLLs with
interdependencies and binary compatibilities issues.
All database privileges are locked down along with the server itself.
Production
The big question here is, "who has access to the production servers and databases?". Depending on your SLAs,
this can be wide and varied, from all access to the development team via internally managed processes all the
way to having no idea where the servers are, let alone getting access to it. I will take the latter approach with
some mention of more stricter access management.
If the development team has access, it's typically under the guise of a network/server administration team that
oversee all servers, their SOE configuration and network connectivity, OS/server security and more importantly,
OS backups and virus scanning. From here, the environment is "handed over" to the apps team for application
configuration, set-up, final testing and "go live".
In this scenario, a single person within the development team should manage change control in this environment.
This tends to be the application architect or the DBA.
171
b) MSDTC is stopped
c) Crystal reports and other batch routines scheduled to run are closed and/or disabled during the upgrade
d) prepare staging area "c:\appbuild" to store incoming CC window files
e) backup all components being replaced, "c:\appatches\<system>\YYYYMMDD"
a. I tend to include entire virtual directories (even if only 2 files are being altered)
b. COM+ DLL's are exported and the DLL itself is also copied just in case the export is corrupt
f) full backup of the database is done if any scripts are being run
g) consider a system state backup and registry backup, emergency disks are a must and should always be kept
up to date.
Take care with service packs of any software. The change (upgrade or downgrade) of MDAC, and the slight
changes in system stored procedures and system catalogs with each SQL Server update can grind parts (or all) of
your application to a halt.
Hot Fixes
Unless you are running a mission critical system, there will always be minor system bugs that result in hot fixes in
production. The procedure is relatively simple but far from ideal in critical systems.
a) Warn all core users of the downtime, pre-empt with a summary of the errors being caused and how to
differentiate the error from other system messages.
b) If possible, re-test the hot fix on the support server
c) Bring down the application in an orderly fashion (e.g. web-server, component services, sql-agent, database
etc).
d) Backup all core components being replaced/altered
Database hot fixes, namely statements rolling back the last change windows work is tricky. Do not plan to kick
users off if possible, but at the same time, careful testing is critical to prevent having to do point in time recovery if
this get bad to worse.
Finally, any hotfix should end with a 1/2 page summary of the reasons why the change was made; this is
documented in the monthly production system report. Accountability is of key importance in any environment.
The whole change management process is about customers and the service we provide them as IT professionals.
To assist in problem detection and ideally, resolution system architects of any application should consider either:
a) API for monitoring software to plug in error trapping/correct capability
b) Application consists of single entry point for all system messages (errors, warning, information) related to
daily activity
c) The logging system is relatively fault tolerant itself, i.e. if it can't write messages to a database it will try a file
system or event log.
d) Where possible, pre-allocate range of codes with a knowledge base description, resolution and rollback
scenario if appropriate. Take care that number allocates don't impose on sysmessages (and its ADO errors) and
other OS related error codes as you don't want to skew the actual errors being returned.
A simplistic approach we have taken is shown below; it's far from self healing but meets some of the basic criteria
so we can expand in the future:
172
MRAC Principal of IR/Task Completion
This is going off track a little in terms of change control but I felt it's worth sharing with you. The MRAC (Mange,
Resource, Approve, Complete) principal is a micro guide to task management for mini projects and incident
requests spanning other teams/people over a short period of time. The idea here is to get the developers who
own the task to engage in basic project management procedures. This not only assists in documenting their
desired outcome, but communicating this to others involved and engaging the resources required to see the entire
task through to its completion.
The process is simple enough, as shown in the table below. The development manager may request this at any
time based on the IR's complexity. The developer is expected to meet with the appropriate resources and drive
the task and its processes accordingly. This is not used in larger projects in which a skilled project manager will
take control and responsibility of the process.
Task or deliverable
Planned Completion Date
Managed by
Resourced to
Approved by
Completed by
Requirements
Design
Build
Test
Implement
The tasks of course will vary, but rarely sway from the standard requirements, design, build, test, implement life-
cycle. Some of the key definitions related to the process are as follows:
173
Accepted The recorded decision that a product or part of a
product has satisfied the requirements and may be
delivered to the Client or used in the next part of the
process.
Approved The recorded decision that the product or part of the
product has satisfied the quality standards.
Authorised The recorded decision that the record or product has
been cleared for use or action.
Variation A formal process for identifying changes to the Support
Release or its deliverables and ensuring appropriate
control over variations to the Support Release scope,
budget and schedule. It may be associated with one or
more Service Requests.
This simple but effective process allows developers and associated management to better track change and its
interdependencies throughout its lifecycle.
Summary
No matter the procedures and policies in place, you still need commitment from development managers, project
leaders/manager and the senior developers to drive the change management process. Accountability and strict
adherence to the defined processes is critical to avoid the nightmare of any project, that being a source code
version that we can never re-create, or a production environment which we don't have the source for.
Failure to lay down the law with development staff (including the DBA) is a task easily put in the 'too hard' basket.
It is not easy, but you need to start somewhere. This article has presented a variety of ideas on the topic that may
prompt you to take further action in this realm. The 21st century DBA, aka Technical Consultant, needs to focus
on a variety of skills, not only database change but change management processes as a complete picture.
A Database Management System (DMS) is a combination of computer software, hardware, and information
designed to electronically manipulate data via computer processing. Two types of database management
systems are DBMSs and FMSs. In simple terms, a File Management System (FMS) is a Database Management
System that allows access to single files or tables at a time. FMSs accommodate flat files that have no relation to
other files. The FMS was the predecessor for the Database Management System (DBMS), which allows access
to multiple files or tables at a time (see Figure 1 below).
174
File Management Systems
Advantages Disadvantages
Simpler to use Typically does not support multi-user access
Less expensive· Limited to smaller databases
Limited functionality (i.e. no support for complicated
Fits the needs of many small businesses and home users
transactions, recovery, etc.)
Popular FMSs are packaged along with the operating systems of
Decentralization of data
personal computers (i.e. Microsoft Cardfile and Microsoft Works)
Good for database solutions for hand held devices such as Palm
Redundancy and Integrity issues
Pilot
Typically, File Management Systems provide the following advantages and disadvantages. The goals of a File
Management System can be summarized as follows (Calleri, 2001):
• Data Management. An FMS should provide data management services to the application.
• Generality with respect to storage devices. The FMS data abstractions and access methods should
remain unchanged irrespective of the devices involved in data storage.
• Validity. An FMS should guarantee that at any given moment the stored data reflect the operations
performed on them.
• Protection. Illegal or potentially dangerous operations on the data should be controlled by the FMS.
• Concurrency. In multiprogramming systems, concurrent access to the data should be allowed with
minimal differences.
• Performance. Compromise data access speed and data transfer rate with functionality.
From the point of view of an end user (or application) an FMS typically provides the following functionalities
(Calleri, 2001):
• File creation, modification and deletion.
• Ownership of files and access control on the basis of ownership permissions.
• Facilities to structure data within files (predefined record formats, etc).
• Facilities for maintaining data redundancies against technical failure (back-ups, disk mirroring, etc.).
• Logical identification and structuring of the data, via file names and hierarchical directory structures.
Advantages Disadvantages
Greater flexibility Difficult to learn
Packaged separately from the operating system (i.e. Oracle, Microsoft
Good for larger databases
Access, Lotus/IBM Approach, Borland Paradox, Claris FileMaker Pro)
Greater processing power Slower processing speeds
Fits the needs of many medium to large-sized organizations Requires skilled administrators
Storage for all relevant data Expensive
Provides user views relevant to tasks performed
Ensures data integrity by managing transactions (ACID test =
atomicity, consistency, isolation, durability)
Supports simultaneous access
Enforces design criteria in relation to data format and
structure
Provides backup and recovery controls
Advanced security
175
The goals of a Database Management System can be summarized as follows (Connelly, Begg, and Strachan,
1999, pps. 54 – 60):
• Data storage, retrieval, and update (while hiding the internal physical implementation details)
• A user-accessible catalog
• Transaction support
• Concurrency control services (multi-user update functionality)
• Recovery services (damaged database must be returned to a consistent state)
• Authorization services (security)
• Support for data communication Integrity services (i.e. constraints)
• Services to promote data independence
• Utility services (i.e. importing, monitoring, performance, record deletion, etc.)
The components to facilitate the goals of a DBMS may include the following:
• Query processor
• Data Manipulation Language preprocessor
• Database manager (software components to include authorization control, command processor, integrity
checker, query optimizer, transaction manager, scheduler, recovery manager, and buffer manager)
• Data Definition Language compiler
• File manager
• Catalog manager
Conclusion
From the File Management System, the Database Management System evolved. Part of the DBMS evolution was
the need for a more complex database that the FMS could not support (i.e. interrelationships). Even so, there will
always be a need for the File Management System as a practical tool and in support of small, flat file databases.
Choosing a DBMS in support of developing databases for interrelations can be a complicated and costly task.
DBMSs are themselves evolving into another generation of object-oriented systems. The Object-Oriented
Database Management System is expected to grow at a rate of 50% per year (Connelly, Begg, and Strachan,
1999, pg. 755). Object-Relational Database Management System vendors such as Oracle, Informix, and IBM
have been predicted to gain a 50% larger share of the market than the RDBMS vendors. Whatever the direction,
the Database Management System has gained it's permanence as a fundamental root source of the information
system.
References
• Connolly, Thomas, Begg, Carolyn, and Ann Strachan. (1999). Database Systems: A Practical
Approach to Design, Implementation, and Management. Essex , UK . Addison Wesley Longman.
• Database Management. [Online]. Edith Cowan University. http://www-
business.ecu.edu.au/users/girijak/MIS4100/Lecture7/index.htm. [2001, August 20].
• Database Management Systems. [Online]. Philip Greenspun.
http://www.arsdigita.com/books/panda/databases-choosing. [2001, August 20].
• File Management Systems. [Online]. Franco Calleri. http://www.cim.mcgill.ca/~franco/OpSys-304-
427/lecture-notes/node50.html. [2001, August 21].
176
• Introductory Data Management Principles. [Online]. Laurence J. Kreig. Washtenaw Community College.
http://www.wccnet.org/dept/cis/mod/f01c.htm. [2001, August 14].
A couple of weeks ago, I had just come away from yet another Microsoft XP marketing pitch about what a
wonderfully robust operating system it is going to be, and how its cool new features were going to truly enrich the
end user experience. I've used XP a bit and I like it, so don't get me wrong here. I'm not bashing Microsoft, but I
couldn't help but be a bit cynical when I heard this. In particular, the specific hype session was centered on some
very non-specific speech recognition capabilities that were destined to "render the keyboard obsolete". I don't
remember the exact wording, but I can't be too far off in my paraphrasing.
I would be very impressed if I could get speech recognition and activation on my machine, and make it so that it
was truly a productivity booster. Supposedly Office XP is fully loaded with voice recognition capabilities, including
menu commands and dictation. But I can't help but think back to the early and mid-nineties when a number of the
speech-recognition software products came out with what I recall were the "keyboard killers" of their time. I don't
remember if their features were really new, or just new to me, but I do remember how the scene played out. It
went something like this (after about 2 hours of trying to set up the microphone, and too many mundane hours of
training the software):
"Ahem… File, new. File New. FILE NEW. No. F I LE N E W." (ok)
[Some nonsensical dictation, just a few lines to test it out. Works pretty well, kind of clunky, still a bit irritating]
"File Save. SAVE. S A V E! (no, not exit!) Save? YES! Not Exit! Cancel! CANCEL! @#$%&!
I was promptly thrown out of the word processing application with nothing to show. Nothing, that is, unless you
count the resulting distaste for speech-to-text. I know the technology was still in its infancy, and may still be for all
intents and purposes, but for me, it turned out to be nothing more than an interesting distraction. I would be willing
to bet that most of you haven’t found it too compelling either. In fact, I have yet to meet anyone who speaks to his
or her machine for purposes other than the opportunity to tell it what it can go do with itself.
Not too much after I saw the latest XP marketing pitch, I was on my way in to work, thinking about a database
project that we had been working on for a while. Part of it has to do with essentially recreating the Query builder
functionality that can be found in SQL Server or Access. We have a web-based mechanism that is used for other
applications and mimics that functionality fairly well, but it is not quite sufficient for the needs of this particular
application. I’ve played around with a number of the third-party tools and controls that are currently on the market,
and they too have a fairly robust feature set. What was lacking in all of these tools was EXTREME simplicity for
the end-user. Dreaming the impossible dream, I recalled the speech recognition capabilities of XP, and thought
about how cool it would be if I could just TELL the application what data I needed to pull, and it actually went out
and did it.
A quick reality check reminded me that I knew nothing about speech to text technology, much less how to couple
that with a database application. But I do know a little something about a particular database platform that
supports a cool bit of technology. You guessed it – English Query. I’ve never actually used it before, and don’t
even know for sure of anyone that has. However, one thing that I do know is that I live to learn about new
technology, and this seemed to be the perfect opportunity to broaden my horizons.
177
So what about the speech recognition part of this? After a little research, I found out that Microsoft has released
their latest version of the Speech SDK (5.1). This version is essentially a wrapper around their previous version
(5.0) that allows for any programming environment that supports automation to access its interface. In my case,
this would be Visual Basic 6.0 as I'm still getting acclimated to .NET. However, I spent some time on the .NET
Speech site, and it looks very promising. As I progress through the English Query project, I may end up focusing
more on the .NET Speech tools, rather than the current tool set. This will inevitably lengthen the learning curve,
but it's something I want to do eventually anyway.(0)
This diagram represents the components of an English Query application deployed on the Web
178
What does it take to get an English Query application up and running?
My development environment is a Win2K Pro box running SQL Server 2K. I will go over the steps you must take
to build and deploy an EQ application on the desktop and on the Web in more detail in subsequent articles; for
now, here is a general overview of what needs to be done:
1. The first thing you have to do is install English Query on your machine. This can be found in the setup
option for SQL Server on the installation CD.
2. To create an EQ application, you must create a new project in Visual Studio from scratch, or using one of
the wizards that is provided with the EQ install. Save your sanity and use the wizards.
3. Once the wizard completes its operation, refine the EQ model that was created by the wizard.
a. Enhance the entities in your model by providing "synonyms" for certain entities (ex: phone =
phone number)
b. Define additional relationships between and within your entities.
4. For any modifications you make, and for good programming practices in general, test your EQ model.
5. After testing, you can refine how data is displayed. This will inevitably be an iterative process.
6. Build (compile) the EQ application.
7. Use the EQ application in your VB project, or deploy it to the Web.
This is a representation of the parts of an English Query Project created in Visual Studio
Introduction
179
Around this time last year I mentioned to my boss that I was interested in Project Management. I had worked for
the company for two years as the principle DBA and felt that project management was the next career step for
me.
I thought that I had become suitably world weary and cynical. Not quite up to Michael Moore standards, but
getting there. I felt ready for the task ahead but my first project in the role of project manager was an eye opener. I
thought I would share with you the main lessons I learnt on my first project.
Lesson One
A customer has certain expectations of their project. If the project is worth $50,000 then the customer is likely to
have $60,000 worth of expectations. If, through budgeting, that $50,000 project gets pruned to, say, $20,000 then
you will find that the customer still has $60,000 worth of expectations.
A project that has been gutted in this way at the start is called a Death March. Read "Death March" by Edward
Yourdon for further details.
Your first job will be to enter a bartering process with the customer to set priorities for the tasks within the project
and to work out what you can deliver for the $20,000. This leads to Lesson Two.
Make it clear that actions are only carried out against written and agreed tasks. The temptation is to slip things into
the project to act as a sweetener, particularly if you know that you are going to have to give the customer some
bad news. However, if these sweeteners are not written down then
• You have no written proof demonstrating you flexibility.
• It raises false expectations in the customer and things will get ugly later on when you have to say "no"
further down the line.
• You will be penalized for project creep when time taken implementing the sweeteners has detracted from
the time spent on the meat of the project.
If you have a concern that needs addressing (i.e. the spec of the server is too low for the task it is expected to do)
then you need to put this in writing. This leads to Lesson Three.
My boss told me that he always volunteers to take the minutes of any meeting because he always knows that
the points that he makes will be recorded. No-one can overlook the points that he raised because he always
records those items. Of course someone could suggest that something be struck from the minutes after the first
draft is issued, but it is unlikely to happen because:
This will tell you not only when things happened but give you the chance to keep a narrative of why they
happened. If a customer continually puts off a major decision then it helps if you document the date/times on
which you chased them up for that decision. If you raised a concern with an aspect of the project, i.e. you
expressed concern that your data warehousing project is going to be run on a low spec server that is doubling as
a web server, then not only the concern, but the response to this concern needs to be recorded.
This is to help you keep track of your project. It is serendipity that this also acts as protection in the event of
project failure. The journal will also be vital in preparing for a project post mortem.
180
Lesson Four – Keep an issues log
We keep a simple Word document with a table that lists:
• The issue.
• The date it was raised.
• Who raised it.
• Who is responsible for dealing with it
• The resolution.
• The date it was closed.
This document is a global document that is circulated to all members of the project team and to the customer. It
acts as a forum for all and sundry to raise their concerns.
Lesson Five
Face to face meetings are relationship builders. The rapport that you build with your customer will help in
weathering the ups and downs of the project. There are things that you can say in a meeting that you would never
put in writing and you would be very wary of saying on the phone. This doesn’t contradict Lesson Two. You still
write everything down, but you sanitize it for general consumption.
Within the constraints of time and budget you need to meet with the customer often enough to keep abreast of
how the customer perceives the progress of their project.
You should also aim to have a project post mortem on completion of a project. This is usually the time when you
ask the customer to sign off the project as being complete and to accept your final invoice.
Lesson Six
A project post mortem is supposed to be a constructive affair in which both the positive and negative aspects of
the project are examined from both yours and the customer’s perspectives. In many ways it is like an annual
employee appraisal. It is not an excuse for the employer/customer to give the employee/project manager what we
British call "a right bollocking". If it is seen in this light then really the issues at stake should have been raised and
dealt with earlier in the project.
There is the danger is that this final stage will degenerate but, frankly, there is little to be gained from such an
experience.
Don’t improvise unless you absolutely have to, you are asking for egg on your face. This is the 21st century. You
should be able to phone someone on your team during a meeting recess. This is a variation on "be nice to the
people on your way up, you are sure to meet them again on your way down".
Summary
They say that good judgement is the product of experience and that experience is the product of bad
judgement. Well, shall we say that I gained a lot of experience on my first project. I was fortunate that a couple of
my bosses are the sort of project managers that developers volunteer to work with and they really helped me
through it. I’ve learnt that there is nothing soft about "soft" skills. Sometimes you have to smile and shake the
customer’s hand when you would sooner break his/her fingers.
Would I do it again? I would have to say 'yes'. With a good team behind you and a fair-minded customer it is
challenging but fun.
Much as I enjoy the problem solving aspect of DBA'ing my experience is that techy jobs tend not to earn much
respect in the boardroom. My observation would be that technicians tend to have tasks thrust upon them –
whereas people managers have at least some flexibility to control their own destiny.
181
Pro Developer: This is Business
Christopher Duncan
2/25/2003
I've been paying the rent as a professional software developer since the 80s. I've also worked both full time and
part time as a musician for longer than that. In my travels, I've come to recognize a great many similarities
between programmers and musicians. Both have the fire, passion and soul of the artist. And all too often, both are
incredibly naïve when it comes to the business end of things. Business – you know, that aspect of your work
where they actually pay you at the end of the day?
Whether you're up all night banging away at the next Killer App or you're cranking up the guitar in a smoky bar full
of black leather jackets, chances are good that money isn't really what you're concentrating on. However, contrary
to popular belief, that doesn't make you noble. At the end of the month, no matter how compelling your art may be,
your landlord is only interested in cold, hard currency. It's just the way the world works. If you don't take the
business aspect of your career every bit as seriously as you take your art, you're going to end up hungry. And just
for the record, I've also done the starving artist routine. Trust me, it's not nearly as romantic as it looks in the
movies. Give me a fat bank account and a two inch steak any day of the week. My art's much better when I'm not
distracted by the constant rumblings of an empty stomach.
Programmers by and large fare much better than their guitar playing brethren when payday rolls around. Even in
the midst of the occasional economic slumps that the tech industry has weathered over the past few decades, a
low paying coding job beats the heck out of a high paying bar gig. Nonetheless, all things are relative. If you make
a living as a programmer, then you need computers, software, development tools, research books, and probably
an extremely robust espresso machine. Spare change to tip your local pizza delivery person is also a good idea if
you want to ensure that your pepperoni delight arrives while the cheese is still melted. All of this requires money.
The difference between a hobbyist and a professional is that the professional lives off of that money. My best
friend taught me that when I was but a fledgling, wannabe garage band musician, working for free. Believe me,
getting paid is better.
Among musicians, referring to a song or style of music as "commercial" is intended as an insult, one that implies
that the songwriter sold their artistic soul for a few bucks and is therefore beneath creative contempt. You'll find a
similar attitude among programmers. Those who have financial and career goals as a priority are often held in
disdain by the true software artists.
In both cases, there is nothing wrong with being zealous about your craft. Indeed, show me someone who has no
passion when it comes to their vocation, and I'll show you a very mediocre craftsman. However, if you're going to
be a professional in an artistic field, you have to master the business aspects just as completely as you've
mastered the creative ones. Failure to do so will bring dire consequences, not all of them immediately obvious.
182
And when you get right down to it, this really speaks to the heart of the matter. You get up each day, you shower
(or so your co-workers hope, anyway), you jump into the transit vehicle of your choice, and you fight the masses to
get to the office so that you can pursue your day as a professional software developer. Of course, once you get
there, instead of coding, you spend a large portion of each day dealing with the fallout from unrealistic marketing
schemes and ill informed decisions from clueless managers who think that semicolons are merely punctuation
marks for sentences. You cope with an endless stream of pointless meetings, interminable bureaucracy, insipid
mission statements, unrealistic deadline pressures and a general environment that seems to care about almost
everything except the cool software you're trying, against all odds, to deliver. You don't have to cope with any of
this nonsense when you're sitting at home on the weekend, coding away on your favorite pet project in your robe
and bunny slippers. So, tell me again why you spend a significant portion of your waking hours fighting traffic and
wearing uncomfortable clothes to spend time in an office environment that seems dead set on working against the
very things in life that you hold dear?
Oh, yeah, that's right. They pay you money to do so. Sorry. I forgot. Really I did.
Rubbish!
Every single hour of every single day that you spend in the corporate world as a professional software developer is
driven by one, and only one thing. Money. Get warm and fuzzy with that, or find another career. Regardless of how
passionate you may be about the art and science of software development, at the end of the day, it's highly
unlikely that you'd spend five seconds of your time at the office if they weren't paying you to do so. You're there for
the money. I don't make the rules. It's just the way it is.
So, no matter how passionate you may be about your craft, at the end of the day, you're a hired gun. Maybe you're
a full time employee. Or maybe, like me, you're a professional mercenary. It doesn't matter. Either way, it all boils
down to the same thing. You show up to code only when people offer to pay you money to do so. Personally, I
find no dishonor in this lifestyle. I deliver the very best I have to offer to my clients. They offer the very greenest
American dollars they possess in return. From my point of view, everybody wins in this scenario. And so, I'm
constantly baffled by programmers I encounter in everyday life who speak from the perspective that only the
software is important, and nothing else.
Really? Is that true? Then can I have your paycheck? I mean, only if you don't care about it, that is. Personally, I
could find a lot of uses for it. But if the software is all that's important to you then shucks, let me give you my bank
account number. I'd be happy to assist you in dealing with those pesky details that arise from the business end of
the programming vocation. It's no trouble. Really. I'm happy to help.
Perspective is everything
Of course, anyone who has by now labeled me an insufferable wise guy is completely unfamiliar with my work, be
it coding, writing, speaking or training. Yes, this is an intentionally confrontational posture towards all who bury
their heads in the sand and think of software and nothing but software. In fact, you happen to be my primary target
for this particular conversation. But that doesn't mean that I don't like you. In fact, it's your very posterior that I'm
trying to protect.
Week after week, I either personally encounter or hear tales of you, or someone like you, being trashed in the
workplace because you have no grip on the realities of the business world. You're taken advantage of and work
ridiculous hours to no good end. Your software requirements change more often than your manager changes his
socks. You suffer the consequences of releases that are absolute disasters because your company refuses to
give you the time you need in order to do things the right way.
You are completely unarmed in this melée if your only response speaks to the needs of the software. To your
complete surprise and dismay, you'll find that nobody cares. Consequently, you're ignored, your project suffers an
ill fate, and the skies just aren't as blue as they could be for one simple reason. You're trying to solve the right
problems, but you're speaking the wrong language. And so, you lose. Over and over again.
183
A simple strategy for winning
So do I have all the answers? Yeah, probably, but that's another conversation entirely (and should you doubt it,
you can always take the matter up with our local attack Chihuahua – he has very strong feelings about such
things). However, in this particular case, what you should really be questioning is whether or not I have a
perspective on the software business that will help improve the things that you truly care about in our industry. And
by the strangest of coincidences, I just happen to have some of those as well. But then, I guess you saw that
coming, didn't you?
I've been known to talk for hours on end about the specific tactics that we, as professional software developers,
can employ to ensure the delivery of a Really Cool Software. In fact, you could say that it's my stock in trade.
Today, however, my message is much, much simpler. I'm not talking about bits and bytes here. Okay, in fairness,
I never spend much time at all talking about bits and bytes. You guys already know about that stuff, and you don't
need me to teach you how to code. What I am talking about, in particular, is perspective, and I deem it a critical
issue. In fact, I'd go so far as to say that if you don't have the proper perspective, you're screwed, and so is your
project.
So what's the perspective that I'm promoting here, and how will it help you? Just like the title says. This is
business! Forget your technical religions. No one cares! Never mind how cool the app you just coded is. Nobody
wants to know! Really! The people who are in a position of power and have the authority to influence the quality of
software you deliver live in a completely different world than you do. Until you come to terms with this one simple
fact of life, you're going to bang your head against the Corporate Wall for the rest of your career. And worst of all,
the software you deliver will suck! Okay, maybe not suck in the eyes of Mere Mortals, but you and I both know that
it could be way cooler than your management will let you make it.
I never use one word where thirty will do. It's a personal shortcoming. Particularly because in this case, what I've
taken many words to relate can be summarized quite succinctly. Your job is not about software. It's about
business. Grasp this one simple concept, and apply it in all of your interactions. Every time you attempt to promote
your agenda to those who have the power to do something about it, stop and ask yourself these questions. Does
what you're proposing make sense from a monetary and business perspective? Will the person you're speaking
with see value in it from their point of view? Or are you speaking only in terms of software?
I realize that it seems a bit strange to de-emphasize technical issues when what you're trying to do is improve a
technical product, but at the end of the day, everyone else shows up at the office for the same reason that you do.
They're in it for the money, and business is the path to obtaining it. Speak from this perspective, and you'll be
amazed at how much it improves your ability to deliver the next Killer App. Compared to dealing with people,
debugging is the easy stuff.
As a DBA, one of the things that happens to me several times a day is finding a chunk of SQL in my inbox or,
worse still, on a piece of paper dropped on my desk. Yes, it's SQL that performs poorly or doesn't do what the
programmer expected and now I'm asked to look at it. And, it's often the case that this chunk of SQL is just plain
184
ugly; hard to read and understand. There are two Best Practices that frequently get applied to such messes before
I really start analyzing the problem…
For me, the worst part of this query are the table aliases. A, B, C, D, E. I find that I must continually refer back to
the "from" clause to try and remember what the heck A or E or whatever represents. Figuring out whether or not
the programmer has gotten the relationships right is a real pain in the neck with this query. He's saved typing,
sure, but at a tremendous cost in clarity. And I've had much worse end up on my desk: tables from A to P on at
least one occasion and about three pages long, with some columns in the SELECT list that weren't qualified by
table aliases at all.
Let's rewrite this guy's query for him using this first Best Practice (I'm not going to do anything about his spacing):
select distinct
clo.clone_id,clc.collection_name,clo.source_clone_id,clo.image_clone_id,lib.library
_name,lib.vector_name,
lib.host_name,loc.plate,loc.plate_row,loc.plate_column,clo.catalog_number,clo.accli
st,clo.vendor_id,clc.value,lib.species,seq.cluster
from clone clo,collection clc,library lib,location loc, sequence seq
where clo.collection_id = clc.collection_id
and clo.library_id = lib.source_lib_id
and clo.clone_id = loc.clone_id
and clo.clone_id = seq.clone_id
and clc.short_collection_type='cDNA'
and clc.is_public = 1
and clo.active = 1
and clo.no_sale = 0
and seq.cluster in (select cluster from master_xref_new where
type='CLONE' and id='LD10094')
Without bothering to fix the spacing, isn't this already easier to understand? Which query lends itself to easier
maintenance? Trust me, it's the latter, every time.
In some situations, being able to easily identify the source table for a column in the select list can be a big help,
too. You may have two different tables which have fields with identical names but which mean different things.
Catching those will be easier with mnemonics.
We can make another big improvement in this query with another best practice...
185
BEST PRACTICE 2 – Use ANSI JOIN Syntax
Do this to clearly demonstrate the separation between "How do we relate these tables to each other?" and "What
rows do we care about in this particular query?"
In this case, I can only guess what the programmer is up to but, if I were a DBA at his site and knew the
relationships between the tables, I could use this "relating" vs. "qualifying" dichotomy to help troubleshoot his
queries. Let's rewrite this query again (but I'm still not going to do much about his spacing):
select distinct
clo.clone_id,clc.collection_name,clo.source_clone_id,clo.image_clone_id,lib.library
_name,lib.vector_name,
lib.host_name,loc.plate,loc.plate_row,loc.plate_column,clo.catalog_number,clo.accli
st,clo.vendor_id,clc.value,lib.species,seq.cluster
from clone clo
inner join collection clc
on clo.collection_id = clc.collection_id
inner join library lib
on clo.library_id = lib.source_lib_id
inner join location loc
on clo.clone_id = loc.clone_id
inner join sequence seq
on clo.clone_id = seq.clone_id
where clc.short_collection_type='cDNA'
and clc.is_public = 1
and clo.active = 1
and clo.no_sale = 0
and seq.cluster in (select cluster from master_xref_new where
type='CLONE' and id='LD10094')
I still can't say for sure that this query is right. However, the DBA that does know this database is going to find it
much easier to spot a missing element of the relationship between, say, collection and clone. It's certainly much
easier to spot a situation where the programmer failed to include any relationship to one of the tables (it would be
obvious to us at this point), so you get fewer accidental Cartesian Products.
In my experience, simply rewriting ugly queries according to these best practices has often pointed up the nature
of the problem and made the solution a snap. This certainly happens often enough that taking the time to do the
rewrite is worth the trouble.
Another advantage of following this rule is that it allows you to readily steal an important chunk of your SQL
statements from any nearby statement that already relates these tables. Just grab the FROM clause out of
another statement, put in the WHERE that's customized for this situation and you're ready, with some confidence,
to run the query. Being a lazy sort, this feature is a real plus for me.
So, encourage mnemonic table aliases and use of ANSI JOIN syntax. As Red Green says: "I'm pullin' for ya.
We're all in this together." He's right; your programmers might end up at my site or vice-versa someday.
Introduction
In the first part of this series a script was used to query a SQL server for databases that were being backed up as
part of the maintenance plans. This allows one to determine if a database is part of a maintenance plan. It would,
in most cases, be nice to have the pertinent backup information on hand. The following class will return the
relevant backup information from the maintenance plan so it can be viewed in a more user friendly manner.
If the script presented from the first series is combined with this script one would be able to loop through all the
databases in the maintenance plans and return their individual backup information to a user interface. By taking
186
these classes and converting them to ASP scripts, a web page can be created to display the current backup
situation on a given SQL server.
Some of these techniques will be presented in upcoming articles. In this article, however, a script to present the
backup information is going to be presented.
An Example
The code for this article can be found at SQLServerCentral.com. The following is an example of the code needed
to return the backup information for a given database. By entering the server and database name one can query
to find the last backup for a give database.
There are two message boxes here that return the backup information. The message boxes demonstrate two
ways information can be returned from the class. The first method is to use GetBackUpHist. This method of the
class returns a text string with all the backup information put together. The second method takes each individual
element and builds the text string. This is useful to add formatting or to write information to a file if the this class
was used as part of an inventory type script.
set objDBInfo = new clsDBBackupInfo
objDBInfo.SQLServer = "MYSERVER"
objDBInfo.UserID = "MYUSERID"
objDBInfo.Password = "MYPASSWORD"
objDBInfo.Database = "MYDATBASE"
msgbox objDBInfo.GetBackupHist
strDBMsg = ""
strDBMsg = strDBMsg & "Database " & objdbinfo.Database & vbCRLF
strDBMsg = strDBMsg & "Start Time " & objdbinfo.StartTime & vbCRLF
strDBMsg = strDBMsg & "EndTime " & objdbinfo.EndTime & vbCRLF
strDBMsg = strDBMsg & "Duration " & objdbinfo.Duration & vbCRLF
strDBMsg = strDBMsg & "Plan " & objdbinfo.Plan & vbCRLF
strDBMsg = strDBMsg & "Success " & objdbinfo.Success & vbCRLF
strDBMsg = strDBMsg & "Message " & objdbinfo.Message & vbCRLF
msgbox strDBMsg
set objDBInfo = nothing
The UserID and Password properties are optional. If the SQL server is running with integrated security and the
logged in user is an administrator on the SQL server the information will be returned without the UserID and
Password properties.
The Class
The beginning of the class has an explanation for the properties and methods of the class. This section is not
enumerated. The enumerated section of the code starts by initializing the needed variables (lines 1-18). The only
code needed in the initialize routine sets the security variable to integrated security by default. The terminate
routine closes the connection to the server.
Lines 28-116 are where the let properties are defined. These are the five settings the user has the ability to
control. In this case the user can set the SQLServer, the Database, the UserID, the Password, and the Security.
When the SQLServer property and the Database properties are set a check is made to see if both properties have
been set (lines 30 and 68). If both properties have been set the rest of the let property routines behave the same
for these two propeties. A SQL statement is constructed, a connection is open and a recordset is returned. The
record set is checked to make sure it is not empty and the values are read into the variables. When the recordset
values are read into the private variables they are then available as properties to the users via the get statements
which will be discussed below.
The UserID and Password properties need to be set, as mentioned above, if the server will not be accessible via
integrated security. The security setting does not need to be set as it is set to integrated by default. This setting
might be used if one wanted to change servers and databases. One server may be able to use integrated security
while another needs an SQL login.
The class has eight get properties which are the properties the user can get once the object has been instantiated.
The SQLServer and Database properties should be known so they may not need to be returned. The other six
properties (lines 118 - 148) can be used by the user to format the database backup information. StartTime,
EndTime and Duration give the user an idea of how long a backup takes. The success property lets the user know
if the backup was successful. The plan property lets the user know which database maintenance plan the backup
is a member of and the message property lists where physically the backup was written.
Lines 151 - 168 are a private routine to open a connection to the database. Lines 170 - 172 are a private routine to
close the connection to the database. The close routine is called by the terminate routine. The final method is the
GetBackupHist. This method returns a string with the same information returned by the individual properties. This
187
method is used mostly for troubleshooting or in a case where a script needs to return information without regards
to format.
'****************************************************
'*
'* CLASS clsDBBackupInfo
'*
'****************************************************
'* The purpose of this class is to list the backups for a given database.
'* The information can be retrieved via a text message using the GetBackupHist
()
'* method or using the individual elements using the gets.
'*
'* LETS
'* SQLServer - Server whose maintenance plans you want to query
'* Database - Database we want to look up last the last backup
for
'*
'* GETS
'* SQLServer - Server Name
'* Database - Database Name
'* Plan - Plan name containing the backup
'* Success - was the last backup a success
'* EndTime - when the last backup ended
'* StartTime - when the last backup started
'* Duration - the length of time the last backup took
'* Message - message for the last backup usually the location of the
backup file
'*
'* Public Functions
'* GetBackupHist() - Returns a string containing the backup information
and populates the GETS.
1 class clsDBBackupInfo
2 private strSQLServer
3 private strDataBase
4 private objCon
5 private SQL2
6 private RS1
7 private str
8 private fd
9 private ConnectionString
10 private strPlan
11 private boolSuccess
12 private dtEndTime
13 private dtStartTime
14 private dtDuration
15 private strMessage
16 private boolSecurity
17 private strUserID
18 private strPassword
19
20 Private Sub Class_Initialize()
21 boolSecurity = TRUE
22 End Sub
23
24 Private Sub Class_Terminate()
25 Call closeConnection
26 End Sub
27
28 Public Property Let SQLServer ( byVal tmpSQLServer )
29 strSQLServer = tmpSQLServer
30 if len(strSQLServer) > 0 and len(strDatabase) > 0 then
31 Dim SQL2
32 Dim RS1
188
33 Dim str
34 Dim fd
35
36 SQL2 = SQL2 & "SELECT TOP 1 * FROM sysdbmaintplan_history "
37 SQL2 = SQL2 & "WHERE (activity LIKE " & "'" & "backup
database" & "'" & ") AND (database_name = " & "'" & strDatabase & "') "
38 SQL2 = SQL2 & "ORDER BY end_time Desc"
39
40 Call openConnection()
41
42 Set RS1 = objCon.Execute(SQL2)
43
44 if not RS1.eof then
45 for each fd in RS1.Fields
46 str = str & fd.name & " " &
fd.value & vbCRLF
47 next
48 strPlan = RS1("Plan_name")
49 boolSuccess = RS1("Succeeded")
50 dtStartTime = RS1("Start_Time")
51 dtEndTime = RS1("End_time")
52 dtDuration = RS1("Duration")
53 strMessage = RS1("Message")
54 else
55 strPlan = ""
56 boolSuccess = ""
57 dtStartTime = ""
58 dtEndTime = ""
59 dtDuration = ""
60 strMessage = ""
61 end if
62 Set RS1 = Nothing
63 end if
64 End Property
65
66 Public Property Let Database ( byVal tmpDatabase )
67 strDatabase = tmpDatabase
68 if len(strSQLServer) > 0 and len(strDatabase) > 0 then
69 Dim SQL2
70 Dim RS1
71 Dim str
72 Dim fd
73
74 SQL2 = SQL2 & "SELECT TOP 1 * FROM
sysdbmaintplan_history "
75 SQL2 = SQL2 & "WHERE (activity LIKE " & "'" & "backup
database" & "'" & ") AND (database_name = " & "'" & strDatabase & "') "
76 SQL2 = SQL2 & "ORDER BY end_time Desc"
77
78 Call openConnection()
79
80 Set RS1 = objCon.Execute(SQL2)
81
82 if not RS1.eof then
83 for each fd in RS1.Fields
84 str = str & fd.name & "
" & fd.value & vbCRLF
85 next
86 strPlan = RS1("Plan_name")
87 boolSuccess = RS1("Succeeded")
88 dtStartTime = RS1("Start_Time")
89 dtEndTime = RS1("End_time")
90 dtDuration = RS1("Duration")
189
91 strMessage = RS1("Message")
92 else
93 strPlan = ""
94 boolSuccess = ""
95 dtStartTime = ""
96 dtEndTime = ""
97 dtDuration = ""
98 strMessage = ""
99 end if
100 Set RS1 = Nothing
101 end if
102 End Property
103
104 Public Property Let Security ( byVal tmpSecurity )
105 boolSecurity = tmpSecurity
106 End Property
107
108 Public Property Let UserID ( byVal tmpUserID )
109 strUserID = tmpUserID
100 boolSecurity = FALSE
101 End Property
112
113 Public Property Let Password ( byVal tmpPassword )
114 strPassword = tmpPassword
115 boolSecurity = FALSE
116 End Property
117
118 Public Property Get SQLServer
119 SQLServer = strSQLServer
120 End Property
121
122 Public Property Get Database
123 Database = strDatabase
124 End Property
125
126 Public Property Get Plan
127 Plan = strPlan
128 End Property
129
130 Public Property Get Success
131 Success = boolSuccess
132 End Property
133
134 Public Property Get EndTime
135 EndTime = dtEndTime
136 End Property
137
138 Public Property Get StartTime
139 StartTime = dtStartTime
140 End Property
141
142 Public Property Get Duration
143 Duration = dtDuration
144 End Property
145
146 Public Property Get Message
147 Message = strMessage
148 End Property
149
140
151 Private Sub openConnection()
152
153 Set objCon = WScript.CreateObject("ADODB.Connection")
190
154
155 ConnectionString = "Provider=sqloledb;"
156 ConnectionString = ConnectionString & "Data
Source=" & strSQLServer & ";"
157 ConnectionString = ConnectionString &
"Initial Catalog=MSDB;"
158 if boolSecurity = TRUE then
159 ConnectionString = ConnectionString &
"Integrated Security=SSPI;"
160 else
161 ConnectionString = ConnectionString &
"User Id=" & strUserID & ";"
162 ConnectionString = ConnectionString &
"Password=" & strPassword & ";"
163 end if
164
165
166 objCon.Open ConnectionString
167
168 End Sub
169
170 Private Sub closeConnection()
171 objCon.Close
172 End Sub
173
174 Public Function GetBackupHist()
175 Dim SQL2
176 Dim RS1
177 Dim str
178 Dim fd
179
180 SQL2 = SQL2 & "SELECT TOP 1 * FROM sysdbmaintplan_history "
181 SQL2 = SQL2 & "WHERE (activity LIKE " & "'" & "backup database" &
"'" & ") AND (database_name = " & "'" & strDatabase & "') " SQL2 = SQL2 &
"ORDER BY end_time Desc"
182
183 Call openConnection()
184
185 Set RS1 = objCon.Execute(SQL2)
186
187 if not RS1.eof then
188 for each fd in RS1.Fields
189 str = str & fd.name & " " & fd.value &
vbCRLF
190 next
191 strPlan = RS1("Plan_name")
192 boolSuccess = RS1("Succeeded")
193 dtStartTime = RS1("Start_Time")
194 dtEndTime = RS1("End_time")
195 dtDuration = RS1("Duration")
196 strMessage = RS1("Message")
197 else
198 str = "No Backups for " & strDatabase & " on " &
strSQLServer
199 strPlan = ""
200 boolSuccess = ""
201 dtStartTime = ""
202 dtEndTime = ""
203 dtDuration = ""
204 strMessage = ""
205 end if
206
207 GetBackupHist = str
191
208 Set RS1 = Nothing
209
210 End Function
211
212End Class
'****************************************************
'*
'* END CLASS clsDBBackupInfo
'*
'****************************************************
Conclusions
This routine is used to query maintenance plans for information regarding backups. The routine allows one to draft
formatted messages using the properties of the class. The class can be used in conjunction with other routines to
create a reporting mechanism for SQL backup procedures. In the next article, both this script and the previous
script will be used in conjunction with SQL-DMO to find servers and query the maintenance plans on those
servers.
Occasionally we stray just a little bit from pure SQL articles and delve into related areas. If you haven't guessed
yet, this is one of those occasions.
On a daily basis I meet with members of our development team to discuss problems they are working on,
problems I need them to work on, or sometimes problems that I'm working on. It's an informal environment where
we go to whichever office is convenient and talk things through until we get to where we need to be. Out of each of
these discussions we often wind up with a list of todo items and/or a diagram showing some proposed changes,
or maybe the flow of how a process will work. Not exactly a new process, I'm sure most of you do something
similar.
Where it gets interesting (to me anyway) is how to have that conversation effectively. It seems that we almost
always look around for something to draw on so that we can present ideas visually – and then modify those ideas
visually as well. When it is time to draw, we typically have three choices:
• Dry erase board/white board/chalk board
• Flip chart/easel pad
• 8-1/2x11 pad
• PC
Leon has a dry erase board in his office. Not that he consciously decided that it was better than the other two
options, that's just how it wound up. Dry erase is nice because you change your drawing quickly and still keep it
legible, but the downside is that once you have something complete there is no easy way to move that to a
transportable medium. (Note: I've heard of people taking digital photos of the board and printing them, not a bad
idea I guess, or maybe you're one of the lucky few who have a board with the built in printing functionality). A lot of
the time the problem is bigger than we can describe on a single board, so we have to start writing on something
else, or start writing a lot smaller in the left over space. Nine times out of ten when I'm in Leon's office I end up
using a 8-1/2x11 pad because I can't erase what's on the board.
Legal or letter notepads are about as low tech as you can get I guess, but they do work. If you have just two
people working the problem it works pretty well, but with three or more the size limits its usability/viewability. Not
as easy to change as dry erase of course, but paper is cheap so you can always redo it, plus the completed notes
are easy to photocopy. Maybe it's just me but I don't think it works as effectively as either dry erase or a flip chart –
I think because it is is helpful to literally "take a step back" and have something you can look at from a few feet
away.
We almost never use a PC to convey ideas. At most we'll grab a chunk of source code, or look at a table design.
Maybe we just haven't found the right tool?
192
That leaves the flip chart. It overcomes most of what I consider negatives on the notepad. Pretty hard to copy of
course, and the paper is a lot more expensive. Not as easy to modify as the dry erase board. For discussions with
developers, at the end of the session they tear the sheets off and tape them to the walls in their office while they
work. Over the past year it has become my tool of choice for outlining problems/solutions, even for things I'm
working on solo. I'll get up, add some stuff, sit back down, look at the chart and think on it.
The interesting part about both dry erase and flip chart is they encourage discussion. When someone walks by or
comes in about something else and sees a new drawing, they often ask questions or have comments that are
useful. No one is going to walk in and see what I have written on my notepad, without being asked.
These sessions are really meetings and well run meetings always have minutes. For us, it's what winds up on the
board/paper that we consider the minutes, no point in investing more time in it. This is a lot more effective than
everyone taking notes while they try to think through the problem at the same time.
A common scenario is for us to revisit the drawing to rethink a problem or reconsider why we went in a specific
direction (a month or more later). Getting everyone looking at the original drawings seems to get us back into that
mental position quickly – or at least more quickly than just talking about it with no visual reference.
I'm not here to say that one method is better than the other, just that one works better for me. What I'm hoping
you'll think about is how you convey ideas and information during these type of brain storming/problem solving
sessions. A lot of what we (developers and DBAs) do is complex stuff. Getting everyone "on to the same page"
isn't easy, but it is a useful metaphor.
193
The Best of SQLServerCentral.com — Vol. 2
In April 2001 six geeks banded together to form a more perfect site. Three years
and 140,000+ members later, SQLServerCentral.com is one of the premier SQL
Server communities in the world. We’ve got over 1,000 articles, 100s of scripts
and FAQs, everything you would need as a SQL Server DBA or developer, and all
at a great price — free.
The Best of
This book contains the best material published on the site from 2003. It’s a
variety of topics from administration to advanced querying. XML to DTS, security
SQLServerCentral.com
to performance tuning. And of course, the famous White Board, Flip Chart, or
Notepad debate. Vol. 2
So why print a book containing material you can get for free? Take a minute,
read the introduction and find out! Essays and Ideas from the SQL Server Community
Andy Jones, Andy Warren, Bob Musser, Brian Kelley, Brian Knight, Bruce Szabo, Chad Miller,
Chris Cubley, Chris Kempster, Christopher Duncan, Christoffer Hedgate, Dale Elizabeth Corey,
Darwin Hatheway, David Poole, David Sumlin, Dinesh Asanka, Dinesh Priyankara, Don Peterson,
Book printing partially sponsored by Frank Kalis, Gheorghe Ciubuc, Greg Robidoux, Gregory Larsen, Haidong Ji, Herve Roggero,
James Travis, Jeremy Kadlec, Jon Reade, Jon Winer, Joseph Gama, Joseph Sack, Kevin Feit,
M Ivica, Mike Pearson, Nagabhushanam Ponnapalli, Narayana Raghavendra, Rahul Sharma,
Ramesh Gummadi, Randy Dyess, Robert Marda, Robin Back, Ryan Randall, Sean Burke,
Sharad Nandwani, Stefan Popovski, Steve Jones, Tom Osoba, Viktor Gorodnichenko