Pro SQL Server 2012 Integration Services
For your convenience Apress has placed some of the front
matter material after the index. Please use the Bookmarks
and Contents at a Glance links to access them.
Contents at a Glance
iv
AbouttheAuthors..................................................................................
AbouttheTechnicalReviewer ........................................................................... Chapter2:BIDSandSSMS.................................................................................... Chapter4:ConnectionManagers.......................................................................... Chapter6:AdvancedControlFlowTasks ........................................................... Chapter8:DataFlowTransformations ............................................................... Chapter10:Scripting.......................................................................................... Chapter11:EventsandErrorHandling...............................................................
Chapter12:DataProfilingandScrubbing .......................................................... Chapter14:HeterogeneousSourcesandDestinations....................................... Chapter16:Parent-ChildDesignPattern............................................................ Chapter18:BuildingRobustSolutions ............................................................... Index ...................................................................................................................
C H A P T E R 1
Introducing Integration Services
I’mthegluethatholdseverythingtogether. —SingerOtisWilliams
Yourbusinessanalystshavefinishedgatheringbusinessrequirements.Thedatabasearchitecthas designedandbuiltadatabasethatcanbedescribedonlyasaworkofart.TheBIarchitectsaredesigning theirOLAPcubesandthedimensionaldatamartsthatfeedthem.Ontheotherhand,maybeyou’rea one-manshowandhavedesignedandbuilteverythingyourself.Eitherway,theonlypiecethat’smissing isatooltobringitalltogether.EnterSQLServerIntegrationServices(SSIS).
LikeOtisWilliams,cofounderoftheMotowngrouptheTemptations,SSISisthegluethatholdsitall together.Morethanthat,SSISisthecirculatorysystemofyourdatawarehousingandBIsolutions.SSIS breatheslifeintoyourtechnicalsolutionsbymovingdata—thelifebloodofyourorganization—from disparatesources,alongwell-knownpaths,andinjectingitdirectlyintotheheartofyoursystem.Along theway,SSIScanvalidate,cleanse,manipulate,transform,andenrichyourdataformaximum effectiveness.
Inthisbook,we’lltakeyouonatourofSSIS,frombuildingyourveryfirstSSISpackageto implementingcomplexmultipackagedesignpatternsseamlessly.ThischapterintroducesyoutoSSIS andtheconceptsbehindextract,transform,andload(ETL)processesingeneral.Webeginatthe beginning,withabriefhistoryofETL,Microsoft-style.
ABriefHistoryofMicrosoftETL
BeforewediveheadfirstintothedetailsofMicrosoft’scurrent-generationETLprocessingsolution,it’s importanttounderstandjustwhatETLis.Aswehavestated,ETLisanacronymforextract,transform,
andload,whichisaveryliteraldescriptionofmoderndatamanipulationandmovementprocesses.
WhenwetalkaboutETL,wearespecificallytalkingabout(1)extractingdatafromasource,suchasa databaseorflatfiles;(2)transformingdata,ormanipulatingandenrichingitenroutetoitsdestination; and(3)loadingdataintoitsdestination,oftenadatabase.
Overtheyears,businessrequirementsfordataprocessinginnearlyanyindustryyoucanpointto havegrownmorecomplex,evenastheamountofdatathatneedstobeprocessedhasincreased exponentially.Unfortunately,thenumberofhoursinadayhasremainedfairlyconstantoverthesame timeperiod,meaningyou’restuckwiththesamelimitedprocessingwindoweachdaytotransportand manipulateanever-growingmagnitudeofdata.ETLsolutionshavebecomeincreasinglysophisticated androbustinresponsetotheseincreaseddataprocessingdemandsofperformance,flexibility,and quality.
Sowe’llbeginourjourneyintoSSISbylookingathowETLhasevolvedintheSQLServerworld.Up toSQLServer6.5,thebulkcopyprogram(bcp)wastheprimarytoolforloadingdataintoSQLServer databases.Acommand-lineutility,bcpmadeloadingbasictextfilesintodatabasetablesfairlysimple. Unfortunately,theflipsideofthatsimplicitywasthatyoucouldusebcponlytoloaddatafromflatfiles, andyoucouldn’tperformadditionalvalidationsortransformationsonthedataduringtheload.A commondatabase-to-databaseETLscenariowithbcpmightincludeextractingdatafromadatabase servertoadelimitedtextfile,importingthefileintoaSQLServerdatabase,andfinallyusingT-SQLto performtransformationsonthedatainthedatabase.Thebcputilityisstillprovidedwithallversionsof SQLServer,andisstillusedforsimpleone-offdataloadsfromflatfilesonoccasion.
InresponsetotheincreasingdemandsofETLprocessing,DataTransformationServices(DTS) madeitsfirstappearanceinSQLServer7.WithDTS,youcouldgrabdatafromavarietyofsources, transformitonthefly,andloaditintothedatabase.AlthoughDTSwasamuchmoresophisticatedtool thanbcp,itstilllackedmuchofthefunctionalityrequiredtodevelopenterprise-classETLsolutions.
WiththereleaseofSQLServer2005,MicrosoftreplacedDTSwithSQLServerIntegrationServices (SSIS).SSISisatrueenterpriseETLsolutionwithseveraladvancementsoveritspredecessors,including built-inlogging;supportforawidevarietyofcomplextransformation,datavalidation,anddata cleansingcomponents;separationofprocesscontrolfromdataflow;supportforseveraltypesofdata sourcesanddestinations;andtheabilitytocreatecustomcomponents,tonameafew.
SSISinSQLServer12representsthefirstmajorenhancementtoSSISsinceitsintroductionway backin2005.Inthisnewestrelease,Microsofthasimplementedmajorimprovementsinfunctionality andusability.SomeofthenewgoodnessincludestheabilitytomoveETLpackagesseamlesslybetween environments,centralizedstorageandadministrationofSSISpackages,andahostofusability enhancements.Inthisbook,you’llexplorethecorefunctionalityyouneedtogetupandrunningwith SSISandtheadvancedfunctionalityyouneedtoimplementthemostcomplexETLprocessing.
Althoughbcpisefficient,manydevelopersandDBAsovertheyearsfoundtheneedforsolutionsthatcan
performmore-complexsolutions.Duringthe“lostyears”ofSQLServerETL,alargenumberofhome-
grownETLapplicationsbegantosproutupinshopsallovertheworld.Manyofthesesolutionswerevery inefficient,featuringhard-codedsourcesanddestinationsandinflexibletransformations.Eveninthe21stcentury,therearequiteafewoftheselegacyhome-brewedETLapplicationsrunningatsomeofthe
world’slargestcorporations.Buildingandmaintainingin-houseETLapplicationsfromscratchcanbean
interestingacademicexercise,butit’sterriblyinefficient.Theextratimeandmoneyspenttryingto
maintainandadministerthecodebasefortheseapplicationscantakeasignificantchunkoftheresources youcouldotherwisedevotetodesigning,developing,andbuildingoutactualETLsolutionswithan enterpriseETLplatform.WhatCanSSISDoforYou?
SSISprovidesawidearrayofout-of-the-boxfunctionalitytoaccomplishcommonETL-relatedtasks.The majortasksyou’llencounterduringmostETLprocessingincludethefollowing:
Extractingdatafromawidevarietyofsourcesincludingflatfiles,XML,the Internet,MicrosoftExcelspreadsheets,andrelationalandnonrelational databases.Ifthestocksourceadaptersdon’tcoveryourneeds,SSIS’ssupportfor .NETgivesyoutheabilitytoextractdatafromliterallyanydatasourcethatyou haveaccessto.
Validatingdataaccordingtopredefinedrulesyouspecifyasitmovesthrough yourETLprocess.Youcanvalidatedatabyusingavarietyofmethodssuchas ensuringthatstringsmatchpatternsandthatnumericvaluesarewithinagiven range. PerformingDatacleansing,ortheprocessofidentifyinginvaliddatavaluesand removingthemormodifyingthemtoconformtoyourpredefinedconstraints. Examplesincludechangingnegativenumberstozeroorremovingextra whitespacecharactersfromstrings. Deduplicatingdata,whichistheeliminationofdatarecordsthatyouconsiderto beduplicates.Foragivenprocess,youmayconsiderentirerecordsthatarevalue- for-valuematchestobeduplicates;forotherprocesses,youmaydeterminethata valuematchonasinglefield(orsetoffields),suchasTelephoneNumber, identifiesaduplicaterecord. Loadingdataintofiles,databasessuchasSQLServer,orotherdestinations.SSIS providesawiderangeofstockdestinationadaptersthatallowyoutooutputdata toseveralwell-defineddestinations.Aswithdataextraction,ifyouhaveaspecial destinationinmindthat’snotsupportedbytheSSISstockadapters,thebuilt-in .NETsupportletsyououtputtonearlyanydestinationyoucanaccess.
NearlyanyprocessthatyoucandefineintermsofETLstepscanbeperformedwithSSIS.Andit’s notjustlimitedtodatabases(thoughthatisourprimaryfocusinthisbook).Asanexample,youcanuse
WindowsManagementInstrumentation(WMI)toretrievedataaboutacomputersystem,formatitto yourliking,andstoreitinanExcelspreadsheet;oryoucangrabdatafromacomma-delimitedfile, transformitabit,andwriteitbackouttoanewcomma-delimitedfile.Nottoputtoofineapointonit, butyoucanperformjustaboutanytaskthatrequiresdatamovementandmanipulationwithSSIS.
WhatIsEnterpriseETL?
You’veseenusrefertoSSISasanenterpriseETLsolutioninthischapter,andyoumayhaveasked yourself,“WhatisthedifferencebetweenanenterpriseETLsolutionandanyotherETLsolution?”Don’t worry,it’sacommonquestionthatweaskedonceandthathassincebeenaskedofusseveraltimes.It hasaverysimpleanswer:enterpriseETLsolutionshavetheabilitytohelpyoumeetyournonfunctional requirementsinadditiontothestandardfunctionalrequirementsofextract,transform,andload.
SowhatisanonfunctionalrequirementandwhatdoesithavetodowithETL?Ifyou’veeverbeenon adevelopmentprojectforanapplicationorbusinesssystem,you’reprobablyfamiliarwiththeterm.In theprevioussection,wediscussedhowSSIShelpsyoumeetyourETLfunctionalrequirements—those requirementsofasystemthatdescribewhatitdoes.InthecaseofETL,thefunctionalrequirementsare generallyprettysimple:(1)getdatafromoneormoresources,(2)manipulatethedataaccordingto somepredefinedbusinesslogic,and(3)storethedatasomewhere.
Nonfunctionalrequirements,ontheotherhand,dealmorewiththequalitiesofthesystem.These typesofrequirementsdealinaspectssuchasrobustness,performanceandefficiency,maintainability, security,scalability,reliability,andoverallqualityofanETLsolution.Weliketothinkofnonfunctional requirementsastheaspectsofthesystemthatdonotnecessarilyhaveadirecteffectontheendresultor outputofthesystem;insteadtheyworkbehindthescenesinsupportoftheresultgeneration.
HerearesomeofthewaysSSIScanhelpyoumeetyournonfunctionalrequirements: RobustnessisprovidedinSSISprimarilythroughbuilt-inerror-handlingto captureanddealwithbaddataandexecutionexceptionsastheyoccur, transactionsthatensureconsistencyofyourdatashouldaprocessenteran unrecoverableprocessingexception,andcheckpointsthatallowsomeabilityto restartpackages. Performanceandefficiencyarecloselyrelated,butnotentirelysynonymous, concepts.YoucanthinkofperformanceastherawspeedwithwhichyourETL processesaccomplishtheirtasks.Efficiencydigsabitdeeperandincludes minimizingresource(memory,CPU,andpersistentstorage)contentionand usage.SSIShasmanyoptimizationsbakeddirectlyintoitsdataflowcomponents anddataflowengine—forinstance,totweaktherawperformanceandresource efficiencyofthedataflow.Chapter14coversthethingsyoucandotogetthemost outofthebuilt-inoptimizations. Maintainabilitycanbeboileddowntotheongoingcostofmanagingand administeringyourETLsolutionafterit’sinproduction.Maintainabilityisalso oneoftheeasiestitemstomeasure,becauseyoucanaskquestionssuchas,“How manyhourseachmonthdoIhavetospendfixingissuesinETLprocesses?”or “Howmanyhoursofmanualinterventionarerequiredeachweektodealwith,or toavoid,errorsinmyETLprocess?”SSISprovidesanewprojectdeployment modeltomakeiteasiertomoveETLprojectsfromoneenvironmenttothenext; andBIDSprovidesbuilt-insupportforsourcecontrolsystemssuchasTeam FoundationServer(TFS)tohelpminimizethemaintenancecostsofyour solutions. SecurityisprovidedinSSISthroughavarietyofmethodsandinteractionswith othersystems,includingWindowsNTFileSystem(NTFS)andSQLServer12. PackageandprojectdeploymenttoSQLServerisapowerfulmethodofsecuring yourpackages.Inthiscase,SQLServerusesitsrobustsecuritymodeltocontrol accessto,andencryptionof,SSISpackagecontents. ScalabilitycanbedefinedashowwellyourETLsolutioncanhandleincreasing quantitiesofdata.SSISprovidestheabilitytoscalepredictablywithyour increaseddemands,providingofcoursethatyoucreateyourpackagesto maximizeSSIS’sthroughput.WediscussscalableETLdesignpatternsinChapter 15.
TIP:Forin-depthcoverageofSSISdesignpatterns,wehighlyrecommendpickingupacopyofSSISDesign
PatternsbyAndyLeonard,MattMasson,TimMitchell,JessicaMoss,andMichelleUfford(Apress,2012).
Reliability,putsimply,canbedefinedashowresistantyourETLsolutionisto failure—andiffailuredoesoccur,howwellyoursolutionhandlesthesituation. SSISprovidesextensiveloggingcapabilitiesthat,whencombinedwithBIDS’s built-indebuggingcapabilities,canhelpyouquicklytrackdownandfixtheroot causeoffailure.SSIScanalsonotifyyouintheeventofafailuresituation. Alltheseindividualnonfunctionalrequirements,whentakentogether,helpdefinetheoverall qualityofyourETLsolution.Althoughit’srelativelysimpletoputtogetherapackageorprogramthat shuttlesdatafrompointAtopointB,thenonfunctionalrequirementsprovidealayerontopofthisbasic functionalitythatallowsyoutomeetyourservice-levelagreements(SLAs)andotherprocessing requirements.
SSISArchitecture
OneofthemajorimprovementsthatSSISintroducedoverDTSwastheseparationoftheconceptsof
controlflowanddataflow.Thecontrolflowmanagestheorderofexecutionforpackagesandmanages
communicationwithsupportelementssuchaseventhandlers.Thedataflowengineisexposedasa componentwithinthecontrolflowanditprovideshigh-performancedataextraction,transformation, andloadingservices.
AsyoucanseeinFigure1-1,therelationshipbetweencontrolflows,dataflows,andtheirrespective componentsisstraightforward.
Figure1-1.Relationshipbetweencontrolflowanddataflow
Simplyspeaking,apackagecontainsthecontrolflow.Thecontrolflowcontainscontrolflowtasks andcontainers,bothofwhicharediscussedindetailinChapter4.TheDataFlowtaskisaspecialtypeof taskthatcontainsthedataflow.Thedataflowcontainsdataflowcomponents,whichmoveand manipulateyourdata.Therearethreetypesofdataflowcomponents:
Sourcescanpulldatafromanyofavarietyofdatastores,andfeedthatdatainto thedataflow. Transformationsallowyoutomanipulateandmodifydatawithinthedataflow onerowatatime. Destinationsprovideameansforthedataflowtooutputandpersistdataafterit movesthroughthefinalstageofthedataflow.
AlthoughthesimplifieddiagraminFigure1-1showsonlyasingledataflowinthecontrolflow,any givencontrolflowcancontainmanydataflows.Asthediagramalsoillustrates,bothcontrolflowsand dataflowsarefoundwithintheconfinesofSSISpackagesthatyoucandesign,build,andtestwith MicrosoftBusinessIntelligenceDevelopmentStudio(BIDS).BIDSisashelloftheVisualStudio integrateddevelopmentenvironment(IDE)that.NETprogrammersarefamiliarwith.Figure1-2shows thedataflowforaverysimpleSSISpackageintheBIDSdesigner.
Figure1-2.SimpleSSISpackagedataflowinBIDS
SincetheintroductionofSSIS,Microsofthasmadesignificantinvestmentsintheinfrastructure requiredtosupportpackageexecutionandenterpriseETLmanagement.Inadditiontodatamovement andmanipulation,theSSISinfrastructuresupportslogging,eventhandling,connectionmanagement, andenumerationactivities.Figure1-3isasimplifiedpyramidshowingthemajorcomponentsofthe SSISinfrastructure.
NOTE:WeintroduceBIDSanddiscussthenewdesignerfeaturesinChapter2.
Figure1-3.SSISarchitecturalcomponents(simplified)
Atthebaseofthepyramidliethecommand-lineutilities,customapplications,theSSISdesigner, andwizardssuchastheimport/exportwizardthatprovideinteractionwithSSIS.Theseapplicationsand utilitiesaredevelopedineithermanagedorunmanagedcode.Theobjectmodellayerexposesinterfaces thatallowtheseutilitiesandapplicationstointeractwiththeIntegrationServicesruntime.The IntegrationServicesruntime,inturn,executespackagesandprovidessupportforlogging,breakpoints anddebugging,connectionmanagementandconfiguration,andtransactioncontrol.Attheverytopof thepyramidistheSSISpackageitself,whichyoudesignandbuildintheBIDSenvironment,tocontain thecontrolflowanddataflowsdiscussedearlierinthischapter.
BackinSQLServer2005,MicrosoftannouncedthedeprecationofDataTransformationServices(DTS). DTSwassupportedasalegacyapplicationinSQLServer2005,2008,and2008R2.ButbecauseSSISisa full-fledgedenterprise-classreplacementforDTS,itshouldcomeasnosurprisethatDTSisnolonger supportedwiththisnewestreleaseofSQLServer.ThismeansthattheExecuteDTS2000Packagetask, theDTSruntimeandAPI,andPackageMigrationWizardareallgoingaway.Fortunately,thelearningcurve toconvertDTSpackagestoSSISisnottoosteep,andtheprocessisrelativelysimpleinmostcases.Ifyou havelegacyDTSpackageslyingaround,now’sthetimetoplantomigratethemtoSSIS. Additionally,theActiveXScripttask,whichwasprovidedstrictlyforDTSsupport,willberemoved.Manyof thetasksthatActiveXScripttaskswereusedforcanbehandledwithprecedenceconstraints,whilemore- complextaskscanberewrittenasScripttasks.WeexploreprecedenceconstraintsandScripttasksin detailinChapters4and5. NewSSISFeatures
ThereareanumberofimprovementsintheSQLServer12releaseofSSIS.Wediscussthesenewfeatures andenhancementsthroughoutthebook.Beforewedigintothedetailsinlaterchapters,weare summarizingthenewfeatureshereforyourconvenience:
Projectdeploymentmodel:ThenewprojectdeploymentmodelisnewtoSQL Server12SSIS.ThekeyfeaturesofthisnewdeploymentmodelaretheIntegration Servicescatalog,environments,andparameters.Thenewdeploymentmodelis designedtomakedeploymentandadministrationofSSISpackagesandETL systemseasieracrossmultipleenvironments.Wediscusstheprojectdeployment modelinChapter17.
T-SQLviewsandstoredprocedures:Thenewprojectdeploymentmodel includesseveralnewSSIS-specificviewsandstoredproceduresforSQLServer.We presenttheseviewsandstoredproceduresinChapter17,duringthediscussionof theprojectdeploymentmodel. BIDSusabilityenhancements:BIDShasbeenimprovedtomakepackage developmentandeditingsimpler.TheBIDSdesignersarenowmoreflexible,and UIenhancementssuchastheIntegrationServicestoolboxmakeiteasiertouse. WepresentthenewfeaturesinBIDSandSQLServerManagementStudio(SSMS) inChapter3. Objectimpactanddatalineageanalysis:Theobjectimpactanddatalineage analysisfeatureprovidesmetadataforlocatingobjectdependencies—for instance,whichtablesapackagedependson.Thisnewtoolprovidesuseful informationfortroubleshootingperformanceproblemsordependencyissues,or whentakingaproactiveapproachtolocatingdependencies.Wedemonstrate impactanddatalineageanalysisinChapter12. ImprovedMergeandMergeJointransformations:TheMergeandMergeJoin transformationshavebeenimprovedinSQLServer12SSISbyprovidingbetter internalcontrolsonmemoryusage.WecoverthenewfeaturesoftheMergeand MergeJointransformationsinChapter6. Datacorrectioncomponent:TheDataCorrectiontransformationprovidesatool tohelpimprovedataquality.WediscussthisnewtransformationinChapter7. Customdataflowcomponentimprovements:SQLServer12SSISincludes improvementsthatallowdeveloperstomoreeasilycreatecustomdataflow componentsthatsupportmultipleinputs.Weexplorethesenewfeaturesinthe discussionofcustomdataflowcomponentsinChapter21. SourceandDestinationAssistants:ThenewSourceandDestinationAssistants aredesignedtoguideyouthroughthestepstocreateasourceordestination.We talkaboutthenewassistantsinChapter6. Simplifieddataviewer:ThedataviewerhasbeensimplifiedinSQLServer12SSIS tomakeiteasiertouse.WedemonstratethedataviewerinChapter8.
OurFavoritePeopleandPlaces
ThereareanumberofSSISexpertsandresourcesthathavebeenouttheresincetheintroductionof SSIS.Herearesomeofourrecommendations,a“bestof”listforSSISontheWeb:
AndyLeonardisanSSISguruandSQLServerMostValuableProfessional (MVP)whohasbeenahead-down,hands-onSSISdeveloperfromday1.Infact, AndywasacontributingauthorontheoriginalProfessionalSQLServer2005
IntegrationServices(Wrox,2006)book,thegoldstandardforSQLServer2005
SSIS.Andy’sblogislocatedat
http://sqlblog.com/blogs/andy_leonard/default.aspx.
JamieThomsonmaywellbeoneofthemostprolificSSISexperts.Having bloggedonhundredsofSSIStopics,JamieisaSQLServerMVPandtheoriginal
SSISJunkie.Youcanreadhisnewestmaterialat http://sqlblog.com/blogs/jamie_thomson/orcatchuponyourSSISJunkie
classicreadingathttp://consultingblogs.emc.com/jamiethomson. BrianKnight,founderofPragmaticWorksandaSQLServerMVP,isawell- knownwriterandtraineronallthingsBIandallthingsSSIS.Catchupwith Brianatwww.bidn.com/blogs/BrianKnight. BooksOnline(BOL)istheholybookforallthingsSQLServer,andthatincludes SSIS.WhenyouneedananswertoaspecificSQLServerorSSISquestion, there’saveryhighprobabilityyoursearchwillendatBOL.Withthatinmind, weliketocutoutthemiddlemanandstartoursearchesat
http://msdn.microsoft.com/en-us/library/ms130214(v=SQL.110).aspx.
SQLServerCentral.com(SSC)wasfoundedbyarovinggangofhard-coreDBAs, includingtheinfamousHawaiian-shirt-andcowboy-hat-wearingSteveJones,a SQLServerMVP.StevekeepsSSCupdatedwithlotsofcommunity-based contentcoveringawiderangeoftopicsincludingSSIS.Whenyouwanttocheck outthebestincommunity-generatedcontent,gotowww.sqlservercentral.com. SSISTeamBlog,maintainedbyMicrosoft’sownMattMasson,islocatedat
http://blogs.msdn.com/b/mattm.Checkoutthisblogforthelatestandgreatest
inSSISupdates,patches,andinsidertipsandtricks. AllanMitchellandDarrenGreen,SSISexpertsandSQLMVPsfromacrossthe pond,sharetheirexpertiseatwww.sqlis.com. CodePlexisaMicrosoftsitehostingopensourceprojects.From
www.codeplex.com,youcandownloadavarietyofopensourceprojectsthat
includetheAdventureWorksfamilyofsampledatabases,opensourceSSIS customcomponents,completeSSIS-basedETLframeworks,ssisUnit(theSSIS unittestingframework),andBIDSHelper.Thisisonesiteyouwanttocheck out. ProfessionalAssociationforSQLServer(PASS)isaprofessionalorganization forallSQLServerdevelopers,DBAs,andBIpros.Membershipisfree,andthe benefitsincludeaccesstotheSQLServerStandardmagazine.VisitPASSat
www.sqlpass.orgformoreinformation.
WehighlyrecommendvisitingthesesitesasyoulearnSSISorencounterquestionsaboutbest practicesorneedguidanceonhowtoaccomplishveryspecifictasksinyourpackages.
Summary
SSISisapowerfulandflexibleenterpriseETLsolution,designedtomeetyourmostdemandingdata movementandmanipulationneeds.ThischapterintroducedsomeofthebasicconceptsofETLand howSSISfitsintotheSQLServerETLworld.WepresentedsomeofthecoreconceptsoftheSSIS architectureandtalkedaboutthenewestfeaturesinSQLServer12SSIS.Wewrappedupwithalistingof afewofourfavoritepeopleandresourcesites.InChapter2weintroduceBIDSandSSMS,withan emphasisonthenewestfeaturesdesignedtomakeyourETLpackagedesign,build,andtestingeasier thanever.
C H A P T E R 2
BIDS and SSMS
Ateachincreaseofknowledge,aswellasonthecontrivanceofeverynewtool,human labourbecomesabridged. —InventorCharlesBabbage
Aftergatheringtherequirementsandspecificationsforthenewprocesses,itistimetodesigntheETL flow.Thisincludesdecidingthetoolstouse,thetimeframesfordataprocessing,thecriteriaforsuccess andfailure,andsoforth.Oneoftheintegralpartsofdecidingwhichtoolstouseisdeterminingthe sourcesforthedataandthecapabilityofthetoolstogainaccesstoitwitheaseandtoextractit efficiently.AswediscussedinChapter1,efficientlydescribesthetaxingofthenetwork,hardware,and overallresourcesoptimally.OneofthebestcharacteristicsofSSISisitseaseofdevelopmentand maintenance(whenbestpracticesandstandardsarefollowed),thewiderangeofsourcesitcanaccess, thetransformationsthatitcanhandleinflight,andmostimportant,cost,asitcomesoutoftheboxwith SQLServer12.
WheninstallingSQLServer12,youhaveoptionstoinstallthreetoolsetsessentialfordeveloping ETLprocesses:BusinessIntelligenceDevelopmentStudio(BIDS),SQLServerManagementStudio (SSMS)—Basic,andManagementStudio—Complete.BIDSusestheVisualStudio2010platformforthe developmentofSSISpackagesaswellasthecreationofprojectsthatcatertothecomponentsoftheSQL Server12suite.ManagementtoolsincludeSSMS,theSQLServercommand-lineutility(SQLCMD),anda fewotherfeatures.Withthesecorecomponents,developerscanissueDataDefinitionLanguage(DDL) statementstocreateandalterdatabaseobjects.TheycanalsoissueDataManipulationLanguage(DML) statementstoquerythedatabaseobjectsusingMicrosoft’sflavoroftheStandardQueryLanguage(SQL), Transact-SQL(T-SQL).DataControlLanguage(DCL)allowsuserstoconfigureaccesstothedatabase objects.
ThischaptercoverstheprojecttemplatesthatBIDSsupports.Italsoprovidesabriefoverviewofthe elementsoftheSQLServer12suite.
SQLServerBusinessIntelligenceDevelopmentStudio
BIDS,whichutilizesVisualStudio2008asthedevelopmentplatform,supportsafewprojecttemplates whosesolepurposeistoprovideinsightintodata.Thisinsightcancomefrommovingpertinentdata fromasourcetoadatabase(ETLprocessesusingSSISprojects),fromdevelopingcubesthatcanprovide optimalhigh-levelsummariesofdata(cubesdevelopedusingSQLServerAnalysisServices,SSAS projects),andfromcreatingreportsthatcanberundirectlyagainstdatabasesorcubes(reportsdesigned usingSQLServerReportingServices,SSRSprojects).Figure2-1showsthebusinessintelligenceprojects thatareavailablewithinVisualStudio.
Figure2-1.ProjectsavailablewithBIDS
ForSQLServer12,theprojectsrequiretheinstallationof.NETFramework3.5.VisualStudiosolutions canmaintainmultipleprojectsthataddressthedifferentdisciplineswithintheSQLServersuite.Afew elementscarryacrossmultipleprojects.Theseelementsarelistednext,andlogicallytietogetherata solutionlevel:
Datasourcesarethedisparatesourcesthathavethedatathatwillbeimportedby themembers.Thesesourcescanbecreatedbyusingawizardandeditedthrough thedesigner.Variouscomponentswithintheprojectcanaccessthese connections. Datasourceviews(DSVs)areessentiallyadhocqueriesthatcanbeusedto extractdatafromthesource.Theirmainpurposeistostorethemetadataofthe source.Aspartofthemetadata,keyinformationisalsostoredtohelpcreatethe appropriaterelationshipswithintheAnalysisServicesdatabase. Miscellaneousisacategoryincludingallfilesthatserveasupportfunctionbutare notintegral.ThisincludesconfigurationfilesforIntegrationServices.
AnalysisServicesProject
TheAnalysisServicesprojectisusedtodesign,develop,anddeployaSQLServer12AnalysisServices database.Moreoftenthannot,thepurposeofETLprojectsistoconsolidateseveralsystemsintoone report-friendlyforminordertosummarizeactivityatahighlevel.AnalysisServicescubesperformroll- upcalculationsagainstdimensionsforquickandefficientreporting.Thisprojecttemplateprovidesa folderstructuredepictedinFigure2-2thatisneededtodevelopanAnalysisServicesdatabase.After development,thecubecanbedeployeddirectlytoAnalysisServices.
Figure2-2.FolderstructureforanAnalysisServicesproject
Thesefoldersorganizethefilesinadeveloper-friendlyformat.Thisformatalsohelpswithbuilding anddeployingprojects.Apartiallistofthefoldersislistedbelow: CubescontainsallthecubesthatarepartoftheAnalysisServicesdatabase.The dimensionscanbecreatedusingawizard,utilizingthemetadatastoredwithinthe DSVs. MiningStructuresapplydata-miningalgorithmsonthedatafromtheDSVs.They canhelpcreatealevelofpredictiveanalysisdependingonthequalityofthedata andtheusageoftheappropriatealgorithm.YoucanusetheMiningModelWizard tohelpcreatetheseaswellastheMiningModelDesignertoeditthem. RolescontainsallthedatabaserolesfortheAnalysisServicesdatabase.Theseroles canvaryfromadministrativepermissionstorestrictionsbasedonthedimensional data. AssembliesholdsallthereferencestoComponentObjectModel,COM,libraries andMicrosoft.NETassemblies.
TIP:SSISpackagescanbeusedtoprocesscubes.Thesepackagescanbeexecutedattheendofasuccessful
ETLprocess.
TheAnalysisServicesdatabase,likeIntegrationServices,cansourcedatafromavarietyoflocations andphysicalstorageformats.TheDSVsusethesamedriversandarenotnecessarilylimitedtotheSQL Serverdatabaseengine.PriorversionsofSQLServeraswellasotherrelationaldatabasemanagement systems(RDBMSs)areavailableassources.ThelanguagesforqueryingSQLServercubesarecalled MultidimensionalExpressions(MDX)andDataMiningExtensions(DMX).Someoftheobjectsthatcan bedefinedinacubeforanalyticpurposesaremeasures,measuregroups,attributes,dimensions,and hierarchies.Theseobjectsarecriticalinorganizinganddefiningthemetricsandthedescriptionsof thosemetricsthattheenduserismostinterestedin.AnotherimportantfeaturethatAnalysisServices providesistheconceptofkeyperformanceindicators(KPIs).KPIscontaincalculationsrelatedtoa measuregroup.Thesecalculationsarevitalinevaluatingtheperformanceofabusiness. IntegrationServicesProject
TheIntegrationServicesprojecttemplateenablesthedevelopertocreateSSISpackages.Thepackageis theexecutableworkunitwithinSSIS.Itconsistsofsmallercomponentsthatcanbeexecuted individuallyduringdevelopment,butIntegrationServicesexecutesatthepackagelevel.Thedebugging optioninVisualStudiowillexecutethepackageasawhole,butthecontrolflowexecutablescanalsobe individuallytested.Figure2-3showsasampleIntegrationServicesproject.Thisprojectwill automaticallybeaddedtoyourcurrentsolutionifyouhaveoneopen.Otherwise,atemporarysolution willbecreated.
NOTE:EventhoughVisualStudiohastheabilitytoexecutepackages,werecommendusingthecommandline
toexecutethemwhentesting.VisualStudiodebuggingmodeshouldbeusedduringdevelopment.Wediscuss
moreoptionsonrunningSSISpackagesinChapter20. Figure2-3.FolderstructureforanIntegrationServicesprojectThefollowinglistdescribestheobjectsandfoldersthatwillappearinyourproject: SSISPackagescontainsallthepackagesassociatedwiththeproject.Thesework unitsaretheactualcomponentsthatwillexecutetheETL.Allthepackagesare addedtothe.dtprojfile.ThisXML-basedfilelistsallthepackagesand configurationsthatarepartoftheproject. Miscellaneouscontainsallthefiletypesotherthanthe.dtsxfiles.Thisfolderis essentialforstoringconfigurationfilesandwillbeusefulforconsolidating connections,especiallywhenutilizingasourcecontrolapplication.Team FoundationServerisintroducedlaterinthischapter.
NOTE:WithSQLServer12,datasourcesanddatasourceviewscannotbeaddedtoaproject.Instead,these connectionscanbeaddedtotheindividualpackages.Inpriorversions,datasourcescouldbeaddedasconnection managersandwereallowedtorefertotheincludedDSVsasthetablesinsourcecomponents.Usingthis methodologytoaccessdataonSQLServerwasnottheoptimalwaytoextractdata. ThedebuggingoptioninVisualStudioexecutesthecurrentpackage.Thisfeatureisusefulforwatchingthe
behaviorofthepackage(howquicklyrowspassthroughthecomponentsintheDataFlowtasks,how quicklyexecutablescomplete,thesuccessfulmatchesofLookupsandMergeJoins,andsoforth).During theexecution,threecolorsindicatethestatusoftheexecutableanddataflowcomponents:yellow—in progress,red—failure,andgreen—success.Thisuseofdifferentcolorsishelpfulduringdevelopment becauseitshowshowtheprecedenceconstraintsandtheDataFlowtasksmovethedatathrough.
NOTE:Unlessapackagecallsanotherpackageasanexecutable,thecurrentpackageistheonlyonethatwill runinDebugmode,notallthepackagesintheproject.VisualStudiowillopeneverypackagethatiscalledina parent-childconfigurationandexecutethem.Iftherearetoomanypackages,certainonesmayopenwitha MemoryOverflowerrordisplayed,butthepackagemayexecuteinthebackgroundandcontinueontothe subsequentpackages.