Thursday, 19 October 2017

Glidande medelvärde sql 2008


Introduktion Med utgåvan av SQL Server 2016 Service Pack 1 är In-Memory ColumnStore-tekniken nu också tillgänglig i Standard, Web och Even Express och LocalDB Editions. Förutom att endast 1 kodbas ska behållas, kommer denna policyändring också att bli en tydlig disklagringsplatsbesparare på grund av dess höga data deplikations - och komprimeringsförhållanden och sist men inte minst är det också en seriös ad hoc-sökprestanda booster Huvudskillnaden mellan SQL-smakerna är hur mycket CPU-ström och minne som allokeras till uppgifter som (åter) byggande av Clustered ColumnStore Index. Till exempel: med standardversionen används en enda kärna (max 100 processortid för sqlservr-processen) och frågar om en CCI händer med högst 2 CPU (MAXDOP2), jämfört med att utnyttja alla tillgängliga processorer i Enterprise Edition. Bygga ett Clustered ColumnStore Index (CCI) med SQL Server 2016 Standard Edition: Bygg en CCI med alla 4 tillgängliga kärnor med SQL Server 2016 Enterprise Edition: Bastidstrenderna för laddning 7.2 GB 60 miljoner rader från en enda TPCH-linjeItem-filer visar inte mycket av En skillnad mellan smakerna när Bulk sätter in data direkt i antingen ett hopbord eller ett bord med en CCI, skillnaden blir tydlig när vi jämför den tid som behövs för att bygga ett CCI på ett hopbord eller bygga upp ett CCI: För att sammanfatta den absoluta snabbaste sättet att ha data tillgänglig i en tabell med ett Clustered ColumnStore Index är att: ladda till höft bygga CCI efteråt med SQL 2016 Ent. Ed. Direktladdning till CCI För tabeller med ett Clustered ColumnStore Index som redan skapats, se till att du direkt strömmar i komprimerade radgrupper för att maximera genomströmningen. För att göra det borde insatsens batchstorlek vara lika eller större än 100 k rader (102400 för att vara exakt). Mindre partier kommer att skrivas in i komprimerade Delta Store-tabeller först innan de flyttas till sin slutliga komprimerade Row Group-segment, vilket innebär att SQL Server måste röra data två gånger: Det finns olika alternativ för att ladda data och vi går över de mest använda sådana, som kommandot Bulk Insert, BCP och SSIS. Låt oss se vad som behövs för att få bästa prestanda och hur man övervakar 1) T-SQL-bulkinsats Låt oss börja med ett BULK INSERT-kommando: Kontrollera dataöverföring För att kontrollera antalet rader som redan har laddats in i CCI, även när Tabellslåsalternativet används, fråga en ny dmv som heter sys. dmdbcolumnstorerowgroupphysicalstats: Denna DMV kommer också att avslöja de möjliga Resource Group-staterna närmare när de laddas. Det finns fyra möjliga Row-grupptillstånd medan du laddar data. När du ser staten INVISBILE som i bilden nedan betyder att data komprimeras till en RowGroup. 0: INVISIBLE (RowGroup är i färd med att byggas från data i Delta Store) 1: OPEN160160160160160160160 (RowGroup accepterar nya poster) 2: CLOSED160160160 (RowGroup är fylld men ännu inte komprimerad av tuple mover-processen) 3: COMPRESSED160 ( RowGroup är fylld och komprimerad). 4 TOMBSTONE160 (RowGroup är redo att skräp samlas och tas bort) Genom att ange batchstorleken med ett värde på 102400 eller högre kommer du att uppnå maximal prestanda och data kommer att bli strömmade och direkt komprimeras till dess slutliga RG detta beteende kommer att dyka upp som KOMPRESSERAD. Du kan också kontrollera en DMV som introducerades med SQL2014 för att kolla på RowGroup State, vilket är sy. columnstorerowgroups DMV: Testresultat Bulk infoga data i en tabell med CCI via kommandot Bulk Insert kan förbättras något genom att lägga till Batchsize102400 och TABLOCK-alternativ. Detta ger en 8 förbättring av genomströmningen. 2) BCP. exe BCP-verktyget används fortfarande ganska tungt i många produktionsmiljöer, så värt att kontrollera det snabbt: som standard sänder BCP 1000 rader åt gången till SQL Server. Den tid det tar att ladda 7,2 GB data via BCP: 530 sekunder. or160 113K rowssec RowGroup-läget visar NVISIBLE vilket innebär att med standardinställningarna används Delta Store. För att säkerställa att BCP-kommandot strömmar data direkt i komprimerade RG-er måste du lägga till batchsize b-alternativet med ett värde av minst 102400. Jag körde olika tester med större batchstorlekar: upp till 1048576 men 102400 gav mig bäst de resultat. BCP DB. dbo. LINEITEMCCI i F: TPCHlineitem. tbl S. - c - T - quotquot - b 102400 h tablock RowGroup-tillståndet visar nu COMPRESSED vilket innebär att vi kringgår Delta Store och dataströmmar i komprimerade RG: Resultat: BCP avslutad på 457 sekunder eller 133K rader per sekund eller under testet märkte jag att standardinställningarna för SSIS 2016 använder minnesbuffertstorlekar som också kan begränsa batchstorleken till mindre än 100K rader. I exemplet nedan ser du data som landas i deltabutiker: RG-staterna är stängda och deltastorehobtid-fälten är befolkade, vilket betyder att deltabutikerna är levererade. Det här var dags att nå ut och kolla med mina kollegor som lyckligtvis har lagt märke till detta och en lösning finns redan där (se: Data Flow Buffer Auto Size-förmåga fördelar data laddas i CCI). För att fullt ut utnyttja CCI-streaming-kapaciteterna måste du öka inställningarna för Standard Memory BufferSize amp MaxRows: Ändra dessa till 10x större värden: 8211 DefaultMaxBufferRows från 10000 till 1024000 och den viktigaste: 8211 DefaultBufferSize från 10485760 till 104857600. Obs! Den nya inställningen AutoAdjustBufferSize bör sättas till True när du laddar mycket breda rader med data. Ändra också värdena för destinationsadapteren: 8211 rader per sats: 160 från ingen till 102400 8211 Maximalt infogningsbegränsningsstorlek: från 2147483647 till 102400 Funktionspariteten införd med SQL Server 2016 SP1 öppnar ett helt nytt utbud av möjligheter att dra nytta av förhoppningsvis genomgångarna ovan hjälper dig att maximera Bulk Insert, BCP och SSIS prestanda när du laddar data i ett Clustered ColumnStore Index Vad är det absolut snabbaste sättet att ladda data från en plattfil till en tabell i SQL Server 2016 Mycket har ändrats sedan min första posta på det här ämnet för många år sedan, introducera introduktionen av In-Memory optimerade tabeller och Indexable Columnstore-tabellindex. Även listan över datatransportfordon som ska väljas växer: förutom BCP, kommandot T-SQL Bulk Insert, SSIS som ETL-verktyg och PowerShell finns några nya tillagda, till exempel PolyBase, External R Script eller ADF. I det här inlägget börjar jag med att kontrollera hur mycket snabbare den nya varaktiga förstärkaren inte hållbara In-memory-tabellerna ställer in baslinjen för dessa test. Im använder en Azure DS4V2 Standard VM med 8 kärnor28 GB RAM och 2 HDD-volymer med värdcaching RW aktiverad. (Båda Luns ger 275 MBsec RW-genomströmning även om GUI anger en gräns på 60 MB). Jag genererade en enda 60 miljoner row7.2 Gigabyte TPCH lineitem plattfil som data att ladda. Som baslinje för att användas för jämförelse använder vi den tid det tar för att ladda filen till ett Heap-bord: Detta regelbundna Bulk Insert-kommando fylls inom 7 minuter med ett genomsnitt på 143K rowssec. Aktivera testdatabasen för minnesoptimerade tabeller (i SQL20142016 Enterprise Amp Developer Edition) introduceras i minnetabellerna är konstruerade för mycket snabb OLTP med många små transaktioner och hög samtidighet, vilket är en helt annan typ av arbetsbelastning som bulkinlägg, men bara Utan nyfikenheter ger det ett försök Det finns 2 typer av minnesbord: hållbara och slitstarka bord. De slitstarka kommer att fortsätta data på disken, de icke-slitstarka kommer inte att användas. För att aktivera det här alternativet måste vi göra lite hushållning och tilldela en snabb diskvolym för att värd dessa filer. Ändra först databasen för att möjliggöra alternativet Innehåller MEMORYOPTIMIZEDDATA följt av att lägga till en filposition och filgrupp som innehåller de minnesoptimerade tabellerna: Det tredje är att lägga till en separat minnespool i SQL Server-förekomsten så att den kan hålla allt de data som vi kommer att ladda in i minnetabeller som är separata från dess standardminnepool: Binda en databas till en minnespool Stegen för att definiera en separat minnespool och binda en databas till den finns nedan: Extra minnespooler hanteras via SQL Resursguvernör. Det fjärde och sista steget är att binda testdatabasen till den nya minnespoolen med kommandot sys. spxtpbinddbresourcepool.160 För att bindningen ska bli effektiv måste vi ta databasen offline och återföra den online. När vi är bundna kan vi dynamiskt ändra mängden minne som tilldelats poolen via kommandot ALTER RESOURCE POOL PoolHk WITH (MAXMEMORYPERCENT 80). Bulk Insert i Durable In Memory-tabell Nu är vi alla inställda när alternativet In-memory är aktiverat, vi kan skapa en in-memory-tabell. Varje minnesoptimerat tablett måste ha minst ett index (antingen en Range - eller Hash-index) som är helt (åter) sammansatt i minnet och lagras aldrig på disken. Ett slitstarkt bord måste ha en deklarerad primär nyckel, som då kan stödjas av det obligatoriska indexet. För att stödja en primär nyckel lade jag till en extra rownumber ROWID1-kolumnen i tabellen: Ange en satsstorlek på 1 (upp till 5) Millionrader till kommandot för inmatning av inlägg hjälper till att fortsätta data till disken medan bulkinsatsen är igång (i stället för att spara allt i slutet) gör så det minskar minnetrycket på minnespoolen PookHK vi skapade. Databelastningen i den hållbara In-Memory-tabellen fullbordas om 5 minuter 28 sekunder, eller 183K Rowssec. Det är okej men inte så mycket snabbare än vår baslinje. Titta på sys. dmoswaitstats visar att no.1 waitstat är IMPROVIOWAIT som uppstår när SQL Server väntar på en bulkbelastning IO för att slutföra. Titta på prestandatältet Bulk Copy Rowssec och Disk Write Bytessec visar spolningen till diskspikarna på 275 MBsec en gång en sats kom in (de gröna spikarna). Det är maximalt vad skivan kan leverera men förklarar inte allt. Med tanke på den mindre vinsten kommer vi att parkera den här för framtida utredning. Övervakning av minnesbassängen Via sys. dmresourcegovernorresourcepools dmv kan vi kontrollera om vår in-memory-tabell använder det nyskapade PoolHK-minnet Pool: Utgången visar att det här är fallet att 7.2GB (lite extra för Rowid) fick okomprimerad laddad i minnet poolHk pool: Om du försöker ladda mer data än du har minne tillgängligt för poolen får du ett korrekt meddelande som den här: Anmärkningen har blivit avslutad. Msg 701, Level 17, State 103, Line 5 Det finns otillräckligt systemminne i resurspoolen 8216PookHK för att köra den här frågan. Om du vill se en nivå djupare vid tilldelning av minnesutrymme på basis av varje minne i minnet kan du köra följande fråga (hämtad från SQL Server i minnet OLTP Internals för SQL Server 2016-dokument): De data som vi just laddade lagras som en Hastighetskonstruktion med hashindex: Hittills så bra Nu kan vi fortsätta och kolla in hur iscenesättning i ett icke-hållbart bord utför Bulk Insert i icke-hållbar in-memory-tabell För IMND-tabeller behöver vi inte en primär nyckel så vi bara lägg till och icke-grupperade Hash-index och sätt DURABILITY SCHEMAONLY. Bulkinsatsen Data laddning i det icke-hållbara bordet fullbordas inom 3 minuter med en genomströmning på 335K rowssec (vs 7 minuter). Detta är 2,3x snabbare och sedan sätts in i ett hopbord. För uppspelning av data är detta definitivt en snabb seger SSIS Single Bulk Insert i ett icke-hållbart bord Traditionellt är SSIS det snabbaste sättet att ladda en fil snabbt till SQL Server eftersom SSIS hanterar all data förbehandling så att SQL Server-motorn kan spendera sina CPU-fästingar på att fortsätta data till disken. Ska detta fortfarande vara fallet när du sätter in data i ett icke-hållbart bord Nedan följer en sammanfattning av testen som jag körde med SSIS för det här inlägget: SSIS Fastparse-alternativet och160 StandardBufferMaxRows och DefaultBufferSize-inställningarna är huvudprestanda-boosters. Även Native OLE DB (SQLOLEDB.1) - leverantören utför något bättre än SQL Native Client (SQLNCLI11.1). När du kör SSIS och SQL Server sida vid sida behöver du inte öka nätverkspaketstorleken.160160 Nettoresultat: Ett grundläggande SSIS-paket som läser en platt filkälla och skriver ut data direkt till tabellen Non-Durable via en OLE DB-destination utför liknande som kommandot Bulk Insert i ett IMND-bord: de 60 miljoner raderna laddas i 2minutes 59seconds eller 335K rowssec, identiskt med kommandot Bulk insert. SSIS med balanserad datadistributör Men wait8230160 in-memory-tabellerna är utformade för att fungera låsa amplåsfri så det betyder att vi kan ladda data också via flera strömmar. Det är lätt att uppnå med SSIS. Den balanserade datadistributören kommer bara med det (BDD är listad i den gemensamma delen av SSIS-verktygslådan) Att lägga till BDD-komponenten och sätta in data i samma icke-hållbart bord med 3 strömmar ger bästa möjliga genomströmning: vi är nu upp till 526000 Rowssec Titta på denna mycket plana linje med endast 160 av CPU-tid som används av SQLServer verkar det som om vi slår lite flaskhals: Jag försökte snabbt vara kreativ genom att utnyttja modulo-funktionen och tillade 2 fler dataflöden i paketet (varje behandling 13 av data) 160 men det som inte förbättrar mycket (1 min52sek) så ett bra ämne att undersöka för en framtida post160160 Alternativet In Memory Non-Durable-tabell ger en viss seriös prestationsförbättring för att lagra dataöverföringsdata 1,5 gånger snabbare med en vanlig Bulk Inser t och upp till 3,6 gånger gånger snabbare med SSIS. Det här alternativet, som främst är utformat för att påskynda OLTP, kan också göra en stor skillnad för att snabbt krympa ditt partifönster (fortsätter). De flesta känner till frasen, citthis kommer att döda två fåglar med en stonequot. Om du inte gör det, hänvisar fasen till ett tillvägagångssätt som adresserar två mål i en åtgärd. (Tyvärr är uttrycket i sig ganska obehagligt, eftersom de flesta av oss inte vill kasta stenar på oskyldiga djur) Idag kommer jag att täcka några grunder på två fantastiska funktioner i SQL Server: kolumnstoreindexet (tillgängligt endast i SQL Server Enterprise) och SQL Query Store. Microsoft genomförde faktiskt kolumnstoreindexet i SQL 2012 Enterprise, även om de har förbättrat det i de två senaste utgåvorna av SQL Server. Microsoft introducerade Query Store i SQL Server 2016. Så, vad är dessa funktioner och varför är de viktiga Tja, jag har en demo som introducerar båda funktionerna och visar hur de kan hjälpa oss. Innan jag går vidare täcker jag även detta (och andra SQL 2016-funktioner) i min CODE Magazine-artikel om nya funktioner SQL 2016. Som en grundläggande introduktion kan Columnstore index hjälpa till att påskynda frågor som skannar över stora mängder data och Query Store spårar avrättningar, körningsplaner och runtime statistik som du normalt behöver samla in manuellt. Lita på mig när jag säger att det här är fantastiska funktioner. För den här demo använder jag demo databasen Microsoft Contoso Retail Data Warehouse. Löst taget är Contoso DW som kvoten riktigt stor AdventureWorksquot, med bord som innehåller miljontals rader. (Den största AdventureWorks tabellen innehåller högst 100 000 rader). Du kan ladda ner Contoso DW-databasen här: microsoften-usdownloaddetails. aspxid18279. Contoso DW fungerar väldigt bra när du vill testa prestanda vid frågor mot större tabeller. Contoso DW innehåller ett standarddatabutikfaktablad som heter FactOnLineSales, med 12,6 miljoner rader. Det är verkligen inte det största datalagdbordet i världen, men det är inte heller barn som spelar. Antag att jag vill sammanfatta produktomsättningen för 2009 och rangordna produkterna. Jag kan fråga faktabordet och gå med i tabellen Produkt Dimension och använda en RANK-funktion, så här: Här är en delresultat av de 10 bästa raderna, av Total Sales. På min bärbara dator (i7, 16 GB RAM) tar frågan var som helst 3-4 sekunder att köra. Det kanske inte verkar som världens ände, men vissa användare kan förvänta sig omedelbara resultat (hur du kan se närmaste resultat när du använder Excel mot en OLAP-kub). Det enda index jag för närvarande har på denna tabell är ett klusterindex på en försäljningsnyckel. Om jag tittar på exekveringsplanen, gör SQL Server ett förslag om att lägga till ett täckningsindex till tabellen: Nu, för att SQL Server föreslår ett index betyder det inte att du blindt ska skapa index på varje citatindexquot-meddelande. I det här fallet upptäcker SQL Server att vi filtrerar baserat på år och använder produktnyckel och försäljningsbelopp. Så, SQL Server föreslår ett täckningsindex, med DateKey som indexnyckelfält. Anledningen till att vi kallar detta ett quotcoveringquot-index beror på att SQL Server kommer att citera längs den icke-nyckelfältquot som vi använde i frågan, citationstecken för ridequot. På så sätt behöver SQL Server inte använda tabellen eller det grupperade indexet på alla databasmotorer kan helt enkelt använda täckningsindex för frågan. Omfattande index är populära i vissa datalagrings - och rapporteringsdatascenarier, men de kommer till en kostnad av databasmotorn som behåller dem. Obs! Omfattande index har funnits länge, så jag har ännu inte täckt kolumnstoreindex och Query Store. Så lägger jag till täckningsindex: Om jag återförsöker samma fråga som jag sprang för en stund sedan (den som aggregerade försäljningsbeloppet för varje produkt), tycks frågan ibland springa ungefär en sekund snabbare och jag får en en annan exekveringsplan, en som använder ett Index-sök i stället för en indexskanning (med datumnyckeln på täckningsindexet för att hämta försäljningen för 2009). Så, före Columnstore Index, kan detta vara ett sätt att optimera denna fråga i mycket äldre versioner av SQL Server. Det går lite snabbare än den första, och jag får en exekveringsplan med ett index sök i stället för en indexskanning. Det finns emellertid några problem: De två exekveringsoperatörerna quotIndex Seekquot och quotHash Match (Aggregate) citationstecken både använder väsentligen kvot för rowquot. Föreställ dig detta i ett bord med hundratals miljoner rader. Relaterat, tänk på innehållet i en faktabord: i det här fallet kan ett enda datum nyckelvärde eller ett enda produktnyckelvärde upprepas över hundratusentals rader (kom ihåg att faktatabellen också har nycklar för geografi, marknadsföring, försäljare , etc.) Så, när quotIndex Seekquot och quotHash Matchquot fungerar rad för rad, så gör de övervärden som kan upprepas över många andra rader. Det här är normalt där jag segrade till SQL Server Columnstore indexet, vilket erbjuder ett scenario för att förbättra prestanda för denna fråga på fantastiska sätt. Men innan jag gör det, låt oss gå tillbaka i tiden. Låt oss gå tillbaka till år 2010 när Microsoft introducerade ett tillägg för Excel som kallades PowerPivot. Många kommer säkert ihåg att se demon av PowerPivot för Excel, där en användare kan läsa miljontals rader från en extern datakälla till Excel. PowerPivot skulle komprimera data och tillhandahålla en motor för att skapa pivottabeller och pivotdiagram som utfördes med fantastiska hastigheter mot komprimerade data. PowerPivot använde en in-memory-teknik som Microsoft kallade quotVertiPaqquot. Denna in-minneteknik i PowerPivot skulle i grunden ta dubbla nyckelvärden för företagsnycklar och komprimera dem till en enda vektor. In-memory-tekniken skulle också scanna dessa värden parallellt, i block av flera hundra åt gången. Slutsatsen är att Microsoft bakat en stor mängd prestandaförbättringar i VertiPaq-minnesfunktionen för att vi ska kunna använda, rätt ut ur den proverbiala rutan. Varför tar jag den här lilla spåret ner i minnesfältet Eftersom i SQL Server 2012 implementerade Microsoft en av de viktigaste funktionerna i sin databasmotorhistoria: kolumnstoreindexet. Indexet är egentligen bara ett index i namnet: det är ett sätt att ta ett SQL Server-bord och skapa ett komprimerat kolonnutrymme i minnet som komprimerar dubbla utländska nyckelvärden ner till enstaka vektorvärden. Microsoft skapade också en ny buffertpool för att läsa dessa komprimerade vektorvärden parallellt vilket skapar potentialen för stora prestationsvinster. Så, jag kommer att skapa ett kolumnindex på bordet, och jag kommer se hur mycket bättre (och mer effektivt) frågan går, jämfört med frågan som löper mot täckningsindexet. Så, jag skapar en kopia av FactOnlineSales (I39ll kallar det FactOnlineSalesDetailNCCS) och jag skapar ett kolumnindex på duplikatbordet så att jag vanligtvis bromsar originalbordet och täckningsindexet på något sätt. Därefter skapar jag ett kolumnindex för det nya tabellen: Observera flera saker: Jag har angivit flera utländska nyckelkolumner samt försäljningsbeloppet. Kom ihåg att ett kolumnregister inte är ett traditionellt raderingsindex. Det finns ingen quotkeyquot. Vi indikerar helt enkelt vilka kolumner SQL Server ska komprimera och placera i en kolumn i minnet. För att använda analogi av PowerPivot for Excel när vi skapar ett kolumnindex, berättar vi för SQL Server att det i huvudsak gör samma sak som PowerPivot gjorde när vi importerade 20 miljoner rader till Excel med PowerPivot. Så, I39ll kör språket igen, den här gången använder du igen det duplicerade FactOnlineSalesDetailNCCS-tabellen som innehåller kolumnstoreindexet. Denna fråga körs omedelbart på mindre än en sekund. Och jag kan också säga att även om bordet hade hundratals miljoner rader, skulle det ändå köras på ett ordspråkigt citat av en ögonfrans. Vi kunde titta på exekveringsplanen (och inom några minuter kommer vi), men nu är det dags att täcka funktionen Query Store. Föreställ dig ett ögonblick att vi körde båda frågorna över en natt: den fråga som använde det vanliga FactOnlineSales-tabellen (med täckningsindex) och sedan den fråga som använde dubbletabellen med kolumnstoreindexet. När vi loggar in på följande morgon vill vi se exekveringsplanen för båda frågorna när de ägde rum, liksom exekveringsstatistiken. Med andra ord tycker vi att vi ser samma statistik som vi kan se om vi körde båda frågorna interaktivt i SQL Management Studio, vände i TIME och IO Statistics, och såg exekveringsplanen direkt efter att exekveringen gjorts. Tja, det är vad Query Store tillåter oss att göra, vi kan aktivera (aktivera) Query Store för en databas, vilket kommer att utlösa SQL Server för att lagra förfrågan och planera statistik så att vi kan se dem senare. Så, jag kommer att aktivera Query Store i Contoso-databasen med följande kommando (och I39ll släcker också eventuellt cachning): Sedan kör jag de två frågorna (och quotequotot som jag sprang dem för några timmar sedan): Låt oss nu låtsas som om de sprang timmar sedan. Enligt vad jag sa kommer Query Store att fånga exekveringsstatistiken. Så hur ser jag dem Lyckligtvis är det ganska enkelt. Om jag utökar Contoso DW-databasen, ser jag en mapp för Query Store. Query Store har enorm funktionalitet och I39ll försöker täcka mycket av det i efterföljande blogginlägg. Men för just nu vill jag se exekveringsstatistik för de två frågorna, och undersöka exekveringsoperatörerna för kolumnindexet. Så, högerklickar du på Top Resource Consuming Queries och kör det alternativet. Det ger mig ett diagram som nedan, där jag kan se exekveringsvaraktighetstid (i millisekunder) för alla frågor som har utförts. I det här fallet var Fråga 1 frågan mot originalbordet med täckningsindexet och Fråge 2 var mot bordet med kolumnhandelindex. Numren ligger inte i kolumnskivans index bättre än det ursprungliga bordsskyddsindexet med en faktor på nästan 7 till 1. Jag kan ändra mätvärdet för att titta på minnesförbrukningen i stället. Observera i det här fallet att fråga 2 (kolumnhandelindexfrågan) använde mycket mer minne. Detta visar tydligt varför kolonnekatalyset representerar quotin-memoryquot-teknik. SQL Server laddar hela kolumnregistret i minnet och använder en helt annan buffertpool med förbättrade operatörer för att bearbeta indexet. OK, så vi har några grafer för att se exekveringsstatistiken kan vi se exekveringsplanen (och exekveringsoperatörer) som är associerade med varje utförande. Ja, vi kan. Om vi ​​klickar på den vertikala fältet för den fråga som använde kolumnklassindexet ser du utförandet planera nedan. Det första som vi ser är att SQL Server utförde en kolumnskriftsindexsökning, och det representerade nästan 100 av kostnaden för frågan. Du kanske säger, en minut, den första frågan använde ett täckningsindex och utförde ett index sök så hur kan en kolumnskriftsindexsökning bli snabbare? Det är en legitim fråga, och lyckligtvis finns det ett svar. Även när den första frågan utförde en indexsökning, utfördes den fortfarande quad för radquot. Om jag lägger muspekaren över skivoperatörskatalogoperatören ser jag en verktygstips (som den nedan) med en viktig inställning: Exekveringsläget är BATCH (i motsats till ROW. Vilket är vad vi hade med den första frågan med hjälp av täcker index). Det BATCH-läget berättar för oss att SQL Server hanterar komprimerade vektorer (för eventuella utländska nyckelvärden som dupliceras, till exempel produktnyckel och datumnyckel) i satser på nästan 1000 parallellt. Så SQL Server kan fortfarande bearbeta kolumnkatalogen mycket effektivare. Om jag också placerar musen över Hash Match (Aggregate) - uppgiften ser jag också att SQL Server samlar in kolumnhandelindex med Batch-läge (även om operatören själv representerar en så liten procent av kostnaden för frågan) Slutligen kan fråga, quotOK, så SQL Server komprimerar värdena i data, behandlar värdena som vektorer och läser dem i block med nästan tusen värden parallellt men min fråga bara önskade data för 2009. Så är SQL Server-skanning över hela uppsättningen dataquot igen, en bra fråga. Svaret är, inte riktigt. Lyckligtvis för oss utför den nya kolumnhandelens indexbuffertpool en annan funktion som kallas kvotsegment eliminationquot. I grund och botten kommer SQL Server att undersöka vektorns värden för kolumnen datum kolumn i kolumnkatalogen och eliminera segment som ligger utanför årets år 2009. Jag slutar här. I efterföljande blogginlägg täcker I39ll både kolumnbutikindex och Query Store mer detaljerat. Det som vi sett här idag är att Columnstore-indexet kan påskynda frågor som skannar över stora mängder data, och Query Store kommer att fånga förfrågningar och låta oss granska exekverings - och resultatstatistik senare. Till slut vill vi producera en resultatuppsättning som visar följande. Notera tre saker: Kolumnerna pivotar väsentligen alla möjliga returförklaringar, efter att ha visat försäljningsbeloppet. Resultatet innehåller subtotaler vid veckans slutdatum (söndag) för alla klienter (där kunden är NULL) Resultatet innehåller en total summa rad (där kunden och datumet är båda NULL) Först innan jag kommer in i SQL-änden kunde vi använda den dynamiska pivotmatrixfunktionen i SSRS. Vi skulle helt enkelt behöva kombinera de två resultatuppsättningarna med en kolumn och sedan kunde vi mata resultaten till SSRS-matriskontrollen, som kommer att sprida returskälen över kolumnns axel i rapporten. Men inte alla använder SSRS (även om de flesta borde). Men även då måste utvecklare ibland konsumera resultatuppsättningar i något annat än ett rapporteringsverktyg. Så för det här exemplet, låt oss antar att vi vill generera resultatuppsättningen för en webbgrid sida och eventuellt vill utvecklaren citera outquot de subtotala raderna (där jag har ett ResultatSetNum-värde på 2 och 3) och placera dem i ett sammanfattande rutnät. Så bunden måste vi generera produktionen ovan direkt från en lagrad procedur. Och som en extra twist nästa vecka kan det vara Retur Reason X och Y och Z. Så vi vet inte hur många återvändande skäl det kan vara. Vi vill enkelt att frågan ska svänga på de möjliga distinkta värdena för Return Reason. Här är där T-SQL PIVOT har en begränsning som vi behöver för att ge den möjliga värden. Eftersom vi vet att det fram till körtid måste vi generera frågesträngen dynamiskt med hjälp av det dynamiska SQL-mönstret. Det dynamiska SQL-mönstret innebär att man skapar syntaxen, bit för bit, lagrar den i en sträng och sedan utför strängen i slutet. Dynamisk SQL kan vara knepigt, eftersom vi måste bädda in syntax inuti en sträng. Men i det här fallet är det vårt enda sanna alternativ om vi vill hantera ett varierat antal återvändande skäl. Jag har alltid funnit att det bästa sättet att skapa en dynamisk SQL-lösning är att bestämma vad den quotiedalquot-genererade frågan skulle vara i slutet (i det här fallet, med tanke på de återvändande skälen vi vet om).och sedan omvända det med piecing det tillsammans en del i taget. Och så, här är SQL som vi behöver om vi visste att dessa återgångsskäl (A till D) var statiska och skulle inte förändras. Frågan gör följande: Kombinerar data från SalesData med data från ReturnData, där vi quothard-wirequot ordet Försäljning som en handlingstyp bildar försäljnings tabellen och använder sedan Returförklaringen från Returdata till samma ActionType-kolumn. Det kommer att ge oss en ren ActionType-kolonn på vilken vi kan pivotera. Vi kombinerar de två SELECT-satserna i ett gemensamt tabelluttryck (CTE), vilket i grund och botten är en härledd tabellundersökning som vi sedan använder i nästa uttalande (till PIVOT) Ett PIVOT-uttalande mot CTE, som summerar dollar för åtgärdstypen vara i ett av de möjliga Action Type-värdena. Observera att detta inte är det slutliga resultatuppsättningen. Vi placerar detta i en CTE som läser från den första CTE. Anledningen till detta är att vi vill göra flera grupperingar i slutet. Det slutliga SELECT-uttalandet, som läses från PIVOTCTE, kombinerar det med en efterföljande fråga mot samma PIVOTCTE, men där vi också implementerar två grupperingar i funktionen GROUPING SETS i SQL 2008: GROUPING vid veckans slutdatum (dbo. WeekEndingDate) GROUPERING för alla rader () Så om vi visste med säkerhet att vi aldrig skulle ha mer returkodskoder, så skulle det vara lösningen. Vi måste dock redogöra för andra orsakskoder. Så vi måste generera hela denna fråga ovan som en stor sträng där vi bygger de möjliga returskälen som en kommaseparerad lista. I39m kommer att visa hela T-SQL-koden för att generera (och exekvera) önskad fråga. Och sedan förstör jag det i delar och förklarar varje steg. Så först, här är hela koden för att dynamiskt generera vad jag har ovanför. Det finns i princip fem steg vi behöver täcka. Steg 1 . Vi vet att någonstans i mixen måste vi skapa en sträng för detta i frågan: SalesAmount, Reason A, Reason B, Reason C, Reason D0160016001600160 Vad vi kan göra är att bygga ett temporärt gemensamt bordsuttryck som kombinerar de hårda kablarna Amountquot kolumn med den unika listan över möjliga orsakskoder. När vi har det i en CTE kan vi använda det trevliga lilla tricket av FOR XML PATH (3939) för att kollapsa dessa rader i en enda sträng, sätt ett kommatecken framför varje rad som frågan läser och använd sedan STUFF för att ersätta den första instansen av ett komma med ett tomt utrymme. Detta är ett knep som du kan hitta i hundratals SQL-bloggar. So this first part builds a string called ActionString that we can use further down. Steg 2 . we also know that we39ll want to SUM the generatedpivoted reason columns, along with the standard sales column. So we39ll need a separate string for that, which I39ll call SUMSTRING. I39ll simply use the original ActionString, and then REPLACE the outer brackets with SUM syntax, plus the original brackets. Step 3: Now the real work begins. Using that original query as a model, we want to generate the original query (starting with the UNION of the two tables), but replacing any references to pivoted columns with the strings we dynamically generated above. Also, while not absolutely required, I39ve also created a variable to simply any carriage returnline feed combinations that we want to embed into the generated query (for readability). So we39ll construct the entire query into a variable called SQLPivotQuery. Step 4 . We continue constructing the query again, concatenating the syntax we can quothard-wirequot with the ActionSelectString (that we generated dynamically to hold all the possible return reason values) Step 5 . Finally, we39ll generate the final part of the Pivot Query, that reads from the 2 nd common table expression (PIVOTCTE, from the model above) and generates the final SELECT to read from the PIVOTCTE and combine it with a 2 nd read against PIVOTCTE to implement the grouping sets. Finally, we can quotexecutequot the string using the SQL system stored proc spexecuteSQL So hopefully you can see that the process to following for this type of effort is Determine what the final query would be, based on your current set of data and values (i. e. built a query model) Write the necessary T-SQL code to generate that query model as a string. Arguably the most important part is determining the unique set of values on which you39ll PIVOT, and then collapsing them into one string using the STUFF function and the FOR XML PATH(3939) trick So whats on my mind today Well, at least 13 items Two summers ago, I wrote a draft BDR that focused (in part) on the role of education and the value of a good liberal arts background not just for the software industry but even for other industries as well. One of the themes of this particular BDR emphasized a pivotal and enlightened viewpoint from renowned software architect Allen Holub regarding liberal arts. Ill (faithfully) paraphrase his message: he highlighted the parallels between programming and studying history, by reminding everyone that history is reading and writing (and Ill add, identifying patterns), and software development is also reading and writing (and again, identifying patterns). And so I wrote an opinion piece that focused on this and other related topics. But until today, I never got around to either publishingposting it. Every so often Id think of revising it, and Id even sit down for a few minutes and make some adjustments to it. But then life in general would get in the way and Id never finish it. So what changed A few weeks ago, fellow CoDe Magazine columnist and industry leader Ted Neward wrote a piece in his regular column, Managed Coder , that caught my attention. The title of the article is On Liberal Arts. and I highly recommend that everyone read it. Ted discusses the value of a liberal arts background, the false dichotomy between a liberal arts background and success in software development, and the need to writecommunicate well. He talks about some of his own past encounters with HR personnel management regarding his educational background. He also emphasizes the need to accept and adapt to changes in our industry, as well as the hallmarks of a successful software professional (being reliable, planning ahead, and learning to get past initial conflict with other team members). So its a great read, as are Teds other CoDe articles and blog entries. It also got me back to thinking about my views on this (and other topics) as well, and finally motivated me to finish my own editorial. So, better late than never, here are my current Bakers Dozen of Reflections: I have a saying: Water freezes at 32 degrees . If youre in a trainingmentoring role, you might think youre doing everything in the world to help someone when in fact, theyre only feeling a temperature of 34 degrees and therefore things arent solidifying for them. Sometimes it takes just a little bit more effort or another ideachemical catalyst or a new perspective which means those with prior education can draw on different sources. Water freezes at 32 degrees . Some people can maintain high levels of concentration even with a room full of noisy people. Im not one of them occasionally I need some privacy to think through a critical issue. Some people describe this as you gotta learn to walk away from it. Stated another way, its a search for the rarefied air. This past week I spent hours in half-lit, quiet room with a whiteboard, until I fully understood a problem. It was only then that I could go talk with other developers about a solution. The message here isnt to preach how you should go about your business of solving problems but rather for everyone to know their strengths and what works, and use them to your advantage as much as possible. Some phrases are like fingernails on a chalkboard for me. Use it as a teaching moment is one. (Why is it like fingernails on a chalkboard Because if youre in a mentoring role, you should usually be in teaching moment mode anyway, however subtly). Heres another I cant really explain it in words, but I understand it. This might sound a bit cold, but if a person truly cant explain something in words, maybe they dont understand. Sure, a person can have a fuzzy sense of how something works I can bluff my way through describing how a digital camera works but the truth is that I dont really understand it all that well. There is a field of study known as epistemology (the study of knowledge). One of the fundamental bases of understanding whether its a camera or a design pattern - is the ability to establish context, to identify the chain of related events, the attributes of any components along the way, etc. Yes, understanding is sometimes very hard work, but diving into a topic and breaking it apart is worth the effort. Even those who eschew certification will acknowledge that the process of studying for certification tests will help to fill gaps in knowledge. A database manager is more likely to hire a database developer who can speak extemporaneously (and effortlessly) about transaction isolation levels and triggers, as opposed to someone who sort of knows about it but struggles to describe their usage. Theres another corollary here. Ted Neward recommends that developers take up public speaking, blogging, etc. I agree 100. The process of public speaking and blogging will practically force you to start thinking about topics and breaking down definitions that you might have otherwise taken for granted. A few years ago I thought I understood the T-SQL MERGE statement pretty well. But only after writing about it, speaking about, fielding questions from others who had perspectives that never occurred to me that my level of understanding increased exponentially. I know a story of a hiring manager who once interviewed an authordeveloper for a contract position. The hiring manager was contemptuous of publications in general, and barked at the applicant, So, if youre going to work here, would you rather be writing books or writing code Yes, Ill grant that in any industry there will be a few pure academics. But what the hiring manager missed was the opportunities for strengthening and sharpening skill sets. While cleaning out an old box of books, I came across a treasure from the 1980s: Programmers at Work. which contains interviews with a very young Bill Gates, Ray Ozzie, and other well-known names. Every interview and every insight is worth the price of the book. In my view, the most interesting interview was with Butler Lampson. who gave some powerful advice. To hell with computer literacy. Its absolutely ridiculous. Study mathematics. Learn to think. Read. Write. These things are of more enduring value. Learn how to prove theorems: A lot of evidence has accumulated over the centuries that suggests this skill is transferable to many other things. Butler speaks the truth . Ill add to that point learn how to play devils advocate against yourself. The more you can reality-check your own processes and work, the better off youll be. The great computer scientistauthor Allen Holub made the connection between software development and the liberal arts specifically, the subject of history. Here was his point: what is history Reading and writing. What is software development Among other things, reading and writing . I used to give my students T-SQL essay questions as practice tests. One student joked that I acted more like a law professor. Well, just like Coach Donny Haskins said in the movie Glory Road, my way is hard. I firmly believe in a strong intellectual foundation for any profession. Just like applications can benefit from frameworks, individuals and their thought processes can benefit from human frameworks as well. Thats the fundamental basis of scholarship. There is a story that back in the 1970s, IBM expanded their recruiting efforts in the major universities by focusing on the best and brightest of liberal arts graduates. Even then they recognized that the best readers and writers might someday become strong programmersystems analysts. (Feel free to use that story to any HR-type who insists that a candidate must have a computer science degree) And speaking of history: if for no other reason, its important to remember the history of product releases if Im doing work at a client site thats still using SQL Server 2008 or even (gasp) SQL Server 2005, I have to remember what features were implemented in the versions over time. Ever have a favorite doctor whom you liked because heshe explained things in plain English, gave you the straight truth, and earned your trust to operate on you Those are mad skills . and are the result of experience and HARD WORK that take years and even decades to cultivate. There are no guarantees of job success focus on the facts, take a few calculated risks when youre sure you can see your way to the finish line, let the chips fall where they may, and never lose sight of being just like that doctor who earned your trust. Even though some days I fall short, I try to treat my client and their data as a doctor would treat patients. Even though a doctor makes more money There are many clichs I detest but heres one I dont hate: There is no such thing as a bad question. As a former instructor, one thing that drew my ire was hearing someone criticize another person for asking a supposedly, stupid question. A question indicates a person acknowledges they have some gap in knowledge theyre looking to fill. Yes, some questions are better worded than others, and some questions require additional framing before they can be answered. But the journey from forming a question to an answer is likely to generate an active mental process in others. There are all GOOD things. Many good and fruitful discussions originate with a stupid question. I work across the board in SSIS, SSAS, SSRS, MDX, PPS, SharePoint, Power BI, DAX all the tools in the Microsoft BI stack. I still write some code from time to time. But guess what I still spend so much time doing writing T-SQL code to profile data as part of the discovery process. All application developers should have good T-SQL chops. Ted Neward writes (correctly) about the need to adapt to technology changes. Ill add to that the need to adapt to clientemployer changes. Companies change business rules. Companies acquire other companies (or become the target of an acquisition). Companies make mistakes in communicating business requirements and specifications. Yes, we can sometimes play a role in helping to manage those changes and sometimes were the fly, not the windshield. These sometimes cause great pain for everyone, especially the I. T. people. This is why the term fact of life exists we have to deal with it. Just like no developer writes bug-free code every time, no I. T. person deals well with change every single time. One of the biggest struggles Ive had in my 28 years in this industry is showing patience and restraint when changes are flying from many different directions. Here is where my prior suggestion about searching for the rarified air can help. If you can manage to assimilate changes into your thought process, and without feeling overwhelmed, odds are youll be a significant asset. In the last 15 months Ive had to deal with a huge amount of professional change. Its been very difficult at times, but Ive resolved that change will be the norm and Ive tried to tweak my own habits as best I can to cope with frequent (and uncertain) change. Its hard, very hard. But as coach Jimmy Duggan said in the movie A League of Their Own: Of course its hard. If it wasnt hard, everyone would do it. The hard, is what makes it great . A powerful message. Theres been talk in the industry over the last few years about conduct at professional conferences (and conduct in the industry as a whole). Many respected writers have written very good editorials on the topic. Heres my input, for what its worth. Its a message to those individuals who have chosen to behave badly: Dude, it shouldnt be that hard to behave like an adult. A few years ago, CoDe Magazine Chief Editor Rod Paddock made some great points in an editorial about Codes of Conduct at conferences. Its definitely unfortunate to have to remind people of what they should expect out of themselves. But the problems go deeper. A few years ago I sat on a five-person panel (3 women, 2 men) at a community event on Women in Technology. The other male stated that men succeed in this industry because the Y chromosome gives men an advantage in areas of performance. The individual who made these remarks is a highly respected technology expert, and not some bozo making dongle remarks at a conference or sponsoring a programming contest where first prize is a date with a bikini model. Our world is becoming increasingly polarized (just watch the news for five minutes), sadly with emotion often winning over reason. Even in our industry, recently I heard someone in a position of responsibility bash software tool XYZ based on a ridiculous premise and then give false praise to a competing tool. So many opinions, so many arguments, but heres the key: before taking a stand, do your homework and get the facts . Sometimes both sides are partly rightor wrong. Theres only one way to determine: get the facts. As Robert Heinlein wrote, Facts are your single clue get the facts Of course, once you get the facts, the next step is to express them in a meaningful and even compelling way. Theres nothing wrong with using some emotion in an intellectual debate but it IS wrong to replace an intellectual debate with emotion and false agenda. A while back I faced resistance to SQL Server Analysis Services from someone who claimed the tool couldnt do feature XYZ. The specifics of XYZ dont matter here. I spent about two hours that evening working up a demo to cogently demonstrate the original claim was false. In that example, it worked. I cant swear it will always work, but to me thats the only way. Im old enough to remember life at a teen in the 1970s. Back then, when a person lost hisher job, (often) it was because the person just wasnt cutting the mustard. Fast-forward to today: a sad fact of life is that even talented people are now losing their jobs because of the changing economic conditions. Theres never a full-proof method for immunity, but now more than ever its critical to provide a high level of what I call the Three Vs (value, versatility, and velocity) for your employerclients. I might not always like working weekends or very late at night to do the proverbial work of two people but then I remember there are folks out there who would give anything to be working at 1 AM at night to feed their families and pay their bills. Always be yourselfyour BEST self. Some people need inspiration from time to time. Heres mine: the great sports movie, Glory Road. If youve never watched it, and even if youre not a sports fan I can almost guarantee youll be moved like never before. And Ill close with this. If you need some major motivation, Ill refer to a story from 2006. Jason McElwain, a high school student with autism, came off the bench to score twenty points in a high school basketball game in Rochester New York. Heres a great YouTube video. His mother said it all . This is the first moment Jason has ever succeeded and is proud of himself. I look at autism as the Berlin Wall. He cracked it. To anyone who wanted to attend my session at todays SQL Saturday event in DC I apologize that the session had to be cancelled. I hate to make excuses, but a combination of getting back late from Detroit (client trip), a car thats dead (blown head gasket), and some sudden health issues with my wife have made it impossible for me to attend. Back in August, I did the same session (ColumnStore Index) for PASS as a webinar. You can go to this link to access the video (itll be streamed, as all PASS videos are streamed) The link does require that you fill out your name and email address, but thats it. And then you can watch the video. Feel free to contact me if you have questions, at kgoffkevinsgoff November 15, 2013 Getting started with Windows Azure and creating SQL Databases in the cloud can be a bit daunting, especially if youve never tried out any of Microsofts cloud offerings. Fortunately, Ive created a webcast to help people get started. This is an absolute beginners guide to creating SQL Databases under Windows Azure. It assumes zero prior knowledge of Azure. You can go to the BDBI Webcasts of this website and check out my webcast (dated 11102013). Or you can just download the webcast videos right here: here is part 1 and here is part 2. You can also download the slide deck here. November 03, 2013 Topic this week: SQL Server Snapshot Isolation Levels, added in SQL Server 2005. To this day, there are still many SQL developers, many good SQL developers who either arent aware of this feature, or havent had time to look at it. Hopefully this information will help. Companion webcast will be uploaded in the next day look for it in the BDBI Webcasts section of this blog. October 26, 2013 Im going to start a weekly post of T-SQL tips, covering many different versions of SQL Server over the years Heres a challenge many developers face. Ill whittle it down to a very simple example, but one where the pattern applies to many situations. Suppose you have a stored procedure that receives a single vendor ID and updates the freight for all orders with that vendor id. create procedure dbo. UpdateVendorOrders update Purchasing. PurchaseOrderHeader set Freight Freight 1 where VendorID VendorID Now, suppose we need to run this for a set of vendor IDs. Today we might run it for three vendors, tomorrow for five vendors, the next day for 100 vendors. We want to pass in the vendor IDs. If youve worked with SQL Server, you can probably guess where Im going with this. The big question is how do we pass a variable number of Vendor IDs Or, stated more generally, how do we pass an array, or a table of keys, to a procedure Something along the lines of exec dbo. UpdateVendorOrders SomeListOfVendors Over the years, developers have come up with different methods: Going all the way back to SQL Server 2000, developers might create a comma-separated list of vendor keys, and pass the CSV list as a varchar to the procedure. The procedure would shred the CSV varchar variable into a table variable and then join the PurchaseOrderHeader table to that table variable (to update the Freight for just those vendors in the table). I wrote about this in CoDe Magazine back in early 2005 (code-magazinearticleprint. aspxquickid0503071ampprintmodetrue. Tip 3) In SQL Server 2005, you could actually create an XML string of the vendor IDs, pass the XML string to the procedure, and then use XQUERY to shred the XML as a table variable. I also wrote about this in CoDe Magazine back in 2007 (code-magazinearticleprint. aspxquickid0703041ampprintmodetrue. Tip 12)Also, some developers will populate a temp table ahead of time, and then reference the temp table inside the procedure. All of these certainly work, and developers have had to use these techniques before because for years there was NO WAY to directly pass a table to a SQL Server stored procedure. Until SQL Server 2008 when Microsoft implemented the table type. This FINALLY allowed developers to pass an actual table of rows to a stored procedure. Now, it does require a few steps. We cant just pass any old table to a procedure. It has to be a pre-defined type (a template). So lets suppose we always want to pass a set of integer keys to different procedures. One day it might be a list of vendor keys. Next day it might be a list of customer keys. So we can create a generic table type of keys, one that can be instantiated for customer keys, vendor keys, etc. CREATE TYPE IntKeysTT AS TABLE ( IntKey int NOT NULL ) So Ive created a Table Typecalled IntKeysTT . Its defined to have one column an IntKey. Nowsuppose I want to load it with Vendors who have a Credit Rating of 1..and then take that list of Vendor keys and pass it to a procedure: DECLARE VendorList IntKeysTT INSERT INTO VendorList SELECT BusinessEntityID from Purchasing. Vendor WHERE CreditRating 1 So, I now have a table type variable not just any table variable, but a table type variable (that I populated the same way I would populate a normal table variable). Its in server memory (unless it needs to spill to tempDB) and is therefore private to the connectionprocess. OK, can I pass it to the stored procedure now Well, not yet we need to modify the procedure to receive a table type. Heres the code: create procedure dbo. UpdateVendorOrdersFromTT IntKeysTT IntKeysTT READONLY update Purchasing. PurchaseOrderHeader set Freight Freight 1 FROM Purchasing. PurchaseOrderHeader JOIN IntKeysTT TempVendorList ON PurchaseOrderHeader. VendorID Te mpVendorList. IntKey Notice how the procedure receives the IntKeysTT table type as a Table Type (again, not just a regular table, but a table type). It also receives it as a READONLY parameter. You CANNOT modify the contents of this table type inside the procedure. Usually you wont want to you simply want to read from it. Well, now you can reference the table type as a parameter and then utilize it in the JOIN statement, as you would any other table variable. So there you have it. A bit of work to set up the table type, but in my view, definitely worth it. Additionally, if you pass values from , youre in luck. You can pass an ADO data table (with the same tablename property as the name of the Table Type) to the procedure. For developers who have had to pass CSV lists, XML strings, etc. to a procedure in the past, this is a huge benefit. Finally I want to talk about another approach people have used over the years. SQL Server Cursors. At the risk of sounding dogmatic, I strongly advise against Cursors, unless there is just no other way. Cursors are expensive operations in the server, For instance, someone might use a cursor approach and implement the solution this way: DECLARE VendorID int DECLARE dbcursor CURSOR FASTFORWARD FOR SELECT BusinessEntityID from Purchasing. Vendor where CreditRating 1 FETCH NEXT FROM dbcursor INTO VendorID WHILE FETCHSTATUS 0 EXEC dbo. UpdateVendorOrders VendorID FETCH NEXT FROM dbcursor INTO VendorID The best thing Ill say about this is that it works. And yes, getting something to work is a milestone. But getting something to work and getting something to work acceptably are two different things. Even if this process only takes 5-10 seconds to run, in those 5-10 seconds the cursor utilizes SQL Server resources quite heavily. Thats not a good idea in a large production environment. Additionally, the more the of rows in the cursor to fetch and the more the number of executions of the procedure, the slower it will be. When I ran both processes (the cursor approach and then the table type approach) against a small sampling of vendors (5 vendors), the processing times where 260 ms and 60 ms, respectively. So the table type approach was roughly 4 times faster. But then when I ran the 2 scenarios against a much larger of vendors (84 vendors), the different was staggering 6701 ms versus 207 ms, respectively. So the table type approach was roughly 32 times faster. Again, the CURSOR approach is definitely the least attractive approach. Even in SQL Server 2005, it would have been better to create a CSV list or an XML string (providing the number of keys could be stored in a scalar variable). But now that there is a Table Type feature in SQL Server 2008, you can achieve the objective with a feature thats more closely modeled to the way developers are thinking specifically, how do we pass a table to a procedure Now we have an answer Hope you find this feature help. Feel free to post a comment. SQL Server IO Performance Everything You Need To Consider SQL Server IO performance is crucial to overall performance. Access to data on disk is much slower than in memory, so getting the most out of local disk and SAN is essential. There is a lot of advice on the web and in books about SQL Server IO performance, but I havent found a single source listing everything to consider. This is my attempt to bring all the information together in one place. So here is a list of everything I can think of that can impact IO performance. I have ordered it starting at the physical disks and moving up the wire to the server and finally the code and database schema. Failed Disk When a drive fails in a disk array it will need to be replaced. The impact on performance before replacement depends on the storage array and RAID configuration used. RAID 5 and RAID 6 use distributed parity, and this parity is used to calculate the reads when a disk fails. Read performance loses the advantage of reading from multiple disks. This is also true, although to a lesser degree, on RAID 1 (mirrored) arrays. Reads lose the advantage of reading from multiple stripes for data on the failed disk, and writes may be slightly slower due to the increase in average seek time. Write Cache When a transaction is committed, the write to the transaction log has to complete before the transaction is marked as being committed. This is essential to ensure transactional integrity. It used to be that write cache was not recommended, but a lot of the latest storage arrays have battery-backed caches that are fully certified for use with SQL Server. If you have the option to vary the distribution of memory between read and write cache, try to allocate as much as possible to the write cache. This is because SQL Server performs its own read caching via the buffer pool, so any additional read cache on the disk controller has no benefit. Thin Provisioning Thin provisioning is a technology provided by some SANs whereby the actual disk storage used is just enough for the data, while appearing to the server to be full sized, with loads of free space. Where the total disk allocated to all servers exceeds the amount of physical storage, this is known as over-provisioning. Some SAN vendors try to claim that performance is not affected, but thats not always true. I saw this issue recently on a 3PAR array. Sequential reads were significantly slower on thin provisioned LUNs. Switching to thick provisioned LUNs more than doubled the sequential read throughput. Where Are The Disks Are they where you think they are It is perfectly possible to be connected to a storage array, but for the IO requests to pass through that array to another. This is sometimes done as a cheap way to increase disk space - using existing hardware that is being underutilized is less costly than purchasing more disks. The trouble is that this introduces yet another component into the path and is detrimental to performance - and the DBA may not even be aware of it. Make sure you know how the SAN is configured. Smart Tiering This is called different things by different vendors. The storage array will consist of two or more types of disk, of varying performance and cost. There are the slower 10K disks - these are the cheapest. Then you have the 15K disks. These are faster but more expensive. And then there may be some super-fast SSDs. These are even more expensive, although the price is coming down. Smart tiering migrates data between tiers so that more commonly accessed data is on the faster storage while less commonly used data drops down to the slower storage. This is OK in principle, but you are the DBA. You should already know which data needs to be accessed quickly and which can be slower. Do you really want an algorithm making this decision for you And regular maintenance tasks can confuse the whole thing anyway. Consider a load of index rebuilds running overnight. Lets suppose the last database to be processed is an archive database - do you want this is to be hogging the SSD when the users login first thing in the morning, while the mission critical database is languishing down in the bottom tier This is an oversimplification, of course. The tiering algorithms are more sophisticated than that, but my point stands. You should decide the priorities for your SQL Server data. Dont let the SAN vendors (or storage admins) persuade you otherwise. Storage Level Replication Storage level replication is a disaster recovery feature that copies block level data from the primary SAN to another - often located in a separate data center. The SAN vendors claim no impact on performance, and this is true if correctly configured. But I have seen poorly configured replication have a serious impact on performance. One client suffered a couple of years of poor IO performance. When I joined them I questioned whether the storage replication was responsible. I was told not to be so silly - the vendor has checked and it is not the problem - it must be SQL Server itself A few months later I was contacted again - they had turned off the replication while in the process of moving to a new data center and guess what Write latency improved by an order of magnitude. Let me repeat that this was caused by poor configuration and most storage replication does not noticeably affect performance. But its another thing to consider if youre struggling with SQL Server IO performance. Host Bus Adapters Check that the SAN and HBA firmware are compatible. Sometimes when a SAN is upgraded, the HBAs on the servers are overlooked. This can result in irregular errors, or even make the storage inaccessible. Have a look at the HBA queue depth. A common default is 32, which may not be optimal. Some studies have shown that increasing this to 64 or higher can improve performance. It could also make things worse, depending on workload, SAN make and model, disk layout, etc. So test thoroughly if you can. Some storage admins discourage modifying HBA queue depth as they think everyone will want the same on their servers and the storage array will be swamped. And theyre right, too Persuade them that it is just for you. Promise not to tell anyone else. Vad som helst. Just get your extra queue depth if you think it will benefit performance. Too Many Servers When a company forks out a small fortune on a storage area network, they want to get value for money. So naturally, every new server that comes along gets hooked up so it can make use of all that lovely disk space. This is fine until a couple of servers start issuing a lot of IO requests and other users complain of a performance slowdown. This is something I see repeatedly at so many clients, and there is no easy solution. The company doesnt want or cant afford to purchase another SAN. If you think this is a problem for you, put a schedule together of all jobs - across all servers - and try to reschedule some so that workload is distributed more evenly. Partition Alignment and Formatting I will briefly mention partition alignment, although Windows 2008 uses a default offset of 1MB so this is less of an issue than it used to be. I am also not convinced that a lot of modern SANs benefit much from the practise. I performed a test on an EVA a few years ago and found just a 2 improvement. Nevertheless, a few percent is still worth striving for. Unfortunately you will have to tear down your volumes and recreate your partitions if this is to be fixed on an existing system. This is probably not worth the hassle unless you are striving for every last inch of performance. Formatting is something else that should be performed correctly. SQL Server stores data in 8KB pages, but these are retrieved in blocks of 8, called extents. If the disks are formatted with 64KB allocation units, this can have a significant performance benefit. Multipathing If you are not using local disk then you should have some redundancy built into your storage subsystem. If you have a SAN you have a complicated network of HBAs, fabric, switches and controllers between SQL Server and the disks. There should be at least two HBAs, switches, etc. and these should all be connected together in such a way that there are multiple paths to the disks. This redundancy is primarily for high availability, but if the multipathing has been configured as activeactive you may see performance benefits as well. Network Attached Storage Since SQL Server 2008 R2 it has been possible to create, restore or attach a database on a file share. This has a number of possible uses, and particularly for devtest environments it can make capacity management easier, and make moving databases between servers much quicker. The question to be asked, though, is quotDo you really want this in productionquot Performance will not be as good as local or SAN drives. There are additional components in the chain, so reliability may not be as good. And by using the network, your data uses the same infrastructure as all the other TCPIP traffic, which again could impact performance. But theres good news While availability is still a worry, improvements in SMB on Windows Server 2012 (and via an update to WIndows Server 2008 R2) have made it significantly faster. I saw a quote from a Microsoft employee somewhere that claimed 97 of the performance of local storage. I cant find the quote now, and I dont remember if he was measuring latency or throughput. Disk Fragmentation How often do you use the Disk Defragmenter tool on your PC to analyze and defragment your C: drive How often do you check fragmentation on the disks on your SQL Servers For most people that is nowhere near as often, Ill bet. Yet volume fragmentation is just as detrimental to SQL Server performance as it is to your PC. You can reduce the likelihood of disk fragmentation in a number of ways: Pre-size data and log files, rather than rely on auto-growth Set auto-growth increments to sensible values instead of the default 10 Avoid shrinking data and log files Never, ever use the autoshrink database option Ensure disks are dedicated to SQL Server and not shared with other applications You can check fragmentation using the same tool as on your PC. Disk Defragmenter is available on all server versions of Windows. Another way to check is via the Win32Volume class in WMI. This bit of PowerShell reports the file percent fragmentation for all volumes on a given server. If you have significant fragmentation there are a couple of ways to fix it. My preferred option is as follows, but requires some downtime. Stop the SQL services Backup the files on the disk (especially mdf, ndf and ldf files - better safe than sorry) Run the Windows Disk Defragmenter tool Start the SQL services Check the error log to ensure no errors during startup Run CHECKDB against all databases (except tempdb). Ive never seen the defrag tool cause corruption, but you cant be too careful Another option that doesnt require downtime is to use a third party tool such as Diskeeper. This can be very effective at fixing and preventing disk fragmentation, but it costs money and uses a filter driver - see my comments below. Filter Drivers A filter driver is a piece of software that sits between an IO request and the write to disk. It allows the write to be examined and rejected, modified or audited. The most common type of filter driver is installed by anti-virus software. You do not want anti-virus software checking every single write to your database files. You also dont want it checking your backups either, or writes to the error log, or default trace. If you have AV software installed, you can specify exclusions. Exclude all folders used by SQL Server, plus the drives used by data and log files, plus the folders used for backups. Even better is to turn off online AV checking, and schedule a scan at a quiet time. OLTP and BI on the Same Server It is rare to find a system that is purely OLTP. Most will have some sort of reporting element as well. Unfortunately, the two types of workload do not always coexist happily. Ive been reading a lot of articles by Joe Chang, and in one article he explains why this is the case. Essentially, OLTP query plans retrieve rows in small batches (less than a threshold of 25 rows) and these IO requests are handled synchronously by the database engine, meaning that they wait for the data to be retrieved before continuing. Large BI workloads and reporting queries, often with parallel plans, issue asynchronous IO requests and take full advantage of the HBA ability to queue requests. As a result, the OLTP requests have to queue up behind the BI requests, causing OLTP performance to degrade significantly. Auto-grow and Instant File Initialization It is good to have auto-grow enabled, just as a precaution, although you should also pre-size data and log files so that it is rarely needed. However, what happens if a data file grows and you dont have instant file initialization enabled Especially if the auto-grow is set too big. All IO against the file has to wait for the file growth to complete, and this may be reported in the infamous quotIOs taken longer than 15 seconds to completequot message in the error log. Instant initialization wont help with log growth, so make sure log auto-growth increments are not too high. For more information about instant file initialization and how to enable it, see this link Database File Initialization . And while on the subject of auto-grow, see the section on proportional fill, below. Transaction Log Performance How long do your transaction log writes take Less than 1ms More than 5ms Look at virtual file stats, performance counters, or the WRITELOG wait time to see if log write latency is an issue for you. Writes to the transaction log are sequential, and so the write head on the disk should ideally be where it was from the last log write. This means no seek time, and blazingly fast write times. And since a transaction cannot commit until the log has hardened to disk, you rely on these fast writes for a performant system. Advice for years has been for the transaction log for each database to be on its own disk. And this advice is still good for local disk, and for some storage arrays. But now that a lot of SANs have their own battery-backed write cache, this advice is not as critical as it used to be. Provided the cache is big enough to cope with peak bursts of write activity (and see my earlier comments about allocating more cache to writes than to reads) you will get very low latency. So what if you dont have the luxury of a mega-bucks SAN and loads of write cache Then the advice thats been around since the 1990s is still valid: One transaction log file per database on its own drive RAID 1, RAID 10 or RAID 01 So assuming you are happy with your log file layout, what else could be slowing down your log writes Virtual Log Files Although a transaction log is written to sequentially, the file itself can become fragmented internally. When it is first created it consists of several chunks called virtual log files. Every time it is grown, whether manually or automatically, several more virtual log files are added. A transaction log that grows multiple times can end up with thousands of virtual log files. Having too many VLFs can slow down logging and may also slow down log backups. You also need to be careful to avoid VLFs that are too big. An inactive virtual log file is not cleared until the end is reached and the next one starts to be used. For full recovery model, this doesnt happen until the next log backup. So a log backup will suddenly have a lot more work to, and may cause performance problems while it takes place. The answer for a big transaction log is to set an initial size of maximum 8000MB, and then manually grow in chunks of 8000MB up to the target size. This results in maximum VLF size of 512MB, without creating an excessively large number of VLFs. Note: this advice is for manual growth only. Do not auto grow by 8000MB All transactions in the database will stop while the extra space is initialised. Autogrow should be much smaller - but try to manually size the file so that auto grow is unlikely to be needed. Log Manager Limits The database engine sets limits on the amount of log that can be in flight at any one time. This is a per-database limit, and depends on the version of SQL Server being used. SQL Server limits the number of outstanding IOs and MB per second. The limits vary with version and whether 32 bit or 64 bit. See Diagnosing Transaction Log Performance Issues and Limits of the Log Manager for more details. This is why the write latency should be as low as possible. If it takes 20ms to write to the transaction log, and you are limited to 32 IOs in flight at a time, that means a maximum of 1600 transactions per second, well below what a lot of high volume OLTP databases require. This also emphasises the importance of keeping transaction sizes small, as one very large transaction could conceivably hold up other transactions while it commits. If you think these limits are affecting log write performance in your databases there are several ways to tackle the problem: Work on increasing log write performance If you have minimally logged operations you can switch the database to use the BULK LOGGED recovery model. Careful though - a log backup containing a minimally logged operation has to be restored in full. Point in time restore is not possible. Split a high volume database into 2 or more databases, as the log limits apply per database Non-Sequential Log Activity There are actions performed by the database engine that move the write head away from the end of the log file. If transactions are still being committed while this happens, you have a seek overhead and log performance gets worse. Operations that read from the log files include rollback of large transactions, log backups and replication (the log reader agent). There is little you can do about most of these, but avoiding large rollbacks is something that should be tackled at the design and development stage of an application. Proportional Fill Very active tables can be placed in a file group that has multiple data files. This can improve read performance if they are on different physical disks, and it can improve write performance by limiting contention in the allocation pages (especially true for tempdb). You lose some of the benefit, though, if you dont take advantage of the proportional fill algorithm. Proportional fill is the process by which the database tries to allocate new pages in proportion to the amount of free space in each data file in the file group. To get the maximum benefit make sure that each file is the same size, and is always grown by the same increment. This is for both manual and auto growth. One thing to be aware of is how the auto growth works. SQL Server does its best to fill the files at the same rate, but one will always fill up just before the others, and this file will then auto grow on its own. This then gets more new page allocations than the others and becomes a temporary hotspot until the others also auto grow and catch up. This is unlikely to cause problems for most databases, although for tempdb it may be more noticeable. Trace flag 1117 causes all data files in a file group to grow together, so is worth considering if this is an issue for you. Personally I would rather manually size the files so that auto growth isnt necessary. tempdb Configuration Lets start with a few things that everybody agrees on: tempdb files should be placed on the fastest storage available. Local SSD is ideal, and from SQL Server 2012 this is even possible on a cluster Pre-size the data and log files, as auto growth may cause performance issues while it occurs New temporary objects are created all the time, so contention in the GAM, SGAM and PFS pages may be an issue in some environments And now some differences of opinion: There is loads of advice all over the web to create one tempdb data file per core to reduce allocation contention. Paul Randall disagrees (A SQL Server DBA myth a day: (1230) tempdb should always have one data file per processor core ). He says that too many files can actually make things worse. His solution is to create fewer files and to increase only if necessary There is more advice, often repeated, to separate tempdb files from other databases and put them on their own physical spindles. Joe Chang disagrees and has a very good argument for using the common pool of disks. (Data, Log and Temp file placement ). Ill leave you to decide what to do AutoShrink The AutoShrink database option has been around ever since I started using SQL Server, causing lots of performance problems for people who have enabled it without fully realising what it does. Often a third party application will install a database with this option enabled, and the DBA may not notice it until later. So why is it bad Two reasons: It is always used in conjunction with auto grow, and the continuous cycle of grow-shrink-grow causes a huge amount of physical disk fragmentation. Ive already covered that topic earlier in this article While it performs the shrink there is a lot of additional IO, which slows down the system for everything else Disable it. Allocate enough space for the data and log files, and size them accordingly. And dont forget to fix all that fragmentation while youre at it. Insufficient Memory This is an article about SQL Server IO performance, not memory. So I dont want to cover it in any detail here - that is a subject for a different article. I just want to remind you that SQL Server loves memory - the more the better. If your entire database(s) fits into memory youll have a much faster system, bypassing all that slow IO. Lack of memory can lead to dirty pages being flushed to disk more often to make space for more pages being read. Lack of memory can also lead to increased tempdb IO, as more worktables for sort and hash operations have to spool to disk. Anyway, the point of this section is really to make one statement: Fill your servers with as much memory as you can afford, and as much as the edition of SQL Server and Windows can address. SQL Server 2014 has a new feature allowing some tables to be retained in memory, and accessed via natively compiled stored procedures. Some redesign of some of your existing code may be needed to take advantage of this, but it looks like a great performance boost for those OLTP systems that start to use it. High Use of tempdb tempdb can be a major consumer of IO and may affect overall performance if used excessively. It is worth looking at the various reasons for its use, and examining your system to ensure you have minimized these as far as possible. User-created temporary objects The most common of these are temporary tables, table variables and cursors. If there is a high rate of creation this can lead to allocation page contention, although increasing the number of tempdb data-files may partially alleviate this. Processes creating very large temporary tables or table variables are a big no-no, as these can cause a lot of IO. Internal Objects The database engine creates work-tables in tempdb for handling hash joins, sorting and spooling of intermediate result sets. When sort operations or hash joins need more memory than has been granted they spill to disk (using tempdb) and you will see Hash warnings and Sort warnings in the default trace. I originally wrote a couple of paragraphs about how and why this happens and what you can do to prevent it, but then I found this post that explains it much better - Understanding Hash, Sort and Exchange Spill Events . Version Store The third use of tempdb is for the version store. This is used for row versioning. Row versions are created when snapshot isolation or read committed snapshot option is used. They are also created during online index rebuilds for updates and deletes made during the rebuild and for handling data modifications to multiple active result sets (MARS). A poorly written application (or rogue user) performing a large update that affects many thousands of rows when a row versioning based isolation level is in use may cause rapid growth in tempdb and adversely impact IO performance for other users. Table and Index Scans A table scan is a scan of a heap. An index scan is a scan of a clustered or non-clustered index. Both may be the best option if a covering index does not exist and a lot of rows are likely to be retrieved. A clustered index scan performs better than a table scan - yet another reason for avoiding heaps But what causes a scan to be used in the first place, and how can you make a seek more likely Out of date statistics Before checking indexes and code, make sure that statistics are up to date. Enable quotauto create statisticsquot. If quotauto update statisticsquot is not enabled make sure you run a manual statistics update regularly. This is a good idea even if quotauto update statisticsquot is enabled, as the threshold of approximately 20 of changed rows before the auto update kicks in is often not enough, especially where new rows are added with an ascending key. Index Choice Sometimes an existing index is not used. Have a look at improving its selectivity, possibly by adding additional columns, or modifying the column order. Consider whether a covering index could be created. A seek is more likely to be performed if no bookmark lookups will be needed. See these posts on the quottipping pointquot by Kimberly Tripp. The Tipping Point . Inefficient TSQL The way a query is written can also result in a scan, even if a useful index exists. Some of the reasons for this are: Non-sargable expressions in the WHERE clause. quotsargquot means Simple ARGument. So move calculations away from the columns and onto the constants instead. So for example, this will not use the index on OrderDate: WHERE DATEADD ( DAY. 1. OrderDate ) gt GETDATE () Whereas this will use an index if it exists (and it is selective enough): WHERE OrderDate gt DATEADD ( DAY. - 1. GETDATE ()) Implicit conversions in a query may also result in a scan. See this post by Jonathan Kehayias Implicit Conversions that cause Index Scans . Bad Parameter Sniffing Parameter sniffing is a good thing. It allows plan re-use and improves performance. But sometimes it results in a less efficient execution plan for some parameters. Index Maintenance Every index has to be maintained. Im not talking about maintenance plans, but about the fact that when rows are inserted, deleted and updated, the non-clustered indexes also have to be changed. This means additional IO for each index on a table. So it is a mistake to have more indexes than you need. Check that all indexes are being used. Check for duplicates and redundant indexes (where the columns in one are a subset of the columns in another). Check for indexes where the first column is identical but the rest are not - sometimes these can be merged. And of course, test, test, test. Index Fragmentation Index fragmentation affects IO performance in several ways. Range scans are less efficient, and less able to make use of read-ahead reads Empty space created in the pages reduces the density of the data, meaning more read IO is necessary The fragmentation itself is caused by page splits, which means more write IO There are a number things that can be done to reduce the impact of fragmentation, or to reduce the amount of fragmentation. Rebuild or reorganize indexes regularly Specify a lower fill factor so that page splits occur less often (though not too low, see below) Change the clustered index to use an ascending key so that new rows are appended to the end, rather than inserted in a random place in the middle Forwarded Records When a row in a heap is updated and requires more space, it is copied to a new page. But non-clustered indexes are not updated to point to the new page. Instead, a pointer is added to the original page to show where the row has moved to. This is called a forwarding pointer, and there could potentially be a long chain of these pointers to traverse to find the eventual data. Naturally, this means more IO. A heap cannot be defragmented by rebuilding the index (there isnt one). The only way to do this is to create a clustered index on the heap, and then drop it afterwards. Be aware that this will cause all non-clustered indexes to be rebuilt twice - once for the new clustered index, and again when it is dropped. If there are a lot of these it is a good idea to drop the non-clustered indexes first, and recreate them afterwards. Better still is to avoid heaps where possible. I accept there may be cases where they are the more efficient choice (inserting into archive tables, for example), but always consider whether a clustered index would be a better option - it usually is. Wasted Space In an ideal world every data page on disk (and in memory) would be 100 full. This would mean the minimum of IO is needed to read and write the data. In practise, there is wasted space in nearly all pages - sometimes a very high percent - and there are a lot of reasons why this occurs. Low fill factor Ive mentioned fill factor already. If it is too high, and page splits are occurring when rows are inserted or updated, it is sensible to rebuild the index with a lower fill factor. However, if the fill factor is too low you may have a lot of wasted space in the database pages, resulting in more IO and memory use. This is one of those quotsuck it and seequot scenarios. Sometimes a compromise is needed. Page splits This is also discussed above. But as well as fragmentation, page splits can also result in wasted space if the empty space is not reused. The solution is to defragment by rebuilding or reorganizing indexes regularly. Wasteful Choice of Data Types Use the smallest data types you can. And try to avoid the fixed length datatypes, like CHAR(255), unless you regularly update to the longest length and want to avoid page splits. The reasoning is simple. If you only use 20 characters out of 200, that is 90 wasted space, and more IO as result. The higher density of data per page the better. Lazy thinking might make developers create AddressLine1, AddressLine2, etc as CHAR(255), because they dont actually know what the longest should be. In this case, either do some research, find out that the longest is 50 characters (for example) and reduce them to CHAR(50), or use a variable length data type. Schema Design Ive already mentioned choice of data types above, but there are other schema design decisions that can affect the amount of IO generated by an application database. The most common one is designing tables that are too wide. I sometimes see a table with 20, 30, 50, even 100 columns. This means fewer rows fit on a page, and for some extreme cases there is room for just one row per page - and often a lot of wasted space as well (if the row is just slightly wider than half a page, thats 50 wasted). If you really do need 50 columns for your Customer table, ask yourself how many of these are regularly accessed. An alternative is to split into 2 tables. Customer, with just a few of the commonly used columns, and CustomerDetail with the rest. Of course, the choice of which columns to move is important. You dont want to start joining the tables for every query as that defeats the object of the exercise. Page or Row Compression Compression is another way of compacting the data onto a page to reduce disk space and IO. Use of row or page compression can dramatically improve IO performance, but CPU usage does increase. As long as you are not already seeing CPU bottlenecks, compression may be an option to consider. Be aware that compression is an Enterprise edition feature only. Backup Compression Since SQL Server 2008 R2, backup compression has been available on Standard edition as well as Enterprise. This is major benefit and I recommend that it be enabled on all instances. As well as creating smaller backups, it is also quicker and means less write IO. The small increase in CPU usage is well worth it. Enable it by default so that if someone sets off an ad hoc backup it will have minimal IO impact. Synchronous MirroringAlwaysOn High safety mode in database mirroring, or synchronous commit mode in AlwaysOn, both emphasise availability over performance. A transaction on the mirroring principal server or primary replica does not commit until it receives a message back from the mirror or secondary replica that the transaction has been hardened to the transaction log. This increases transactional latency, particularly when the servers are in different physical locations. Resource Governor in 2014 Up until and including SQL Server 2012 resource governor has only been able to throttle CPU and memory usage. Finally the ability to include IO in a resource pool has been added to SQL Server 2014. This has obvious use as a way of limiting the impact of reports on the system from a particular user, department or application. Gathering The Evidence There are a lot of ways you can measure SQL Server IO performance and identify which areas need looking at. Most of what follows is available in SQL CoPilot in graphical and tabular form, both as averages since last service start and as snapshots of current activity. Wait Types Use sys. dmoswaitstats to check number of waits and wait times for IOCOMPLETION, LOGBUFFER, WRITELOG and PAGEIOLATCH. Use this script to focus on the IO wait types: SELECT waittype. waitingtaskscount. waittimems - signalwaittimems AS totalwaittimems , 1. ( waittimems - signalwaittimems ) CASE WHEN waitingtaskscount 0 THEN 1 ELSE waitingtaskscount END AS avgwaitms FROM sys. dmoswaitstats WHERE waittype IN ( IOCOMPLETION. LOGBUFFER. WRITELOG. PAGEIOLATCHSH. PAGEIOLATCHUP. PAGEIOLATCHEX. PAGEIOLATCHDT. PAGEIOLATCHKP ) This shows averages since the last service restart, or since the wait stats were last cleared. To clear the wait stats, use DBCC SQLPERF (sys. dmoswaitstats, CLEAR) You can also check sys. dmoswaitingtasks to see what is currently being waited for. Virtual File Stats Query sys. dmiovirtualfilestats to find out which data and log files get the most read and write IO, and the latency for each file calculated using the stall in ms. SELECT d. name AS databasename. mf. name AS logicalfilename. numofbytesread. numofbyteswritten. numofreads. numofwrites. 1. iostallreadms ( numofreads 1 ) avgreadstallms. 1. iostallwritems ( numofwrites 1 ) avgwritestallms FROM sys. dmiovirtualfilestats (NULL, NULL) vfs JOIN sys. masterfiles mf ON vfs. databaseid mf. databaseid AND vfs. FILEID mf. FILEID JOIN sys. databases d ON mf. databaseid d. databaseid Performance Counters There are two ways of looking at performance counters. Select from sys. dmosperformancecounters, which shows all the SQL Server counters, or use Windows Performance Monitor (perfmon) to see the other OS counters as well. Some counters to look at are: SQL Server:Buffer Manager Lazy writessec The number of times per second that dirty pages are flushed to disk by the Lazy Writer process. An indication of low memory, but listed here as it causes more IO. Checkpoint pagessec The number of dirty pages flushed to disk per second by the checkpoint process. Page readssec Number of physical pages read from disk per second Page writessec Number of physical pages written to disk per second Readahead pagessec Pages read from disk in advance of them being needed. Expect to see high values in BI workloads, but not for OLTP SQL Server:Access Methods Forwarded recordssec Should be as low as possible. See above for explanation of forwarded records. Full scanssec The number of unrestricted full scans. Use of UDFs and table variables can contribute to this, but concentrating on seeks will help to keep the value down Page splitssec The number of page splits per second - combining splits due to pages being added to the end of a clustered index as well as quotgenuinequot splits when a row is moved to a new page. Use the technique from the link in the section on index fragmentation, above, to get a more accurate breakdown Skipped ghosted recordssec For information about ghosted records see An In-depth Look at Ghost Records in SQL Server Workfiles createdsec A measure of tempdb activity Worktables createdsec A measure of tempdb activity SQL Server:Databases Log bytes flushedsec The rate at which log records are written to disk Log flush wait time The duration of the last log flush for each database Log flush waitssec The number of commits per second waiting for a log flush Logical Disk Avg Disk secsRead Avg Disk secsWrite Avg Disk Read bytessec Avg Disk Write bytessec Using the sys. dmosperformancecounters DMV, a lot of counters display a raw value, which has to be monitored over time to see values per second. Others have to be divided by a base value to get a percentage. This makes this DMV less useful unless you perform these calculations and either monitor over time or take an average since the last server restart. This script uses the tempdb creation date to get the number of seconds since the service started and calculates the averages for these counters. It also retrieves all other counters and calculates those that are derived from a base value. USE master SET NOCOUNT ON DECLARE upsecs bigint SELECT upsecs DATEDIFF ( second. createdate. GETDATE ()) FROM sys. databases WHERE name tempdb SELECT RTRIM ( objectname ) objectname. RTRIM ( instancename ) instancename. RTRIM ( countername ) countername. cntrvalue FROM sys. dmosperformancecounters WHERE cntrtype 65792 UNION ALL SELECT RTRIM ( objectname ), RTRIM ( instancename ), RTRIM ( countername ), 1. CAST ( cntrvalue AS bigint ) upsecs FROM sys. dmosperformancecounters WHERE cntrtype 272696576 UNION ALL SELECT RTRIM ( v. objectname ), RTRIM ( v. instancename ), RTRIM ( v. countername ), 100. v. cntrvalue CASE WHEN b. cntrvalue 0 THEN 1 ELSE b. cntrvalue END FROM ( SELECT objectname. instancename. countername. cntrvalue FROM sys. dmosperformancecounters WHERE cntrtype 537003264 ) v JOIN ( SELECT objectname. instancename. countername. cntrvalue FROM sys. dmosperformancecounters WHERE cntrtype 1073939712 ) b ON v. objectname b. objectname AND v. instancename b. instancename AND RTRIM ( v. countername ) base RTRIM ( b. countername ) UNION ALL SELECT RTRIM ( v. objectname ), RTRIM ( v. instancename ), RTRIM ( v. countername ), 1. v. cntrvalue CASE WHEN b. cntrvalue 0 THEN 1 ELSE b. cntrvalue END FROM ( SELECT objectname. instancename. countername. cntrvalue FROM sys. dmosperformancecounters WHERE cntrtype 1073874176 ) v JOIN ( SELECT objectname. instancename. countername. cntrvalue FROM sys. dmosperformancecounters WHERE cntrtype 1073939712 ) b ON v. objectname b. objectname AND v. instancename b. instancename AND REPLACE ( RTRIM ( v. countername ), (ms). ) Base RTRIM ( b. countername ) ORDER BY objectname. instancename. countername Dynamic Management Views and Functions As well as the DMVs in the above scripts, there are a number of others that are useful for diagnosing SQL Server IO performance problems. Here are all the ones I use. Ill add some sample scripts when I get the time: sys. dmoswaitstats sys. dmiovirtualfilestats sys. dmosperformancecounters sys. dmiopendingiorequests sys. dmdbindexoperationalstats sys. dmdbindexusagestats sys. dmdbindexphysicalstats sys. dmosbufferdescriptors It can also be useful to see what activity there is on the instance. Here are your options: The Profiler tool is quick and easy to use - you can start tracing in a matter of seconds. However, there is some overhead and it may impact performance itself - especially when a lot of columns are selected. A server side trace is a better option. A server-side trace has less of an impact than running Profiler. It has to be scripted using system stored procedures, but Profiler has the ability to generate the script for you. Extended Event Sessions Extended events were first introduced in SQL Server 2008, and have been considerably enhanced in SQL 2012. They are very lightweight, and the use of server-side traces and Profiler is now deprecated. Nevertheless, use of extended events may impact performance of high transaction systems if you are not careful. Use an asynchronous target and avoid complicated predicates to limit the overhead. There are a number of tools for gathering performance data from your servers. SQLIO is a simple tool that creates a file on disk and tests latency and throughput for randomsequential IO, at various block sizes and with a variable number of threads. These are all fully configurable. SQLIO is a great way of getting a baseline on a new server or storage, for future comparison. Third party tools are another option for viewing performance metrics. Some show you what is happening on the server right now. Others are built into more complex (and expensive) monitoring solutions. Performance metrics obtained on virtual servers are unreliable. Performance counters and wait stats may give the impression that everything is OK, when it is not. I recommend the use of the performance monitoring tools provided by the VM vendor. In the case of VMWare, this is very easy to use and is built into Virtual Center. This turned into a much bigger article than I expected - SQL Server IO performance is a big subject I started with everything I knew, and double checked my facts by searching the web and checking books. In the process I learnt a whole lot of new stuff and found a lot of useful links. It has been a useful exercise. Hopefully this has been useful for you too.

No comments:

Post a Comment