In this post we are going to find the total number of spaces in a given string. I have a string which contains spaces at multiple places and i just want to count them all. We can achieve this in various ways. Here, i am sharing some of these methods. Why i am doing this in various ways and sharing with you is just to show you the performance of different approaches of problem solving and also to share some interesting tricks to solve this problem in different manners. I am not going to cover all the possible ways to solve this problem and in fact you can achieve this in some other ways too. In case you want to share your own method, I welcome it you all to share this in comment section of this post.
Create a demo table as below and then put some demo data in it. We will use this table for demo in this post. Remember, in this table i am going to put 100,000 records to compare the performance of various methods too.
Create Demo table with dummy data with 100,000 rows
SELECT TOP 100000 IDENTITY(INT) AS RowID, CHAR((ABS(CHECKSUM(NEWID())) % 26) + 65) + CASE WHEN ABS(CHECKSUM(NEWID())) % 2 = 0 THEN CHAR((ABS(CHECKSUM(NEWID())) % 26) + 65) ELSE ' ' END + CASE WHEN ABS(CHECKSUM(NEWID())) % 3 = 0 THEN CHAR((ABS(CHECKSUM(NEWID())) % 26) + 65) ELSE ' ' END + CHAR((ABS(CHECKSUM(NEWID())) % 26) + 65) + CASE WHEN ABS(CHECKSUM(NEWID())) % 2 = 0 THEN CHAR((ABS(CHECKSUM(NEWID())) % 26) + 65) ELSE ' ' END + CASE WHEN ABS(CHECKSUM(NEWID())) % 3 = 0 THEN CHAR((ABS(CHECKSUM(NEWID())) % 26) + 65) ELSE ' ' END + CHAR((ABS(CHECKSUM(NEWID())) % 26) + 65) + CASE WHEN ABS(CHECKSUM(NEWID())) % 2 = 0 THEN CHAR((ABS(CHECKSUM(NEWID())) % 26) + 65) ELSE ' ' END + CASE WHEN ABS(CHECKSUM(NEWID())) % 3 = 0 THEN CHAR((ABS(CHECKSUM(NEWID())) % 26) + 65) ELSE ' ' END + CHAR((ABS(CHECKSUM(NEWID())) % 26) + 65) AS Comment INTO #DemoTable FROM sys.columns a CROSS JOIN sys.columns b
You can use any logic to create the dummy data and so you can avoid the above code which i have used to create a dummy data. In my logic i have tried to generate random number of spaces in each row but you can choose your own way to do so.
Now, we have a table with 100,000 rows (depends on total rows available in sys.columns) and going to count the number of spaces in “Comment” column in various ways.
SELECT * FROM #DemoTable
Using REPLACE function : Fastest and easiest one
As we know that the REPLACE function can replace each occurrence of the given string with another given string in a string. In this approach, i am going to use REPLACE function with some tricks to find the total count of spaces in given string. To do this task, i will find the length of total string first and then will Replace all the spaces with blank (without space) string. Finally i will subtract the length of the replaced string from original string as i did in below query.
SELECT RowID, Comment, LEN(Comment + '@') - LEN(REPLACE(Comment, ' ', '') + '@') As TotalSpaces FROM #DemoTable
In above image, In first row, After each character i have 2 spaces and thus output is 6 and also for other rows there are multiple spaces in between two characters.
One more thing you can see in above code is concatenation of “@” symbol in original string at end of the string. As LEN function excludes the trailing spaces at end. So i have used this trick to count all the spaces even at the end. I will use this trick at each place where a LEN function will be used in this post. Now have a look on the performance stats with 100,000 rows of VARCHAR(10) column. Also remember that below performance stat may vary on your machine.
Performance stats (Avg):
SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 420 ms.
Using a SerialNumber table – Without help of REPLACE function
In this method, we are going to achieve the above output without using this REPLACE function. Why i am doing so because in one of my session, i was asked to do this without use of REPLACE function and so i want to share this method with you too. In this method i will create a function which applies a join with a SerialNumber table. Now what is a SerialNumber table here? Yup!, you guessed right, it’s just a table with a RowID column and nothing. For your ready reference i am sharing the structure and dummy data too.
SerialNumber table creation with dummy data
CREATE TABLE dbo.SerialNumber ( SNo INT PRIMARY KEY --For clustered index and non duplicate values ) GO --Insert data now INSERT INTO SerialNumber(SNo) SELECT TOP 100000 ROW_NUMBER() OVER(ORDER BY (SELECT 0)) AS SNo FROM SYS.COLUMNS A CROSS JOIN SYS.COLUMNS B GO
You can insert the same data using a WHILE loop too. It depends on you what you like and how much rows you want to insert. Hence this is a one time activity, for me it does not matter either i am using a WHILE loop or a Join. But remember that a WHILE loop will take much more time than a Join.
Now create a function as below;
CREATE FUNCTION dbo.fn_GetCountOfChar ( @String NVARCHAR(MAX), @StrToCount NVARCHAR(10) ) RETURNS INT AS BEGIN DECLARE @Len TINYINT = LEN(@StrToCount + '@') - 1 RETURN(SELECT COUNT(SUBSTRING(@String, SNo, @Len)) FROM dbo.SerialNumber WHERE SNo <= LEN(@String + '@') - 1 AND SUBSTRING(@String, SNo, @Len) + '@' = @StrToCount + '@') END
In this function, i have applied a join with SerialNumber table with the passed string up to the length of the given string and extracted the sub-string values from each character with the same length as of the given string which count we have to found. Now if this extracted sub-string is same as the given string we have counted all these rows and returned this count to the caller of this function.
SELECT RowID, Comment, dbo.fn_GetCountOfChar(Comment, ' ') AS TotalSpaces FROM #DemoTable
From above query, you will get the same output as i have shared for first method but with different performance stats.
Performance stats (Avg):
SQL Server Execution Times:
CPU time = 5304 ms, elapsed time = 5402 ms.
Using a Recursive CTE – RBAR Approach
As I have mentioned earlier, we can achieve this in many ways and this is again an another way to achieve the same task using a recursive CTE. As we know that the recursive CTE executes as like a loop. I am sharing this method, because many of us resolves such type of problems with a recursive CTE approach. At the end of this post, i will share the performance stats of all these methods which will help you out to find a better way to do a task in SQL Server.
To achieve this task again i am going to create a function which will use a CTE in recursive mode as below;
CREATE FUNCTION dbo.fn_GetCountOfCharCTE ( @String NVARCHAR(MAX), @StrToCount NVARCHAR(10) ) RETURNS INT AS BEGIN DECLARE @Count INT = 0 SET @String = '@' + @String + ' ' ;WITH CTE AS ( SELECT RemaingStr = STUFF(@String, PATINDEX('%[^' + @StrToCount + ']%', @String), PATINDEX('%[' + @StrToCount + ']%', LTRIM(@String)) - 1, '') UNION ALL SELECT RemaingStr = STUFF(RemaingStr, PATINDEX('%[^' + @StrToCount + ']%', RemaingStr), PATINDEX('%[' + @StrToCount + ']%', LTRIM(RemaingStr)) - 1, '') FROM CTE WHERE RemaingStr > '' ) SELECT @Count = (LEN(RemaingStr + '@') - 1) - 1 FROM CTE WHERE RemaingStr = '' OPTION(MAXRECURSION 32767) RETURN @Count END
In above function, inside CTE I have just removed all the characters from the given string except only the string which I have to count. And after that I have counted the length of the string and returned it to the caller.
SELECT RowID, Comment, dbo.fn_GetCountOfCharCTE(Comment, ' ') AS TotalSpaces FROM #DemoTable
Again from above query, you will get the same output but with below performance stats;
Performance stats (Avg):
SQL Server Execution Times:
CPU time = 8284 ms, elapsed time = 8702 ms.
Conclusion
Below is the Performance comparison chart for all the above methods with 100,000 rows of VARCHAR(10) column.
From above figure, we can ensure that the SET based approach has a significant performance improvement over recursive approach. Also the first one is the fastest one method to adopt. What i recommend here is to avoid the use of recursive CTE in all cases, if you can solve the same problem with SET based approach.
You have many more ways to achieve this task and so I will request you to put your approach in comment section of this post. Just for a hint, we can also use a CLR approach to achieve this task. You can try CLR or use your own method and share on this post.
Thanks for your patience and i highly welcome your comment and suggestion on this post. Please rate this post and also do share it on your social wall too.
Very rapidly this website will be famous amid all blogging and site-building
people, due to it’s good articles
Hey Gopal,
Okies so the culprit here is the Data type.. We can do something like
DECLARE @Comment NVARCHAR(100) = ‘Pawan ‘
DECLARE @Comment1 NVARCHAR(100) = ‘Pawan ‘
DECLARE @Comment2 NVARCHAR(100) = ‘Pawan ‘
SELECT @Comment, LEN(@Comment + ‘@’) – LEN(REPLACE(@Comment, ‘ ‘, ”) + ‘@’) As TotalSpaces
SELECT @Comment , DATALENGTH(CAST(@Comment AS VARCHAR(MAX))) – LEN(REPLACE(CAST(@Comment AS VARCHAR(MAX)),’ ‘, ”)) As TotalSpaces
SELECT @Comment1, LEN(@Comment1 + ‘@’) – LEN(REPLACE(@Comment1, ‘ ‘, ”) + ‘@’) As TotalSpaces
SELECT @Comment1 , DATALENGTH(CAST(@Comment1 AS VARCHAR(MAX))) – LEN(REPLACE(CAST(@Comment1 AS VARCHAR(MAX)),’ ‘, ”)) As TotalSpaces
SELECT @Comment2, LEN(@Comment2 + ‘@’) – LEN(REPLACE(@Comment2, ‘ ‘, ”) + ‘@’) As TotalSpaces
SELECT @Comment2 , DATALENGTH(CAST(@Comment2 AS VARCHAR(MAX))) – LEN(REPLACE(CAST(@Comment2 AS VARCHAR(MAX)),’ ‘, ”)) As TotalSpaces
Cool, Keep up the good work !
DATALENGTH() – Returns the number of bytes used to represent any expression.
DATALENGTH is especially useful with varchar, varbinary, text, image, nvarchar, and ntext data types because these data types can store variable-length data.
Len() – Returns the number of characters of the specified string expression, excluding trailing blanks and that is why you are addding ‘@’ so that you can handle the trailing blanks.
Also you don’t have to explicitly divide it by 2.
The answer I gave is pertaining to the datatype mentioned by you. Is that uncertain or did i missed something ?
Just notice my last line I have mentioned it as a final note.
Just check this script,
DECLARE @Comment NVARCHAR(100) = ‘Pawan ‘
SELECT @Comment, LEN(@Comment + ‘@’) – LEN(REPLACE(@Comment, ‘ ‘, ”) + ‘@’) As TotalSpaces
SELECT @Comment , DATALENGTH(@Comment) – LEN(REPLACE(@Comment,’ ‘, ”)) As TotalSpaces
In my case, it will output 3, whereas in case of your solution it will return 11 and it is not desirable.
Even you will put it as;
SELECT @Comment , DATALENGTH(@Comment) – DATALENGTH(REPLACE(@Comment,’ ‘, ”)) As TotalSpaces
It will return 6 and you have to divide it by 2.
You don’t need to append ‘@’ in LEN(Comment + ‘@’) – LEN(REPLACE(Comment, ‘ ‘, ”) + ‘@’) . This will decrease performance as you to append @ for all the rows.
Use below-
SELECT Comment , DATALENGTH(Comment) – LEN(REPLACE(Comment,’ ‘, ”)) SpacesPresent FROM #DemoTable
Hi Pawan,
Thanks for your reading and your interest.
Regarding your point, LEN is used to count the number of characters in a string whereas DATALENGTH is used to extract the number of bytes. It also true that these functions can be used interchangeably in some situations but not always.
What you have suggested is not a substitute of what I have used in my script. I have used NVARCHAR variables and DATALENGTH will return the UNICODE count of bytes (Count * 2) and again have to divide it by 2 to use.
Also dividing by 2 is not a generic solution and it depends on the used data type. You have to switch in case of Unicode as DATALENGTH (string)/2 and Non-Unicode as DATALENGTH .
In case of uncertain data types, LEN works well whereas DATALENGTH is not.
Also note that DATALENGTH works on any datatype but LEN can not.