This content originally appeared on DEV Community and was authored by Vanderlei Morais
One thing so trivial as transforming a string could have considerable impact in the performance depending on how is implemented. Let's check an example of a simple requirement that I had to work some time ago.
The Requirement
Given a "CPF" (Brazilian National ID) stored as string with only numeric digits, add some standard separators (“.” and “-”) to display it in a more user-friendly format, pretty straightforward as:
> Input: “12345678909”
> Output: “123.456.789–09”
Initial Solution: Stack Overflow
Like a good developer I google it for a quick solution and, of course, I ended up in Stack Overflow. In the best voted answer, I found this implementation:
public string Format(string cpf)
{
return Convert.ToUInt64(cpf).ToString(@"000\.000\.000\-00");
}
Looks like a good solution, right? To use the number format string - that allows defining a mask for a number in the ToString
method - the string was converted to UInt64
.
Pretty clever, huh? Not much...
My Solution
I had doubts about the performance of the solution found, especially because the conversion part, so I tried to implement my own solution in a very simple way:
public string Format(string cpf)
{
return $"{cpf.Substring(0, 3)}.{cpf.Substring(3, 3)}.{cpf.Substring(6, 3)}-{cpf.Substring(9, 2)}";
}
Basically, I just used Substring
to split the CPF into four parts, inserting the corresponding separators.
Then, to compare the approaches I used Benchmark.DotNet, here are the results:
Method | Mean | Ratio |
---|---|---|
FormatCpfConvert | 301.97 ns | baseline |
FormatCpfSubstring | 127.54 ns | -56% |
My solution was 50% faster than the Stack Overflow one!
Final Solution
Even that my solution was an acceptable implementation I felt it could still be improved, the problem is that using Substring
for extracting the CPF sections generates new allocations in the memory for each part of the string.
To make this process efficiently instead of creating new substrings we ideally could just pick slices of the original input and add the required separators. Here is where the power of Span and Slice can be used.
The Span<T>
type provides a way to point to a specific part of an object in the memory by using the Slice
method, for manipulating strings this can be very helpful in scenarios where parts of the existing string can be used to produce the desired result.
So, by just applying the extension method AsSpan
into the string and using Slice
I implemented this new solution:
public string Format(string cpf)
{
var cpfAsSpan = cpf.AsSpan();
return $"{cpfAsSpan.Slice(0, 3)}.{cpfAsSpan.Slice(3, 3)}.{cpfAsSpan.Slice(6, 3)}-{cpfAsSpan.Slice(9, 2)}";
}
After running the benchmark again, we have:
Method | Mean | Ratio |
---|---|---|
FormatCpfConvert | 301.97 ns | baseline |
FormatCpfSubstring | 127.54 ns | -56% |
FormatCpfAsSpan | 75.94 ns | -74% |
Final solution is almost 75% faster than the initial one!
Conclusion
I hope this gives you an idea of Span
type and its benefits regarding performance. Although this type was introduced a few years ago is not common to see it being used or explained.
Also, this experience reinforced practices that I recommend and try to follow every day:
- Stay up to date about the features of the programming language/framework that you are working with, you never know when you can find something useful to apply in your day-to-day activities
- Make it work, then make it better/faster
- Don't blindly rely on solutions from the internet
Links:
This content originally appeared on DEV Community and was authored by Vanderlei Morais
Vanderlei Morais | Sciencx (2024-09-25T03:08:20+00:00) Improving String Manipulation in .NET. Retrieved from https://www.scien.cx/2024/09/25/improving-string-manipulation-in-net/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.