This content originally appeared on DEV Community and was authored by DEV Community
Hello
In this lab we are supposed to run different versions of the same program and analyze the performance difference.
This program basically will scale a number of samples that will be decided later but that must be enough for the program to run for 20 seconds or more.
Each version of the program has some variance in in the algorithm.
Here are they:
From our lab page
1. vol0.c is the basic or naive algorithm. This approach multiplies each sound sample by the volume scaling factor, casting from signed 16-bit integer to floating point and back again. Casting between integer and floating point can be expensive operations.
2. vol1.c does the math using fixed-point calculations. This avoids the overhead of casting between integer and floating point and back again.
3. vol2.c pre-calculates all 65536 different results, and then looks up the answer for each input value.
4. vol3.c is a dummy program - it doesn't scale the volume at all. It can be used to determine some of the overhead of the rest of the processing (besides scaling the volume) done by the other programs.
5. vol4.c uses Single Instruction, Multiple Data (SIMD) instructions accessed through inline assembley (assembly language code inserted into a C program). This program is specific to the AArch64 architecture and will not build for x86_64.
6. vol5.c uses SIMD instructions accessed through Complier Intrinsics. This program is also specific to AArch64.
In this lab we will use the AArch64 and x86_64 machines provided by the professor.
AArch64
First, as said by the professor, we need to calculate the overhead of a dummy program.
This way we will know how much overhead we have and this way we can decrease it from our other algorithms runs.
So lets set the number of samples so our program can run for over 20 seconds:
Then we compile our programs using the make
command:
And now lets run vol3 (Dummy program) for 5 times and see how long it takes in average.
Lets discard the highest and the lowest values and get the average: 26.946 seconds
This is the value of our overhead that we will subtract from the other programs.
Now lets do the same with others:
vol0:
Again, lets discard the highest and lowest ones and get the average, then we subtract from the dummy overhead.
Our Average is 27.684 seconds
.
27.684
- 26.946 (overhead)
= 0.738s
.
We had 0.738s
scaling time.
vol1:
Repeating the process, we have 27.501s
average
Subtracting the overhead we have: 0.555s
scaling time
vol2:
Average: 33.748s
Subtracting the overhead we have: 6.802s
scaling time.
vol4:
Average: 26.763s
.
Subtracting the overhead we have: -0.183s
scaling time.
vol5:
Average: 26.729s
.
Subtracting the overhead we have: -0.216s
scaling time.
I tried to change the Makefile
to try to make it run faster by using the -Ofast
and enable the vectorization
And had a 3 second average difference from the -O2
.
Now lets go to x86_64 machine
x86_65
The dummy (vol3) ovearhead value is 23.650s
.
Vol0 is the naïve implementation:
The average was 24.158 s
.
Minus the overhead, the time scaling is: 0.508s
Vol1 is the fixed point implementation:
The average was 23.736s
.
Minus the overhead, the time scaling is: 0.086s
Vol2 is the precalculated implementation:
The average was 24.676sz
.
Minus the overhead, the time scaling is: 1.026s
Finally
Lets answer the questions on the c code:
The dummy program will help us measure the overhead, this is the process that every other vol program will go through so we need to know how much it is in order to know how much is the scaling process time.
Those parts are needed because if you don’t use the result of your calculations, the compiler will see it and avoid it as you are not using it anyway, so In the end your program will end up doing nothing at all.
That’s a good question. I would say to make sure it’s a positive number, but to be honest Im not sure. ☹
I could notice that different algorithms produce different outputs. But running the same program multiple times will produce the same result even with different performance times. I would say that the difference results between algorithms, in terms of audio is not big enough for us to perceive.
Thank you for reading!
This content originally appeared on DEV Community and was authored by DEV Community
DEV Community | Sciencx (2022-03-09T02:50:49+00:00) SPO600 Lab 05 – Algorithm Selection. Retrieved from https://www.scien.cx/2022/03/09/spo600-lab-05-algorithm-selection/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.