Curse of dimensionality…

Prerequisites : Familiarity with KNN

Most of us know that, more features(data) leads to more accuracy for a particular Machine Learning model. But for K Nearest Neighbours Classifier, things are a bit different. If it finds an irrelevant feature, it w…


This content originally appeared on DEV Community and was authored by Chakraborty Arya

Prerequisites : Familiarity with KNN

Most of us know that, more features(data) leads to more accuracy for a particular Machine Learning model. But for K Nearest Neighbours Classifier, things are a bit different. If it finds an irrelevant feature, it will still use it as a useful data, and eventually lead to noise/fallacy.
There is another instance. Let's suppose there's a data with 2 or more features with relatively the same meaning/values. Then our classifier will treat 2 features as seperate, and that feature will get eventually more weightage/importance. This can also lead to bad results.

Thus, one way is to assign weights to our features.
Instead of using summation of the Euclidean/Manhattan distance as our parameter, we should use summation of weight[i]distance(between 2 data-points) as our parameter. Calculating the weight is not really difficult. We can assign random weights to the respective feature value. Then we can calculate the best suited weight[i] with the help of **Cost(error) funnction and Gradient Descent* to minimise the cost.

Another way is to use feature selection.
Generally in KNN, we use backward elemination. It basically means that we calculate the accuracy of the ML model while keeping the feature and removing the feature. Accordingly it keeps or eleminates the feature(s) based on higher accuracy value. This is easier to implement for a dataset, than the previous one.

Question :
Consider the dataset given below :-

Image description

If we decide to give weights to manage our features in KNN and we have our data as shown below, what might possibly be the weights assigned to feature1 and feature2? There are other features also present in the data-set which are not shown for clarity purposes. Assume that weights vary between 0 and 100. Max weight is 100 and min weight is 0.

Answer - As we can notice that both the features are having similar values we give importance to only one feature and assign less weight to the other. Any one of the two features can be given higher weight.

Links :


This content originally appeared on DEV Community and was authored by Chakraborty Arya


Print Share Comment Cite Upload Translate Updates
APA

Chakraborty Arya | Sciencx (2022-04-02T13:17:02+00:00) Curse of dimensionality…. Retrieved from https://www.scien.cx/2022/04/02/curse-of-dimensionality/

MLA
" » Curse of dimensionality…." Chakraborty Arya | Sciencx - Saturday April 2, 2022, https://www.scien.cx/2022/04/02/curse-of-dimensionality/
HARVARD
Chakraborty Arya | Sciencx Saturday April 2, 2022 » Curse of dimensionality…., viewed ,<https://www.scien.cx/2022/04/02/curse-of-dimensionality/>
VANCOUVER
Chakraborty Arya | Sciencx - » Curse of dimensionality…. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2022/04/02/curse-of-dimensionality/
CHICAGO
" » Curse of dimensionality…." Chakraborty Arya | Sciencx - Accessed . https://www.scien.cx/2022/04/02/curse-of-dimensionality/
IEEE
" » Curse of dimensionality…." Chakraborty Arya | Sciencx [Online]. Available: https://www.scien.cx/2022/04/02/curse-of-dimensionality/. [Accessed: ]
rf:citation
» Curse of dimensionality… | Chakraborty Arya | Sciencx | https://www.scien.cx/2022/04/02/curse-of-dimensionality/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.