# Information

WEEK 4 QUIZ

## Information

==Identifying NBA Player Positions Using Clustering==

*Standard basketball player positions are outdated and often do not describe what a player does when the player steps on the court anymore. To discover the characteristics that define current players you decided to collect and cluster basket players. Towards this extent, you have collected a list of players and their statistics for the 2018 / 2019 season. Furthermore, you have arranged the data collected in an analytic base table that consist of 403 instances and thirteen features. The features include points (PTS), field goals attempted (FGA), field goals percentage (FG.), three-point field goals attempted (X3PA), three-point field goals percentage (X3P.), free throws attempted (FTA), free throws percentage (FT.), offensive rebounds (OREB), defensive rebounds (DREB), assists (AST), turnovers (TOV), steals (STL), and blocks (BLK).*

# Question 1

The graph provided below displays the distribution of each of the thirteen features using box plots. Assuming that we want each feature to hold the same level of importance, should we modify the ranges of some of the features? Select one:

- a) No, changing the range of each feature is not required to achieve equal importance.
- b) Yes, adjusting the range of each feature is necessary to achieve equal importance.

- Memo
The correct answer is: b) Yes

Since the features are measured on different scales, features such as FT will dominate the distance equation. To prevent this from occurring we need to scale the feature values to a similar range.

# Question 2

The graph provided below displays the distribution of each of the 13 features after a specific numerical transformation has been applied. Based on the graph provided was normalisation or standardisation performed? Select one:

- a) Each feature was normalised to the same range
- b) Standardisation was applied

- Memo
The correct answer is: b) Standardisation was applied

If each of the features were normalised to the same range (e.g. -1 to 1), each feature would have had at least one instance located at the maximum value and one instance located at the minimum value. Since this is not the case in the graph, the features were standardised.

# Question 3

Before we cluster the data we first need to evaluate the cluster tendency of the data. To evaluate the cluster tendency of the data we plan to use visual inspection of clusters. How many distance calculations do we need to perform to compute the dissimilarity matrix optimally?

- Memo
The correct answer is: 81003

The data set contains 403 instances. We need to calculate the pairwise distance between every instance. However since the data is symmetrical, and the distance between the same instances is zero, we only need to calculate 81003 measurements i.e. (403 x 403 – 403)/2.

# Question 4

Inspect the ordered dissimilarity image provided in the figure below for potential clusters. Do you observe any clusters? Select one:

- a) No, clusters cannot be observed.
- b) Yes, clusters can be observed.

- Memo
The correct answer is: b) Yes, clusters can be observed

The emergence of clear structures suggests that the data is not complexly random.

# Question 5

Assume that we want to cluster the instances using k-means clustering. Since we do not know the number of clusters e.g. k in advance we need to try different values of k. The graph below displays the total within-cluster sum of squares (w) obtained for different values of k. Interestingly, w increased when k increased from eight to nine. Can w increase when k increase or did we make a mistake in our code? Select one:

- a) Yes, an increase in the sum of square errors is possible and can occur in certain scenarios
- b) No, an increase in the sum of square errors should not typically happen and may indicate a mistake in the code

- Memo
The correct answer is: a) Yes

The k-means clustering algorithm is not guaranteed to find a global minimum. Instead, we can find local minima. In such a case, increasing k may lead to an increase in the within-sum of square error.

# Question 6

The parallel coordinate plot below visualises the clusters assigned to each instance when k was set to three. Based on the parallel coordinate plot provided select the most correct option: Players assigned to cluster one are (select one):

- a) Dominant defenders and rebounders
- b) All round clutch shooters
- c) Offensive playmakers

- Memo
The correct answer is: a) Dominant defenders and rebounders

When analysing the parallel coordinate plot examine features that help us to differentiate an instance from other instances. For example, players assigned to cluster 1 has a high value for the feature BLK, DREB, FG. and OREB. These are features associated with defensive players.

# Question 7

To validate if the clusters were not found by chance, random sampling was performed and k-means clustering was reapplied, setting k equal to three. The results of the clustering are illustrated in the parallel coordinate plot provided below. Are the clusters stable? Select one:

- a) Yes, the clusters are stable
- b) No, the clusters are not stable

- Memo
The correct answer is a) Yes

The clustering results are stable since most of the instances are assigned to the same clusters previously found (although the labels differ).

# Question 8

The dendrogram below shows the results obtained when clustering the players using hierarchical clustering. If the dendrogram is cut at y = 40, how many clusters would remain?

- Memo
The correct answer is: 40

A horizontal line drawn at y = 40 will “cut” two vertical lines. These lines each represent a cluster.

# Question 9

The dendrogram below shows the results obtained when clustering the players using hierarchical clustering. The name of each player is displayed on the x-axis and the colour assigned to the player name is based on the clustering results previously obtained with k-means clustering. If we would have selected three clusters using the hierarchical clustering approach, would we have obtained similar results as the k-means clustering? Select one:

- a) No
- b) Yes

- Memo
The correct answer is: a) Yes

Yes. When we draw a horizontal line at y = 30 we end up with three clusters. These clusters are similar to the clusters obtained with k-means clustering.