- Matplotlib 3.0 Cookbook
- Srinivasa Rao Poladi
- 629字
- 2021-08-13 15:16:10
There's more...
In the preceding plot, we displayed the complete data range available in the input data. However, we can limit the the data range we want to color code using the plt.clim() method. This is useful when there are outliers (a few entries of the variable are too large or too small compared to rest of the data), and then by default the colormap extends color codes for the complete range of data. When there are outliers, because of the larger range, the majority of the values get blurred and outliers stand out with extreme colors. If we want to suppress the impact of outliers, we can limit the data range that excludes the outliers and plot the desired range with a complete list of colors in the colorbar. This gives a true picture of the data distribution. Here is how we can accomplish this.
First, we will add noise to 10% of the image pixels to the image that we have created. We will have to do a bit of data pre-processing to add the noise, which increases the data range beyond -1 to 1. We will then plot the same image with a full data range and a data range limited to -1 to 1 , and see the difference between the two. As can be seen, in the first image, a few cells (imputed noise or outliers) are bright in blue or white, where as all others are blur, but in the second image, all the cell colors vary uniformly in the range of -1 to 1:
- Make noise in 10% of the image pixels:
np.random.seed(0)
mask = (np.random.random(corr.shape) < 0.1)
columns = corr.columns
corr1 = np.array(corr)
corr1[mask] = np.random.normal(0, 5, np.count_nonzero(mask))
corr = pd.DataFrame(corr1, columns=columns)
- Define the figure and its size:
plt.figure(figsize=(12, 5))
- Define the first axes and the plot correlation map on it:
plt.subplot(121)
plt.imshow(corr, cmap='Blues')
plt.colorbar()
plt.xticks(range(len(corr)),corr.columns, rotation=75)
plt.yticks(range(len(corr)),corr.columns)
- Define the second axes and plot the correlation map, with limits on data:
plt.subplot(122)
plt.imshow(corr, cmap='Blues')
plt.colorbar(extend='both')
plt.clim(-1, 1)
plt.xticks(range(len(corr)),corr.columns, rotation=75)
plt.yticks(range(len(corr)),corr.columns)
- Adjust the space in between the plots, and display the figure on the screen:
plt.tight_layout()
plt.show()
Here is the explanation of the code and how it works:
- np.random.seed(0) sets the seed so that every time we run the random number generator we get the same data. This ensures repeatability of results, when we run the same code multiple times.
- mask = (np.random.random(corr.shape) < 0.1) creates a matrix of the same size of corr, with entries as True for all values less than 0.1; otherwise they are False.
- columns = corr.columns extracts the column names from the corr pandas DataFrame for later use.
- corr1 = np.array(corr) creates a NumPy array for our corr data frame, as the next statement works well with the NumPy array rather than data frame.
- corr1[mask] = np.random.normal(0, 5, np.count_nonzero(mask)) replaces entries in corr1, corresponding to the True entries in the mask, by generating random normal values for 10% of the entries, and this distribution would have a mean of zero mean and a standard deviation of five. The idea is to replace 10% of the entries with larger values representing noise.
- corr = pd.DataFrame(corr1, columns=columns) creates the data frame for the noise-imputed correlation matrix.
- plt.subplot(1, 2, 1) creates the axes for the first plot, where we would display the noise-imputed image.
- plt.imshow(corr, cmap='Blues') plots the image in plot1, followed by colorbar, ticks, and ticklabels.
- plt.subplot(1, 2, 2) creates the axes for the second plot, in which we limit the data range between -1 and 1.
- lt.colorbar(extend='both') plots the colorbar with arrows on both ends, indicating that the range extends beyond what is displayed.
- plt.clim(-1, 1) limits the colors (actually the data range) to between -1 and 1.
You should see following plots: