Generate Word Clouds Of Any Shape In Python

Generate Word Clouds Of Any Shape In Python

Word cloud is an effective way of visualizing the texts. From a pool of texts, you can see which words are the dominants. They are fun and engaging visuals. So, just by looking at this visualization, you know the mode of the text. In this article, I am going to explain how to generate a word cloud using a python module called WordCloud. This is simple and easy. I will start with a simple word cloud and then show some custom and cool shape.

Setup

For this tutorial, I will use a dataset from Kaggle. Please feel free to download the dataset and follow along:

To use the WordCloud module, you need to install it. That can be done by using the pip install command:

pip install wordcloud

The command for anaconda users:

conda install -c conda-forge wordcloud

The tools to be used:

  1. Numpy Library
  2. Pandas Library
  3. Matplotlib Library
  4. Pillow Imaging library
  5. Jupyter Notebook environment

Please make sure that you have them installed.

Simple Word Cloud

The simplest version is very easy to build. First import the necessary packages and dataset.

import numpy as np
import pandas as pd
from os import path
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv("winemag-data-130k-v2.csv", index_col=0)

As you can see, this dataset has a description of wines of different countries and some other information as well. But for this tutorial, I will only focus on the description column because that contains a good amount of text. I will join all the descriptions and make one large text.

text = " ".join(review for review in df.description)
print ("There are {} words in the combination of all review.".format(len(text)))
#output:
There are 31661073 words in the combination of all review.

That’s a big text data. Here is the code snippet to generate the simplest word cloud.

wordcl = WordCloud().generate(text)
plt.imshow(wordcl, interpolation='bilinear')
plt.axis('off')
plt.show()

The most basic word cloud is done! The bigger and bolder the word is, the more frequently it appeared in the text.

Let’s see how to improve this word cloud.

Improve The Word Cloud

One important improvement you may think of is, to get rid of some less important words such as is, are, too, some, etc. That is very easy. Because there is a set of those words called ‘stopwords’ that are already there for us to use.

stopwords = set(STOPWORDS)

Here is part of the output. You can see some of the stopwords. There are a few more words I want to add in the stopwords. Such as the word ‘wine’. That is probably the most used word in the text as the word looks very big in the WordCloud. We know this is about wine. We do not need to visualize that word so big. I will also add a few other words in the stopwords as follows:

stopwords.update("drink", "now", "wine", "made", "the")

I will use a few more styling parameters as well.

background_color: To change the default black background

max_font_size: In the example above, you can see that some words are really large. I believe it will look better if we can control that.

max_words: Based on the frequency of appearance, it will present the specified number of words in the word cloud.

Let’s use all the parameters explained above and create the word cloud again:

wordcl = WordCloud(stopwords=stopwords, background_color="white", max_font_size=50, max_words= 2000).generate(text)
plt.figure(figsize=(10, 8))
plt.imshow(wordcl, interpolation='bilinear')
plt.axis('off')
plt.show()

I suggest, please feel free to play with these parameters.

Use A Custom Shape

Instead of having a rectangular shape, we can make this word cloud in the shape of our choice using a custom shape. To demonstrate that I will use this picture:

Find this picture in this link

You can take a screenshot from this page as well to use this picture. We will make our word cloud in this shape. I saved this picture in a folder called ‘img’.




mask = np.array(Image.open("img/w_wine.png"))

Now, use this mask in the WordCloud module.

wc = WordCloud(background_color='black', mask=mask, mode='RGB', 
              width=1000, max_words=200, height=1000,
              random_state=1)
wc.generate(text)
plt.figure(figsize=(10, 10))
plt.imshow(wc, interpolation='bilinear')
plt.tight_layout(pad=0)
plt.axis('off')
plt.show()

In my opinion, instead of all these colors, white-colored words will look clearer and better in black background.

wc = WordCloud(background_color='black', mask=mask, mode='RGB', 
               color_func=lambda *args, **kwargs: "white", 
              width=1000, max_words=200, height=1000,
              random_state=1)
wc.generate(text)
plt.figure(figsize=(10, 10))
plt.imshow(wc, interpolation='bilinear')
plt.tight_layout(pad=0)
plt.axis('off')
plt.show()

Get a more accurate shape using contour.

wc = WordCloud(background_color='white', mask=mask, mode='RGB',
               width=1000, max_words=1000, height=1000,
               random_state=1, contour_width=1, contour_color='steelblue')
wc.generate(text)
plt.figure(figsize=(10, 10))
plt.imshow(wc, interpolation='bilinear')
plt.tight_layout(pad=0)
plt.axis('off')
plt.show()

Isn’t that cool! Using the same code I developed this word cloud with the picture beside.

Find the picture in the right here

Please feel free to try it. Do we have to get those black pictures always to do this type of visualization? Not really. We can use a colored picture and make the color of words as the pictures. I will use this colored picture next.

Find this picture in this link

Here we will use the Image color generator module to generate the color for the word cloud. Here is the total code snippet:

bottles = np.array(Image.open("img/bottle size.png"))
wc = WordCloud(background_color='white', mask=bottles)
wc.generate(text)
image_colors = ImageColorGenerator(bottles)
wc.recolor(color_func=image_colors)
plt.figure(figsize=[10, 10])
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.show()

Look the color of the words are almost like the color of the bottles. But it is not that exact. Because the color of the bottles is not uniform and simple.

One disadvantage is if the shape is too complicated and the color is complicated it may not give you the desired result.

The shape of the bottles is simple. So, the shape of the word cloud came out to be reasonably clear. But the color of the bottles is a bit complicated. In my opinion, the word cloud could not catch the exact color of the bottles.

As I showed before, you can put a border on this.

wc = WordCloud(background_color="white", mask=bottles,
              contour_width=1, contour_color='firebrick')
wc.generate(text)
image_colors = ImageColorGenerator(bottles)
wc.recolor(color_func=image_colors)
plt.figure(figsize=(10,10))
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.show()

Finally, you may want to save this image in a file. I have a folder called ‘img’ in the same folder as this notebook. I decided to save this file there. Here is how to do that:

wc.to_file("img/wine.png")

Conclusion

I tried to explain the process of making a word cloud in this article. Provided some code snippet that should work well. There are other parameters in the WordCloud module, I did not touch it because I wanted to keep it as simple as I can.

 

#DataScience #WordCloud #DataVisualization #Python #DataAnalytics 

Leave a Reply

Close Menu