?

custom colormap in wordcloud

So the documentation for the Python WordCloud library is a little, ah...minimalistic, let's say. Which means that when I was helping my mom out with some code for a project she's presenting in a couple of weeks, I had a hell of a time trying to set a custom color scheme for the word cloud. Not the colors of an image; not any of the preset colormaps; a simple, custom color scheme. To be honest, I wasn't even sure it could be done.

(It would have been less of a big deal if the existing MatPlotLib colormaps sucked less for word cloud purposes,[1] but that's a different conversation.)

Anyway, I figured out how to do this, and it was by piecing some stuff together from the MatPlotLib documentation and giving it a whirl rather than actually finding an answer anywhere, so I thought I'd put up the answer I found, in case someone else needs it someday.

The essential elements

You are going to need MatPlotLib for this, but that's a prereq for the library anyway, so it shouldn't be a problem.

You're going to import matplotlib.colors as mcolors. Then, define your palette as a list of strings, colors: you can use named colors or RGB(A) values, but I used hex codes. You also need to define the number of colors, which is the length of the list. You then use these to define a custom colormap, cmap. The whole thing looks like this:

colors = ['#3F7CAC', '#65816D', '#8B862D', '#E4A21D', '#DD7D2C', '#D5573B', '#AF5447', '#765087']
bins = len(colors)
cmap = mcolors.LinearSegmentedColormap.from_list('custom_cmap', colors, N=bins)
Python

Then you use your new custom colormap, cmap, like you would any of the pre-defined ones, and you're done!

The complete code

Here's the complete code for my project, just for reference.

(A note: this is a visualization for a NLP project in Portuguese, so the stop words are Portuguese, not English.)

import pandas as pd
from spacy.lang.pt.stop_words import STOP_WORDS
from wordcloud import WordCloud
import matplotlib.colors as mcolors

df = pd.read_csv("tables/master-fados.csv")

stops = STOP_WORDS
stops.add("ai")
stops.add("oh")

df['clean'] = df.apply(lambda x: " ".join([word for word in x.text.split(" ") if word not in stops]), axis = 1)

texts = " ".join(df["clean"].to_list())

colors = ['#3F7CAC', '#65816D', '#8B862D', '#E4A21D', '#DD7D2C', '#D5573B', '#AF5447', '#765087']
bins = len(colors)
cmap = mcolors.LinearSegmentedColormap.from_list('custom_cmap', colors, N=bins)

wordcloud = WordCloud(width=1600, height=900, background_color='white',colormap=cmap).generate(texts)

wordcloud.to_file("images/whole.png")
Python

  1. Why does just about every colormap include one color that's unreadable against a white background?
tags
no messages yet!