Removing unused CSS with Purgecss/UnCSS

sage9

#1

This seems to be a common request here so I thought I would demonstrate two very simple methods which require very little modification to Sage.

Recently, I’ve been looking for a quicker alternative to UnCSS, since it is slow and often flaky. I thought PurifyCSS would be the answer since it has a nice webpack plugin, however, it doesn’t remove similar named selectors, which — if you use something like Tachyons — can be a lot. As of writing this, the main repo seems to be unmaintained as well.

Purgecss

Purgecss was originally thought of as the v2 of purifycss.1

Purgecss is a tool that works well and is actively maintained. It also has a handy webpack plugin. And though it has nothing in it, this repo looks promising as well.

Using Purgecss with Sage

Add the webpack plugin (and glob-all) to your Sage project:

yarn add --dev purgecss-webpack-plugin glob-all

Then drop the plugin in the webpack optimize config:

// resources/assets/build/webpack.config.optimize.js

// ...
const glob = require('glob-all');
const PurgecssPlugin = require('purgecss-webpack-plugin');

module.exports = {
  plugins: [
    // ... 
    new PurgecssPlugin({
      paths: glob.sync([
        'app/**/*.php',
        'resources/views/**/*.php',
        'resources/assets/scripts/**/*.js',
      ]),
      whitelist: [ // Only if you need it!
        'pr3','pv2','ph3',
        'mb1',
        'input',
        'tracked-mega'
      ],
    }),
  ],
};

Keep in mind that this will only process if you run yarn build:production as it is in the optimize config file. You could load this in the main config, however, removing CSS should be equated to minifying your CSS, which is reserved for the production build script.

As you may have noticed above, a small drawback is the need to whitelist any CSS that’s not in the specified paths, which makes using this on a site with a plugin like HTML Forms a bit painful.

Using UnCSS

Though UnCSS is more accurate since it loads the actual pages to figure out which classes are being used, this means every possible view must be shown and you’ll need to manual add the sitemap.

yarn add --dev uncss postcss-uncss

Then in the PostCSS config:

// resources/assets/build/postcss.config.js

// ...
const uncssConfig = {
  html: [
    'http://example.test',
    // Your entire sitemap added manually
    // or some other way if you’re clever (wget is handy for this).
  ]
};

// ...

module.exports = ({ file, options }) => {
  return {
    parser: options.enabled.optimize ? 'postcss-safe-parser' : undefined,
    plugins: {
      'postcss-uncss': options.enabled.optimize ? uncssConfig : false, // ← Add the plugin
      cssnano: options.enabled.optimize ? cssnanoConfig : false,
      autoprefixer: true,
    },
  };
};

#2

What about using a WordPress plugin like The SEO framework for autogenerating a sitemap.xml? Of course, non-public and hidden areas would have to be collected by a separate sitemap.


#3

In writing this, I tried to use The SEO Framework’s sitemap.xml, however, since the goal for this post was to try and do it in a way that was quick and easy to setup, I didn’t want to include a ton of extra dependencies (e.g. xml to json converters, sitemap generators, etc).

An idea I had was to add a pre-build script using wget to spider the entire site, ignoring robots.txt (i.e. -e robots=off), however, it was getting too convoluted for this post.

I like the idea of having a tool check your actual codebase (Purgecss) vs. the generated version of your website (UnCSS), as I do want all of my template code in my theme directory (not in the database). Of course, there are pitfalls, but if you didn’t need to theme any plugins, it’s gold.


Building custom Tachyons (non-Sass) in Sage
#4

@knowler thanks for taking the time to put this together!

I’m intrigued by the idea of automating the process of creating the sitemap array for UnCSS using wget. I’ve done some research and would love some feedback to see if I’m on the right track!

To get a list of all the urls in a website was able to use this script:

wget --spider --recursive --level=inf --no-verbose --accept html --output-file=./test.txt http://example.test/

I then edited the file using this script:

grep -i URL ./test.txt | awk -F 'URL:' '{print $2}' | awk '{$1=$1};1' | awk '{print $1}' | sort -u | sed '/^$/d' > ./sortedurls.txt

Which gave a list like this:

http://example.test/
http://example.test/page1/
http://example.test/page2/
...

The next step would be to create a JavaScript array from this file. I figure this can be done using node-fs. Something like this:

const fs = require('fs');
const text = fs.readFileSync('./sortedurls.txt', 'utf-8');
const arr = text.split('\n');

Which would create an array like this:

[ 'http://example.test/', 
  'http://example.text/page1/', 
  'http://example.text/page2/',
  ... ]

However, this does rely on the ability to use the Node File System module out of the box with Webpack, which I haven’t tried (and I can’t say I’m an expert on Node or Webpack).

You would also need to be able to execute a bash script to use wget, grep, and awk. But it looks like that isn’t too hard to do with this plugin. Depending on the size of a site though, this whole thing could take a long time!

It’s very possible I’ve completely overcomplicated this though. Would love feedback, especially if there is an easier way to go about it!


#5

This looks awesome. I’ll have to take a look at it later, but first, one word of warning regarding crawling with wget is that you need to whitelist your IP so you don’t get blocked (edit: for remote servers of course).


#6

That’s a good point!

I also saw someone suggest adding --wait=1 to the wget command in order to not decrease your site’s performance while generating the sitemap.