Removing unused CSS with Purgecss/UnCSS

sage9

#1

This seems to be a common request here so I thought I would demonstrate two very simple methods which require very little modification to Sage.

Recently, I’ve been looking for a quicker alternative to UnCSS, since it is slow and often flaky. I thought PurifyCSS would be the answer since it has a nice webpack plugin, however, it doesn’t remove similar named selectors, which — if you use something like Tachyons — can be a lot. As of writing this, the main repo seems to be unmaintained as well.

Purgecss

Purgecss was originally thought of as the v2 of purifycss.1

Purgecss is a tool that works well and is actively maintained. It also has a handy webpack plugin. And though it has nothing in it, this repo looks promising as well.

Using Purgecss with Sage

Add the webpack plugin (and glob-all) to your Sage project:

yarn add --dev purgecss-webpack-plugin glob-all

Then drop the plugin in the webpack optimize config:

// resources/assets/build/webpack.config.optimize.js

// ...
const glob = require('glob-all');
const PurgecssPlugin = require('purgecss-webpack-plugin');

module.exports = {
  plugins: [
    // ... 
    new PurgecssPlugin({
      paths: glob.sync([
        'app/**/*.php',
        'resources/views/**/*.php',
        'resources/assets/scripts/**/*.js',
      ]),
      whitelist: [ // Only if you need it!
        'pr3','pv2','ph3',
        'mb1',
        'input',
        'tracked-mega'
      ],
    }),
  ],
};

Keep in mind that this will only process if you run yarn build:production as it is in the optimize config file. You could load this in the main config, however, removing CSS should be equated to minifying your CSS, which is reserved for the production build script.

As you may have noticed above, a small drawback is the need to whitelist any CSS that’s not in the specified paths, which makes using this on a site with a plugin like HTML Forms a bit painful.

Using UnCSS

Though UnCSS is more accurate since it loads the actual pages to figure out which classes are being used, this means every possible view must be shown and you’ll need to manual add the sitemap.

yarn add --dev uncss postcss-uncss

Then in the PostCSS config:

// resources/assets/build/postcss.config.js

// ...
const uncssConfig = {
  html: [
    'http://example.test',
    // Your entire sitemap added manually
    // or some other way if you’re clever (wget is handy for this).
  ]
};

// ...

module.exports = ({ file, options }) => {
  return {
    parser: options.enabled.optimize ? 'postcss-safe-parser' : undefined,
    plugins: {
      'postcss-uncss': options.enabled.optimize ? uncssConfig : false, // ← Add the plugin
      cssnano: options.enabled.optimize ? cssnanoConfig : false,
      autoprefixer: true,
    },
  };
};

Using Tailwind CSS with Sage 9 + Webpack
Building custom Tachyons (non-Sass) in Sage
#2

What about using a WordPress plugin like The SEO framework for autogenerating a sitemap.xml? Of course, non-public and hidden areas would have to be collected by a separate sitemap.


Critical CSS plugin
#3

In writing this, I tried to use The SEO Framework’s sitemap.xml, however, since the goal for this post was to try and do it in a way that was quick and easy to setup, I didn’t want to include a ton of extra dependencies (e.g. xml to json converters, sitemap generators, etc).

An idea I had was to add a pre-build script using wget to spider the entire site, ignoring robots.txt (i.e. -e robots=off), however, it was getting too convoluted for this post.

I like the idea of having a tool check your actual codebase (Purgecss) vs. the generated version of your website (UnCSS), as I do want all of my template code in my theme directory (not in the database). Of course, there are pitfalls, but if you didn’t need to theme any plugins, it’s gold.


#4

@knowler thanks for taking the time to put this together!

I’m intrigued by the idea of automating the process of creating the sitemap array for UnCSS using wget. I’ve done some research and would love some feedback to see if I’m on the right track!

To get a list of all the urls in a website was able to use this script:

wget --spider --recursive --level=inf --no-verbose --accept html --output-file=./test.txt http://example.test/

I then edited the file using this script:

grep -i URL ./test.txt | awk -F 'URL:' '{print $2}' | awk '{$1=$1};1' | awk '{print $1}' | sort -u | sed '/^$/d' > ./sortedurls.txt

Which gave a list like this:

http://example.test/
http://example.test/page1/
http://example.test/page2/
...

The next step would be to create a JavaScript array from this file. I figure this can be done using node-fs. Something like this:

const fs = require('fs');
const text = fs.readFileSync('./sortedurls.txt', 'utf-8');
const arr = text.split('\n');

Which would create an array like this:

[ 'http://example.test/', 
  'http://example.text/page1/', 
  'http://example.text/page2/',
  ... ]

However, this does rely on the ability to use the Node File System module out of the box with Webpack, which I haven’t tried (and I can’t say I’m an expert on Node or Webpack).

You would also need to be able to execute a bash script to use wget, grep, and awk. But it looks like that isn’t too hard to do with this plugin. Depending on the size of a site though, this whole thing could take a long time!

It’s very possible I’ve completely overcomplicated this though. Would love feedback, especially if there is an easier way to go about it!


#5

This looks awesome. I’ll have to take a look at it later, but first, one word of warning regarding crawling with wget is that you need to whitelist your IP so you don’t get blocked (edit: for remote servers of course).


#6

That’s a good point!

I also saw someone suggest adding --wait=1 to the wget command in order to not decrease your site’s performance while generating the sitemap.


#7

@knowler thanks for sharing these methods!

I use your UnCSS as a PostCSS plugin method. For the sitemap I use the WordPress JSON Sitemap Generator plugin I made. It generates a JSON sitemap on non-production environments and is available via: http://yourdomain.com?json_sitemap.

The JSON sitemap includes:

  • Single posts
  • Pages
  • Single CPT’s
  • Author archives
  • Term archives
  • CT term archives
  • Monthly archives

Plus, these special pages:

  • Empty search results page
  • Search results page with no results
  • Search results page with pagination
  • 404 page

I hope my plugin comes in handy for others as well. :grinning:


#8

Wow, this is awesome!


#9

@knowler @Henk thank you for this, works beautifully!


#10

Hey all - would this be a viable option if you have 3,000+ articles and growing?


#11

Using PurgeCSS is always a safe option if you have the paths and whitelist set correctly since it just scans for classes that are being used in your source PHP and JS. There is no need for a sitemap.

Using UnCSS is the less safe option (but most accurate when it works). UnCSS uses a headless browser (used to be PhantomJS, but now is jsdom >= 0.15.0) to process the actual/generated HTML (that’s why we need a sitemap) to find out which styles are being used. The reason I call this option “less safe” is because it will only process what you feed it, which means you have to give it an accurate sitemap. You need parity between your development and production databases for this. Also, the larger your sitemap, the longer it will take UnCSS to process it, which is why I call “slow and flaky” in the OP. In the past, using UnCSS, I have found it to break for no apparent reason during a build. If you choose UnCSS, you will need to be more strategic in your approach, especially if the site in question has “3000+ articles and growing.”

Personally, I have completely switched from UnCSS to PurgeCSS for removing unused CSS. I find it a lot more trustworthy and I think it’s very important to be able to run yarn build:production twice in a row and expect the same results (if my code hasn’t changed).


#12

If you choose UnCSS, you will need to be more strategic in your approach, especially if the site in question has “3000+ articles and growing.”

For instance, you could limit the sitemap to a number of representative posts. In my experience this is doable and can be accurate.