Migrating our Monorepo to Yarn 2

WEB
9 min read

DoltHub is a web-based UI built in React to share, discover, and collaborate on Dolt databases. We recently migrated our monorepo to Yarn 2 (or Yarn Modern). It took us some extra steps to make Yarn 2 work with our monorepo and other infrastructure. We thought sharing them could be useful for others looking to adopt Yarn 2.

Why migrate to Yarn 2

Yarn summarizes the reasons to upgrade to Yarn Modern on their website. In short, they include:

  • New features
  • Efficiency
  • Extensibility
  • Stability
  • Future proof

I initially decided to upgrade our repository when I came across one of Yarn 2's new commands researching a solution to a dependency issue we were having. I started looking into it and got excited about some of the new commands, workspace tools, and potential performance gains, so I decided to try it out.

Our architecture

Our monorepo (named ld, short for Liquidata, our former company name) houses all of our back-end, front-end, deployment configuration, and other related code. Our front-end code lives in a directory within ld called web, which is split up into packages managed with Yarn workspaces. This means that each package is just a regular NPM package with its own package.json. We can add our packages as dependencies in one another's package.json files exactly as with any other package, but they are resolved locally and share a single node_modules and yarn.lock.

This is an abbreviated version of what our web directory looks like:

- web
  - node_modules
  - packages
    - blog
      - package.json
    - dolthub
      - package.json
    - graphql-server
      - package.json
    - shared-components
      - package.json
    - tailwind-config
      - package.json
  - package.json
  - yarn.lock

You can read more about our front-end architecture here.

How we migrated our monorepo

In the end, the actual changes required to migrate to Yarn 2 were not that many. However, I did try and fail a few times before landing the change. At first, it was a dependency that was breaking our website build. To resolve it we either needed to downgrade to webpack 4 or upgrade to React 18 (which was in its beta phase at the time). We wanted React 18 to be more stable before upgrading and webpack 4 was not working with Yarn 2.

Once we upgraded React to its release candidate I tried Yarn 2 again. I gave up on zero-installs pretty early because it didn't seem like enough of the tools we were using were compatible (specifically ESLint, dependabot) and it still seemed like there were enough benefits without it to continue with the migration. There were a few extra changes I needed to make outside of the four main documented ones which I discuss below, and I was eventually able to successfully land the migration.

Yarn has a step-by-step guide to migrate your repository. Here are all the steps we needed to migrate our monorepo.

1. Install yarn

web % npm install -g yarn
web % yarn set version berry # I forgot this step initially and it was a pain to switch between branches with different versions

2. Add .yarnrc to web

nodeLinker: node-modules

yarnPath: .yarn/releases/yarn-3.1.1.cjs

3. Commit changes and run yarn install

web % yarn install

4. Add to web/.gitignore

.pnp.*
.yarn/*
!.yarn/patches
!.yarn/plugins
!.yarn/releases
!.yarn/sdks
!.yarn/versions

5. Add TypeScript plugin

We also added Yarn's TypeScript plugin by running yarn plugin import typescript. This automatically adds @types/ packages into your dependencies when you add a package that doesn't include its own types.

6. package.json updates

We were already using workspaces with Classic Yarn, but there were a few changes we needed to make to our package.json scripts to get them to work with Yarn 2.

In our root web/package.json, we have scripts that run commands for individual packages, as well as multiple packages at once. We were originally using --cwd (which specifies the working directory) to run a script for a specific package. Yarn 2 no longer supported this argument, so we used yarn workspace instead:

{
  "scripts: {
-   "test:blog": "yarn --cwd 'packages/blog' test",
+   "test:blog": "yarn workspace @dolthub/blog test"
  }
}

We use npm-run-all to run a command in more than one package. We can still use npm-run-all with some small changes as so:

{
  "scripts: {
-   "compile": "npm-run-all compile:*",
-   "compile:fakers": "yarn --cwd 'packages/fakers' compile",
-   "compile:utils": "yarn --cwd 'packages/utils' compile",
-   "compile:resource-utils": "yarn --cwd 'packages/resource-utils' compile",
-   "compile:shared-components": "yarn --cwd 'packages/shared-components' compile",
-   "compile:tailwind-config": "yarn --cwd 'packages/tailwind-config' compile",
-   "compile:blog": "yarn --cwd 'packages/blog' compile",
-   "compile:graphql-server": "yarn --cwd 'packages/graphql-server' compile",
-   "compile:dolthub": "yarn --cwd 'packages/dolthub' compile",
-   "check:graphql-server": "yarn --cwd 'packages/graphql-server' run check-server",
+   "compile": "npm-run-all 'compile:*'",
+   "compile:fakers": "yarn workspace @dolthub/fakers compile",
+   "compile:utils": "yarn workspace @dolthub/utils compile",
+   "compile:resource-utils": "yarn workspace @dolthub/resource-utils compile",
+   "compile:shared-components": "yarn workspace @dolthub/shared-components compile",
+   "compile:tailwind-config": "yarn workspace @dolthub/tailwind-config compile",
+   "compile:blog": "yarn workspace @dolthub/blog compile",
+   "compile:graphql-server": "yarn workspace @dolthub/graphql-server compile",
+   "compile:dolthub": "yarn workspace @dolthub/dolthub compile",
+   "check:graphql-server": "yarn workspace @dolthub/graphql-server run check-server",
  }
}

Yarn 2 has a new workspaces foreach command that accomplishes this same thing (don't forget to install the workspace-tools plugin first).

web % yarn workspaces foreach run compile

Notice we also now need single quotes around compile in the line "compile": "npm-run-all 'compile:*'". We had a few scripts we needed to add single quotes too, including our clean scripts, which use rimraf. Without the single quotes around file paths, whenever that file was not found rimraf would error instead of skipping and moving on.

{
  "scripts: {
-   "clean": "npm-run-all clean:*",
-   "clean:blog": "yarn --cwd 'packages/blog' clean",
-   "clean:misc": "rimraf node_modules packages/*/node_modules packages/*/.eslintcache packages/*/*.tsbuildinfo packages/*/dist packages/*/.rts2_cache* packages/dolthub/.next",
+   "clean": "npm-run-all 'clean:*'",
+   "clean:blog": "yarn workspace @dolthub/blog clean",
+   "clean:misc": "rimraf node_modules 'packages/*/node_modules' 'packages/*/.eslintcache' 'packages/*/*.tsbuildinfo' 'packages/*/dist' packages/dolthub/.next",
  }
}

7. Upgrade docker-node

We use Docker to deploy our DoltHub services. After all of the above, everything was building and running smoothly. I then tried to deploy and could not do so successfully. After an embarrassing amount of time trying to figure out why, I finally realized the issue was the docker-node version we were using in our Dockerfiles. Upgrading from 14.17.4 to 16.14.0 solved the issue.

At this point everything was working and I was able to land the Yarn migration. But I had forgotten to check one thing.

8. Handling Dependabot incompatibilities

Every month Dependabot bumps our web dependencies. This keeps our packages up-to-date and helps prevent dependency hell if we do need to upgrade something.

When that time arrived, I realized not only does Dependabot not support Plug n Play, but it doesn't work with Yarn 2 at all! When it upgrades a dependency it either does not come with yarn.lock changes or converts the yarn.lock file to Yarn Classic, which breaks everything. This dependabot issue with a request to support Yarn 2 has been open for over two years and has hundreds of supporters, and still nothing.

In this issue I found a comment with a workaround GitHub Actions workflow. When I tried it out it worked for the case when there was no yarn.lock file (upgrading dependencies in an individual workspace package.json), but not when the yarn.lock file was wrong (updating a dependency in the root package.json).

I changed the workflow a bit so that instead of just running yarn install and committing the yarn.lock file, it soft resets the last commit, undoes any yarn.lock changes, and then runs yarn install and commits. You can view the workflow we use here.

All in all not too much harm done, but it would be great if Dependabot could support Yarn 2 as it continues to be adopted by more and more people.

Is it worth it?

There are some benefits to migrating to Yarn 2 without Zero-Installs, but after some comparison we didn't really see any performance gains, and it was even worse than Yarn Classic at times.

Here's a little comparison for our repository*:

Yarn Classic Yarn Modern
node_modules size** 1.3G 1.4G
web directory size 2.0G 2.2G
yarn.lock lines 19517 27960
time yarn.install, no cache 79.44s user 146.53s system 205% cpu 1:49.86 total 133.45s user 121.33s system 151% cpu 2:47.98 total
time yarn install, with Y2 cache 79.44s user 146.53s system 205% cpu 1:49.86 total 99.87s user 93.42s system 172% cpu 1:52.02 total
time yarn add [dep] 17.67s user 22.31s system 183% cpu 21.779 total 10.81s user 1.57s system 121% cpu 10.213 total
time yarn remove [dep] 4.09s user 1.45s system 127% cpu 4.353 total 8.76s user 1.38s system 121% cpu 8.344 total

* This comparison would be more accurate if we averaged the results from many runs because there's a lot of variation between runs.

** There was a typo in an earlier version of this blog that mistakenly listed the node_modules size as the web directory size. It has now been fixed.

Yarn also maintains performance benchmarks that compares different versions of Yarn as well as other package managers like NPM for Next.js and Gatsby apps (both of which we use!). You can check that out here.

It would be interesting to see how performance for our repository would compare to Yarn 2 with Zero-Installs. In theory it would be possible to reach zero second installs, even for large repositories like ours. I'm looking forward to the day more tools are compatible.

In the meantime, there are some useful Yarn 2 features that could make migrating worth it, like types support and improved readability and usability of logs and commands.

One underrated feature, especially for those using workspaces, is automatic resolution of different versions of the same dependency in different packages. With Yarn Classic we needed to add a resolutions field to our web/package.json like this:

"resolutions": {
  "**/react": "^18.0.0-rc.0",
  "**/react-dom": "^18.0.0-rc.0"
}

It looks for different versions of react and react-dom and resolves them in our yarn.lock after running yarn install. This prevents our React build from breaking with the dreaded Invalid hook call. Hooks can only be called inside of the body of a function component error.

With Yarn Modern we no longer need the resolutions field. If I upgrade React in just one of our workspaces (like shared-components), it actually updates React in all workspaces that have React as a dependency.

shared-components % yarn up react

web % git status
On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   packages/blog/package.json
        modified:   packages/dolthub/package.json
        modified:   packages/shared-components/package.json
        modified:   yarn.lock

This saves a lot of headache, especially as we continue to develop and share code between different packages.

We waited over two years since it's first stable release to adopt Yarn 2. It seems like many of the kinks have been worked out since then, and it will continue to improve with time.

If you're interested in learning more about Dolt or DoltHub or have found better solutions to any of the above, feel free to reach out to me on Discord in the #dolthub channel.

A performance improvement for Yarn 2 without Zero-Installs

Updated 03/21/2022

After posting this blog in a few places on the Internet, people had some feedback and insights on Zero-Installs performance. It seems like people who have tried PnP with Zero-Installs did have improved performance as expected and recommended it despite some frustration with the migration process.

It also turns out that there is additional yarnrc.yml configuration that improves Yarn 2 performance without Zero-Installs. Victor Vlasenko (larixer), one of the Yarn 2+ maintainers who has written most of the code specific to node_modules support for Yarn 2 and also works at SysGears, joined our Discord to offer advice on how to improve our performance for Yarn 2 without Zero Installs:

yarnrc.yml updates

As recommended, we added these three lines to our .yarnrc.yml:

compressionLevel: 0
nmMode: hardlinks-local
enableGlobalCache: true

And we did see some improvements. Here's how it compares to the performance metrics we used above:

Yarn Modern with updated yarnrc.yml
node_modules size 1.2G
web directory size 1.7G
yarn.lock lines 27960
time yarn.install, no cache 116.74s user 106.38s system 159% cpu 2:20.11 total
time yarn install, with Y2 cache 52.59s user 77.35s system 188% cpu 1:09.04 total
time yarn add [dep] 10.81s user 1.57s system 121% cpu 10.213 total
time yarn remove [dep] 8.36s user 1.52s system 119% cpu 8.242 total

While some performance gains are less significant, compared to Yarn Classic almost every item has now shown some kind of improvement. Most significantly, the time to install was cut almost in half!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.