Working with file system paths and file URLs on Node.js

[2022-07-15] dev, javascript, nodejs
(Ad, please don’t block)
Warning: This blog post is outdated. Instead, read chapter “Working with file system paths and file URLs on Node.js” in “Shell scripting with Node.js”.

In this blog post, we learn how to work with file system paths and file URLs on Node.js.

In this blog post, we explore path-related functionality on Node.js:

  • Most path-related functionality is in module 'node:path'.
  • The global variable process has methods for changing the current working directory (what that is, is explained soon).
  • Module 'node:os' has functions that return the paths of important directories.

The three ways of accessing the 'node:path' API  

Module 'node:path' is often imported as follows:

import * as path from 'node:path';

In this blog post, this import statement is occasionally omitted. We also omit the following import:

import * as assert from 'node:assert/strict';

We can access Node’s path API in three ways:

  • We can access platform-specific versions of the API:
    • path.posix supports Unixes including macOS.
    • path.win32 supports Windows.
  • path itself always supports the current platform. For example, this is a REPL interaction on macOS:
    > path.parse === path.posix.parse
    true
    

Let’s see how function path.parse(), which parses file system paths, differs for the two platforms:

> path.win32.parse(String.raw`C:\Users\jane\file.txt`)
{
  dir: 'C:\\Users\\jane',
  root: 'C:\\',
  base: 'file.txt',
  name: 'file',
  ext: '.txt',
}
> path.posix.parse(String.raw`C:\Users\jane\file.txt`)
{
  dir: '',
  root: '',
  base: 'C:\\Users\\jane\\file.txt',
  name: 'C:\\Users\\jane\\file',
  ext: '.txt',
}

We parse a Windows path – first correctly via the path.win32 API, then via the path.posix API. We can see that in the latter case, the path isn’t correctly split into its parts – for example, the basename of the file should be file.txt (more on what the other properties mean later).

Foundational path concepts and their API support  

Path segments, path separators, path delimiters  

Terminology:

  • A non-empty path consists of one or more path segments – most often names of directories or files.
  • A path separator is used to separate two adjacent path segments in a path:
    > path.posix.sep
    '/'
    > path.win32.sep
    '\\'
    
  • A path delimiter separates elements in a list of paths:
    > path.posix.delimiter
    ':'
    > path.win32.delimiter
    ';'
    

We can see path separators and path delimitors if we examine the PATH shell variable – which contains the paths where the operating system looks for executables when a command is entered in a shell.

This is an example of a macOS PATH (shell variable $PATH):

> process.env.PATH.split(/(?<=:)/)
[
  '/opt/homebrew/bin:',
  '/opt/homebrew/sbin:',
  '/usr/local/bin:',
  '/usr/bin:',
  '/bin:',
  '/usr/sbin:',
  '/sbin',
]

The split separator has a length of zero because the lookbehind assertion (?<=:) matches if a given location is preceded by a colon but it does not capture anything. Therefore, the path delimiter ':' is included in the preceding path.

This is an example of a Windows PATH (shell variable %Path%):

> process.env.Path.split(/(?<=;)/)
[
  'C:\\Windows\\system32;',
  'C:\\Windows;',
  'C:\\Windows\\System32\\Wbem;',
  'C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;',
  'C:\\Windows\\System32\\OpenSSH\\;',
  'C:\\ProgramData\\chocolatey\\bin;',
  'C:\\Program Files\\nodejs\\',
]

The current working directory  

Many shells have the concept of the current working directory (CWD) – “the directory I’m currently in”:

  • If we use a command with a partially qualified path, that path is resolved against the CWD.
  • If we omit a path when a command expects a path, the CWD is used.
  • On both Unixes and Windows, the command to change the CWD is cd.

process is a global Node.js variable. It provides us with methods for getting and setting the CWD:

  • process.cwd() returns the CWD.
  • process.chdir(dirPath) changes the CWD to dirPath.
    • There must be a directory at dirPath.
    • That change does not affect the shell, only the currently running Node.js process.

Node.js uses the CWD to fill in missing pieces whenever a path isn’t fully qualified (complete). That enables us to use partially qualified paths with various functions – e.g. fs.readFileSync().

The current working directory on Unix  

The following code demonstrates process.chdir() and process.cwd() on Unix:

process.chdir('/home/jane');
assert.equal(
  process.cwd(), '/home/jane'
);

The current working directory on Windows  

So far, we have used the current working directory on Unix. Windows works differently:

  • Each drive has a current directory.
  • There is a current drive.

We can use path.chdir() to set both at the same time:

process.chdir('C:\\Windows');
process.chdir('Z:\\tmp');

When we revisit a drive, Node.js remembers the previous current directory of that drive:

assert.equal(
  process.cwd(), 'Z:\\tmp'
);
process.chdir('C:');
assert.equal(
  process.cwd(), 'C:\\Windows'
);

Fully vs. partially qualified paths, resolving paths  

  • A fully qualified path does not rely on any other information and can be used as is.
  • A partially qualified path is missing information: We need to turn it into a fully qualified path before we can use it. That is done by resolving it against a fully qualified path.

Fully and partially qualified paths on Unix  

Unix only knows two kinds of paths:

  • Absolute paths are fully qualified and start with a slash:

    /home/john/proj
    
  • Relative paths are partially qualified and start with a filename or a dot:

    .   (current directory)
    ..  (parent directory)
    dir
    ./dir
    ../dir
    ../../dir/subdir
    

Let’s use path.resolve() (which is explained in more detail later) to resolve relative paths against absolute paths. The results are absolute paths:

> const abs = '/home/john/proj';

> path.resolve(abs, '.')
'/home/john/proj'
> path.resolve(abs, '..')
'/home/john'
> path.resolve(abs, 'dir')
'/home/john/proj/dir'
> path.resolve(abs, './dir')
'/home/john/proj/dir'
> path.resolve(abs, '../dir')
'/home/john/dir'
> path.resolve(abs, '../../dir/subdir')
'/home/dir/subdir'

Fully and partially qualified paths on Windows  

Windows distinguishes four kinds of paths (for more information, see Microsoft’s documentation):

  • There are absolute paths and relative paths.
  • Each of those two kinds of paths can have a drive letter (“volume designator”) or not.

Absolute paths with drive letters are fully qualified. All other paths are partially qualified.

Resolving an absolute path without a drive letter against a fully qualified path full, picks up the drive letter of full:

> const full = 'C:\\Users\\jane\\proj';

> path.resolve(full, '\\Windows')
'C:\\Windows'

Resolving a relative path without a drive letter against a fully qualified path, can be viewed as updating the latter:

> const full = 'C:\\Users\\jane\\proj';

> path.resolve(full, '.')
'C:\\Users\\jane\\proj'
> path.resolve(full, '..')
'C:\\Users\\jane'
> path.resolve(full, 'dir')
'C:\\Users\\jane\\proj\\dir'
> path.resolve(full, '.\\dir')
'C:\\Users\\jane\\proj\\dir'
> path.resolve(full, '..\\dir')
'C:\\Users\\jane\\dir'
> path.resolve(full, '..\\..\\dir')
'C:\\Users\\dir'

Resolving a relative path rel with a drive letter against a fully qualified path full depends on the drive letter of rel:

  • Same drive letter as full? Resolve rel against full.
  • Different drive letter than full? Resolve rel against the current directory of rel’s drive.

That looks as follows:

// Configure current directories for C: and Z:
process.chdir('C:\\Windows\\System');
process.chdir('Z:\\tmp');

const full = 'C:\\Users\\jane\\proj';

// Same drive letter
assert.equal(
  path.resolve(full, 'C:dir'),
  'C:\\Users\\jane\\proj\\dir'
);
assert.equal(
  path.resolve(full, 'C:'),
  'C:\\Users\\jane\\proj'
);

// Different drive letter
assert.equal(
  path.resolve(full, 'Z:dir'),
  'Z:\\tmp\\dir'
);
assert.equal(
  path.resolve(full, 'Z:'),
  'Z:\\tmp'
);

Getting the paths of important directories via module 'node:os'  

The module 'node:os' provides us with the paths of two important directories:

  • os.homedir() returns the path to the home directory of the current user – for example:

    > os.homedir() // macOS
    '/Users/rauschma'
    > os.homedir() // Windows
    'C:\\Users\\axel'
    
  • os.tmpdir() returns the path of the operating system’s directory for temporary files – for example:

    > os.tmpdir() // macOS
    '/var/folders/ph/sz0384m11vxf5byk12fzjms40000gn/T'
    > os.tmpdir() // Windows
    'C:\\Users\\axel\\AppData\\Local\\Temp'
    

Concatenating paths  

There are two functions for concatenating paths:

  • path.resolve() always returns fully qualified paths
  • path.join() preserves relative paths

path.resolve(): concatenating paths to create fully qualified paths  

path.resolve(...paths: Array<string>): string

Concatenates the paths and return a fully qualified path. It uses the following algorithm:

  • Start with the current working directory.
  • Resolve path[0] against the previous result.
  • Resolve path[1] against the previous result.
  • Do the same for all remaining paths.
  • Return the final result.

Without arguments, path.resolve() returns the path of the current working directory:

> process.cwd()
'/usr/local'
> path.resolve()
'/usr/local'

One or more relative paths are used for resolution, starting with the current working directory:

> path.resolve('.')
'/usr/local'
> path.resolve('..')
'/usr'
> path.resolve('bin')
'/usr/local/bin'
> path.resolve('./bin', 'sub')
'/usr/local/bin/sub'
> path.resolve('../lib', 'log')
'/usr/lib/log'

Any fully qualified path replaces the previous result:

> path.resolve('bin', '/home')
'/home'

That enables us to resolve partially qualified paths against fully qualified paths:

> path.resolve('/home/john', 'proj', 'src')
'/home/john/proj/src'

path.join(): concatenating paths while preserving relative paths  

path.join(...paths: Array<string>): string

Starts with paths[0] and interprets the remaining paths as instructions for ascending or descending. In contrast to path.resolve(), this function preserves partially qualified paths: If paths[0] is partially qualified, the result is partially qualified. If it is fully qualified, the result is fully qualified.

Examples of descending:

> path.posix.join('/usr/local', 'sub', 'subsub')
'/usr/local/sub/subsub'
> path.posix.join('relative/dir', 'sub', 'subsub')
'relative/dir/sub/subsub'

Double dots ascend:

> path.posix.join('/usr/local', '..')
'/usr'
> path.posix.join('relative/dir', '..')
'relative'

Single dots do nothing:

> path.posix.join('/usr/local', '.')
'/usr/local'
> path.posix.join('relative/dir', '.')
'relative/dir'

If arguments after the first one are fully qualified paths, they are interpreted as relative paths:

> path.posix.join('dir', '/tmp')
'dir/tmp'
> path.win32.join('dir', 'C:\\Users')
'dir\\C:\\Users'

Using more than two arguments:

> path.posix.join('/usr/local', '../lib', '.', 'log')
'/usr/lib/log'

Ensuring paths are normalized, fully qualified, or relative  

path.normalize(): ensuring paths are normalized  

path.normalize(path: string): string

On Unix, path.normalize():

  • Removes path segments that are single dots (.).
  • Resolves path segments that are double dots (..).
  • Turns multiple path separators into a single path separator.

For example:

// Fully qualified path
assert.equal(
  path.posix.normalize('/home/./john/lib/../photos///pet'),
  '/home/john/photos/pet'
);

// Partially qualified path
assert.equal(
  path.posix.normalize('./john/lib/../photos///pet'),
  'john/photos/pet'
);

On Windows, path.normalize():

  • Removes path segments that are single dots (.).
  • Resolves path segments that are double dots (..).
  • Converts each path separator slash (/) – which is legal – into a the preferred path separator (\).
  • Converts sequences of more than one path separator to single backslashes.

For example:

// Fully qualified path
assert.equal(
  path.win32.normalize('C:\\Users/jane\\doc\\..\\proj\\\\src'),
  'C:\\Users\\jane\\proj\\src'
);

// Partially qualified path
assert.equal(
  path.win32.normalize('.\\jane\\doc\\..\\proj\\\\src'),
  'jane\\proj\\src'
);

Note that path.join() with a single argument also normalizes and works the same as path.normalize():

> path.posix.normalize('/home/./john/lib/../photos///pet')
'/home/john/photos/pet'
> path.posix.join('/home/./john/lib/../photos///pet')
'/home/john/photos/pet'

> path.posix.normalize('./john/lib/../photos///pet')
'john/photos/pet'
> path.posix.join('./john/lib/../photos///pet')
'john/photos/pet'

path.resolve() (one argument): ensuring paths are normalized and fully qualified  

We have already encountered path.resolve(). Called with a single argument, it both normalizes paths and ensures that they are fully qualified.

Using path.resolve() on Unix:

> process.cwd()
'/usr/local'

> path.resolve('/home/./john/lib/../photos///pet')
'/home/john/photos/pet'
> path.resolve('./john/lib/../photos///pet')
'/usr/local/john/photos/pet'

Using path.resolve() on Windows:

> process.cwd()
'C:\\Windows\\System'

> path.resolve('C:\\Users/jane\\doc\\..\\proj\\\\src')
'C:\\Users\\jane\\proj\\src'
> path.resolve('.\\jane\\doc\\..\\proj\\\\src')
'C:\\Windows\\System\\jane\\proj\\src'

path.relative(): creating relative paths  

path.relative(sourcePath: string, destinationPath: string): string

Returns a relative path that gets us from sourcePath to destinationPath:

> path.posix.relative('/home/john/', '/home/john/proj/my-lib/README.md')
'proj/my-lib/README.md'
> path.posix.relative('/tmp/proj/my-lib/', '/tmp/doc/zsh.txt')
'../../doc/zsh.txt'

On Windows, we get a fully qualified path if sourcePath and destinationPath are on different drives:

> path.win32.relative('Z:\\tmp\\', 'C:\\Users\\Jane\\')
'C:\\Users\\Jane'

This function also works with relative paths:

> path.posix.relative('proj/my-lib/', 'doc/zsh.txt')
'../../doc/zsh.txt'

Parsing paths: extracting various parts of a path (filename extension etc.)  

path.parse(): creating an object with path parts  

type PathObject = {
  dir: string,
    root: string,
  base: string,
    name: string,
    ext: string,
};
path.parse(path: string): PathObject

Extracts various parts of path and returns them in an object with the following properties:

  • .base: last segment of a path
    • .ext: the filename extension of the base
    • .name: the base without the extension. This part is also called the stem of a path.
  • .root: the beginning of a path (before the first segment)
  • .dir: the directory in which the base is located – the path without the base

Later, we’ll see function path.format() which is the inverse of path.parse(): It converts an object with path parts into a path.

Example: path.parse() on Unix  

This is what using path.parse() on Unix looks like:

> path.posix.parse('/home/jane/file.txt')
{
  dir: '/home/jane',
  root: '/',
  base: 'file.txt',
  name: 'file',
  ext: '.txt',
}

The following diagram visualizes the extent of the parts:

  /      home/jane / file   .txt
| root |           | name | ext  |
| dir              | base        |

For example, we can see that .dir is the path without the base. And that .base is .name plus .ext.

Example: path.parse() on Windows  

This is how path.parse() works on Windows:

> path.win32.parse(String.raw`C:\Users\john\file.txt`)
{
  dir: 'C:\\Users\\john',
  root: 'C:\\',
  base: 'file.txt',
  name: 'file',
  ext: '.txt',
}

This is a diagram for the result:

  C:\    Users\john \ file   .txt
| root |            | name | ext  |
| dir               | base        |

path.basename(): extracting the base of a path  

path.basename(path, ext?)

Returns the base of path:

> path.basename('/home/jane/file.txt')
'file.txt'

Optionally, this function can also remove a suffix:

> path.basename('/home/jane/file.txt', '.txt')
'file'
> path.basename('/home/jane/file.txt', 'txt')
'file.'
> path.basename('/home/jane/file.txt', 'xt')
'file.t'

Removing the extension is case sensitive – even on Windows!

> path.win32.basename(String.raw`C:\Users\john\file.txt`, '.txt')
'file'
> path.win32.basename(String.raw`C:\Users\john\file.txt`, '.TXT')
'file.txt'

path.dirname(): extracting the parent directory of a path  

path.dirname(path)

Returns the parent directory of the file or directory at path:

> path.win32.dirname(String.raw`C:\Users\john\file.txt`)
'C:\\Users\\john'
> path.win32.dirname('C:\\Users\\john\\dir\\')
'C:\\Users\\john'

> path.posix.dirname('/home/jane/file.txt')
'/home/jane'
> path.posix.dirname('/home/jane/dir/')
'/home/jane'

path.extname(): extracting the extension of a path  

path.extname(path)

Returns the extension of path:

> path.extname('/home/jane/file.txt')
'.txt'
> path.extname('/home/jane/file.')
'.'
> path.extname('/home/jane/file')
''
> path.extname('/home/jane/')
''
> path.extname('/home/jane')
''

Categorizing paths  

path.isAbsolute(): Is a given path absolute?  

path.isAbsolute(path: string): boolean

Returns true if path is absolute and false otherwise.

The results on Unix are straightforward:

> path.posix.isAbsolute('/home/john')
true
> path.posix.isAbsolute('john')
false

On Windows, “absolute” does not necessarily mean “fully qualified” (only the first path is fully qualified):

> path.win32.isAbsolute('C:\\Users\\jane')
true
> path.win32.isAbsolute('\\Users\\jane')
true
> path.win32.isAbsolute('C:jane')
false
> path.win32.isAbsolute('jane')
false

path.format(): creating paths out of parts  

type PathObject = {
  dir: string,
    root: string,
  base: string,
    name: string,
    ext: string,
};
path.format(pathObject: PathObject): string

Creates a path out of a path object:

> path.format({dir: '/home/jane', base: 'file.txt'})
'/home/jane/file.txt'

Example: changing the filename extension  

We can use path.format() to change the extension of a path:

function changeFilenameExtension(pathStr, newExtension) {
  if (!newExtension.startsWith('.')) {
    throw new Error(
      'Extension must start with a dot: '
      + JSON.stringify(newExtension)
    );
  }
  const parts = path.parse(pathStr);
  return path.format({
    ...parts,
    base: undefined, // prevent .base from overriding .name and .ext
    ext: newExtension,
  });
}

assert.equal(
  changeFilenameExtension('/tmp/file.md', '.html'),
  '/tmp/file.html'
);
assert.equal(
  changeFilenameExtension('/tmp/file', '.html'),
  '/tmp/file.html'
);
assert.equal(
  changeFilenameExtension('/tmp/file/', '.html'),
  '/tmp/file.html'
);

If we know the original filename extension, we can also use a regular expression to change the filename extension:

> '/tmp/file.md'.replace(/\.md$/i, '.html')
'/tmp/file.html'
> '/tmp/file.MD'.replace(/\.md$/i, '.html')
'/tmp/file.html'

Using the same paths on different platforms  

Sometimes we’d like to use the same paths on different platforms. Then there are two issues that we are facing:

  • The path separator may be different.
  • The file structure may be different: home directories and directories for temporary files may be in different locations, etc.

As an example, consider a Node.js app that operates on a directory with data. Let’s assume that the app can be configured with two kinds of paths:

  • Fully qualified paths anywhere on the system
  • Paths inside the data directory

Due to the aforementioned issues:

  • We can’t reuse fully qualified paths between platforms.

    • Sometimes we need absolute paths. These have to be configured per “instance” of the data directory and stored externally (or inside it and ignored by version control). These paths stay put and are not moved with the data directory.
  • We can reuse paths that point into the data directory. Such paths may be stored in configuration files (inside the data directory or not) and in constants in the app’s code. To do that:

    • We have to store them as relative paths.
    • We have to ensure that the path separator is correct on each platform.

    The next subsection explains how both can be achieved.

Relative platform-independent paths  

Relative platform-independent paths can be stored as Arrays of path segments and turned into fully qualified platform-specific paths as follows:

const universalRelativePath = ['static', 'img', 'logo.jpg'];

const dataDirUnix = '/home/john/data-dir';
assert.equal(
  path.posix.resolve(dataDirUnix, ...universalRelativePath),
  '/home/john/data-dir/static/img/logo.jpg'
);

const dataDirWindows = 'C:\\Users\\jane\\data-dir';
assert.equal(
  path.win32.resolve(dataDirWindows, ...universalRelativePath),
  'C:\\Users\\jane\\data-dir\\static\\img\\logo.jpg'
);

To create relative platform-specific paths, we can use:

const dataDir = '/home/john/data-dir';
const pathInDataDir = '/home/john/data-dir/static/img/logo.jpg';
assert.equal(
  path.relative(dataDir, pathInDataDir),
  'static/img/logo.jpg'
);

The following function converts relative platform-specific paths into platform-independent paths:

import * as path from 'node:path';

function splitRelativePathIntoSegments(relPath) {
  if (path.isAbsolute(relPath)) {
    throw new Error('Path isn’t relative: ' + relPath);
  }
  relPath = path.normalize(relPath);
  const result = [];
  while (true) {
    const base = path.basename(relPath);
    if (base.length === 0) break;
    result.unshift(base);
    const dir = path.dirname(relPath);
    if (dir === '.') break;
    relPath = dir;
  }
  return result;
}

Using splitRelativePathIntoSegments() on Unix:

> splitRelativePathIntoSegments('static/img/logo.jpg')
[ 'static', 'img', 'logo.jpg' ]
> splitRelativePathIntoSegments('file.txt')
[ 'file.txt' ]

Using splitRelativePathIntoSegments() on Windows:

> splitRelativePathIntoSegments('static/img/logo.jpg')
[ 'static', 'img', 'logo.jpg' ]
> splitRelativePathIntoSegments('C:static/img/logo.jpg')
[ 'static', 'img', 'logo.jpg' ]

> splitRelativePathIntoSegments('file.txt')
[ 'file.txt' ]
> splitRelativePathIntoSegments('C:file.txt')
[ 'file.txt' ]

Using a library to match paths via globs  

The npm module 'minimatch' lets us match paths against patterns that are called glob expressions, glob patterns, or globs:

import minimatch from 'minimatch';
assert.equal(
  minimatch('/dir/sub/file.txt', '/dir/sub/*.txt'), true
);
assert.equal(
  minimatch('/dir/sub/file.txt', '/**/file.txt'), true
);

Use cases for globs:

  • Specifying which files in a directory should be processed by a script.
  • Specifying which files to ignore.

More glob libraries:

  • multimatch extends minimatch with support for multiple patterns.
  • micromatch is an alternative to minimatch and multimatch that has a similar API.
  • globby is a library based on fast-glob that adds convenience features.

The minimatch API  

The whole API of minimatch is documented in the project’s readme file. In this subsection, we look at the most important functionality.

Minimatch compiles globs to JavaScript RegExp objects and uses those to match.

minimatch(): compiling and matching once  

minimatch(path: string, glob: string, options?: MinimatchOptions): boolean

Returns true if glob matches path and false otherwise.

Two interesting options:

  • .dot: boolean (default: false)
    If true, wildcard symbols such as * and ** match “invisible” path segments (whose names begin with dots):

    > minimatch('/usr/local/.tmp/data.json', '/usr/**/data.json')
    false
    > minimatch('/usr/local/.tmp/data.json', '/usr/**/data.json', {dot: true})
    true
    
    > minimatch('/tmp/.log/events.txt', '/tmp/*/events.txt')
    false
    > minimatch('/tmp/.log/events.txt', '/tmp/*/events.txt', {dot: true})
    true
    
  • .matchBase: boolean (default: false)
    If true, a pattern without slashes is matched against the basename of a path:

    > minimatch('/dir/file.txt', 'file.txt')
    false
    > minimatch('/dir/file.txt', 'file.txt', {matchBase: true})
    true
    

new minimatch.Minimatch(): compiling once, matching multiple times  

Class minimatch.Minimatch enables us to only compile the glob to a regular expression once and match multiple times:

new Minimatch(pattern: string, options?: MinimatchOptions)

This is how this class is used:

import minimatch from 'minimatch';
const {Minimatch} = minimatch;
const glob = new Minimatch('/dir/sub/*.txt');
assert.equal(
  glob.match('/dir/sub/file.txt'), true
);
assert.equal(
  glob.match('/dir/sub/notes.txt'), true
);

Syntax of glob expressions  

This subsection covers the essentials of the syntax. But there are more features. These are documented here:

Matching Windows paths  

Even on Windows, glob segments are separated by slashes – but they match both backslashes and slashes (which are legal path separators on Windows):

> minimatch('dir\\sub/file.txt', 'dir/sub/file.txt')
true

Minimatch does not normalize paths  

Minimatch does not normalize paths for us:

> minimatch('./file.txt', './file.txt')
true
> minimatch('./file.txt', 'file.txt')
false
> minimatch('file.txt', './file.txt')
false

Therefore, we have to normalize paths if we don’t create them ourselves:

> path.normalize('./file.txt')
'file.txt'

Patterns without wildcard symbols: path separators must line up  

Patterns without wildcard symbols (that match more flexibly) must match exactly. Especially the path separators must line up:

> minimatch('/dir/file.txt', '/dir/file.txt')
true
> minimatch('dir/file.txt', 'dir/file.txt')
true
> minimatch('/dir/file.txt', 'dir/file.txt')
false

> minimatch('/dir/file.txt', 'file.txt')
false

That is, we must decide on either absolute or relative paths.

With option .matchBase, we can match patterns without slashes against the basenames of paths:

> minimatch('/dir/file.txt', 'file.txt', {matchBase: true})
true

The asterisk (*) matches any (part of a) single segment  

The wildcard symbol asterisk (*) matches any path segment or any part of a segment:

> minimatch('/dir/file.txt', '/*/file.txt')
true
> minimatch('/tmp/file.txt', '/*/file.txt')
true

> minimatch('/dir/file.txt', '/dir/*.txt')
true
> minimatch('/dir/data.txt', '/dir/*.txt')
true

The asterisk does not match “invisible files“ whose names start with dots. If we want to match those, we have to prefix the asterisk with a dot:

> minimatch('file.txt', '*')
true
> minimatch('.gitignore', '*')
false
> minimatch('.gitignore', '.*')
true
> minimatch('/tmp/.log/events.txt', '/tmp/*/events.txt')
false

Option .dot lets us switch off this behavior:

> minimatch('.gitignore', '*', {dot: true})
true
> minimatch('/tmp/.log/events.txt', '/tmp/*/events.txt', {dot: true})
true

The double asterisk (**) matches zero or more segments  

´**/ matches zero or more segments:

> minimatch('/file.txt', '/**/file.txt')
true
> minimatch('/dir/file.txt', '/**/file.txt')
true
> minimatch('/dir/sub/file.txt', '/**/file.txt')
true

If we want to match relative paths, the pattern still must not start with a path separator:

> minimatch('file.txt', '/**/file.txt')
false

The double asterisk does not match “invisible” path segments whose names start with dots:

> minimatch('/usr/local/.tmp/data.json', '/usr/**/data.json')
false

We can switch off that behavior via option .dot:

> minimatch('/usr/local/.tmp/data.json', '/usr/**/data.json', {dot: true})
true

Negating globs  

If we start a glob with an exclamation mark, it matches if the pattern after the exclamation mark does not match:

> minimatch('file.txt', '!**/*.txt')
false
> minimatch('file.js', '!**/*.txt')
true

Alternative patterns  

Comma-separate patterns inside braces match if one of the patterns matches:

> minimatch('file.txt', 'file.{txt,js}')
true
> minimatch('file.js', 'file.{txt,js}')
true

Ranges of integers  

A pair of integers separated by double dots defines a range of integers and matches if any of its elements matches:

> minimatch('file1.txt', 'file{1..3}.txt')
true
> minimatch('file2.txt', 'file{1..3}.txt')
true
> minimatch('file3.txt', 'file{1..3}.txt')
true
> minimatch('file4.txt', 'file{1..3}.txt')
false

Padding with zeros is supported, too:

> minimatch('file1.txt', 'file{01..12}.txt')
false
> minimatch('file01.txt', 'file{01..12}.txt')
true
> minimatch('file02.txt', 'file{01..12}.txt')
true
> minimatch('file12.txt', 'file{01..15}.txt')
true

Using file: URLs to refer to files  

There are two common ways to refer to files in Node.js:

  • Paths in strings
  • Instances of URL with the protocol file:

For example:

assert.equal(
  fs.readFileSync(
    '/tmp/data.txt', {encoding: 'utf-8'}),
  'Content'
);
assert.equal(
  fs.readFileSync(
    new URL('file:///tmp/data.txt'), {encoding: 'utf-8'}),
  'Content'
);

Class URL  

In this section, we take a closer look at class URL. More information on this class:

In this blog post, we access class URL via a global variable because that’s how it’s used on other web platforms. But it can also be imported:

import {URL} from 'node:url';

URIs vs. relative references  

URLs are a subset of URIs. RFC 3986, the standard for URIs, distinguishes two kinds of URI-references:

  • A URI starts with a scheme followed by a colon separator.
  • All other URI references are relative references.

Constructor of URL  

Class URL can be instantiated in two ways:

  • new URL(uri: string)

    uri must be a URI. It specifies the URI of the new instance.

  • new URL(uriRef: string, baseUri: string)

    baseUri must be a URI. If uriRef is a relative reference, it is resolved against baseUri and the result becomes the URI of the new instance.

    If uriRef is a URI, it completely replaces baseUri as the data on which the instance is based.

Here we can see the class in action:

// If there is only one argument, it must be a proper URI
assert.equal(
  new URL('https://example.com/public/page.html').toString(),
  'https://example.com/public/page.html'
);
assert.throws(
  () => new URL('../book/toc.html'),
  /^TypeError \[ERR_INVALID_URL\]: Invalid URL$/
);

// Resolve a relative reference against a base URI 
assert.equal(
  new URL(
    '../book/toc.html',
    'https://example.com/public/page.html'
  ).toString(),
  'https://example.com/book/toc.html'
);

Resolving relative references against instances of URL  

Let’s revisit this variant of the URL constructor:

new URL(uriRef: string, baseUri: string)

The argument baseUri is coerced to string. Therefore, any object can be used – as long as it becomes a valid URL when coereced to string:

const obj = { toString() {return 'https://example.com'} };
assert.equal(
  new URL('index.html', obj).href,
  'https://example.com/index.html'
);

That enables us to resolve relative references against URL instances:

const url = new URL('https://example.com/dir/file1.html');
assert.equal(
  new URL('../file2.html', url).href,
  'https://example.com/file2.html'
);

Used this way, the constructor is loosely similar to path.resolve().

Properties of URL instances  

Instances of URL have the following properties:

type URL = {
  protocol: string,
  username: string,
  password: string,
  hostname: string,
  port: string,
  host: string,
  readonly origin: string,
  
  pathname: string,
  
  search: string,
  readonly searchParams: URLSearchParams,
  hash: string,

  href: string,
  toString(): string,
  toJSON(): string,
}

Converting URLs to strings  

There are three common ways in which we can convert URLs to strings:

const url = new URL('https://example.com/about.html');

assert.equal(
  url.toString(),
  'https://example.com/about.html'
);
assert.equal(
  url.href,
  'https://example.com/about.html'
);
assert.equal(
  url.toJSON(),
  'https://example.com/about.html'
);

Method .toJSON() enables us to use URLs in JSON data:

const jsonStr = JSON.stringify({
  pageUrl: new URL('https://2ality.com/p/subscribe.html')
});
assert.equal(
  jsonStr, '{"pageUrl":"https://2ality.com/p/subscribe.html"}'
);

Getting URL properties  

The properties of URL instances are not own data properties, they are implemented via getters and setters. In the next example, we use the utility function pickProps() (whose code is shown at the end), to copy the values returned by those getters into a plain object:

const props = pickProps(
  new URL('https://jane:pw@example.com:80/news.html?date=today#misc'),
  'protocol', 'username', 'password', 'hostname', 'port', 'host',
  'origin', 'pathname', 'search', 'hash', 'href'
);
assert.deepEqual(
  props,
  {
    protocol: 'https:',
    username: 'jane',
    password: 'pw',
    hostname: 'example.com',
    port: '80',
    host: 'example.com:80',
    origin: 'https://example.com:80',
    pathname: '/news.html',
    search: '?date=today',
    hash: '#misc',
    href: 'https://jane:pw@example.com:80/news.html?date=today#misc'
  }
);
function pickProps(input, ...keys) {
  const output = {};
  for (const key of keys) {
    output[key] = input[key];
  }
  return output;
}

Alas, the pathname is a single atomic unit. That is, we can’t use class URL to access its parts (base, extension, etc.).

Setting parts of a URL  

We can also change parts of a URL by setting properties such as .hostname:

const url = new URL('https://example.com');
url.hostname = '2ality.com';
assert.equal(
  url.href, 'https://2ality.com/'
);

We can use the setters to create URLs from parts (idea by Haroen Viaene):

// Object.assign() invokes setters when transferring property values
const urlFromParts = (parts) => Object.assign(
  new URL('https://example.com'), // minimal dummy URL
  parts // assigned to the dummy
);

const url = urlFromParts({
  protocol: 'https:',
  hostname: '2ality.com',
  pathname: '/p/about.html',
});
assert.equal(
  url.href, 'https://2ality.com/p/about.html'
);

Managing search parameters via .searchParams  

We can use property .searchParams to manage the search parameters of URLs. Its value is an instance of URLSearchParams.

We can use it to read search parameters:

const url = new URL('https://example.com/?topic=js');
assert.equal(
  url.searchParams.get('topic'), 'js'
);
assert.equal(
  url.searchParams.has('topic'), true
);

We can also change search parameters via it:

url.searchParams.append('page', '5');
assert.equal(
  url.href, 'https://example.com/?topic=js&page=5'
);

url.searchParams.set('topic', 'css');
assert.equal(
  url.href, 'https://example.com/?topic=css&page=5'
);

Converting between URLs and file paths  

It’s tempting to convert between file paths and URLs manually. For example, we can try to convert an URL instance myUrl to a file path via myUrl.pathname. However that doesn’t always work – it’s better to use this function:

url.fileURLToPath(url: URL | string): string

The following code compares the results of that function with the values of .pathname:

import * as assert from 'assert';
import * as url from 'node:url';

//::::: Unix :::::

const url1 = new URL('file:///tmp/with%20space.txt');
assert.equal(
  url1.pathname, '/tmp/with%20space.txt');
assert.equal(
  url.fileURLToPath(url1), '/tmp/with space.txt');

const url2 = new URL('file:///home/thor/Mj%C3%B6lnir.txt');
assert.equal(
  url2.pathname, '/home/thor/Mj%C3%B6lnir.txt');
assert.equal(
  url.fileURLToPath(url2), '/home/thor/Mjölnir.txt');

//::::: Windows :::::

const url3 = new URL('file:///C:/dir/');
assert.equal(
  url3.pathname, '/C:/dir/');
assert.equal(
  url.fileURLToPath(url3), 'C:\\dir\\');

This function is the inverse of url.fileURLToPath():

url.pathToFileURL(path: string): URL

It converts path to a file URL:

> url.pathToFileURL('/home/john/Work Files').href
'file:///home/john/Work%20Files'

Use case for URLs: accessing files relative to the current module  

One important use case for URLs is accessing a file that is a sibling of the current module:

function readData() {
  const url = new URL('data.txt', import.meta.url);
  return fs.readFileSync(url, {encoding: 'UTF-8'});
}

This function uses import.meta.url which contains the URL of the current module (which is usually a file: URL on Node.js).

Using fetch() would have made the previous code even more cross-platform. However, as of Node.js 18.5, fetch() doesn’t work for file: URLs yet:

> await fetch('file:///tmp/file.txt')
TypeError: fetch failed
  cause: Error: not implemented... yet...

Use case for URLs: detecting if the current module is running as a script  

See the blog post “Node.js: checking if an ESM module is ‘main’”.

Paths vs. file: URLs  

When shell scripts receive references to files or export references to files (e.g. by logging them on screen), they are virtually always paths. However, there are two cases where we need URLs (as discussed in previous subsections):

  • To access files relative to the current module
  • To detect if the current module is running as a script