Crawl Weather Using Cypress

An edge-case when I would use tests that depend on each other.

Imagine I ask you to do the following exercise using Cypress:

  1. Pick 10 random US cities
  2. For each city:
    1. Check the temperature forecast for the next day
    2. If the temperature is between 17C and 20C stop
    3. Otherwise go to the next city
  3. If there are no cities with the ideal temperature forecast, log a message

So this is not a real end-to-end test, but a fun application of Cypress, just like crawling web pages, or playing Wordle using Cypress. Let's see how we can do this.

🎁 You can find the full source code for this blog post in the repo bahmutov/crawl-weather.

This blog post gives a good example of using a few of my favorite Cypress plugins, and I have an entire course about them.

Pick 10 random US cities

First, let's pick 10 cities randomly. We can use Wikipedia's list of US cities. There is a table with the city names in the second column.

US cities table

Let's do this. We can scaffold a new project and add Cypress. We can put the Wikipedia base url as our test base url because that is the page we intend to visit.

cypress.config.js
1
2
3
4
5
6
7
8
9
const { defineConfig } = require('cypress')

module.exports = defineConfig({
e2e: {
baseUrl: 'https://en.wikipedia.org',
supportFile: false,
fixturesFolder: false
}
})

The test at first visits the page so we can see the HTML markup for the table.

cypress/e2e/spec.cy.js
1
2
3
it('fetches 10 random US cities', () => {
cy.visit('/wiki/List_of_United_States_cities_by_population')
})

The page loads and we can see the table with the cities is the second table on the page. Let's grab its second column and extract the inner text from each cell

cypress/e2e/spec.cy.js
1
2
3
4
5
6
7
8
9
it('fetches 10 random US cities', () => {
cy.visit('/wiki/List_of_United_States_cities_by_population')
cy.get('table.wikitable.sortable')
.should('have.length.gte', 1)
.first()
.find('tbody')
.invoke('find', 'tr th+td')
.should('have.length.greaterThan', 10)
})

The table with the city names are found.

The table cells with the city names

Let's pick 10 random cells from the 330 cells returned by the query.

cypress/e2e/spec.cy.js
1
2
3
4
5
6
7
8
9
10
11
12
13
it('fetches 10 random US cities', () => {
cy.visit('/wiki/List_of_United_States_cities_by_population')
cy.get('table.wikitable.sortable')
.should('have.length.gte', 1)
.first()
.find('tbody')
.invoke('find', 'tr th+td')
.should('have.length.greaterThan', 10)
.then(function ($cells) {
return Cypress._.sampleSize($cells.toArray(), 10)
})
.should('have.length', 10)
})

Picked 10 random city cells

To extract the text from each found cell we can use plain Cypress or bring in cypress-map to help us write shorter queries.

cypress/e2e/spec.cy.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// https://github.com/bahmutov/cypress-map
import 'cypress-map'

it('fetches 10 random US cities', () => {
cy.visit('/wiki/List_of_United_States_cities_by_population')
cy.get('table.wikitable.sortable')
.should('have.length.gte', 1)
.first()
.find('tbody')
.invoke('find', 'tr th+td')
.should('have.length.greaterThan', 10)
.then(function ($cells) {
return Cypress._.sampleSize($cells.toArray(), 10)
})
.should('have.length', 10)
// cy.map and cy.print are from cypress-map
.map('innerText')
.print('cities %o')
})

Printed names of the 10 random cities

Let's save the found cities into a JSON file

cypress/e2e/spec.cy.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// https://github.com/bahmutov/cypress-map
import 'cypress-map'

it('fetches 10 random US cities', () => {
cy.visit('/wiki/List_of_United_States_cities_by_population')
cy.get('table.wikitable.sortable')
.should('have.length.gte', 1)
.first()
.find('tbody')
.invoke('find', 'tr th+td')
.should('have.length.greaterThan', 10)
.then(function ($cells) {
return Cypress._.sampleSize($cells.toArray(), 10)
})
.should('have.length', 10)
// cy.map and cy.print are from cypress-map
.map('innerText')
.print('cities %o')
.then((cities) => {
cy.writeFile('cities.json', cities)
})
})

The JSON file cities.json shows the city names.

cities.json
1
2
3
4
5
6
7
8
9
10
11
12
[
"Tempe",
"Lynn",
"Temecula",
"Elgin",
"Providence",
"Des Moines",
"Norfolk[m]",
"Honolulu[o]",
"Menifee",
"New Orleans[n]"
]

This file is temporary, and it should not be included in the repo.

Note: the extra text in the parenthesis like [m], [o], and [n] are anchor links that provide additional information about the city on mouse hover.

Information about New York City shows when I hover over the "[d]" anchor link

Let's clean the text up by removing those references characters. We could look at the markup and remove them all

The HTML markup creating the "[d]" anchor link

We can remove them by finding all sup elements inside the city cells

1
2
3
4
5
6
7
.then(function ($cells) {
return Cypress._.sampleSize($cells.toArray(), 10)
})
.should('have.length', 10)
.then(($cities) => {
$cities.find('sup').remove()
})

Tip: want to make the test deterministic while working on it? Use Array.prototype.slice instead of Cypress._.sampleSize

1
2
3
4
5
6
7
.then(function ($cells) {
return $cells.toArray().slice(0, 10)
})
.should('have.length', 10)
.then(($cities) => {
$cities.find('sup').remove()
})

So my final list of cities with their true names is

cities.json
1
2
3
4
5
6
7
8
9
10
11
12
[
"Dayton",
"Rialto",
"Lewisville",
"Jersey City",
"Mesquite",
"Menifee",
"Berkeley",
"Providence",
"Macon",
"Anchorage"
]

Fetch the weather

Let's start the second test. It will fetch the weather for one city for now using wttr.in. We want the output in JSON format.

cypress/e2e/spec.cy.js
1
2
3
4
5
6
7
8
9
10
it('fetches weather', () => {
cy.readFile('cities.json').then((cities) => {
const cityName = cities[0]
cy.request(`https://wttr.in/${cityName}?format=j1`)
.its('body')
// cy.tap() comes from cypress-map
// and by default prints the current subject using console.log
.tap()
})
})

The weather information returned by wttr.in service

Tip: we can "see" the weather by requesting the default HTML page that draws the weather in ASCII art.

cypress/e2e/spec.cy.js
1
2
3
4
5
6
7
8
9
10
11
it('fetches weather', () => {
cy.readFile('cities.json')
.its(0)
.then((cityName) => {
cy.request(`https://wttr.in/${cityName}`)
.its('body')
.then((html) => {
cy.document().invoke({ log: false }, 'write', html)
})
})
})

The weather in Dayton rendered as ASCII HTML art

To get the forecast, we need to take the nested property weather.0.avgtempC

cypress/e2e/spec.cy.js
1
2
3
4
5
6
7
8
9
10
11
12
13
it('fetches weather', () => {
cy.readFile('cities.json').then((cities) => {
const cityName = cities[0]
cy.request(`https://wttr.in/${cityName}?format=j1`)
.its('body')
// cy.tap() comes from cypress-map
// and by default prints the current subject using console.log
.tap()
.its('weather.0.avgtempC')
// cy.print in cypress-map
.print(`${cityName} average tomorrow is %dC`)
})
})

The weather forecast for Dayton

Recursion

We want to find the magical city with a comfortable weather forecast in the range 17C-20C. We have the code to check one city. Now let's use the recursion to decide if we need to check the next city or stop.

cypress/e2e/spec.cy.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
it('fetches weather until we find a comfortable city', () => {
const getForecast = (cities) => {
if (cities.length < 1) {
cy.log('No more cities to check')
return
}
cy.print(`${cities.length} cities remaining`)
// always check the last city
// and remove it from the remaining list
const cityName = cities.pop()
cy.request(`https://wttr.in/${cityName}?format=j1`)
.its('body')
// cy.tap() comes from cypress-map
// and by default prints the current subject using console.log
.tap()
.its('weather.0.avgtempC')
.print(`${cityName} average tomorrow is %dC`)
.then((temperature) => {
if (temperature >= 17 && temperature <= 20) {
cy.log(`People in ${cityName} are lucky`)
} else {
// call the weather check again
// with the shorter list of cities to check
getForecast(cities)
}
})
}

cy.readFile('cities.json')
// kick off the search
.then(getForecast)
})

Every time you use recursion you need 3 things in your code:

  1. The function R that checks if it needs to stop
1
2
3
4
5
6
7
const getForecast = (cities) => {
if (cities.length < 1) {
cy.log('No more cities to check')
return
}
...
}
  1. The recursive call with a smaller data set
1
2
3
4
5
6
7
8
9
10
11
12
13
// always check the last city
// and remove it from the remaining list
const cityName = cities.pop()
...
.then((temperature) => {
if (temperature >= 17 && temperature <= 20) {
cy.log(`People in ${cityName} are lucky`)
} else {
// call the weather check again
// with the shorter list of cities to check
getForecast(cities)
}
})
  1. The initial call to the function R to start it
1
2
3
cy.readFile('cities.json')
// kick off the search
.then(getForecast)

Ok, in our case we checked 10 cities and could not find any with a comfortable temperature forecast.

There were no cities in the random list with comfortable temperature forecast

Let's rerun both tests together, refetching the new list, saving it, then searching through it.

cypress/e2e/spec.cy.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// https://github.com/bahmutov/cypress-map
import 'cypress-map'

it.only('fetches 10 random US cities', () => {
...
.then((cities) => {
cy.writeFile('cities.json', cities)
})
})

it.only('fetches weather until we find a comfortable city', () => {
cy.readFile('cities.json')
// kick off the search
.then(getForecast)
}

Look at that. One of the picked cities was Jackson (probably in Florida) with the average temperature tomorrow forecast of 17C

Jackson has comfortable temperature forecast for tomorrow

Use cypress-recurse

We can implement recursive algorithms in Cypress using cypress-recurse plugin.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// https://github.com/bahmutov/cypress-recurse
import { recurse } from 'cypress-recurse'

it('finds the city with comfortable weather using cypress-recurse', () => {
cy.readFile('cities.json').then((cities) => {
// always check the last city
// and remove it from the remaining list
let cityName = cities.pop()
recurse(
// get the temperature for the current city
// yields the temperature number
() =>
cy
.request(`https://wttr.in/${cityName}?format=j1`)
.its('body.weather.0.avgtempC')
.then(Number)
.should('be.within', -30, 50)
.print(`${cityName} average tomorrow is %dC`),
// predicate to check if we should stop
(temperature) => temperature >= 17 && temperature <= 20,
// recursion options
{
log(temperature, { successful }) {
if (successful) {
cy.log(`People in ${cityName} are lucky`)
}
},
limit: cities.length,
timeout: 10_000,
post() {
// go to the next city
cityName = cities.pop()
},
},
)
})
})

Using cypress-recurse to implement the algorithm

Cities fixture file

We could merge the two tests into one. But I do like the cy.writeFile / cy.readFile mechanism, since it allows me to quickly iterate over each test without waiting for the cities to be re-fetched from the Wikipedia, while I am tweaking the recursive code in getForecast. Normally, having a dependency between the test is an anti-pattern. But in our case, there is a real dependency: if we cannot fetch the cities from Wikipedia, our recursive search would not work at all. A middle ground between two independent tests and a single test via cities.json file seems ok.

Just to make sure we can always verify the wttr.in service, we can save a copy of cities.json as fixture file in our repo and have a separate test that goes through the cities.

cypress/fixtures/two.json
1
["Boston", "Detroit"]

Let's write a test that confirms we can fetch the temperatures. We load the fixture JSON file and call the getForecast. We can even add a little bit of validation to make sure we extract reasonable temperature numbers

cypress/e2e/spec.cy.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// in getForecast
const cityName = cities.pop()
cy.request(`https://wttr.in/${cityName}?format=j1`)
.its('body')
// cy.tap() comes from cypress-map
// and by default prints the current subject using console.log
.tap()
.its('weather.0.avgtempC')
.then(Number)
.should('be.within', -30, 50)
.print(`${cityName} average tomorrow is %dC`)

it.only('fetches forecast', () => {
cy.fixture('two.json') // kick off the search
.then(getForecast)
})

Using cities fixture to request temperatures from wttr.in

Finally, lets check the response from wttr.in service using cy-spok

cypress/e2e/spec.cy.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// https://github.com/bahmutov/cy-spok
import spok from 'cy-spok'

it('validates wttr.in response', () => {
// numbers returned by wttr.in are strings
const temperature = /^\-?\d+$/
cy.request('https://wttr.in/Boston?format=j1')
.its('body')
.should(
spok({
current_condition: [
{
temp_C: spok.test(temperature),
temp_F: spok.test(temperature),
},
],
weather: [
{
$topic: 'today',
avgtempC: spok.test(temperature),
},
{
$topic: 'tomorrow',
avgtempC: spok.test(temperature),
},
],
}),
)
})

Validate the weather response object using cy-spok

Cache the cities using cypress-data-session

Let's imagine you do want to make the tests independent of each other, yet preserve the speed. Saving the cities.json in one test and reading it from another test might not be ideal. Let's get the list of cities in the test itself, yet cache it using cypress-data-session plugin.

cypress/e2e/cache.cy.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
// https://github.com/bahmutov/cypress-map
import 'cypress-map'
// https://github.com/bahmutov/cypress-data-session
import 'cypress-data-session'
// https://github.com/bahmutov/cypress-recurse
import { recurse } from 'cypress-recurse'

const fetchCities = () => {
cy.visit('/wiki/List_of_United_States_cities_by_population')
cy.get('table.wikitable.sortable')
.should('have.length.gte', 1)
.first()
.find('tbody')
.invoke('find', 'tr th+td')
.should('have.length.greaterThan', 10)
.then(function ($cells) {
return Cypress._.sampleSize($cells.toArray(), 10)
// return $cells.toArray().slice(0, 10)
})
.should('have.length', 10)
.then(($cities) => {
$cities.find('sup').remove()
})
// cy.map and cy.print are from cypress-map
.map('innerText')
.print('cities %o')
}

const checkForecast = (cities) => {
// always check the last city
// and remove it from the remaining list
let cityName = cities.pop()
recurse(
() =>
cy
.request(`https://wttr.in/${cityName}?format=j1`)
.its('body.weather.0.avgtempC')
.then(Number)
.should('be.within', -30, 50)
.print(`${cityName} average tomorrow is %dC`),
(temperature) => temperature >= 17 && temperature <= 20,
{
log(temperature, { successful }) {
if (successful) {
cy.log(`People in ${cityName} are lucky`)
}
},
limit: cities.length,
timeout: 30_000,
post() {
cityName = cities.pop()
},
},
)
}

it('fetches the cities and checks the forecast', () => {
// if there are no cities yet (cached in memory)
// fetches the list and stores it in memory
// else returns the same list
cy.dataSession({
name: 'cities',
setup: fetchCities,
})
// because our recursive function modifies the list
// let's make sure it is a copy of the cached list of cities
.apply(structuredClone)
.print()
.then(checkForecast)
})

The first time we run this function, the list is fetched from Wikipedia

The first time the test runs the "setup" function gets the list of cities

Then the list is run through the checkForecast recursive function. Let's say we want to tweak the test. If we simply tweak the checkForecast steps or click "R" to re-run the test, the cached copy of the list will be used from memory, skipping the fetchCities function completely.

The test uses the cached in memory list of cities

Stay warm.

PS: This blog post was written in Montreal during a -22C day.