beanz Magazine

Regex Game

Bureau of Land Management Alaska on Flickr

Learn how to search through blobs of text with speed, accuracy, and elegance… like a ninja!

From 1896 to 1899, over 100,000 men flocked to the Klondike during the gold rush. They filled large pans with gravel and dipped them in subarctic streams. By carefully shaking and stirring, they might — if they were lucky — find a few nuggets of gold. It was a long, frustrating, and often exhausting process.

Nowadays, many of us have a similar task to perform. Except we sift through spreadsheets and databases rather than clay and dirt, and we’re looking for information, not gold. Luckily we have tools that those Yukon explorers never had.

Using letters, numbers, and special symbols like ^, $, and +, a regular expression (regex) is designed to match a specific pattern. You could write a regex to highlight all the emails in a text document. You could pinpoint all the telephone numbers among thousands of other words.

The main advantage of regexes is speed. They’re faster than regular code searches, and often portable from one programming language to the next.

The Basics

Start by navigating to https://regexr.com/, a website designed to test regexes.

Replace the sample text with this fun poem about sharks by John Ciardi:


The thing about a shark is — teeth
One row above, one row beneath.

Now take a close look. Do you find
It has another row behind?

Still closer — here, I’ll hold your hat:
Has it a third row behind that?

Now look in and… Look out! Oh my,
I’ll never know now! Well, goodbye.

If you’ve left the sample expression untouched, a few words in the poem will be highlighted. Can you guess what they all have in common?

Regex Basics

1) Letters and numbers (e, ea)

A lowercase letter matches a lowercase letter. A capitalized letter matches a capitalized letter, and a number matches a number. Try replacing the expression on the webpage with ‘e’ or capital ‘I’. What happens if you type two letters together, like ‘ea’?

2) One or the other ([ea])

If you put those same two characters inside square brackets, the pattern with match either ‘e’ or ‘a’. This is useful when you need to match similar words, like cat, hat, and sat. In this case you could write the regex ‘[chs]at’.

3) Wildcard ( . , o.e)

Type a ‘.’ and the entire text is highlighted, including letters, punctuation, and spaces. Dots are wildcards. If you’re not sure which character you need to match, but you want to match something, wildcards come in handy. Try the sequence ‘o.e’.

4) Escaped characters (/s, /w, /W)

Escaped characters combine a backslash and a letter. ‘/w’ matches any letter, while ‘/W’ matches anything but letters. ‘\s’ matches whitespace, including spaces, tabs, and newlines. Escaped sequences are more specific than the basic wildcard, but they still offer plenty of flexibility.

5) Quantifiers (+, *, {2})

What if you want to match any four-letter word? ‘/w/w/w/w’ would work, but it’s a pain to write. Quantifiers are a convenient way to indicate how many times you want to repeat the previous character. ‘+’ means one or more. ‘ * ‘ is zero or more. Curly braces indicate a specific amount, such as ‘\w{4}’. Try comparing ‘te+’ with ‘te*’. Heads up — the star character can be unpredictable!

6) NOT (^, [^a])

To avoid a certain character at all cost, use ‘^’ and square brackets. For example, to match anything except the letter ‘a’, type: ‘[^a]’. Careful, though! ‘^’ has a different meaning outside the brackets.

Challenge

While there are other rules for regexes, this is a great first taste. See if you can apply what you’ve learned to create expressions that match the following patterns:

  1. Both ‘look’ & ‘Look’
  2. All 3-letter words that end with ‘w’
  3. All punctuation marks
  4. All words that are 4 letters or longer

Answers

With regexes, there’s always more than one correct answer. Your solution may be different, but if it matches what it’s supposed to match, you’re golden!

  1. [Ll]ook
  2. ..w
  3. [^\s^\w]
  4. \w*\w{4}

Learn More

Introduction to Regular Expressions

https://www.kidscodecs.com/regular-expressions/
https://scotch.io/tutorials/an-introduction-to-regex-in-python
http://codular.com/regex

Regex Crossword, a crossword puzzle with regular expressions

https://regexcrossword.com/

The regular expression game

http://play.inginf.units.it/#/

Article about the importance of Regular Expressions

https://www.theguardian.com/technology/2012/dec/04/ict-teach-kids-regular-expressions

Video tutorial about regular expressions

https://www.youtube.com/watch?v=7DG3kCDx53c

The Poem, Teeth of Sharks

https://www.poetryfoundation.org/poems/49771/about-the-teeth-of-sharks

Also In The April 2018 Issue

Who chooses new emojis? And what’s coming in the next batch?

Build your own voice-controlled digital assistant with a Raspberry Pi and an analog speaker.

Using Scratch and some simple vector math, create your own Boids algorithm to simulate the flight of birds.

6 hands-on STEM camps and clubs to join this summer.

With these new high-tech cards, public transit is easier than ever.

Slice digital photos into pieces and have fun pasting them back together.

Meet the cute little bot that’s helping scientists understand the courtship of frogs.

Dive into the nitty-gritty of game-making with this popular Python library.

Learn about the Native soldiers and the creative cypher code that helped win WWII.

Our cars aren’t quite driving themselves, but they can help us park and avoid collisions.

Make your games even more exciting with custom user-created content!

A clever teacher uses our favourite round robot to bring books to life.

Learn how to search through blobs of text with speed, accuracy, and elegance… like a ninja!

Links from the bottom of all the April 2018 articles, collected in one place for you to print, share, or bookmark.

Interesting stories about computer science, software programming, and technology for April 2018.