beanz Magazine

Fisher-Yates Shuffle

Jeremy Thompson on Flickr

What sounds like a country western dance actually is an efficient way to sort large sets of data randomly.

If you need to sort a set of data, and you don’t happen to have the Hogwarts sorting hat, you’ll have to rely instead on algorithms like the Fisher-Yates Shuffle. While it sounds like a type of dance, in fact, it’s a way to sort data with as little bias as possible. Plus it can do much more than sort Hogwarts students into one of four houses.

The Fisher-Yates Shuffle is an algorithm — a set of rules — to shuffle data in an unbiased way. Each result is as likely as any other result.

In 1938, two English statisticians, Sir Ronald Fisher and Frank Yates, worked together to create a set of rules to be done with pen and paper:

Step 1 — Write down numbers from 1 through N. So N could be 100.
Step 2 — Pick a random number k between the numbers 1 and the number of numbers not crossed out in your list in step 1.
Step 3 — Counting from the low end of your list of numbers not crossed out, cross out the kth number not yet struck out then write it down at the end of a separate list of numbers.
Step 4 — Repeat from the second step until all numbers have been crossed out.

The sequence of numbers in the separate list of numbers, in step 3, is now a random permutation of the original list of numbers from step 1.

The modern version of this shuffling algorithm is called Algorith P, from the book, The Art of Computer Programming by Donald Knuth. It simplifies step 3 of the Fisher-Yates Shuffle to reduce time spent by the computer counting numbers left unstruck. The modern version also shuffles in place. Instead of creating a copy of the data set, shuffling is done within the original data.

Of course, the modern version used in programming has more subtleties to make programming easier. For example, what if you do not know the value of N, the highest number in your data set?

How Fisher-Yates Works with Pencil, Paper, and Dice

Here’s how the Fisher-Yates Shuffle works with pencil, paper, and one dice. We’ll start with an initial number data set between 1 and 4:

Step 1 — We roll dice to get our initial value for k, our random number. In this example, it’s the number 2 so we cross out the number in the second position in our initial number set, the number 2 in this case, and write in the number 2 for our sorted number set.
Step 2 — We roll dice and our k value is 3. We cross out the number in the third unstruck (not crossed out) position in our initial number set, the number 4 in this case, and write in the number 4 for our sorted number set, after the number 2.
Step 3 — We have two numbers left not crossed out, 1 and 3. We roll the dice and our k value is 1 so we pick the number in the first position in our initial number set, 1 in this case, and add it to our sorted number set.
Step 4 — We have only the number 3 left so we cross it out and add it to our sorted number set.

concepts-fisher-yates-table-pencil-paper
Original Fisher-Yates Shuffle Algorithm

How Fisher-Yates Shuffle Works with Computers

The modern method, used in computing, follows this pencil and paper process but with a few interesting changes. We can’t pick the last available number, for example, and the modern method substitutes the last number not chosen with the chosen number (instead of crossing out the chosen number). We also remove the chosen number, shortening our number set each time we pick a random number k.

Here’s how the Fisher-Yates Shuffle works with modern method used by computers. We’ll start with an initial number data set between 1 and 4:

Step 1 — We roll dice to get our initial value for k, our random number. In this example, it’s the number 2 so we replace the number in the second position in our initial number set with the last number in our initial number set, the number 4 in this case. We write in the number 2 for our sorted number set.
Step 2 — We roll dice and our k value is 2 (because the modern method does not select the last available unselected number). We replace the number in the second position in our initial number set, the number 4 in this case. We write in the number 4 for our sorted number set, before the number 2.
Step 3 — We have two numbers left not selected, 1 and 3. We roll the dice and our k value is 1 so we replace the number in the first position in our initial number set, 1 in this case, with the last available number, 3. We add the number 1 to the first position in our sorted number set.
Step 4 — We have only the number 3 left so we move it from our initial number set and add it to our sorted number set, in the first position. 3142 is our random number.

concepts-fisher-yates-table-computer
Modern Fisher-Yates Shuffle Algorithm

The Fisher-Yates Shuffle works best with large data sets, not limited sets like the numbers 1 through 4. But these examples give you an idea of the process Fisher and Yates came up with for their statistics, as well as the version Knuth and others come up with to use with computers. The end result is the same or similar: an unbiased sorting of large sets of data.

Indeed, the Fisher-Yates shuffle works so well, if it were put into a hat as wearable technology, the Hogwarts sorting hat might eat itself, if it really means what it says when it sings this song:

“Oh, you may not think I’m pretty,
But don’t judge on what you see,
I’ll eat myself if you can find
A smarter hat than me.”

Learn More

Harry Potter Sorting Hat

https://www.pottermore.com/explore-the-story/the-sorting-hat

The Fisher-Yates Shuffle

https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle

Sir Ronald Fisher

https://en.wikipedia.org/wiki/Ronald_Fisher

Frank Yates

https://en.wikipedia.org/wiki/Frank_Yates

Sorting Algorithm

https://en.wikipedia.org/wiki/Sorting_algorithm

The Art of Computer Programming

http://www-cs-faculty.stanford.edu/~uno/taocp.html
https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming

Donald Knuth

https://en.wikipedia.org/wiki/Donald_Knuth
http://www-cs-faculty.stanford.edu/~uno/boss.html
https://en.wikipedia.org/wiki/Knuth_reward_check
https://en.wikipedia.org/wiki/San_Serriffe


Also In The December 2016 Issue

Hour of Code and EU Code Week are events designed to introduce kids, young adults, and others to programming and computer science.

Real life treasure hunts are a way to get outdoors, learn map skills, and have fun finding hidden caches near you.

A trainable puppy plus treats plus technology equals a dog that can send selfies. Here's how.

An app to help kids remember important stuff like feed your pets, brush your teeth, and smile.

These books include lots of great projects to work on by yourself or with others, from Scratch and Minecraft to fun maker space projects.

The mBot robotics kit is an excellent comparatively low-cost way to begin working with robots.

There are maybe a bazillion Raspberry Pi projects online. Here are really fun projects plus links to find more.

The Wayback Machine lets you travel back in time to see old websites. Plus the Internet Archive has thousands of vintage games, software, books, and more.

Eating dog food doesn't sound like much fun but it's an important part of creating software.

The ability to identify patterns, decompose large problems into small parts, develop algorithms to solve problems, and generalize to find solutions.

To celebrate this wonderful time of the year, let’s create some holiday music using Sonic Pi on our Raspberry Pi.

This project shows how to use the pygame code library to move simple animations with the Python programming language.

This project, shows you how to create your own random password generator in the C# programming language.

These projects mix science and technology in interesting ways. Sewing and electronics, for example, is a different way to learn about electronics.

Links from the bottom of all the December 2016 articles, collected in one place for you to print, share, or bookmark.

Interesting stories about computer science, software programming, and technology for December 2016.

What sounds like a country western dance actually is an efficient way to sort large sets of data randomly.