In this installment, learn about how programming languages are designed.
Welcome back to our series on creating your own programming language! In this article, we’ll be learning more about designing a language. I’ll be outlining a simple language I’ll call “Pangolin”, after the oddly adorable animal, and we’ll discuss the syntax and decisions that go into every step of the language. Separately, there will be interpreters for Pangolin available in the GitHub repository linked at the end of the article. These interpreters, written in several languages, will feature comments explaining how it works and how to adapt the techniques it describes for making your own language.
As a refresher, an interpreter is a program that runs your programming language. It has to:
- Read the code of the program it will run
- Turn that code, which is just a text file, into structured data called the abstract syntax tree
- Use the abstract syntax tree to run, or evaluate, the program
In other words, an interpreter converts the code into something the computer can understand and execute.
So let’s start by talking more about how to design a language. After all, there are a lot of choices to make!
Most programming languages you might’ve seen before have a lot of different features. Things like:
- While loops for repeating an action until something changes
- For loops for repeating an action a number of times
- Variables for storing data
- If statements for making choices
- Objects or arrays for storing complex data
- Functions for structuring code you’ll run again and again
- Modules or namespaces to make writing library code easier
- Types to ensure that data is used correctly
But do we need all of these things in a programming language? Why do most languages have them in some form?
So in general, programming languages are what we call “Turing complete”. That means that they are powerful enough to describe anything a computer could even theoretically do. In fact, you’ve probably seen so many Turing complete languages it might be hard to imagine what a language that isn’t Turing complete would even look like!
Imagine having a language where you don’t have functions, while loops, or for loops. Instead, you can write exactly one action on each line of code. The number of steps your program can run is exactly the lines of code you write. This means you couldn’t make things like operating systems that have to run indefinitely, or user interfaces in webpages or games.
The easiest way to make a language Turing complete is to include some kind of while loop or to have functions. You basically just need some way to do an action over and over and over again until some condition is met. You don’t even need both of these, just one or the other is good enough. The lambda calculus, arguably the oldest programming language even though it predated computers, had only functions. The original version of Fortran only had loops.
If everything beyond functions and variables doesn’t increase the power of the programming language, then why is the syntax of most programming languages so complicated? It’s a matter of usability.
Just because you can do everything with functions doesn’t mean you’d actually want to. It’s similar to why we don’t use assembly language, the most basic language that the CPU can understand, for all our programming. Most programming languages are equivalent in power but not equivalent in usability.
So our first goal here is to create a simple, yet usable, programming language.
Our first version of Pangolin, called PangolinV1, is going to have:
- While loops for iteration
- If expressions for branching
- A method for printing output
Now we’re going to outline the design choices of PangolinV1 and talk about what the other possibilities are!
General Design Choices
PangolinV1’s syntax doesn’t care about whitespace. (var x 10) and
both mean the same thing. Other languages, like Python and Haskell, require you to have a particular indentation, and whitespace is important.
Creating, assigning, and using variables
(var x 10)
(set x 3)
Here are the rules for variables in PangolinV1:
- you have to declare a variable before you use it
- variables have to be declared with an initial value; they can’t be undefined
- the value returned by declaring a variable is the value of the variable
- the value returned by assigning a value to a variable is the new value of the variable
- no parentheses are needed around a variable use
(if cond code1 code2)
For if expressions the condition is evaluated first: if it is a positive number, then execute the first option. If it’s 0, or a negative number, then execute the second option. If it evaluates to something not a number, like a function, then the program should crash immediately. The value returned by the if is the value returned by the code it ends up executing
Sequences of code
Sequences of expressions are started with block, which turns a group of expressions into a single one. Each expression is executed in turn and then the result returned by the “block” is the value of the last expression evaluated.
While loops first evaluate their condition and, like the if-expression above, if the condition is a number greater than 0 the body of the loop runs and the loop starts back over. If the value of the condition is a number equal to or less than 0, then the body is skipped. If the condition evaluates to a function, then the program should crash. The value returned by the while loop is 0.
The value returned at the end of the while loop is very arbitrary. It easily could have been anything, like the number 42 or a function that prints the first ten prime numbers.
It could even return something more useful, like the number of times the loop ran before it finished! Another choice we made is that you could have while loops where the body of the loop runs once no matter what. These were common in older languages and were generally called “do-while” loops.
Operations on numbers
Addition and subtraction do what you’d expect, but are written like
(+ 2 3)
(- 10 100)
The comparison operators are < and =. In
(< x y)
if x is less than y then (< x y) returns 1, otherwise it returns 0. What should happen if you pass in a function instead of a number to <? Should that just automatically return 0? Should it cause the program to crash? For Pangolin, I chose to have it crash because I personally think if you compare a function and a number together then some kind of mistake happened and the program should stop so you can debug it. There are a lot of people who would disagree with that, though. Maybe you’re one of them!
Similarly, = returns 1 if both numbers are the same and 0 if they’re not, and if something that isn’t a number is passed to = then the program crashes immediately.
Function declarations and calls
(function (arg1 arg2 ...) body)
(fn arg1 arg2 arg3 ...)
Functions in Pangolin are always unnamed. To create a named function, you need to assign a function value to a variable, like
(var id (function (x) x))
When functions are called all their arguments are evaluated first, then the values are handed to the function.
There’s a really interesting design decision here, though, that might not be obvious at first! If a variable is passed to a function, what does that mean? Since variables are like boxes with data in them, does giving a variable to a function mean that you’re giving the function the entire box to play around with, or are you handing the function what’s in the box? In other words, if you have the following program should it print out 1 or 2?
(var x 1)
(var fn (function (a) (set a (+ a 1)))
This is the difference between what computer scientists call “call-by-value” and “call-by-reference” in a programming language. Call-by-reference is when you’re giving the function access to “the container” and call-by-value is when you’re only giving the function access to “the value” of the variable. In a call-by-reference language, the program will print out 2 because the function fn can change the contents of the variable. In a call-by-value language, it will print out 1 because the function only gets the contents of the variable. In Pangolin, we’ve chosen our functions to operate as call-by-value.
In Pangolin, like most languages, when you create variables inside of a function those variables are only visible within the function.
Printing: You can print values with the built in print function.
Right now, print only prints the word function when you pass it a function, but often in the Lisp family of languages printing a function actually shows you the code of the function! That’s a really useful feature for debugging, and one you might want to try in your own languages.
So that’s our summary of Pangolin and some of the design choices that we had to make to design a language. There are a lot more small details to consider than you might think!
I hope you check out the PangolinV1 interpreters and follow the instructions there to try designing your own language.
Next time we’ll be talking about adding more features and kinds of data to Pangolin and how this opens up even more choices.
Pangolin Code Repository
A reference on Fortran 77, a version of Fortran that has call-by-reference functions and is a good glimpse into what older languages looked like
PangolinV1 is meant to look like a simple, modern, language. Forth is an example of just how different a programming language can look while still being Turing complete
Another very different kind of programming language is Pharo, which has a common ancestor with Scratch!
How To Create a Programming Language
Also In The August 2017 Issue
A substitution cipher is an easy way to begin learning about how to use and make secret codes.
What are the odds two people in your classroom share a birthday? Much higher than you think!
Scratch is a fun block-based programming language that's easy to learn once you understand the basics.
The micro:bit is a not too expensive board that lets you easily build projects to learn about computing.
Meet Thomas, a turtle who can help you draw stars with Python (not the snake!).
The humble sewing machine can be a great first step to fun maker projects. Here's how to get started!
There's lots you can do make your online experiences enjoyable AND safe.
Minecraft is a fun game to play and a way to learn about games and programming. But first you have to learn the basics.
Have you ever put books in alphabetical order? What do you think the best method of alphabetizing would be?
With an EV3 robotics set, you can build all kinds of robots!
Some ideas how to engage young women in computing and STEAM based on recent research.
These three dimensional objects are 3D printed and cast images when light shines through them.
How do computers predict what text you want to write next? Here's how to create predictive stories.
This tutorial shows how to create a chat bot that plays hangman.
In this installment, learn about how programming languages are designed.
Links from the bottom of all the August 2017 articles, collected in one place for you to print, share, or bookmark.
Interesting stories about computer science, software programming, and technology for February 2017.