Lessons and labs will be aggregated on this page.
This is the multi-page printable view of this section. Click here to print.
Labs
- 1: 01. Installing a *nix operating system
- 2: 02. Intro to the CLI
- 2.1: Launching a Terminal
- 2.2: Navigating the file system
- 2.3: File and directory management
- 2.4: Text files
- 2.5: Process management
- 3: 03. Installing the VSCode IDE
- 3.1: for WSL
- 3.2: for VirtualBox
- 3.3: for Mac
- 4: 04. VSCode basics
- 4.1: Keyboard shortcuts
- 4.2: Managing files
- 4.3: Editing code
- 4.4: Running code and the integrated terminal
- 5: 05. Debugging
- 6: 06. Testing
- 6.1: Assertions
- 6.2: Unit testing
- 6.3: Structuring test code
- 6.4: pytest
- 6.5: Testing for exceptions
- 6.6: Test coverage
- 7: 07. Comprehensive example
- 8: 08. Control Flow Graphs
- 9: 09. Code Readability
- 9.1: Coding conventions
- 9.2: Documenting code
- 10: 10. Code-level Design
- 11: 11. Version Control
- 11.1: Git and GitHub setup
- 11.2: Git basics
- 11.3: Undoing mistakes with Git
- 11.4: Branching and Merging, Part 1
- 11.5: Branching and Merging, Part 2
- 11.6: GitHub CLI setup
- 11.7: Remote repos
- 11.7.1: Scenario 1 - Sharing a new project
- 11.7.2: git push
- 11.7.3: Scenario 2 - Clone an existing project
- 11.7.4: Scenario 3 - Retrieving changes
- 11.7.5:
- 12: 12. Remote Servers
- 12.1: Connecting to ada
- 12.2: Working on ada
- 13: 13. Server and Client App Samples
- 13.1: Flask server app
- 13.2: PyGame client app
1 - 01. Installing a *nix operating system
You are getting the first edition of all these pages. Please let me know if you find an error!
Windows is the world’s most popular OS for the home user, but most commercial software runs on Linux. Linux is derived from an OS called Unix. So are Android, macOS, and iOS.
Windows and Unix-derived operating systems do the same things, but the specific commands you type and the way you set up your user environment differ between the OS families.
Jump to the section for your computer’s OS to get started.
Starting from Windows
For your personal computer, choose whichever Option works best. For a lab computer, you must use Option 2.
The options below install Ubuntu Linux, which will require 18-30GB of disk space. You are free to use any other Linux distribution that you are comfortable with. Ubuntu will provide a familiar experience to Windows users.
You are going to keep Windows on your computer. We are going to use virtualization tools to make Linux think that it is running on actual hardware, when in reality it is running “inside” Windows. The virtualization software passes OS commands from Linux to Windows, and Windows ultimately controls the hardware. But, Linux, in its virtual environment, will manage all the software running inside it.
Option 1 (preferred) - Windows Subsystem for Linux
You will enable a virtualization feature called the Windows Subsystem for Linux (WSL) and install a Linux distribution.
Follow the instructions here (use Method 1): https://canonical-ubuntu-wsl.readthedocs-hosted.com/en/latest/guides/install-ubuntu-wsl2/
Option 2 - VirtualBox
If you cannot perform Option 1, you will have to install a program called VirtualBox to perform the virtualization. You must use VirtualBox on lab computer; it is already installed in CG 2055 and CG 2004.
Follow this tutorial https://ubuntu.com/tutorials/how-to-run-ubuntu-desktop-on-a-virtual-machine-using-virtualbox#1-overview. There are 5 pages in the tutorial.
- Important Note: In Step 2 of the tutorial, do not put create “machine folder” inside of folder that is backed up by OneDrive or Google Drive. Mine defaults to
c:\Users\laymanl\VirtualBoxVMs
, which is fine.
Starting from Mac
You don’t need to do anything. macOS is derived from Unix, so everything we do in Linux you should be able to do in macOS. There will be a few minor differences.
Just don’t tell anyone that you’re using Linux on a Mac, because you are not! macOS is not Linux, but they speak the same language.
2 - 02. Intro to the CLI
You are getting the first edition of all these pages. Please let me know if you find an error!
You are responsible for knowing all the CLI commands in this lab.
By the end of the lab, you should be able to navigate the Linux file system, manage files and directories, manipulate text files, and utilize process management commands.
Make sure that you have completed Installing a *nix operating system first!
Pro tips before you get started
- Mega important:There is no notion of “undo” in the CLI. You run a command, it’s done. So you have to be careful when you do things like delete or move files in the CLI. There is no Trash Can.
- Press the
Tab
key while you are typing. The terminal will attempt to autocomplete the command or filename you are typing. Big typing time saver. - Use the
up arrow
on your keyboard to cycle through the most-recently used commands you typed in. Good for re-running things. - Program going crazy and the CLI is not responding? Stuck typing and can’t get out? Press
Control+C
(Linux) orCommand+C
to stop what is happening. This sends a signal to the OS to kill the current process.
2.1 - Launching a Terminal
You are getting the first edition of all these pages. Please let me know if you find an error!
Launching a terminal on Mac
The terminal program on Macs is simply called “Terminal”. You can open it in two ways:
- Finder –> Applications –> Utilities –> Terminal
- Press Command+Spacebar. Type “terminal” in the Spotlight Search popup and you will see an option to open the Terminal.
- CMD+Spacebar is a great way to open apps quickly on Mac.
- You may wish to drag the Terminal application to your Dock at the bottom.
Ubuntu (on Windows)
Using the Windows Subsystem for Linux
Windows has several terminal programs. Windows PowerShell and Command Prompt are for interacting with Windows CLI directly. We want to open an Ubuntu terminal for interacting with the Ubuntu OS you installed in the previous lab.
- Open the Windows menu and search for “Terminal”
- It will most likely open a window like this:This is PowerShell (for talking to Windows) and is not what we want.
- Click the dropdown to the PowerShell tab. You should see an option for Ubuntu. Select it. After a moment, you should see the Ubuntu Terminal that looks like this:
Using VirtualBox
- Open VirtualBox and start your Ubuntu virtual machine.
- Once Ubuntu opens, click the “More Apps” icon and find the Terminal.
- Alternately, press the Windows key (called the super key in Ubuntu) and start typing “Terminal” and you will see it suggested.
- The Windows key and typing a search term is a great way to find things in Ubuntu and usually faster than clicking through a menu.
2.2 - Navigating the file system
You are getting the first edition of all these pages. Please let me know if you find an error!
Part 1: Navigating the File System
Understanding the File System Structure
Filesystems are follow a “tree” structure for both Windows and Unix-based systems. Specifically, an upside-down or sideways tree.
Key terms and concepts
Course Note: You need to know terms and concepts that look like this.
Directories hold files and other directories. When you use the term subdirectory, you are talking about the directories listed inside the current working directory.
Files represent programs, pictures, audio, video, word processing docs, and the like can be run by the operating system (in the case of programs) or opened by another piece of software, like Photoshop, Microsoft Word, VSCode, etc.
The file system has a root directory. On Linux (and Mac), this directory is named /
. On Windows, it is typically C:\
.
- Linux uses forward slashes (
/
), whereas Windows uses backslashes (’\’). It matters, and is an endless source of annoyance for developers.
A user’s home directory is where their user-specific content lives, like documents and pictures that you save. On your personal computer, you probably only have one user. A lab machine will have many different users.
- On Linux, the home directory for the user named ‘alice’ is
/home/alice/
- On Mac, it would be
/Users/alice/
- On Windows, it would be
c:\Users\alice\
You can use the Terminal/CLI to navigate the file system, like you would graphically using the Windows Explorer or Mac Finder to navigate to files. As you navigate with the CLI, you are “in” one directory at a time. The directory that you are currently “in” is called the working directory. Commands that you run run in the context of the working directory. If you were to create a file using the CLI, for example using touch newfile.txt
, it will create the file in the working directory. Or if you were to try and run a program from the CLI, it will look in the working directory for that program (and other places we will discuss later).
Explore the root directory using the ls
and cd
commands.
Open a Terminal for Mac or Ubuntu. See the Launching a Terminal lab.
Type in the following CLI commands one at a time.
pwd
ls # This will not show anything because there are no files.
cd .. # Go "up" one level in the file tree.
pwd
ls # This should now list some things.
ls / # List the files in the root.
ls -l / # List the details of files in the root.
cd / # Change working directory to root.
ls # list files.
cd .. # Go up... But it won't go anywhere because you can go higher!
ls # You're still in the root. List root's files.
None of these commands change anything on your computer. They give you information and let you navigate between directories.
Mac users: If you encounter a Permission Denied
error while running the ls /
or cd /
commands, try running sudo ls /
or sudo cd /
. It will prompt you to enter your password. The sudo
command makes you an “administrator” in the eyes of the CLI. Mac is protecting the sensitive /
directory, and wants to make sure you have permission to do what you’re trying to do.
Key Commands
pwd
- Print Working Directory - what is the name of the directory you are currently “in”. Use then when you don’t know where you are.ls
- List contents. Will show both subdirectories and files in the working directory.ls <target>
- List the contents of target directory, e.g.,ls /usr/
ls -l
- Lists contents and gives you additional information, like the file type. May also dols -l <target>
ll
- Shorthand forls -l
. Can doll <target>
cd
- Change Directory. This is how you navigate.cd /
changes to the root directorycd ~
or simplycd
will navigate to the user’s home directory.cd ..
go “up” one level to the parent of the current directorycd <target>
changes to the<target>
directory.
The argument of the ls
and cd
commands is a directory name or the special ..
symbol. You can “jump” directories by putting the directories full name, like ls /usr/bin/
. A directory’s full name is called its path.
You can also specify relative paths, which we will discuss more later.
The terminals are capable of autocompleting. Type cd
to change to your home directory, then type cd D
then hit the Tab key. What happens? The terminal will find all subdirectories (if any) of your working directory that start with capital D.
Extremely important point: all file system names are case-sensitive in Linux. For example, you can have files named user.txt
and User.txt
and or a directory /usr/
and /Usr/
they are different. Capitalization matters in software development.
Exercise:
- Navigate to the
/usr/
directory. Use thepwd
command to display your current directory. Typels
. What do you see? - Now type
ls -l
orll
. What do you see? - Use
cd ~
or simplycd
to navigate to the home directory. Usels
to display the files and folders. What do you see?
Knowledge Check:
- Question: What does the
pwd
command do? - Question: How do you navigate to the root directory?
- Question: How do you navigate to your home directory?
2.3 - File and directory management
You are getting the first edition of all these pages. Please let me know if you find an error!
By the end of the lab, you should be able to navigate the Linux file system, manage files and directories, manipulate text files, understand basic file permissions, and utilize process management commands.
Part 2: File and Directory Management
Reminder: All file system names a case-sensitive.
Now, let’s practice adding and removing files and directories using the CLI.
Creating and Removing Directories
mkdir
- Make Directoryrmdir
- Remove Directoryrm -r
- Remove Directory and its contents recursively. WARNING: This is going to delete the directory and everything below it recursively. Linux does not have ‘undelete’, so be very careful with this command!
The commands below have a #
character, which indicated the beginning of a comment. # comments
are there for clarification and you do not type them.
Type ls
after each command below to see the changes:
cd # switch to your home directory
mkdir MyLab
cd MyLab
rmdir MyLab # This should fail because the directory is not empty
cd ..
rm -r MyLab
Creating, Copying, and Deleting Files
touch
- Create an Empty Filecp
- Copy Files and Directoriesrm
- Remove Filesmv
- Move or Rename Files
Type ls
after each command below to see the changes:
cd # go to your home directory
touch sample.txt
cp sample.txt sample_copy.txt
mv sample.txt renamed_sample.txt
rm sample_copy.txt
Exercise
- Create a new directory named
LabDirectory
- Navigate into this directory using the
cd
command - Create a new file named
LabFile.txt
inside this directory. Usetouch
- Copy this file to a new file named
LabFileCopy.txt
. Usecp
- Delete
LabFileCopy.txt
. Userm
2.4 - Text files
You are getting the first edition of all these pages. Please let me know if you find an error!
Part 3: Text File Manipulation
You can use the CLI to do simple or complex text manipulation. As developers, you will use a text editor or IDE like IDLE, PyCharm, or VSCode to do such tasks most of the time. However, it can be handy to do from the CLI sometimes, and many scripts used to compile and build software these CLI text-manipulation techniques.
Key terms
Most CLI commands, including the ones you have already seen like ls
and pwd
have an output that is printed to the terminal. Some commands, like cp
, do NOT have an output printed to the screen.
Below you will see the special >
and >>
operators.
>
is the redirect operator. It takes the output from a command and writes it to a file you specify, e.g.,echo "hello" > file.txt
. It will create the file if it does not exist, and will overwrite the file if it does exist!>>
is the append operator. It will create the file if it does not exist, and will append to the end of the file if it does exist!
Viewing and Editing Text Files
echo
- Display a line of textcat
- Concatenate and display file contentsless
- View file contents one screen at a timegrep
- Search for patterns in Files
echo "Hello, Linux CLI!" > hello.txt
cat hello.txt
echo "Another line" >> hello.txt
cat hello.txt
grep "Hello" hello.txt
grep "o" hello.txt
seq 1 1 10000 >> numbers.txt # making a big file - no need to learn.
cat numbers.txt
less numbers.txt # Spacebar goes forward, b goes back, q to quit.
Exercise
- Use
echo
to create a text file with some content. Tryecho "this is my first file" > myfile.txt
- Use
cat
will print all of the file’s contents to the screen all at once. - Use
echo
to append text to the file. - Use
grep
to search for the word “first” in the file. - Use
less
to view the file content one screen at a time. Hitq
to exit.
Knowledge check
- Question: How can you append text to an existing file using
echo
? - Question: What command would you use to search for a specific word in a file?
2.5 - Process management
You are getting the first edition of all these pages. Please let me know if you find an error!
Part 4: Process Management
Key terms
We discussed what a process is when we introduced Operating Systems concepts. Below you will see a reference to PID - Process ID. This is an integer that uniquely identifies the process to the OS. As a user, you use the PID to specify which process you are talking about.
Run the following:
ps
top # hit q or Control+C to quit the program.
Monitoring and Controlling Processes
ps
- Report a snapshot of current processestop
- Display Linux processes and how much memory or CPU they are using. Similar to the Activity Monitor on Mac and the Task Manager on Windows. Hitq
to exit.- Use the keyboard combo
Control+C
to kill/quit the current process. kill
- Send a signal to a process
Exercise
We are going to install Python and create a wild task.
On Ubuntu only: Run the command
sudo apt install python3
- You will be prompted to type your password. The terminal will not show any characters while you are typing.
- You will see some text as python3 installs.
Open a second Terminal:
- On Mac or in VirtualBox Ubuntu: You can click the
+
button on the tab in the current Terminal. You should see a second “fresh” terminal pane. - Ubuntu on WSL: Click the drop down next to your Ubuntu tab and make sure to pick Ubuntu again. You should see a second “fresh” Linux pane. If you see Powershell or Command Prompt, you’re in the wrong place.
- On Mac or in VirtualBox Ubuntu: You can click the
Now run
python3
and create the following infinite loop.pythonwhile True: print("hello there")
We should now have an out of control Python process gobbling up CPU cycles.
Switch back to the other Terminal tab and run the following commands.
ps
top # find the PID of the python process that is gobbling all the CPU
kill <PID> # Replace <PID> with the actual process ID
The terminal will not say anything, but run top
again. The runaway Python process should be gone. Switch back to the Terminal tab where you had that Python process and it should say terminated
or something similar.
Knowledge Check
- Question: How can you view real-time process activity?
Conclusion
Anything you can do with your OS’s GUI, you can do on the command line. It just looks different. Become comfortable with the CLI – you will find that it can be MUCH faster for certain tasks, and will be indispensable to you as a software engineer.
Also, the commands above have equivalent commands on Windows machines (mostly). If you are a regular Windows user, you would do yourself a favor to learn the equivalent commands to things like ls
, rmdir
, and cd
in the Windows CLI (PowerShell).
Final Knowledge Check
- Question: Summarize the steps to create a new directory, navigate into it, create a text file, and view it using
less
. - Question: From the CLI, how would you find the runaway process with a memory leak (probably using the most memory) and terminate it?
Further Reading
- Consider walking through this tutorial for even more explanation and extra commands and concepts: https://ubuntu.com/tutorials/command-line-for-beginners#3-opening-a-terminal
3 - 03. Installing the VSCode IDE
You are getting the first edition of all these pages. Please let me know if you find an error!
The most useful tool for a software developer, other than the brain, is an integrated development environment (IDE). You may have used IDEs in your classes, such as IDLE (which is bundled with Python), PyCharm, IntelliJ, Visual Studio, or XCode. IDEs usually have the following capabilities at a minimum:
- Text editing for writing source code
- Running the code
- Debugging (more on this in the future)
- Browsing files
- Searching through files
- Navigating through code structures easily
Most IDEs have many more capabilities. Software developers develop a preference for an IDE based on its capabilities, its ease-of-use, and the programming languages it supports.
In this class, we will use Visual Studio Code or VSCode. VSCode is an open source IDE maintained by Microsoft. It has support for nearly all programming languages, is lightweight on system resources, and has many optional add-in “extensions” to provide even more useful capabilities.
Note: VSCode works in Windows, Mac, and graphical Linux-based operating systems. If you are using Windows, we want to run it from our Linux environment
Choose the section corresponding to your Linux environment for instructions on installing VSCode. Skip to 04. VSCode basics lab if already have VSCode set up in macOS, Ubuntu, or with WSL.
3.1 - for WSL
You are getting the first edition of all these pages. Please let me know if you find an error!
This lab is for those who are on Windows and have set up the Windows Subsystem for Linux (WSL) using Option 1 for Windows from this lab.
Installation
Install Visual Studio Code on the Windows side (not in WSL).
- When prompted to Select Additional Tasks during installation, be sure to check the Add to PATH option so you can easily open a folder in WSL using the
code
command.code
will launch VSCode from the CLI.
- When prompted to Select Additional Tasks during installation, be sure to check the Add to PATH option so you can easily open a folder in WSL using the
Run VSCode from the Windows side after installation is complete.
Click on the Extensions button in the far left sidebar or press Ctrl+Shift+X.
Type WSL in the search box under
EXTENSIONS: MARKETPLACE
. The top result should be WSL from Microsoft. Click the install button:Also search for Python in the Extensions marketplace and install it. The one you want is also from Microsoft.
With the WSL extension, if we open a directory in Ubuntu with Python files, VSCode will use the tools installed in Ubuntu when working with files inside Ubuntu. When working with files in Windows, VSCode will use programs installed on the Windows side.
You should now be good to go to develop Python code that lives in Linux from VSCode.
Test drive
We are going to create a sample project directory in Ubuntu on the WSL, then open VSCode and edit files in that Linux directory.
Launching VSCode from Ubuntu
Start Ubuntu from Windows by selecting Ubuntu from the Windows run menu, or by opening an Ubuntu terminal.
Run the following in the Ubuntu terminal:
Thebashcd # make sure in your home directory mkdir python-test # make a directory to play in cd python-test # change to the new directory code . # launch VSCode in the current directory
code
command launches the VSCode program. It was added when we installed the WSL extension. The commandcode .
says launch code and have it open the current working directory. The symbol.
always means the working directory. Sometimes it will be necessary to explicitly tell the CLI we are referring to the working directory; more on those situations as they arise.You will see a download take place. A new VSCode window will open after a moment on the Windows side.
You may be asked if you “trust the authors of the files in this folder”. Click the checkbox and then pick “Yes, I trust the authors.”
You should see something like the following when complete.
The pane on the left is the Explorer pane. This is showing the directory
python-test
. There are not yet any files in the directory.
Creating a new file
Let’s create a file on the Ubuntu side in our project directory. We should see it immediately in VSCode.
- Go back to your Ubuntu terminal and make sure you are in the
python-test
directory. - Type the command
touch hello.py
to create an empty Python file. - Go back to VSCode. You should see the file
hello.py
in the directory here. Click on it and it will open an empty editor pane. - In the code editor, type
print("Hello World")
. Hit CTRL+S to save the file. You must explicitly save your changes in VSCode. - Go back to the Ubuntu Terminal and type
cat hello.py
. You should see the code.
So you now have VSCode on the Windows side successfully editing files and interacting with directories inside Ubuntu.
You are now ready to code! Move on to 04. VSCode basics lab.
3.2 - for VirtualBox
You are getting the first edition of all these pages. Please let me know if you find an error!
This lab is for those who are on Windows and are running Ubuntu inside VirtualBox using Option 2 for Windows from this lab.
Installation
Open VirtualBox and start up your Ubuntu virtual machine (VM). Sign in to Ubuntu.
- You may want to go full screen. Do this by selecting View -> Full Screen.
- If the Full Screen is small, right-click on the Desktop -> Display Settings then change the Resolution to something larger, probably 1920x1080.
- You exit full screen by hitting
Right CTRL+F
In Ubuntu, click the “App Center” icon in the left app bar.
Type
visual studio code
in the search bar of the App Center. Select the “code” result with the blue icon.Click “Install” on the next screen.
You will be prompted to enter your Ubuntu login password. VSCode will now take a few moments to install.
You should see a green “Open” button when the installation finishes. Click “Open”.
VSCode will open and you will see a screen similar to this:
On the far left side of Ubuntu in the app bar (called Dash), you will now see the VSCode icon. Right-click it and select “Pin to Dash” to make it easy to launch VSCode from the Ubuntu desktop.
Configuring VSCode for Python
Click on the Extensions button in the far left sidebar or press Ctrl+Shift+X.
Type
python
in the search box underEXTENSIONS: MARKETPLACE
. The top result should be Python from Microsoft. Click the Install button:
You should now be good to go to develop Python code in Ubuntu.
Test drive
We are going to create a sample project directory in Ubuntu on the WSL, then open VSCode and edit files in that Linux directory.
Launching VSCode from the Terminal
Start a Terminal in Ubuntu. Hit the Windows key and start typing
terminal
. Select the Terminal app.- You may also want to right-click the Terminal and “Pin to Dash” for easy startup!
Run the following in the Ubuntu terminal:
Thebashcd # make sure in your home directory mkdir python-test # make a directory to play in cd python-test # change to the new directory code . # launch VSCode in the current directory
code
command launches the VSCode program. It was added when we installed the WSL extension. The commandcode .
says launch code and have it open the current working directory. The symbol.
always means the working directory. Sometimes it will be necessary to explicitly tell the CLI we are referring to the working directory; more on those situations as they arise.A VSCode window will open after a moment.
You may be asked if you “trust the authors of the files in this folder”. Click the checkbox and then pick “Yes, I trust the authors.”
You should see something like the following when complete.
The pane on the left is the Explorer pane. This is showing the directory
python-test
. There are not yet any files in the directory.
Creating a new file
Let’s create a file in the Terminal in our project directory. We should see it immediately in VSCode.
- Go back to your Ubuntu terminal and make sure you are in the
python-test
directory. - Type the command
touch hello.py
to create an empty Python file. - Go back to VSCode. You should see the file
hello.py
in the directory here. Click on it and it will open an empty editor pane. - In the code editor, type
print("Hello World")
. Hit CTRL+S to save the file. You must explicitly save your changes in VSCode. - Go back to the Ubuntu Terminal and type
cat hello.py
. You should see the code.
So you now have VSCode successfully editing files and interacting with directories inside Ubuntu.
You are now ready to code! Move on to 04. VSCode basics lab.
3.3 - for Mac
You are getting the first edition of all these pages. Please let me know if you find an error!
This lab is for those who are installing Visual Studio Code on Mac machines.
Installation
Instructions in this section are taken from https://code.visualstudio.com/docs/setup/mac.
- Download Visual Studio Code for macOS.
- Open the browser’s download list and locate the downloaded app or archive.
- If archive, extract the archive contents. Use double-click for some browsers or select the ‘magnifying glass’ icon with Safari.
- Drag
Visual Studio Code.app
to the Applications folder, making it available in the macOS Launchpad. - Open VS Code from the Applications folder by double clicking the icon.
- Add VS Code to your Dock by right-clicking on the icon, located in the Dock, to bring up the context menu and choosing Options, Keep in Dock.
Enable launching VSCode from the CLI
You can also run VS Code from the terminal by typing code
after adding it to the path:
- Launch VS Code.
- Open the Command Palette (Cmd+Shift+P) and type ‘shell command’ to find the Shell Command: Install ‘code’ command in PATH command. Select this.
- You will need to restart any open Terminal windows for the change to take effect. You’ll be able to type
code .
in any folder to start editing files in that folder.
Configuring VSCode for Python
Click on the Extensions button in the far left sidebar or press Ctrl+Shift+X.
Type
python
in the search box underEXTENSIONS: MARKETPLACE
. The top result should be Python from Microsoft. Click the Install button:
You should now be good to go to develop Python code with VSCode on Mac.
Test drive
We are going to create a sample project directory in the Terminal, then open VSCode and edit files in that directory.
Launching VSCode from the Terminal
Start a Terminal in Ubuntu. Hit the Windows key and start typing
terminal
. Select the Terminal app.- You may also want to right-click the Terminal and “Pin to Dash” for easy startup!
Run the following in the Ubuntu terminal:
Thebashcd # make sure in your home directory mkdir python-test # make a directory to play in cd python-test # change to the new directory code . # launch VSCode in the current directory
code
command launches the VSCode program. It was added when we installed the WSL extension. The commandcode .
says launch code and have it open the current working directory. The symbol.
always means the working directory. Sometimes it will be necessary to explicitly tell the CLI we are referring to the working directory; more on those situations as they arise.A VSCode window will open after a moment.
You may be asked if you “trust the authors of the files in this folder”. Click the checkbox and then pick “Yes, I trust the authors.”
You should see something like the following when complete.
The pane on the left is the Explorer pane. This is showing the directory
python-test
. There are not yet any files in the directory.
Creating a new file
Let’s create a file in the Terminal in our project directory. We should see it immediately in VSCode.
Go back to your Terminal and make sure you are in the
python-test
directory.Type the command
touch hello.py
to create an empty Python file.Go back to VSCode. You should see the file
hello.py
in the directory here. Click on it and it will open an empty editor pane.In the code editor, type
print("Hello World")
. Hit CTRL+S to save the file. You must explicitly save your changes in VSCode.Go back to the Terminal and type
cat hello.py
. You should see the code.
So you now have VSCode successfully editing files and interacting with directories inside Ubuntu.
You are now ready to code! Move on to 04. VSCode basics lab.
4 - 04. VSCode basics
You are getting the first edition of all these pages. Please let me know if you find an error!
This lab provides the minimum introduction to VSCode needed to write programs. VSCode has similar functionality to other professional IDEs, such as PyCharm, IntelliJ, or XCode.
4.1 - Keyboard shortcuts
You are getting the first edition of all these pages. Please let me know if you find an error!
Keyboard shortcuts
Everything you can do with a menu and a mouse has a keyboard shortcut. Menu+mouse is easier to learn, but keyboard shortcuts will make you about 30% more productive once you master them.
Rule of thumb: If you use the same mouse+menu commands over and over, learn the keyboard shortcut instead. Try to learn a shortcut or two each week.
I’ve highlighted my most-used keyboard shortcuts in the official cheatsheets from VSCode:
- Cheatsheet for Mac
- Cheatsheet for Windows if using WSL.
- Cheatsheet for Linux if using VirtualBox.
When you see, e.g., Ctrl+X
, that means hold down the Control key and then press X.
4.2 - Managing files
You are getting the first edition of all these pages. Please let me know if you find an error!
Organizing and opening projects
The last thing we did in Lab 03. Installing the VSCode IDE was to open the python-test
directory in VSCode.
Rule #1: Keep each project, assignment, and lab in its own directory. It is fine to aggregate those files under a single directory, like so:
~/seng-201
├── assignment1
├── assignment2
├── lab01
├── lab02
└── python-test
├── fib.py
├── hello.py
└── hello2.py
seng-201
subdirectory in my home directory symbolized by the ~
. Inside seng-201
, I have created subdirectories for each project.Rule #2: Open the specific project directory in VSCode, not the parent directory. Suppose you want to work on assignment1
, then you need to open the assignment1
folder. You open a folder in VSCode in two ways:
- Use your Terminal/CLI to
cd
into the project folder, then typecode .
- Open VSCode first, then do
File > Open Folder
and descend inside the project folder, then clickOpen
.
The folder you open serves as the working directory for VSCode. If you open the parent folder seng-201
, you can still edit project files, but you will add complexity to running the projects. Don’t do it.
Explorer pane
The Explorer pane is where you browse and manage files. Open it by clicking on the Explorer icon on the main left sidebar:
Things you can do in the Explorer:
- Create new files and subdirectories.
- Double-click files to open.
- Right click files and directories for a variety of tools, like renaming and deleting.
Exercise
Click on the
python-test
folder name. You created this folder at the end of the 03. Installing the VSCode IDE labClick the New File icon. Type
hello.py
in the box and hit Enter.You will see an editor tab pop open on the right with the name
hello.py
at the top.
Knowledge check:
- Question: (True/False) Each coding project should have its own directory on the filesystem?
- Question: (True/False) It’s okay to open the parent directory holding multiple projects in VSCode?
- Question: How do you open VSCode from the current directory from the CLI?
4.3 - Editing code
You are getting the first edition of all these pages. Please let me know if you find an error!
Editing
An Editor pane will automatically open every time you open a file. Things to know about the Editor windows:
- You must explicitly save files you have edited. Do this with
Ctrl+S
(Windows, Linux) orCmd+S
(Mac) - The line numbers on the left side are used to identify individual lines of code in error messages and elsewhere.
- Familiar text editing features like Cut and Paste are available in the
Edit
menu at the top or Right-Clicking in an editor window. Learn those keyboard shortcuts! CMD+/
(Mac) andCtrl+/
(Windows, Linux) toggles comments on the current line or selected lines. This is one of my favorite keyboard shortcuts!- Suppose your code calls a function defined elsewhere. Hold down
Cmd
(Mac) orCtrl
(Windows, Linux) and hover over the function call. It will turn blue like a link. Left click the link and the function definition in the editor. Very handy! Look up the Go back keyboard shortcut to return your cursor to where you were. - Not happy with a variable or function name?
Right-click it > Rename Symbol
It will be renamed everywhere in scope! - Use the arrow keys to move the cursor one character at a time. Hold down
Alt
(Windows, Linux) orOption
(Mac) while tapping the left- or right-arrows. You will skip entire “words”. Again, very handy. Hold downShift
as well to select those words!
Exercise
Create a new file called fib.py
in your python-test
folder and paste in the following code:
def fibonacci(n):
"""
Computes and returns the Fibonacci sequence of length n.
Assumes n >= 1
"""
if n == 1:
return [1]
if n == 2:
return [1, 1]
result = [1, 1]
for i in range(2,n):
result.append(result[i-1] + result[i-2])
return result
print(fibonacci(1))
print(fibonacci(2))
print(fibonacci(6))
print(fibonacci(10))
- Hold down
Cmd
(Mac) orCtrl
(Windows, Linux) and mouse over one of thefibonacci()
calls at the bottom. Click the link and watch the cursor jump. - Using the keyboard shortcut, comment out the first three
print(...)
calls at the bottom all at once. - Hit
Ctrl+S
to save the file. - Now uncomment them all at once.
Right-click
afibonnaci()
call and rename the symbol. Where does it change in the code?- Hit
Ctrl+Z
orCmd+Z
to undo the rename.
Knowledge check:
- Question: How do you comment/uncomment a block of code with your keyboard?
- Question: What is the keyboard shortcut for saving your edits to a file?
- Question: What does holding down
Cmd
orCtrl
+ left-clicking on a name in the editor window do?
4.4 - Running code and the integrated terminal
You are getting the first edition of all these pages. Please let me know if you find an error!
VSCode itself does not know how to run Python code or any other language. VSCode instead uses tools installed on your computer to run programs, e.g., the Python tools you downloaded from https://python.org. So if you want to use VSCode to develop, e.g., Java or Javascript programs, you need to have the necessary tools installed on your system.
VSCode will automatically find language tools on your file system if they are installed in a “standard” location.
Running code
There are three ways to run a program file:
- Select the
Run
menu at the top, thenStart Debugging
- If necessary, select the
Python Debugger
popup, and select default options of subsequent pop-ups until you see the program run in the interactive Terminal at the bottom. - We will discuss the difference between
Start Debugging
andRun Without Debugging
in the future.
- If necessary, select the
- In the editor window,
Right-click
anywhere in the code to open the context menu, then selectRun Python > Run Python File in Terminal
. - Press the
F5
hotkey to start debugging.
By default, VSCode will run the file in the active editor. Alternately, you can right-click a different file in the Explorer and run it.
Exercise
- Create
hello.py
in thepython-test
directory if needed and addprint("Hello World")
- Run
hello.py
using the Run menu - Run it using the editor context window
- Use the
F5
key. If yourF5
key is missing or hard to work with, use Google to research “how to reassign keyboard shortcuts in VSCode”. Reassigning it to the shortcutCmd+R
orCtrl+R
is a solid option.
The Integrated Terminal
When you ran your hello.py
program, you should have seen a flurry of output in the Integrated Terminal window at the bottom. What just happened?
- VSCode opened a Terminal CLI, like you did in the Launching a Terminal lab, except this one is embedded in VSCode.
- VSCode issued the CLI command
python
with your file as an argument. python
runs in the Terminal and prints output.
Your Terminal contents will different, however, you should see Hello World
in there.
Remember, VSCode doesn’t run Python code itself – it uses the tools installed on your computer to do it.
Important note: The Terminal in VSCode is an embedded version of the Terminal we used in Intro to the CLI. You can use the same CLI commands like cd
, ls
, mkdir
, etc.
You may find it convenient to use this integrated Terminal rather than switching to a separate windows. Or you may prefer to keep them separate. Do what works for you.
You can always open the Terminal in VSCode by clicking the Terminal pane (highlighted red in the figure above), or by selecting the Terminal
menu at the top.
Exercise
- List directory contents in the integrated Terminal using the
ls
command. - Type
cd ~
in the integrated Terminal to switch to your home directory. Notice how the Explorer pane does not change. You are only changing the working directory in the Terminal. - Run
hello.py
again using VSCode. What happens in the Terminal? - Use the Terminal to navigate to your
python-test
directory usingcd
commands. - Run the command
touch hello2.py
. Does it appear in the Explorer pane? - Run the command
rm hello2.py
. What happened? What happened in the Explorer pane?
Knowledge check:
- Question: What is the keyboard shortcut for debugging/running your program?
- Question: How do you open an integrated Terminal without running a Python program?
- Question: How can you print the name of the current working directory in the integrated Terminal?
- Question: If you have a runaway process in the integrated Terminal, how do you cancel/kill it? (The answer is the same as for the regular Terminal.)
5 - 05. Debugging
You are getting the first edition of all these pages. Please let me know if you find an error!
5.1 - Terms and concepts
You are getting the first edition of all these pages. Please let me know if you find an error!
What is debugging?
Debugging is the process of comprehending how a program arrived at a particular state.
Errors are incorrect calculations or bad states of a program. An error occurs while the program is running. Errors show as bad output, crashes, and the like. Debugging is often about comprehending how you arrived at an error.
Defects are programming mistakes, logic flaws, or problems with design that could lead to errors. What did you do wrong? Defects are problems or mistakes, errors are the tangible result of running a program with a defect.
Colloquially, we conflate these two terms into the concept of a “bugs”, and hence the term “debugging”.“Bug” is an old term pre-dating computers, but Admiral Grace Hopper, who is largely responsible for us no longer programming in Assembly Language, popularized the term “bug” in computing after she found one in the Harvard Mark II computer:
What is program state?
You have no doubt used print()
statements to understand your program. Printing variables, or printing here
to see if a line executes is common. You are debugging using print statements.
Think about what these print
statements tell you. They tell you:
- What are the variable values at a point in time?
- Which lines of code are getting executed when?
These two pieces of information are the essence of debugging. Let’s formalize them:
- step: the program statement (often a single line of code) that was just executed.
- state of a program is comprised of:
- the variable values at the step.
- the call stack at the step. We will explain this in a moment.
Again, debugging is trying to understand how you arrived at a particular state (incorrect calculation, a crash).
Debugging from an exception
Let’s examine some debugging info. Do the following exercise to get set up.
Setup
- Use the Terminal to create a directory called
debugging-lab/
in the same place you are gathering all your code for this class. - Download
bad_math.py
and save it to thedebugging-lab/
directory. cd
into thedebugging-lab/
directory and runcode .
to start Visual Studio Code in that directory.- Select the
bad_math.py
file, then Run it WITHOUT DEBUGGING, either:- Go to the menu at the top and do
Run > Run without Debugging
- Right click in the editor and select
Run Python > Run Python File in Terminal
- Go to the menu at the top and do
- The program should crash with an error.
If the program crashes due to an exception, the stack trace will usually point you to the line of code that exploded:
There is a lot of useful information in this stack trace to start the debugging process.
It tells you that the error is in bad_math.py, line 4
and even shows you the offending line of code.
Don’t fix any bugs yet. We want them for the next lab.
The error is an IndexError: list index out of range
. So the program tried to execute numbers[i]
but likely i
was too big.
The other lines show the call stack, or the chain of function calls that are active in memory. In Python,the top-most function was called first, and the bottom-most function was called last (it is the reverse in Java):
- Line 30 of
<module>
called themain()
function. -<module>
represents the filebad_math.py
itself and any code in the file that is not in a function or class. - Inside
main()
on line 18,largest_number = find_largest(numbers)
was called. - Finally, inside
find_largest()
, the buggy line was called that generated the exception and crashed the program.
So the call stack is the chain of active functions that are waiting for something to be computed and returned. <module> -> main() -> find_largest()
, which errored out. Look at the code itself to confirm the chain of function calls.
Congratulations! You have found some essential debugging information: the step at which the error occurred and the call stack portion of the state. What key debugging information are you missing?
The variable values! Now go to line 4
. Add print(i)
and print(numbers)
right before that line to see what values i
and numbers
when the crash happens. That should give you a strong hint on what happened and how to fix it.
Don’t fix any bugs yet. We want them for the next lab.
Debugging is a process
A good software engineer follows a structured process. Use the exception message or your knowledge of the program to say, “Well, the problem could be this.” Form a hypothesis. Then add print
statements to help determine state around the problematic step. Try different input values to confirm your hypothesis.
Maybe you will discover your hypothesis is incorrect. No problem! Maybe the error is actually due to something earlier in the call stack. Move your print
statements up the stack and try again.
Whatever you do, build and refine your hypotheses. Do not just try something to see if it works. You may get lucky and fix the problem, but if you don’t understand the fix, how do you really know? You will also be doomed to make the same mistake again if you don’t understand what happened.
A better way?
You can debug just fine with print
statements, but managing them is tedious. You will also have times where it would be useful to pause execution of the program at a certain point say, on the first iteration of a loop.
You can get state with print
and control steps with code, but modern debugging tools will simplify this process while keeping your code clean.
We illustrate how to use Visual Studio Code’s debugger in the next lab.
Knowledge check
- Question: What two elements comprise the state of a program at a particular step?
- Question: Suppose you use a constant value that never changes in your program, like
pi = 3.14159
. Do you think the variablepi
is part of the program state? Why or why not? - Question: When do you see a stack trace? What information does it contain?
- Question: Explain the difference between an error and a defect. Give an example of a defect and its resulting error.
- Question: What information about the running program is contained in the call stack?
5.2 - The Visual Studio Code debugger
You are getting the first edition of all these pages. Please let me know if you find an error!
Debugging support tools have been around since the 70s. All modern IDEs, like Visual Studio Code, let you control the steps of program execution while showing the program state. Debugging tools, properly used, are much more efficient than print
statements.
Running the debugger
If you didn’t do it in the Debugging Basics lab, create a debugging-lab/
directory and download bad_math.py to it.
- Open the
debugging-lab/
directory and openbad_math.py
in an editor. - Run the program in debug mode by doing one of:
- Hit your
F5
key. - Select
Run > Start Debugging
from the top menu. - Click the Debug pane on the left sidebar, then the
Run & Debug
button.
- Hit your
- The first time you debug a file you will need to choose a debugger. Choose the
Python Debugger
suggested by Code. - You will now be prompted to select a debugging configuration. Choose
Python File: Debug the currently active Python file
.
The Visual Studio Code debugger should now launch. Notice that you are now in the Debugging pane of Visual Studio Code, which is accessible anytime from the left sidebar. This pane will open any time you Run a program with debugging.
You should see something similar to the following:
The bad_math.py
program should crash with an exception. Here are the essential elements you see:
- The editor shows the exception details in a red box. The yellow line and arrow mark the step the program was on when it crashed.
- These are the step controls. Visual Studio Code automatically paused on the step that caused the crash. More on the controls below.
- The variable pane shows the values of all variables in scope at the current step. Variable values are one part of the program state.
- The watch pane lets you isolate variables you want to monitor. Similar to the variable pane.
- The call stack is the other part of the program state. It shows the stack of function calls that arrived at the current step.
Using the step controls, hit either the blue “play” icon or the red “stop” icon. Stop will cancel execution and produce nothing, play will continue execution of the program, resulting in the exception printing in the Terminal (where the program is running) and the program will crash.
Breakpoints and stepping
The Visual Studio Code debugger will automatically break (pause) execution on steps that throw an exception. You can look at the variable pane and call stack to understand the state of the program and hopefully gain insight into what happened.
However, you will often want to break execution at step of your choosing, not just when an exception happens. Maybe want to see how a value was computed and what the variables were well before the crash happened. Or maybe your program doesn’t crash at all, but simply produces the wrong output.
You add breakpoints in the IDE to tell the debugger on which step(s) to pause execution. To set a breakpoint:
- Left click in the blank space to the left of the line number in the code editor. A red dot will appear to indicate the breakpoint. Set a breakpoint on
line 3
.- Click the breakpoint again to remove it.
- You can set multiple break points.
- You cannot set a breakpoint on a blank line of code.
- Run the program with Debugging from the
Run
menu or hitF5
. - The debugger will break (pause execution) on
line 3
or on whichever line you placed the breakpoint. - Use the step controls to control the execution of the program. All of these controls have a keyboard shortcut as well.
- - continue execution until the next breakpoint or the program ends.
- - Step Over the current line, which means evaluate the line and go to the next one.
- - Step Into the current line. Super the current line calls a function like
if my_fun(x) == True
, the debugger will step into themy_fun()
function and step through it. If you did step over, the debugger would evaluate the entire line including themy_fun()
call without pausing. - - Step Out of the current function. This will immediately complete all lines of the current function and pause at the line that called the current function in the call stack.
- - Restart the debugging on the program. Just like re-running it. All your breakpoints will be retained.
- - Stop the debugger without further execution of the code.
Notice that the variable pane, watch pane, and call stack update with each step. So now, using breakpoints and the step controls, you can precisely control the execution of the program to more methodically track down what is going on.
Adding a watch variable
The variables pane shows all variables in scope at each step. This set of variables can be overwhelming, and you often won’t care about most of the variables.
The watch pane lets you specify variables you want to watch specifically. To set a watch variable:
- Set a breakpoint and start debugging the program
- Either:
- Select the variable in the editor, then
Right Click > Add to Watch
; or - In the watch pane, click the + to
Add Expression
and type in the name of the variable, e.g., type the namelargest
.
- Select the variable in the editor, then
Now you will see your watched variables update as you step through the program. You can add as many watch variables as you like.
Conditional breakpoints
Using the watch pane helps you focus on what’s important as you refine your “what’s going on here?” hypothesis while debugging.
You will also find it useful to only have a breakpoint trigger under certain conditions. For example, you are reading file full of 10,000 hospital patient records and you figure out that the program crashes when it gets to the record belonging to “Alice St. John”. Unfortunately, Alice is record 342. You don’t want to set a breakpoint on the offending line and have to hit the Continue control 341 times to figure out what’s going on with Alice’s data.
Enter the conditional breakpoint, which is a breakpoint that only pauses execution when an expression you specify evaluates to True
. Try it with our bad_math.py
sample:
- Right click to the left of Line 3 and select
Add Conditional Breakpoint
- A textbox will appear with
Expression
on the left. Typelargest == 12
in the textbox. - Now hit the Continue control. The conditional breakpoint will only pause when
largest == 12
.
Conditional breakpoints are extremely useful for refining your hypothesis as to what’s going on. Note you can enter any Python expression that evaluates to True
or False
, for example:
largest == 12 and i < 8
largest >= 5
Starting with vs. without debugging
When running your program, you have the option to Start with Debugging
or Start without Debugging
. What’s the practical difference?
Starting without debugging will not pause on breakpoints or exception, nor will variable values be tracked. Running without debugging will not affect any breakpoints or watch variables you have set – it just doesn’t update them.
Starting with debugging will do everything we showed, but significantly slows down the execution time of your program.
Exercise
There are 4 bugs present in the initial bad_math.py
that can be triggered based on which value the numbers
variable has. The various calls to main()
at the bottom of the file are sufficient to reveal all the bugs.
Find and remove them. There are multiple ways to squash the bugs. You may squash two bugs at once depending on how you fix the first bug that causes the exception we have seen in our examples.
Knowledge check
- Question: How do you start running a program in debug mode in Visual Studio Code?
- Question: How do you add a variable to the watch list from the editor view?
- Question: How do you set a conditional breakpoint that pauses when
x
evaluates toFalse
? - Question: What is the difference between
Step Over
andStep Into
in terms of the next step of execution?
Additional resources
- The official Debugging in Visual Studio Code documentation.
- Some simple coding errors in Python you can practice with in the debugger.
5.3 - More practice
You are getting the first edition of all these pages. Please let me know if you find an error!
Use these files to practice your debugging skills with the Visual Studio Code debugger. Look for the keyword BUG
in the files on how to expose the error.
6 - 06. Testing
You are getting the first edition of all these pages. Please let me know if you find an error!
Testing is integral to all forms of engineering. Software developers often write as much test code as they do product code! This set of labs introduces testing concepts and automated testing.
6.1 - Assertions
You are getting the first edition of all these pages. Please let me know if you find an error!
Software testing is both a manual and an automated effort.
Manual testing is when a tester (or user) enters values into the user interface and checks the behavior of the system.
Automated testing is where test code is used to check the results of the main product code. Automated testing is an essential part of program verification, which is an evaluation that software is behaving as specified and is free from errors.
Automated testing is a necessity in real systems with thousands of lines of code and many complex features. Manual testing is simply infeasible to do thoroughly.
Code that verifies code?
Automated testing in this case means writing code. Developers and testers write code and scripts that executes and tests some other code.
Exercise
- Create a directory named
testing-lab
in yourseng-201/
directory. - Download
sample.py
and put it in thetesting-lab/
directory. - Open it in Visual Studio Code, and run it.
The function calls in the __main__
section of code are a semi-automated test. The calls are automated, but the verification is still manual – you, the developer, have to verify that the output is indeed correct.
To have automated testing, we need a programmatic indicator of correctness. Enter the assert
statement.
The assert
statement
Nearly all programming languages have an assert
keyword. An assertion checks if a value is True
or False
. If True
, it does nothing. If False
, the assert
throws a special type of exception. Assertions are commonly used in languages like C and Ada to verify that something is True before continuing execution.
In most modern languages, including Python, the assert
is the basis of automated testing.
Exercise
Let’s explore the assert
in Python.
- Create a new file named
test_sample.py
in thetesting-lab/
directory. Edit the file in Visual Studio Code. - Add the following code:
test_sample.py
assert True assert False print("Made it to the bottom.")
- Run
test_samply.py
. Notice the following.assert True
does not produce any output. The program simply continues.assert False
generates an exception. This is expected.- The
print(...)
statement did not execute because the exception generated byassert False
crashed the program.
- Comment out the
assert False
line and run it again. Theprint(...)
statement will execute.
This demonstrates the behavior of assert
. Let’s add some more interesting assertions. Add the following lines to the bottom of test_sample.py
:
test_sample.py
x = 2**5
assert x == 32
assert type("Bob") == str
y = 16
assert x-y==16 and type("Bob") == str and int("25") == 25
print("Made it to the bottom.")
The right-hand side of the assert
statements now use comparisons and boolean operators. This looks a bit more realistic. The assert
can have any simple or complex Boolean expression so long as it evaluates to True
or False
.
Quick Exercise: Change the operators or values in the expressions so they evaluate to False
. Notice how the last assert
can fail if any of the comparisons are false.
We’ll put our assertions to work testing program code in the next lab.
Knowledge check
- Question: What two things are you trying to verify with program verification?
- Question: Why do we need automated testing?
- Question: What happens next if a Python program encounters the statement
assert True
? - Question: What happens next if a Python program encounters the statement
assert False
? - Question: What happens when the following executes:
assert 16 == 2**4
? - Question: What happens when the following executes?
assert len('Bob') > 0 and 'Bob' == 'Alice'
6.2 - Unit testing
You are getting the first edition of all these pages. Please let me know if you find an error!
Assertions are the basis of modern automated testing. Developers write test code in source files that are separate from the main program code. We have our program code in sample.py
and the test code will be in test_sample.py
. This is a common naming convention.
In practice, the test code will be kept in a separate directory from the program code.
Testing sample.py
Now, let’s use our assert
to test the correctness of the functions in sample.py
.
- Comment out all the code in
test_sample.py
- Add the line
import sample
. In Python, this makes the content ofsample.py
accessible to code intest_sample.py
.1 - Now let’s convert those
print(...)
statements fromsample.py
intoassert
statements intest_sample.py
.test_sample.py
should now have the following:test_sample.py
import sample # We import the filename without the .py assert sample.palindrome_check("kayak") # the function should return True, giving "assert True" assert sample.palindrome_check("Kayak") assert sample.palindrome_check("moose") is False # the function should return False, giving "assert False is False", which is True assert sample.is_prime(1) is False assert sample.is_prime(2) assert sample.is_prime(8) is False assert sample.reverse_string("press") == "sserp" # checking result for equality with expected assert sample.reverse_string("alice") == "ecila" assert sample.reverse_string("") == "" print("All assertions passed!")
Point 1: We access the functions in sample.py
by calling, e.g., sample.palindrome_check(...)
. The prefix sample.X
tells Python “go into the sample
module and call the function named X.” We would get an error if we called only sample.palindrome(...)
because Python would be looking in the current running file, which has no such function defined in it.
Point 2: In Python, you should check if a value is True
or False
using is
. The is
operator returns a boolean. You could also type x == True
or x == False
. Either form will work, but is
is preferred2.
Point 3: Remember that palindrome_check()
and is_prime()
return True/False themselves. We are simply verifying that they are returning the correct value. reserve_string()
returns a string value, so we need to compare using ==
to an expected value.
Point 4: The program will crash with an AssertionError
if any of the assert
statements are False
. Mess up one of the assertions to verify this.
Exercise
- Go to
sample.py
and define a function namedpower()
that takes two parameters,x
andy
, and returns the computed result ofxʸ
. - Add
assert
statements totest_sample.py
to verify your function behaves correctly.
Unit tests
The file test_sample.py
is what software engineers call an automated unit test. Unit tests test individual an individual classes or source files3. Unit tests are usually written by the same developer who wrote the program code.
Our automated unit test now calls functions and use assert
statements to verify that they are returning the expected results. If an assertion fails, the test fails.
What does it mean if a test fails? One of two things:
- Either there is something wrong in the program code. Maybe there is a logic error.
- The test code itself has a mistake in its logic.
Regardless, if a test fails, you need to figure out why. A good unit test will systematically exercise all the logic of the function or module under test. This can help uncover flaws in the program code. We will discuss strategies to do this in subsequent lessons.
We also need a way to run the test code and accumulate the results in a useful way. We will do this in the next lab.
Knowledge check
- Question: Suppose you wanted to test a function named
get_patient_priority(str)
inhospital.py
. What would you have to do to call the function from your test code? - Question: The right hand side of an
assert
statement can be any expression (simple or complex) as long as it evaluates to _____ or _____. - Question: Who writes unit tests?
- Question: The name for a test that tests an individual module is a ______ test.
- Question: Why do you think we write separate
assert
statements for each function input, rather than oneassert
statement that calls the function multiple times with different inputs? That is, why not doassert sample.reverse_string("alice") == "ecila" and sample.reverse_string("") == ""
?
In Python parlance, a single file is called a module. You can create complicated modules that are collections of multiple source files. This is how many popular Python libraries like
random
work, as do third party libraries likepytorch
andkeras
used for machine learning. It is a way to bundle functions and classes for convenient use in source code. ↩︎If you are dying to know the difference between
x is False
andx == False
, it’s because many other values are equivalent to True and False when using==
. Empty values, such as0
or[]
are== False
(try it). But onlyFalse is False
. Similarly, onlyTrue is True
, but1 == True
. ↩︎The unit is usually a single class. However, in our case, there is no class, but a collection of functions in a file. Some people treat a file as a unit. But a file can have multiple classes in it. The definition of a unit is a bit fuzzy, but usually refers to either a class or a single file. ↩︎
6.3 - Structuring test code
You are getting the first edition of all these pages. Please let me know if you find an error!
Limitations to the current approach
In the previous lab, we gathered our assert
statements into a test file that can be run. If the test file runs to completion, our tests have passed. If it fails with an AssertionError
, we know that a test has failed and something is wrong (either with the program code or the test code itself). We have the beginnings of automated unit testing.
Our current goal
What we have so far is a good start, but we have two things to improve upon:
- Currently, we can only fail one
assert
the test file at a time. Ideally, we would like to know if multiple test cases are failing. - We would like to collect our test results in a human-friendly format. I run the test, I get a summary of passes and fails.
We can accomplish these both these things. First, we need to organize our test cases in our test file. Second, we will need help from developer tools.
Current state
Here is our sample.py
file:
sample.py
def palindrome_check(s):
cleaned_str = ''.join(s.lower())
return cleaned_str == cleaned_str[::-1]
def is_prime(n):
if n <= 1:
return False
for i in range(2, int(n**0.5) + 1):
if n % i == 0:
return False
return True
def reverse_string(s):
return s[::-1]
And here is the test code:
test_sample.py
import sample # We import the filename without the .py
assert sample.palindrome_check("kayak") # the function should return True, giving "assert True"
assert sample.palindrome_check("Kayak")
assert sample.palindrome_check("moose") is False # the function should return False, giving "assert False is False", which is True
assert sample.is_prime(1) is False
assert sample.is_prime(2)
assert sample.is_prime(8) is False
assert sample.reverse_string("press") == "sserp" # checking result for equality with expected
assert sample.reverse_string("alice") == "ecila"
assert sample.reverse_string("") == ""
print("All assertions passed!")
Remember, we use the naming convention test_<file>.py
to identify the unit test for <file>.py
.
Organizing test code into test cases
To meet our goal, we will first organize our assert
statements into test cases, which has a conceptual and a literal definition:
- test case (concept): inputs and expected results developed for a particular objective, such as to exercise a particular program path or verify that a particular requirement is met. [Adapted from ISO/IEC/IEEE 24765].
- test case (literal): a test function within a test file.
Let’s start simple. Let’s move the assert
statements that test each function into their own functions in the test file like so:
test_sample.py
import sample # We import the filename without the .py
def test_palindrome():
assert sample.palindrome_check("kayak") # the function should return True, giving "assert True"
assert sample.palindrome_check("Kayak")
assert sample.palindrome_check("moose") is False # the function should return False, giving "assert False is False", which is True
def test_is_prime():
assert sample.is_prime(1) is False
assert sample.is_prime(2)
assert sample.is_prime(8) is False
def test_reverse():
assert sample.reverse_string("press") == "sserp" # checking result for equality with expected
assert sample.reverse_string("alice") == "ecila"
assert sample.reverse_string("") == ""
# run the test cases when executing the file
if __name__ == "__main__":
test_palindrome()
test_is_prime()
test_reverse()
We say now that each of test_palindrome()
, test_is_prime()
, and test_reverse()
is a test case. We have three (3) test cases in one (1) unit test file.
Note the naming convention: all the test case functions begin with the string test_
. This is a requirement of the developer tool in the next lab that will help us run multiple test cases even if one of them fails.
The block beginning with if __name__ == "__main__":
allows us to run the tests by running the file. You should not see any output when you run the unit test because all of these assert
statements should evaluate to True.
Diversifying our test cases
One test case for each function in your program code is where you should start. However, we often want more than one test case per program code function. Why?
Consider why we have multiple simple assert
statements. Suppose we have the following valid assertion: assert sample.is_prime(1) is False and sample.is_prime(2)
. Now, suppose this assertion failed due to a bug in our program code. The bug could either be with the logic of dealing with the input 1
or 2
. We put our checks in separate assert
statements so we know precisely which input caused an error in the program code.
The same strategy applies when unit testing program code.
Program paths
A program path is a sequence of instructions (lines of code) that may be performed in the execution of a computer program. [ISO/IEC/IEEE 24765] Take a look at is_prime()
in sample.py
:
|
|
Program paths are formed by the unique sequence of instructions (lines of code) that may be executed. is_prime()
has three unique program paths:
- Giving the input
1
executes lines 5, 6 and 7. This path (5,6,7) deals with special cases where our input is ≤ 1. One (1) itself is not prime, and neither are 0 or negative numbers by definition. - Giving the input
4
executes lines 5, 6, 8, 9, and 10. This path (5,6,8,9,10) accounts for numbers > 1 that are not prime. - Giving the input
5
will execute lines 5, 6, 8, 9 and 11. This path (5,6,8,9,11) accounts for numbers > 1 that are prime. The input3
is a special case of this that does not include line 8.
Path testing
Let’s group assert
statements that test “a particular program path” or “a particular requirement” (see the test case definition) into separate test cases. Change test_is_prime()
to the following:
test_sample.py
def test_is_prime():
assert sample.is_prime(2)
assert sample.is_prime(8) is False
assert sample.is_prime(2719)
assert sample.is_prime(2720) is False
def test_is_prime_special_cases():
assert sample.is_prime(1) is False
assert sample.is_prime(0) is False
assert sample.is_prime(-1) is False
These test cases both verify is_prime()
but examine different program paths.
test_is_prime_special_cases()
tests path #1 (previous subsection). We know something is wrong with the part of our algorithm that handles the special case of integers ≤ 1.
test_is_prime()
tests paths #2 and #3. WE know something is with the part of the algorithm that checks if the input is divisible by a potential factor if that test case fails.
The ability to pinpoint where the algorithm is failing is very useful to the developer when they go to debug. Especially when you have many test cases and hundreds of lines of program code.
Some functions only have one program path, and so one test case may be sufficient.
Your testing strategy
Writing separate test cases for each program path or requirement is a testing strategy. But, it can be hard to know how much to identify the program paths or to know how many tests are “enough”.
For now, start with one test case per program function.
Then ask yourself, “are there sets of input where the program behaves differently than for other inputs?” If so, divide your test case to separate those input sets. In is_prime()
, the program behaves differently if you give it inputs ≤ 1 vs. inputs > 1 that are prime vs. inputs > 1 that are not prime.
We will discuss how to analyze a program to create a good test strategy in future lessons, as well as quantify how good our tests are.
Exercise
Our test_is_prime()
has lumped together the program paths where the number is prime and the number is not. Reorganize this test into two test cases: one for each program path. Write one test case asserting only prime numbers ≥ 1, and the other only non-prime numbers ≥ 1.
Knowledge check
- Question: In test code, a single function is called what?
- Question: How many program paths will a function with a single
if-else
statement have? - Question: What is a program path?
- Question: Conceptually, what is a test case?
- Question: Besides generally being more organized, why do software developers want to split up their tests into multiple test cases?
- Question: Suppose you have a program file that defines the functions
foo()
andbar()
. How many test cases should you have at a minimum in your test code? What should they be named?
6.4 - pytest
You are getting the first edition of all these pages. Please let me know if you find an error!
Test frameworks
We have created a well-organized unit test in the previous lab. Our test code is looking good, but we still need to address two issues for it to be truly useful:
- We would like to know if multiple test cases are failing.
- We would like to collect our test results in a human-friendly format.
Automated test frameworks address these find and execute test code (often through naming conventions like test_*
), capture assertion exceptions (test case failures), and generate summaries of which tests pass and fail.
Automated test frameworks are an integral part of modern software engineering.
Introducing pytest
We will use an automated test framework for Python called pytest. Test frameworks are language-specific. Java has JUnit, C++ has CPPUnit, JavaScript has multiple options, etc. Automated test frameworks exist for nearly every programming language and do largely the same things.
pytest
is a library. Libraries are source code or compiled binaries that provide useful functions. They are almost always written in the same programming language as the program code. Professional software engineers use third-party libraries, often open source, to provide functions that they would otherwise have to write themselves.
In our case, we could write some try-except
blocks to catch our assertion exceptions, create counters to track the number of tests passed or failed, and then print out the results. But why do that when we can use a library? No sense in reinventing the wheel.
Installing pytest
with pip
We install pytest
and another tool we will use later from the CLI. Choose your operating system below and follow the instructions:
pip3 install -U pytest pytest-cov
- Run these commands first from your Terminal:
sudo apt update -y && sudo apt upgrade -y sudo apt install python3-pip python3-venv # Make sure your working directory is the directory the test files are in. python3 -m venv .venv # This will create a subdirectory named .venv/
- Open Visual Studio Code in the working directory. It is essential that your
testing-lab/
directory is the top-level of Visual Studio Code. - Press Ctrl+Shift+P or select View-Command Palette
- Search for “environment” and select Python: Create Environment…
- Select Venv
- Select Use Existing
- The integrated Terminal in Visual Studio code should restart, and you should see a little
(.venv)
at the beginning of the command line. Contact the instructor if you do not. - You will run all subsequent Terminal commands from the integrated Terminal in Visual Studio Code.
- From the integrated terminal, run
pip install pytest pytest-cov
What is pip
? It is basically the App Store for Python packages. A package contains one or more libraries or executable tools. pip
was included when you installed Python on your computer. We will use pip
again to install useful packages in future labs.
Running test code with pytest
You should have a testing-lab/
directory containing sample.py
and test_sample.py
. If not, grab the files from the previous lab Change into the testing-lab/
directory so that it is the working directory in the terminal.
Run pytest test_sample.py
in the terminal. You should see console output similar to the following:
collected 3 items
test_sample.py ... [100%]
================ 3 passed in 0.01s =================
pytest
scans your test file looking for functions that follow the naming convention test_<function_name>
and “collects” them. I had three test case functions in my code, but you may have more or less, so your “collected” number may be different. Test case function names must start with test_
for pytest
to run them.
pytest
then calls each test case separately and checks to see if the test case throws an AssertionError
. If so, the test case fails. If not, the test case passes
Let’s introduce errors in our program code sample.py
to show pytest
collecting multiple test case failures, which is one of our improvements needed for automated unit testing.
Open sample.py
and make the following changes:
|
|
Now run pytest test_sample.py
again. Your output should now look something like this:
collected 3 items
test_sample.py FF. [100%]
======================================================================= FAILURES =======================================================================
___________________________________________________________________ test_palindrome ____________________________________________________________________
def test_palindrome():
assert sample.palindrome_check("kayak") # the function should return True, giving "assert True"
> assert sample.palindrome_check("Kayak")
E AssertionError: assert False
E + where False = <function palindrome_check at 0x1023494e0>('Kayak')
E + where <function palindrome_check at 0x1023494e0> = sample.palindrome_check
test_sample.py:5: AssertionError
____________________________________________________________________ test_is_prime _____________________________________________________________________
def test_is_prime():
> assert sample.is_prime(1) is False
E assert True is False
E + where True = <function is_prime at 0x1023493a0>(1)
E + where <function is_prime at 0x1023493a0> = sample.is_prime
test_sample.py:9: AssertionError
=============================================================== short test summary info ================================================================
FAILED test_sample.py::test_palindrome - AssertionError: assert False
FAILED test_sample.py::test_is_prime - assert True is False
============================================================= 2 failed, 1 passed in 0.03s ==============================================================
We can see at the nice human-friendly summary at the end that 2 failed and 1 passed. The names of the test cases that failed are printed, as are the exact assert
calls that failed.
Other ways of running pytest
- You can run
pytest
without giving it a target file.pytest
will scan the working directory looking for files with the naming conventiontest_<file>.py
. It will collect and run test cases from alltest_<file>.py
it finds. - Try running
pytest --tb=line
to get a condensed version of the results if you find the output to be overwhelming.
Recap
We accomplished a couple significant things in this lab:
- We installed the
pytest
package usingpip
. Again, you only need to do this once. - We ran
pytest
, which scans for files and functions namedtest_*
and runs them. pytest
collects test case successes and failures independently from one another, allowing us to get more information with each run of our test code.pytest
displays a summary of the results in human-friendly format.- All popular programming languages have a test framework. You will need to seek out one for the language you are working in.
Knowledge check
- Question: The Python tool we run to install Python packages is called _______.
- Question: For
pytest
to find and execute tests automatically, the test files and test cases must begin with __________. - Question: (True/False) You can have multiple
assert
statements in a single test case? - Question: Create a file called
math.py
with the following function:def compute_factorial(n): if n < 0: return "Factorial is not defined for negative numbers." elif n == 0 or n == 1: return 1 else: factorial = 1 for i in range(2, n + 1): factorial *= i return factorial
- Create a test file.
- Implement one or more test cases that cover all program paths in the function.
- Use
pytest
to execute your test code.
6.5 - Testing for exceptions
You are getting the first edition of all these pages. Please let me know if you find an error!
Before you start
If necessary, fix up your sample.py
so that all your test cases pass.
Testing for exceptions
Sometimes, the expected behavior of a function is that it throws an exception. How do we test for expected exceptions given an input?
Suppose we want reverse_string()
to work only for strings containing the letters [a–z] and to throw an exception if the string contains any other characters. Change reverse_string()
in sample.py
to the following:
|
|
This is appropriate given the requirements of reverse_string()
. It returns a reversed str
input under normal circumstances, but raises an exception under abnormal circumstances, a.k.a., exceptional conditions from our problem statement structure.
“Raising” and “throwing” an exception are the same thing. You will hear both terms in practice. The keyword in Python is raise
, and exceptions in Python always end with the string Error
, e.g., ValueError
or IndexError
.
Exercise
- Define a new test case in
test_sample.py
namedtest_reverse_exception
and add a call tosample.reverse_string
with an input that will trigger the exception. - Run
pytest
. You should see a test summary similar to the following:
================================= short test summary info =================================
FAILED test_sample.py::test_reverse - ValueError: letters a-z only
FAILED test_sample.py::test_reverse_exception - ValueError: letters a-z only
=============================== 2 failed, 2 passed in 0.06s ===============================
I have two test failures: the new test case I created, and the original test_reverse
. This is because test_reverse
in my code contains the call assert sample.reverse_string('')
. The empty string does not consist of the letters [a–z], so an exception is correctly raised.
This is an important lesson: as program code evolves, so too might the test code. Move the assert sample.reverse_string('')
to the test_reverse_exception
test case where it logically belongs.
Your test cases for reverse_string
should now look something like this:
|
|
Verifying expected exceptions with pytest
Our assert
statements only check the return values of functions. pytest
provides a convenient helper function to check if an exception was raised.
First, add the line import pytest
to the top of your test code file test_sample.py
.
Second, change test_reverse_exception
to the following:
|
|
A few things of note:
pytest.raises(...)
requires that you specify the type of exception. In our case, we expect aValueError
to be raised.- We can optionally capture the exception itself. That’s what
as err
does on line 22.err
is a variable (name it whatever you want) that captures the exception. - On line 24, we can call
str(err)
to convert the exception to a string. That error message should be"letters a-z only"
, which comes from the lineraise ValueError('letters a-z only')
insample.py
.
This test case would fail if reverse_string()
did not raise an exception
Exercise
- Comment out the if-statement and exception raising lines in
reverse_string()
and rerunpytest
. How does the pytest output for an expected exception differ from a failedassert
?
Checking exception values
Checking the exception message is useful because we may want our function to raise ValueError
s under different circumstances. For example, maybe we want to raise a ValueError for the empty string that says ‘string cannot be empty’, and a different ValueError for letters a-z only
.
Why would you want to raise two different ValueErrors? Because it tells the caller of reverse_string()
what they did wrong and how to fix it. It’s similar rationale to why we split our assert
statements and our test cases into multiple instances to get more precise info.
Exercise
- Put the if-statement and exception raising back in
reverse_string()
. Add an if-statement at the beginning of the function to check if the input parameter is the empty string. If so, raiseValueError('string must not be empty')
. Re-runpytest
. What happens? - Modify your
test_reverse_string
so that bothwith pytest.raises(...)
calls capture the error as in line 22. Add/modifyassert
statements to verify that the appropriate error message is in the exception.
Recap
We accomplished a couple significant things in this lab:
- We installed the
pytest
package usingpip
. Again, you only need to do this once. - We ran
pytest
, which scans for files and functions namedtest_*
and runs them. pytest
collects test case successes and failures independently from one another, allowing us to get more information with each run of our test code.pytest
displays a summary of the results in human-friendly format.
Knowledge check
- Question: (True/False) Raising and throwing exceptions are two different things.
- Question: Why should you not exception logic in the same test case where you test “normal” logic?
- Write a code block using
pytest
that checks that thedetermine_priority(str)
function correctly throws aTypeError
when passed anything other than a string. - Question: What happens when running
pytest
and the program code raises an exception that you do not expect?
6.6 - Test coverage
You are getting the first edition of all these pages. Please let me know if you find an error!
Before you start
You must have completed the lab on Testing for exceptions.
Motivation
Software engineers need some measure of the quality of the tests they write. This is not a simple question to answer.
- Does a good test find bugs? Hopefully, but also, we should be writing our code to not have bugs!
- Do we count how many lines of test code we have? Is it more than source code? Maybe, but that doesn’t mean we are testing the right things.
- Do our tests check independent things in the code? How can we determine that automatically if so?
Measuring test case quality is not straightforward, but there is one generally agreed-upon measure used as a baseline: test coverage.
Test coverage
Test coverage is a measure of how much of source code is executed when the tests run. There are three measures of “how much”:
- Line coverage or statement coverage is the percentage of source lines of code executed by your test cases. We do not include test code lines when counting the percentage of code.
- Branch coverage is the percentage of program paths executed by your test cases.
- Conditional coverage is the percentage of Boolean conditions executed by your test cases.
Consider the following (very poorly designed and implemented) code snippet:
|
|
Now consider the following test case:
def test_authorize():
assert my_module.authorize(True, "bob", "privileged") is True
- This test case has 100% line coverage because all lines of code are executed.
- This test case has 50% branch coverage because only one program path is executed: the path where the
if-statement
evaluates to True. - This test case has 33% conditional coverage because only one boolean conditional is checked (
is_authenticated is True
), but the other expressionsuser_id.startswith('admin')
andcaller == privileged
are not.
Line coverage is the least precise, and conditional coverage is the most precise.
Test coverage is computed over the union of all source lines, branches, and conditions executed by our test cases. So we can easily write additional test cases that, collectively, reach 100% statement, branch, and condition coverage.
You want to target 100% condition coverage, but achieving 100% of any coverage can be challenging in a real system. Exception handling and user interface code in complex systems can be hard to test for a variety of reasons.
In practice, most organizations aim for 100% line coverage as a target.
Using pytest-cov
to compute test coverage
Most test frameworks, like pytest
and Junit
(for Java), also have tools for computing test coverage. Manually computing these measures would be too tedious. These tools compute line coverage, but not always branch coverage, and almost never condition coverage because of the technical challenges of automating that calculation.
We installed the pytest-cov
tool when we installed pytest
. Refer to the instructions for installing pytest and pytest-cov
Open a Terminal in the directory where you were working on your unit testing examples. Run the following:
Running pytest-cov
Run the following command from your Terminal in the directory with sample.py
and test_sample.py
from the previous labs.
pytest --cov .
- This tells pytest to run tests in the current directory, .
, and generate the coverage report. You should see something similar to the following:
============================================================= test session starts ==============================================================
platform darwin -- Python 3.12.2, pytest-8.3.3, pluggy-1.5.0
rootdir: /Users/laymanl/git/uncw-seng201/content/en/labs/testing/coverage
plugins: cov-5.0.0
collected 4 items
test_sample.py .... [100%]
---------- coverage: platform darwin, python 3.12.2-final-0 ----------
Name Stmts Miss Cover
------------------------------------
sample.py 23 6 74%
test_sample.py 23 3 87%
------------------------------------
TOTAL 46 9 80%
============================================================== 4 passed in 0.03s ===============================================================
pytest
executes your tests as well, so if any tests fail, you will see that output as well. Note that failing tests can lower your test coverage!
- The general format for the command is
pytest --cov <target_directory>
- To get branch coverage, run the command
pytest --cov --cov-branch <target-directory>
Generating a coverage report
You can also generate an HTML report with pytest --cov --cov-branch --cov-report=html <target-directory>
. This will create a folder named htmlcov/
in the working directory. Open the htmlcov/index.html
file in a web browser, and you will see an interactive report that shows you which lines are and are not covered.
Knowledge check
- Test coverage is a measure of how much _________________ is executed when the __________________ runs.
- Explain the difference between branch coverage and conditional coverage.
- Give an example of a function and a test case where you have 100% branch coverage but <100% conditional coverage.
- (True/False) Branch coverage is more precise than statement coverage.
7 - 07. Comprehensive example
You are getting the first edition of all these pages. Please let me know if you find an error!
We have covered quite a bit. Let’s go through an example from problem statement to implementation to test using what we’ve learned so far.
We’ll start with this high-level description of the problem:
You are tasked with writing a program that can read in a text file where each line has the name of a species of bird. Your program needs to count the number of times each species appears. An example of the input is below. Ask the user to type in the name of the file they wish to be processed.
White-eared Hummingbird Townsend's Solitaire Townsend's Solitaire Yellow-fronted Canary Chestnut-fronted Macaw
Your program must handle any text file in this format.
Crafting a problem statement
An ounce of planning is worth a pound of programming.
Implementation
We’ll start by doing the simplest thing that meets the requirements of the problem description.
Sample input files:
Most programs and their constituent functions can be thought of as having three parts: (1) read the input, (2) compute something, (3) generate output.
Testing
Time to test using pytest
using what we learned from the pytest lab and testing for exceptions.
Reworking the code to be testable
We cannot test our code as it. We need to reorganize for testability. User interface code is difficult to unit test.
Writing pytest code
Finally time to test. When you write test cases and assertions, you are checking the actual computed result against the expected result for a given input.
Recap
We went from high level problem description, to problem statement, to an initial implementation, to reorganizing our code to be testable, to finally writing our tests.
You need to become comfortable with all these steps!
Ending files:
8 - 08. Control Flow Graphs
You are getting the first edition of all these pages. Please let me know if you find an error!
Motivation
We often use the term trace when we are trying to think about how a program is executing. Remember the stack trace?
When you try to understand code, you often stare at the lines and trace through what is happening. Sometimes, tracing can be quite difficult if you have many if-statements, loops, or function calls.
Wouldn’t it be useful if we could represent our code graphically to facilitate this tracing? That is exactly what the control flow graph helps us to do. These graphs can help us to understand what our code does, and also gives us a powerful analysis tool for designing test cases as well as many other applications in computer science.
Definition and uses
A control-flow graph (CFG) is a representation of all program paths that might be traversed through a program during its execution. A program path is a sequence of execution steps like we learned about in debugging.
The Rust Project Developers (Apache License 2.0 or MIT), via Wikimedia Commons
Frances (Fran) Allen was an IBM Fellow who devised the concept of control flow graphs in the 1960s. In 2006, she became the first woman to receive the Turing Award for her contributions to computer science.
Rama, CC BY-SA 2.0 FR, via Wikimedia Commons
Formal definition
(Credit to David Liu and Mario Badr for this section’s content).
Control flow graphs represent different blocks of code. A basic block is a sequence of non-compound statements and expressions in a program’s code that are guaranteed to execute together, one after the other.
Here are some examples and non-examples of basic blocks:
# A single statement is a basic block.
x = 1
# A sequence of multiple statements and function calls is a basic block.
x = 5
y = x + 2
z = f(x, y)
print(x + y + z)
# A basic block can end with a return or raise statement.
x = 5
y = x + 2
return f(x, y)
# But a sequence of statements with a return/raise in the middle is
# NOT a basic block, since the statements after the return/raise aren't
# going to execute.
x = 5
return x
y = x + 2 # Will never execute!
# An if statement is not a basic block, since it is a compound statement.
# The statements it contains aren't guaranteed to execute one after the other.
if x > 5:
y = 3
else:
y = 4
Typically we treat basic blocks as being maximal, i.e., as large as possible. So if we have a sequence of assignment statements (x = 5
, y = x + 2
, etc.), we treat them as one big block rather than consisting of multiple single-statement blocks.
Now let’s look at that if statement example in more detail. We can divide it up into three basic blocks: one for the condition (x > 5
), then one for the if branch (y = 3
) and one for the else branch (y = 4
). We can now formalize this idea, and extend it to other kinds of control flow statements like loop.
Formally, a control flow graph (CFG) of a program is a graph \(G = (V,E)\) where:
- \(V\) is the set of all (maximal) basic blocks in the program code, plus one special element represent the \(end\) of a program.
- \(E\) is the set of edges, where:
- There is an edge from block \(b_1\) to block \(b_2\) if and only if the code in \(b_2\) can be executed immediately after the code in \(b_1\).
- There is an edge from block \(b\) to the special \(end\) block if and only if the the program can stop immediately after executing the code in block \(b\). This occurs if there is no code written after \(b\), or if \(b\) ends in a
return
orraise
statement.
Building a CFG
Here are the rules:
When you draw a node, you will write either the actual statements or the line numbers inside the rectangle.
Decision nodes: Draw as a diamond or a highlighted rectangle. These are blocks that either (a) transfer control by performing a
function_call()
, or (b) make a decision withif-else
,try-except
for
, orwhile
. You do not create a decision nodes for built-in functions likeprint()
orinput()
. Atry-except
block is a decision node on thetry
; theexcept
blocks are regular nodes (usually).Regular nodes: Draw as a rectangle. These are blocks code that executes in sequence without jumping. You group multiple lines of code together into one regular node when they execute in sequence.
End node: Draw two concentric circles with the inner one filled-in. This represents the “end” of the control flow that you are modeling. It does not represent a line of code.
Edges: Draw a line with an arrow at the end to represent the control flow passing from one node to another.
- Regular nodes will have a single incoming edge and a single outgoing edge indicating program control flows in and out of the code block.
- Decision nodes will have a single incoming edge. They will have either two outgoing edges in the case of
if-else
,for
, andwhile
statements or one outgoing edge if afunction_call()
that activates a new function. Label the outgoing edge(s) of the decision node with the function_call() or the condition, e.g.,x < 0
orx >= 0
. - For
try
nodes, you have a single incoming edge. You have one outgoing edge to the internal nodes of thetry
, and one outgoing edge to eachexcept
andfinally
block. - The end node can have many incoming edges, and will have no outgoing edges.
We can model a CFG for an entire program, a selected block, or individual functions. CFGs can get lengthy quickly, so you are best off working with separate, small functions.
Example
Let’s start with a simple code snippet:
|
|
We will use line 1
def check_number(x):
as our start point. It is a regular node because no decision is made. Draw a rectangle at the top of a sheet of paper. Write ether the line number or the entire line of code inside the node.Below the first node, draw a diamond or highlighted rectangle box to represent a decision node for line 2. Decision nodes are used when you encounter
if-else
,for
, orwhile
loops or a call to a user-definedfunction()
. Draw an edge connecting the first node to the second.Draw a regular node for line 3 as a rectangle next to the line 2 node. Regular nodes represent blocks of code (in this case only one line) that executes in sequence with no decisions or calls to other functions. Draw an edge from line 2 to line 3 and label it with the condition that transfers control to line 3.
Draw another regular node representing line 5 below the line 2 node. Draw an edge from line 2 to 5 and label it with the condition that transfers control to line 5.Note that we DO NOT draw a node for the
else
on line 4. It is a part of theif
decision node on line 2. However, if we haveif-elif
, we would draw another decision node. We are just capturing theif
comparisons in our graph.Finally, we need an end node to indicate the end of the program paths. Draw two concentric circles below the other nodes. Connect lines 3 and 5 to this end node. This node does not represent a line of code, but indicates the end of the execution we care about.
Now we have a CFG for a very simple block of code. Tracing the execution of the program becomes a matter of tracing your pen through the nodes and, when you reach decision nodes, determining how the variables values determine the flow of control.
One of the most important features of a CFG
Identifying unique program paths
One of the most important uses of a CFG is that it enables us to identify all the unique program paths in the code. Again, a program path is a sequence of execution steps like we learned about in debugging.
Question: Can how many unique program paths are indicated by the CFG? What are they?
To answer this question, you trace the set of nodes executed during a single “run” of the code block. A path is the set of nodes executed. Note that we have a decision node (line 2). So when the program executes, we have to choose a path, either going through 3 or 5 because the program makes a choice based on the value of x
.
So the answer, then, is there are two unique program paths:
- The path (1,2,3)
- The path (1,2,5)
Why do we care about the unique program paths? Because we can measure how good our unit tests are based on the number of unique program paths covered. So, our goal becomes to design our test cases so that the set of tests hits every unique program path. Sometimes this is easier said than done. Test coverage is a measure of how many program paths are covered by a test of test cases, and test coverage is used throughout the industry as a measure of test quality. We will use a tool to calculate the test coverage in a future lab.
Exercise: Loop example
Consider the following code that includes a loop.
|
|
Try to draw the CFG for this example. Some pointers:
- A loop is a decision node. In the case of this
for
loop, if there are stillnum
remaining in the list, you go to 3. Otherwise, the program block is ended because there is nothing left after the for loop. - Where do you go after lines 4 and 6? Back to the
for
loop.
Exercise: Multiple return paths
The following example has multiple ways to return out of the code block. You would treat raising an exception as returning.
|
|
Try to draw the CFG for this example. Some pointers:
- Lines 2 and 4 are both decision nodes.
return
statements are treated as regular nodes, but they all go to the end node.- Make sure to label your decision nodes’ outgoing edges with the condition.
Knowledge Check
- Question: What is a program path, and how is a CFG related to program paths?
- Question: What do you label the outgoing edges of a decision node with?
- Question: How many unique program paths exist in the Loop example? What are they?
- Question: Write a set of test cases that exercise all unique paths in the Loop example.
- Question: How many unique program paths exist in the Multiple return paths example? What are they?
- Question: We didn’t model a
try-except
scenario. Apply your critical thinking and the rules at the top of this lab to create a CFG for the following more complex function:
|
|
9 - 09. Code Readability
You are getting the first edition of all these pages. Please let me know if you find an error!
Motivation
Learning to program often focuses on syntax and semantics – avoid errors and get the correct answer.
You probably also learned about rules to follow for how your code looks. You were probably also told that you should write good comments. Why?
A tremendous amount of research in programming language development and in software engineering focuses on program comprehension a.k.a., understandability. How much effort does it take to understand your source code? Software engineers care deeply about understandability because most of the effort in software development is spent fixing bugs or adding functionality to existing code. To do that without breaking everything, you need to understand what the existing code does!
Understandable code is a function of several things:
- The programming language syntax and semantics. Python is objectively more human-friendly than assembly language.
- Coding conventions and documentation.
- Design and organization of the code.
We are going to focus on #2 first and #3 in the future.
Both coding conventions and code documentation promote readability: how difficult is it for someone to read your source code and understand it. Let’s look at these topics separately.
9.1 - Coding conventions
You are getting the first edition of all these pages. Please let me know if you find an error!
Motivation
“Readability counts.”
– Tim Peters, long-time Python contributor, The Zen of Python
You were probably taught to give your variables descriptive names, such as total = price + tax
, as opposed to t = p + tax
. But, sometimes, you are told there are traditional variable names, like
for i in range(1, 4): # i is the outer loop index
for j in range(1, 4): # j is the inner loop index
print(i, j)
Consider the following code with poor variable names and improper spacing:
def bs(a,x):
there=False
fst,lst = 0,len(a)-1
while fst<=lst and not there:
mid=(fst+lst)//2
if x<a[mid]:
lst=mid-1
elif x>a[mid]:
fst=mid+1
else:
return True
return False
As a developer, it would certainly take me a minute to figure out what this function does. Better names would go a long way for sure. But also, the improper spacing makes it needlessly difficult to see what each line is doing. In Python, every operator should have a single space around it. For example, lst=mid-1
should be lst = mid - 1
.
Now compare to a properly named, properly spaced solution:
def binary_search(lst, target):
found = False
first, last = 0, len(lst) - 1
while first <= last and not found:
mid = (first + last) // 2
if x < lst[mid]:
last = mid - 1
elif x > lst[mid]:
first = mid + 1
else:
return True
return False
Coding conventions in Python
Coding conventions are the rules for naming, spacing, and commenting adopted by an organization. These conventions are often language-specific. Google has coding conventions for many languages that they expect their developers to follow, for example. Many organizations will use their own conventions. One of the nice things about coding conventions is that they can be checked by tools in the IDE to let you know if you’re violating them.
The creators of Python have published a set of coding conventions for the whole language, called PEP 8 - Style Guide for Python Code, which we will follow in this class.
The sections below are a subset of the rules that I consider the most impactful on readability.
Naming rules
- Variable and function names are
lowercase_with_underscores
only.- Function names are verbs or begin with a verb.
- Variable and class names should be nouns.
- Class names are
CamelCase
beginning with an uppercase letter. - File names (modules) are
lowercase
letters. You may use_
if it improves readability.
Blank lines
- Surround top-level function and class definitions with two blank lines.
- Method definitions inside a class are surrounded by a single blank line.
- Extra blank lines may be used, sparingly, to separate groups of related functions.
- Use blank lines in functions, sparingly, to indicate logical sections.
- Otherwise, avoid unnecessary blank lines!
Whitespace within lines
- Do not put whitespace immediately inside parentheses, brackets, braces.
- Do:
spam(ham[1], {eggs: 2})
- No:
spam( ham[ 1 ] , { eggs: 2 } )
- Do:
- Do not put whitespace immediately before a comma, semicolon, or colon:
- Do:
if x == 4: print(x, y); x, y = y, x
- No:
if x == 4 : print(x , y) ; x , y = y , x
- Do:
- Most operators get one space around them.
- Otherwise, avoid unnecessary whitespace!
Summary
Consistently applying coding conventions makes your code easier to understand.
We can use tools to help enforce coding conventions, and we will do so soon. For now, concentrate on learning the Python naming and spacing conventions above.
Knowledge check
- Define coding conventions.
- What are the PEP8 violations in the following code block? How do you fix them?
class patient: def __init__(self,firstName,lastName,age): self.firstName=firstName self.lastName=lastName self.age=age def computeBill(self,fee,interest): return fee*(1+interest) def printRecord(self): print(f"{self.firstName} {self.lastName} {self.age}") if __name__ == "__main__": bob = patient('bob', 'bobberton', 55) bob.printRecord()
9.2 - Documenting code
You are getting the first edition of all these pages. Please let me know if you find an error!
Motivation
Comments in code provide a way for you to leave notes to yourself and others about what your code does. These are very useful, if not essential, in a team setting. The term code documentation in general refers to the set of comments in source code that, hopefully, explain something about that code.
Code documentation is a double-edged sword. Done well, it helps you and others understand your code. Done poorly, it provides no value and can even mislead. Further, code documentation needs to be updated when the code is updated!
Three simple rules
We want our code documentation to be clear and concise, just like the code itself. Here is what we will focus on documenting.
- Code should be self-documenting to the greatest extent possible.
- Document the purpose of classes and modules (files).
- Document the purpose, parameters, return values, and exceptions of functions.
You can apply these rules to almost any language you encounter, and you will find that the recommendations for creating class and function comments different per language.
Self-documenting code
Self-documenting code is a popular term for “I can look at the code and understand it’s purpose.” How do you achieve that?
Naming
Use descriptive variable, function, and class names according to your team’s coding conventions.
Variables and classes should be nouns that describe the data.
- Keep them short and concise, say, 16 characters max. Shorter is better.
- Use plural nouns to represent lists, sets, and other collections.
- Do not use built-in names for variables, like
max
,min
,sum
. - Examples:
for name in birds:
wherebirds
is a list of strings.total = sum(scores)
Functions should be verbs or start with a verb. They should describe what the function does.
- Again, strive to be concise.
- If a phrase better describes the function, split the words with underscores (Python convention), such as
compute_average_score()
. In Java, you would use camelCase
Comments
In-line comments are useful but should not be abused. Use in-line comments to:
- Summarize a complex block of code.
- Explain an implementation or design choice.
Do not write a comment for every line. A programming proficient in the programming language should be able to understand your code if you use good variable names and your logic is clear. In cases where the logic is unclear or convoluted, a code comment is warranted to explain your implementation.
Docstrings
In Python, we document modules (.py
files), classes, and functions with docstrings. Docstrings are part of the Python language syntax.
Some tools look for these docstring content in a particular content These tools can give you pop-up information about a module, class, or function:
Installing autoDocstring
We will install a Visual Studio Code extension to make writing docstrings simpler.
Go to the Extensions pane on the left side or press Ctrl+Shift+X.
Search for autoDocstring
and install the extension by Nils Werner.
Creating docstrings for a module/file
On the first line of the file, put something similar to the following:
"""This module contains functions useful for counting birds."""
That’s it. You can add multi-line docstrings where needed like so:
|
|
Place your cursor on the first line of the file (for modules), just below the class name, or just below the function name. then type """
and hit Enter. autoDocstring will create a template for you.
Creating docstrings for a class
Place a blank line below the class name line and type """
. autoDocstring will prepare a template for you.
|
|
Simply replace the word _summary_
with whatever you want to say. Be concise and state the purpose of the class. Use multiple lines if desired.
Creating docstrings for a function
Place a blank line below the function name and type """
. autoDocstring will prepare a template for you.
def __init__(self, name, age, weight, height):
"""_summary_
Args:
name (_type_): _description_
age (_type_): _description_
weight (_type_): _description_
height (_type_): _description_
"""
self.name = name
self.age = age
self.weight = weight
self.height = height
autoDocstring will create a _summary_
area to explain the purpose of the function. It will have an Args
region for you to describe the types and purpose of each argument. It will also create an Exceptions
region if your function explicitly raises
exceptions.
Fill in the contents like so.
def __init__(self, name, age, weight, height):
"""Constructor for the Patient class
Args:
name (_str_): first and last name
age (_int_): age in years
weight (_int_): weight in pounds
height (_int_): height in inches
"""
self.name = name
self.age = age
self.weight = weight
self.height = height
Now with your docstrings set up, you will see helpful pop-ups in your IDE when you type class and function names!
Knowledge check
- When are the two cases where an in-line comment is appropriate?
- In Python, why is
sum
a bad variable name? - Why is
doc()
a bad function name? - For which three Python program elements do you write docstrings?
- What are the four possible elements of a function docstring?
- Does the docstring go inside or above the program element?
10 - 10. Code-level Design
You are getting the first edition of all these pages. Please let me know if you find an error!
Motivation
We make references to “writing code the right way”, but that is secondary to getting the correct answer. After all, how can you get a good grade if it doesn’t work?
In software engineering, everything needs to work, but doing it the right way is equally important. Why?
- Because you are on a team, and someone else may have to understand and edit your code. Including your future self. We call this understandability.
- Poorly-implemented solutions are more difficult to change without introducing bugs. We call this maintainability.
- Poorly-implemented solutions may work with small data, but become intolerable with millions of records. We call this efficiency.
- Overly-specific solutions that make assumptions about the data will break when encountering “the real world”. Avoiding this is called robustness.
The Rules
These characteristics are the result of your code design. The labs in these sections will go through code-level design principles that you, the developer, are responsible for when writing code.
We will start with a relatively simple program then add functionality to it.
In extending this program, we will implement the following design rules that will help improve the understandability, maintainability, efficiency, and robustness of the software:
- Separate input/output logic from business logic.
- Functions should have a single responsibility.
- Handle errors at the lowest sensible level, and re-raise/re-throw them otherwise.
- Raise specific errors and define your own if needed.
- Avoid magic literals.
- DRY (Don’t Repeat Yourself) and the Rule of Three.
You should write those down. We will explore them in-depth in turn.
Example program
Imagine you are a bank teller working an old command-line console that provides access to customer’s bank accounts. The program does not do much right now, but we will add to it.
Do the following:
- Create a subdirectory named
bank-accounts/
in yourseng-201/
directory. - Download the following files and put them in the
bank-accounts/
directory:process_accounts.py
: the main program file that you run.bankaccount.py
: defines aBankAccount
class that is used by the program.accounts.csv
: a plain text file in Comma-Separate Value (CSV) format. Open it in a text editor (like Visual Studio Code) and also in a spreadsheet program like Excel or Google Sheets. CSV is a common way of sharing tabular data in plain text.- You can read the CSV file in Python the same way you do a plain text file.
- The first line of the CSV file contains the column headers – descriptive names for each comma-separated value.
- Each line of the
accounts.csv
file represents one bank account. - The format for each line is
<AccountNumber>,<FirstName>,<LastName>,<AccountBalance>,<DateOpened>,<DateOfLastTransaction>
accounts_bad_numbers.csv
: a data file with strings where there should be numbers.accounts_missing_columns.csv
: a data file that contains only<AccountNumber>,<FirstName>,<LastName>
accounts_expanded.csv
: more columns added.progress.jpg
: a file we will use for testing.
- Run
process_accounts.py
. Select the menu option to view an account, then enter an account number from the CSV file. You should see the account data. Some sample accounts:- 796505
- 872934
Rules 1–2
Make sure you have all the files from above in a bank-accounts/
directory.
End of Day 1
Here is the code for process_accounts.py
at the end of the first lecture: process_accounts.py
.
Rules 3–4
End of Day 2
Here is the code for process_accounts.py
at the end of the second lecture: process_accounts.py
.
accounts_expanded.csv
: more columns added.
11 - 11. Version Control
Coding is an incremental activity. You write code, it’s a little broken, you fix it. You work on the next thing, it’s a little broken, you fix it. And so forth until you’re “done”.
During the coding process, you have probably done the following:
- Saved a copy of the file at a point when you know it just works. Then you keep coding.
- Wanted to go back in time to a point when everything did work so you can start over.
- Had to email or otherwise share your code files between computers.
Version Control Systems (VCSes) are systems that manage changes to source code, documents, and other files over time. VCSes are also how all teams store and share their code on a shared project. VCSes are essential to software engineering.
A VCS is a computer application, the most prolific of which is called Git and was created by Linus Torvalds, the creator of Linux. All VCSes, including Git, have the following features:
- The ability to make a version: a snapshot of the project files at the current time.
- The ability to revert to an earlier version.
- The ability to compare versions of the project files to see their differences.
- The ability to share versions with a central repository that multiple people can access.
Importantly, it is up to the programmer to decide when to create a version, when to revert, and when to share. This is in contrast to your OS or an app like OneDrive or Google Drive, which do some of these things automatically.
We will use Git and GitHub in this class as our VCS. We will start by setting up these tools on your computer.
11.1 - Git and GitHub setup
Git is the world’s most popular version control system. GitHub is a cloud service that hosts shared code repositories.
We will setup these and then delve further.
Git installation
Git is available for all operating systems. We will install for WSL/Linux and Mac.
Git is bundled with the XCode Command Line Tools, which you may already have on your Mac. Open a Terminal and run:
git --version
If you don’t have it installed already, it will prompt you to install. If you do have it installed, you will see something like git version 2.39.5 (Apple Git-154)
.
sudo apt update # You will be prompted to enter your login password
sudo apt install git-all
Git configuration
Close any open Terminals. Run the following in a new Terminal.
git config --global user.name "John Doe" # Put your real name
git config --global user.email johndoe@example.com # Put a permanent email here
git config --global core.editor "code --wait" # Use Visual Studio Code as the editor for commit messages.
You only run these once when you install Git.
GitHub set up
We will use GitHub in this class to remotely store versions of our code. Many organizations use GitHub to store their code, including many popular open source projects.
Use a permanent, personal email account to register for a free GitHub account at https://github.com. You will eventually lose access to your UNCW email, but you will want to access your GitHub account long after you graduate.
That’s all you need to do for now. We will use GitHub soon.
11.2 - Git basics
The Git VCS stores versions in repositories. You will typically have one repository for each project. For example, you would have a repository for Assignment 2, a separate repository for Assignment 3, etc.
Git divides the world into three parts to facilitate tracking and sharing versions.
The workspace or working directory is the directory on your computer where the project resides, e.g., seng-201/assignment3/
. You work on your files in this directory as usual.
A local repository is a special hidden directory within the workspace where Git stores the version history and other information. The local repository is created by the Git program. You interact with the local repository using git
commands to create new versions, compare files, and revert back to earlier versions.
A remote repository is a copy of the local repository on a computer somewhere else. In this class, the copy will be kept on GitHub, but software companies may have their own hosts. The remote repository enables teams to share project changes and to restore the project if something terrible happens to someone’s computer.
You must learn and understand the relationship between these entities to master Git. Tools like OneDrive and Google Drive have similar concepts, but what distinguishes Git from those tools is that you decide when to save and share changes to your project between these entities.
Keeping a version history
We will start with the most simple use case for a VCS: we want to kept a historic timeline of versions. A version is a snapshot of files in the workspace at a point in time.
Step 1. Start with a directory
Create a subdirectory called speakeasy
in your seng-201/
directory. Change into the speakeasy
directory
Open the directory in Visual Studio Code. Create a file named main.py
with the following:
main.py
print("Welcome to the Speakeasy!")
print("Did you know? The term 'speakeasy' was coined during Prohibition in the United States.")
mocktails = ["Virgin Mojito", "Cucumber Lemonade", "Pineapple Ginger Beer", "Berry Spritzer"]
print("\nToday's Mocktail Menu:")
for drink in mocktails:
print(f"- {drink}")
print("\nThank you for visiting! Come again soon.")
We have created only the workspace – no Git yet:
Step 2. git init
We need to initialize Git for each project. In the Terminal:
- Make sure you are in the
speakeasy/
directory. - Run the command
git init
- You will see output like
Initialized empty Git repository in /Users/laymanl/seng-201/speakeasy/.git/
This command initializes the local repository within the working directory. The local repository is created within a hidden .git/
subdirectory. Run the command ls -al
to see the .git/
subdirectory. You may be able to see the .git/
subdirectory in your file browser, but it will not show up in Visual Studio Code.
Git is now monitoring the workspace for changes to files and subdirectories. You only need to run git init
once to track a new project and any subdirectories under that project.
Undoing git init
First, you should not keep Git repositories in directories that are in OneDrive, Google Drive, or the like. You can run into weird authentication errors.
Second, do not nest Git local repositories.
If you ran git init
in the wrong place, find that hidden .git/
directory and delete it. This will remove the Git repository (and all of its history), but will not change the workspace files.
Checking where you are: git status
Run the command git status
. You should see something like:
On branch main
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
main.py
nothing added to commit but untracked files present (use "git add" to track)
git status
is useful for understanding the state of your workspace and local repository. Breaking down the contents:
On branch main
: we will discuss this in a future lab. Ignore for now.No commits yet
: Git is telling us we have not created a version yet. We have to do this manually.Untracked files...
: Git says there are files that have been added, changed, or removed that we have not versioned yet.
Step 3. Creating the first version
Creating a version always requires two separate commands. Run the following in the Terminal:
git add main.py
git commit -m "First commit of main.py"
git add [file]
: Adds a changed file to the index.- The index is the list of files that will be saved to the version.
- It is possible to change, say, 10 files, but only save 5 of them to the version. The index let’s you be selective if you need to.
git commit -m "<message>"
: Commit your changes to a new version.
We have just created a new version: a snapshot of project files at a point in time. We have added and committed main.py
to a new version in Git local repository. We can now, if we want, restore main.py
to this version in the future.
Step 4. Creating another version
Add the following line to main.py
:
print("Don't forget to tip your server!")
README.md
This is my first project!
git status
. You will see something like:On branch main
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: main.py
Untracked files:
(use "git add <file>..." to include in what will be committed)
README.md
no changes added to commit (use "git add" and/or "git commit -a")
This is the current status. We have added a file, but we have not added it to the index nor committed it yet. We also haven’t added or committed our changes to main.py
yet. Remember, everything in Git is manual and this is by design.
The Changes not staged for commit:
section tells us which files have changed in the workspace, but we haven’t added to the index. We also see the Untracked files:
section, which is telling us that README.md
is a new file with no version history.
Let’s commit them both at once. Run the following:
git add .
git commit -m "Added message and README file"
The command git add .
tells Git to add ALL changes, additions, and deletions in the current directory. This is how you should get a snapshot of all changes to your project.
We have now created a new version. Our Git looks like this:
Differences
It is very important that you understand that Git does not store entire copies of files. You cannot go into the hidden .git/
directory and simply copy “version 1” of your files.
Git stores file differences. It compares Version 2 of your files to Version 1 to see what has changed, and stores the set of changes. This set of changes is called a difference or a diff for short. Storing only the differences makes Git more space efficient, and also enables some very useful comparison tools that we will explore later.
Step 5: Viewing history with git log
In your Terminal type git log
. You hit q
to exit the log viewer. You will see something like this:
commit b424cc472f7276dc35493abbd186563a191ca25b (HEAD -> main)
Author: Lucas Layman <laymanl@uncw.edu>
Date: Mon Oct 21 15:21:44 2024 -0400
Added message and README file
commit 8356ea035b8d6538f9ea4eabe2393d6cd6016553
Author: Lucas Layman <laymanl@uncw.edu>
Date: Mon Oct 21 15:13:00 2024 -0400
First commit of main.py
(END)
Each block is a version. The versions are not numbered 1, 2, 3, etc. but are identified by a unique hash like b424cc472f7276dc35493abbd186563a191ca25b
. They are shown in reverse chronological order.
git log
shows you the version history of the local repository only. It is useful for see what work has been done recently.
Important concept review
The workspace is the directory on your filesystem that your project lives in. You code here. The files you see in Visual Studio Code and your File Explorer are the workspace. When you make changes to files, they are immediately saved in the workspace because the workspace is synonymous with your filesystem.
The local repository is Git’s history of versions. Versions are snapshots of the workspace files at a point in time. The developer must manually add and commit changes to create a version.
Git does not store entire copies of files, but rather the differences from one version to the next.
Summary of the process
To create a version history for a project (a directory), do the following:
- Run
git init
to create the local repository. - Make changes to files: adding new files, editing existing files, deleting files.
git add .
to stage all changes in the index.git commit -m "<message>"
to save the version to the local repository.- Repeat steps 2–5.
Knowledge Check
- (Question) Explain the purpose of the local repository and how it differs from the workspace.
- (Question) What is the function of a remote repository in Git?
- (Question) Describe the significance of the
.git/
directory. - (Question) What happens when you run
git init
in a directory? - (Question) What does the
git status
command show you? - (Question) What is the purpose of the Git index (staging area)?
- (Question) What command do you run to add something to the staging area?
- (Question) What command do you run to save a new version in the local repository?
- (Question) What happens if you try to save a new version without staging first?
- (Question) True or False: A version is a copy of the entire file that was changed?
- (Question) Versions in Git are not stored sequentially as in Version1, Version2. How are versions uniquely identified in Git?
- (Challenge) Create a new directory, initialize it with Git, and create a new file. Commit changes to track the file.
- (Challenge) Add modifications to an multiple files and use
git status
to see the changes. Commit only a subset of the changes. - (Challenge) Describe how you would undo an incorrect
git init
operation.
11.3 - Undoing mistakes with Git
One of Git’s powers is being able to “go back in time” to a previous version to undo a terrible mistake or simply to start fresh.
How to identify the scenario that applies to you
We will walk through some common scenarios where you might want to undo your work and reset to a known safe state.
“Going back in time” depends on what you want to change and the current state of your repository in terms of (a) what’s changed in the workspace, (b) what is staged in the index, and (c) what has been committed to the local repository.
Use the git status
command to identify staged and unstaged changes, and git log
to check the local repo version history.
Starting state
In Lab: Git Basics, we created a Git repository for a simple speakeasy/
project. We added two files, main.py
and README.md
, and committed two versions:
We will pick up our example from this point.
Oops #1: Deleted something from the workspace
- Open Visual Studio Code for the
speakeasy/
folder. - Now delete
main.py
Let’s say you want to recover what you just deleted. This scenario may involve one file, many files, directories, or anything in the project folder. So when I use the word “file” below, I mean any of those things.
Your options depend on whether the file has been staged with git add
or committed at some point in the past.
If the file has been staged before
- First try using your IDE’s undo feature: CTRL+Z or CMD+Z. If you see the file reappear, you are good to go.
- If undo doesn’t work, use
git restore [name]
. Git will place a copy in the workspace.
If the file has not been staged
- Try using your IDE’s undo feature.
- If that doesn’t work, check your operating system’s “trash can”.
- Sorry. It’s gone.
Oops #2: Undoing unstaged changes
Suppose you’re editing a file tracked by Git. You don’t like what you’ve done, and want to start over from most recent version.
- Make sure
main.py
is back in your workspace. - Add the following code to
main.py
:import random def silly_compliment(): compliments = [ "You're as useful as a screen door on a submarine, but twice as fun!", "Your brain is like a sponge... except it soaks up memes more than facts!", "You're as rare as a unicorn at a hotdog stand." ] return random.choice(compliments)
- Save the file.
- Add the line
I like working on it!
toREADME.md
and save the file. - Make a new file
hello.py
and addprint("Hello world!")
to it. - Run
git status
git status
tells you that main.py
and README.md
have been modified but are not staged, and it tells you that blah.py
is new and untracked:
Our changes are only in the workspace, they are not staged in the index yet.
Now, let’s undo some changes:
- Run the command
git checkout -- main.py
to reset to the file to the most recent version, in this case, the versionb424cc
.- The contents of
main.py
will change in the editor. - Notice that
hello.py
andREADME.md
are unchanged. This is because we specifiedmain.py
as the target ofgit checkout --
- The contents of
- Restore the changes to
main.py
by undoing with CTRL+Z or CMD+Z. - Now run the command
git checkout -- .
- Notice that both
main.py
andREADME.md
reset to their previous version. This is because we specified the target.
, which is shortcut for “the current working directory”. Bothmain.py
andREADME.md
are tracked by Git, so they both reset. - However,
hello.py
is untracked by Git so it is unaffected.
- Notice that both
After running these commands, we are in the state below where hello.py
is a new file but not being tracked by Git. Both README.md
and main.py
are as they were in the most recent committed version.
Now what if you want to get rid of an untracked, unstaged file like hello.py
? Just delete the file!
The checkout --
command replaces the workspace files with the most-recently-committed versions of those files in the local repository, i.e., the files as they were in b424cc
.
Oops #3: Undoing staged changes
Suppose you are adding, editing, or deleting files and you have run the git add .
command to stage the changes in the index. You realize that you made a mistake, and you do not want to save those changes. You either want to work on them some more, or you simply want to start over.
We will start at the end of the previous scenario: main.py
and README.md
are unchanged and look like they do in the most recent version b424cc
, while we added added a new file hello.py
that is not staged yet.
Run the following:
- Re-add the following code to
main.py
:import random def silly_compliment(): compliments = [ "You're as useful as a screen door on a submarine, but twice as fun!", "Your brain is like a sponge... except it soaks up memes more than facts!", "You're as rare as a unicorn at a hotdog stand." ] return random.choice(compliments)
- Run
git add .
to stage the changes to bothmain.py
and the newhello.py
file. - Run
git status
main.py
and hello.py
are now in the index of changes we want to save to a new version, but we haven’t committed that new version to the local repository yet.
Suppose at this point that we need to do more work in hello.py
and main.py
. Maybe we’ve made a mistake, and we’re not ready record these changes.
- Run the command
git reset hello.py
. This will unstage the file, meaning it will not be included in the commit until you rungit add
again. - You can also run
git reset .
to unstage any staged changes. The files will be unchanged in your working directory.
The files still have all their changes in the workspace. You are ready to edit and fix up whatever you need.
Oops #4: Completely restart from the last version
This is a common scenario. You work for a bit and then decide that all the changes you have made are bad, and the easiest thing is just to start over.
You want to wipe out all the changes in both your workspace and the index. Be careful: once you do this, you can’t undo it.
Let’s start where we ended in the previous figure: we’ve changed main.py
and added the new file hello.py
. These changes are not staged in the index yet.
Do the following:
- Run
git status
to see that we have unstaged and uncommitted changes. - The
git reset --hard HEAD
HEAD
is a special reference that means “the most recent committed version”.--hard
argument tells Git “destroy changes to tracked files in the workspace and the index”
You should see output like
HEAD is now at b424cc4 Added message and README file
b424cc4
is the most recent committed version in the local repository, and “Added message and README file” was the message for that version.
Run git status
:.
Notice that untracked files are unaffected. We have not added or committed hello.py
, so it remains untouched. But main.py
has been reset to its most recent version.
All together, git reset --hard HEAD
says “reset the tracked files in the workspace by replacing (--hard
) the workspace contents with the most recent version (HEAD
)”
Again, this is a destructive action. You cannot undo it once done. But, it is very useful for starting fresh. Your local repository is unaffected by the command.
Oops #5: Undoing the most recent commit
You have run git add .
and then a git commit -m "<message>"
. Committing saves a new version to the local repository.
Maybe you are unhappy with the version and you want to edit your work. Maybe you forgot to add a file that needed to be there. In these cases, the simplest thing is often to make the changes and just make another commit.
You committed version should be “good code”. Bug free, compiles, works. However, sometimes you commit a mistake. You find a terrible bug in your code. Or you committed a syntax error and didn’t notice. These scenarios call for you to undo the commit.
Starting from the previous scenario, we have hello.py
in the workspace but untracked. Let’s introduce a bug to main.py
:
- Open
main.py
and add the linetip = float(input("Enter a tip amount: "))
- Make sure to save
main.py
- Run
git add .
- Run
git commit -m "Enable user to type a tip amount"
You will see output like:
[main 81a55e5] Enable user to type a tip amount
2 files changed, 3 insertions(+)
create mode 100644 hello.py
We should now have three versions in our local repository. Run git log
to see them:
We realize that we have committed a bug. tip = float(input("Enter a tip amount: "))
will crash the program if the user types in a non-numeric number for the tip, like "one dollar"
. We want to undo the commit so we can fix the bug and to keep our version history containing only “good code”.
You have two options here:
- You may have some changes to your workspace that you want to keep. Like you want to keep
hello.py
. Or maybe your code inmain.py
is pretty good, and you just want to fix it up a little bit. - Your last commit was a total disaster. You don’t want to keep any changes you made to
main.py
orhello.py
. You want to completely throw away the most recent version and go back to the one before it.
Option 1: Preserve your work, fix it, then make a new commit.
Run the command git reset HEAD~1
. You will see output like:
Unstaged changes after reset:
M main.py
Now run git log
. You will see something like:
commit b424cc472f7276dc35493abbd186563a191ca25b (HEAD -> main)
Author: Lucas Layman <laymanl@uncw.edu>
Date: Mon Oct 21 15:21:44 2024 -0400
Added message and README file
commit 8356ea035b8d6538f9ea4eabe2393d6cd6016553
Author: Lucas Layman <laymanl@uncw.edu>
Date: Mon Oct 21 15:13:00 2024 -0400
First commit of main.py
Notice that git log
only shows two versions! What have we done? Your current Git state is like this:
The command git reset HEAD~1
tells the local repository to “forget” the most recent version. It’s like it never happened.
However, the files in your workspace and index are unchanged! All the edits and additions are still there for you to work with, they are just not committed.
Now you have the opportunity to fix up those files, add
them, and commit
them.
Option 2: Disaster! Delete the last version and reset all the files
This is just like Oops #4 where you reset the tracked files, but you also want to destroy the most recent commit.
The command to do this is git reset --hard HEAD~1
. This command is destructive and you cannot undo the consequences.
Assuming you have changes to main.py
and hello.py
from the previous scenario:
- Do
git add .
andgit commit -m "Enabling the user to enter a tip"
to stage and commit a new version - Run
git reset --hard HEAD~1
- Run
git log
to see the version history
hello.py
is unaffected because it is untracked, however, main.py
and README.md
are reset to their version 2 status. We’ve also deleted the bad version.
Recap
Git has even more functionality for “going back in time”, such as going back two, three, or more versions in the past. Or undoing multiple commits at once. Those use cases can be tricky to do correctly without unintended consequences.
For now, the “Oops” scenarios above will be sufficient 95% of the time as you develop your Git skills:
- Deleted a file from the workspace: Undo (CTRL+Z/CMD+Z) or
git restore <filename>
- Undoing unstaged (not
add
) changes:git checkout -- <filename>
- Undoing staged (
add
ed) changes:git reset <filename>
- Completely restart from the last version:
git reset --hard HEAD
. This is destructive! - Undoing the most recent commit:
- and keep your work:
git reset HEAD~1
- and throw away work:
git reset --hard HEAD~1
. This is destructive!
- and keep your work:
Knowledge check
- (Question) Describe how
git status
andgit log
help identify a repository’s state. - (Question) What command would you use to recover a deleted file that was previously staged or committed?
- (Question) How does
git checkout -- <file>
differ fromgit restore?
- (Question) Explain how to undo changes that are staged but not committed.
- (Question) What happens to untracked files when you run
git checkout -- .
? - (Question) Which command do you run to completely reset your working directory to the most recent version?
- (Question) Which command do you run to destroy/remove the last version in the local repository?
- (Challenge) Simulate deleting a file and use Git commands to recover it.
- (Challenge) Experiment with staging changes, then undo them.
11.4 - Branching and Merging, Part 1
One of Git’s main features is branching: the ability to create parallel timelines in version history, and then merge them together later.
The circles in the illustration represent versions. The lines indicate different branches. We will build a similar diagram below while introducing branching concepts.
Why branching? It allows version histories to be a little dirty, or only incrementally complete. Then we share when we’re happy and done.
This feature is essential for working on a team, and also by yourself to preserve a “clean” main branch while updating functionality in parallel.
The active branch
Git has a notion of the active branch, which is the branch you are currently committing to. So far, you have only been committing to the main
branch in our examples.
The main
branch
Let’s create a new project:
- Create a directory
git-branching
in yourseng-201/
directory. - Change into the
git-branching
directory and rungit init
to initialize a new Git repo. - Create the file
app.py
with the following content:def main(): print("Welcome to the main branch!") if __name__ == "__main__": main()
- Run
git add .
- Run
git commit -m "first version"
Every Git repository has a default branch called main
(or master
prior to July 2020). This branch is created for you when you run git init
.
In the Terminal window, you may see the text (main)
in the command prompt indicating that main
is the active branch:
Visual Studio Code also displays the active branch in the bottom left:
Most software groups treat the main
branch as the place where only robust, finished, shippable code lives. You are not allowed to commit directly to main
in many organizations. Instead, the expectation is that you work in a different branch and integrate with main when finished and approved.
Committing directly to main
is fine for small personal projects that you don’t expect anyone else to use or that won’t live long. Most short class assignments fall into this category.
But, you should use branches for any other scenario, even if working by yourself!
What is a branch?
Remember how we said that the special variable HEAD
in Git is a pointer or reference to a specific version in the commit history? Usually, the HEAD
is pointing to the most recent version of the active branch.
Branches, including the main
branch, are additional named variables that point to a specific version. When you run git init
, creates a named main
variable that points to a specific version. When you make your first commit, main
will point to the first version in your repository:
To branch or not to branch
Before you create a branch, you must decide what to do with any unstaged and staged changes.
When you create a new branch, un-committed changes (unstaged and staged) are brought into the new branch. This is often desirable.
Suppose you start working on code and you realize “this is more complicated than I thought and going to take a lot of effort.” You can move these changes to a new branch, and the version history of your current branch will be unchanged.
You may also want to save all your currently unstaged and staged changes to the active branch. You have three options:
- If you have no changes in the working directory, then you’re good to create a new branch.
- Stage and commit changes if you want to create a new version in the active branch.
- Create a new branch if you want your staged and unchanged changes to appear in the branch, but you want the old branch, e.g.,
main
, to be unchanged for now. - You can also undo those changes using
git reset
or something similar..
You decide what’s best.
Creating a new branch
Run the command git checkout -b feature-1
. You will see something similar to:
You have created a new branch named feature-1
, and you have set the active branch to feature-1
. The checkout
command tells the HEAD
to point to feature-1
, which makes feature-1
the active branch.
This means any committed changes will be saved to the version history of feature-1
but not to main
. Your workspace state looks like the following:
We have not yet committed a new version, so all three variables are pointing the first version.
Remember: Why do we want to use branches? It allows version histories to be a little dirty, or only incrementally complete. Then we share when we’re happy and done. This feature is essential for working on a team, and also by yourself to preserve a “clean” main branch while updating functionality in parallel
Committing a new version to the branch
Change app.py
to the following:
def main():
print("Welcome to the main branch!")
feature_1()
def feature_1():
print("Feature 1 activated!")
if __name__ == "__main__":
main()
Add and commit the change:
git add app.py
git commit -m "Add feature 1 function"
Run git log
, and you will see something like this:
commit 89c5985701b1a6b188d1c23fef3b0196dd17b34e (HEAD -> feature-1)
Author: Lucas Layman <laymanl@uncw.edu>
Date: Tue Oct 29 11:29:37 2024 -0400
Add feature 1 function
commit e436c51cd2760e9ef0d49a65472a404044c2d3c0 (main)
Author: Lucas Layman <laymanl@uncw.edu>
Date: Tue Oct 29 11:19:05 2024 -0400
first version
You are looking at the version history of the feature-1
branch. Note that the history is based on the first version from main
.
Conceptually, our branch history looks like this:
The local repository looks like this:
A second commit
Let’s make another change and commit it to the feature-1
branch. Do the following the following code:
- Replace
app.py
with the following:import random def main(): print("Welcome to main!") feature_1() def feature_1(): print("Feature 1 activated!") print(f"Your random number is {random.randint(1,100)}.") if __name__ == "__main__": main()
git add .
git commit -m "adding random number generation"
We now have two new versions in our feature-1
branch. Our repo and branch history look like this:
Switching between branches
Run the command
git checkout main
to switch back to the main
branch. Notice there is no -b
.
Question: What happens to the code in your IDE?
You should see that the contents of app.py
are replaced with the contents as they were in the first version. Here is the current state of the repo:
Several things happened:
checkout
tellsHEAD
to point to the same version as themain
variable. This makes themain
branch the active branch again.- Git replaces the contents of the workspace with the files as they were at the
main
version. feature-1
is unaffected. The version committed tofeature-1
is still in the local repository, so we can go back to the files at that version by checking out thefeature-1
branch.
Exercise: Checkout feature-1
to verify that all your changes have been saved in that branch. Switch back to main
when you are done.
Merging
Our repo reflects the most common use case for branches: you work on something in a branch for a while, you make it perfect, and you are now ready to bring your work into main
. Remember, main
should only contain clean, complete, “good” code.
You want now to merge your feature-1
branch into the main
branch. Merging is the process of combining the histories of two branches.
Run the following:
git checkout main
to ensure thatmain
is the active branch.git merge feature-1
to merge the feature-1 versions intomain
You will see output similar to:
(3.12.2) ➜ git-branching git:(main) git merge feature-1
Updating e436c51..b2f5622
Fast-forward
app.py | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
You will also see that your IDE’s editor contents for app.py
contain all the changes from the most recent version of feature-1
. Run the git log
command and you will see that HEAD
, main
, and feature-1
all point to the most recent version from feature-1
.
Here is the state of our repo:
Conceptually, we have created a new version of main
that includes all the changes from the feature-1
branch. I say conceptually because have not actually created a new version in the repo, but have updated the main
variable to point to the same version as feature-1
.
The feature-1
branch is still alive and well, and we can check it out and code against it. How does merging work?
- Find most recent common ancestor: Git first identifies the most recent common ancestor (base commit) of the two branches. This is where both branches diverged from each other. In the illustration, this was the first commit
e436c5
. - Analyze changes: Git then looks at the changes that have been made in both branches since that common ancestor.
- Apply Changes:
- If the changes are non-conflicting (meaning they don’t overlap), Git automatically combines them. This is what happened here.
- If there are conflicting changes (meaning the same parts of a file have been modified differently in each branch), Git pauses and marks the conflicts. You’ll need to resolve these conflicts manually before completing the merge.
- (Sometimes) Create a Merge Commit: Once all changes are applied, Git creates a new commit (called a “merge commit”) on the active branch. This merge commit has two parents—one from each branch being merged—and represents the integration of both sets of changes.
- I say “sometimes” because in cases where
main
has not changed, like in this lab example, a merge commit onmain
is not created.main
is simply “fast-forwarded” (that is the actual Git term) to the latest version offeature-1
by moving themain
pointer. - However, if changes were made to both
main
andfeature-1
, we would see a merge commit.
- I say “sometimes” because in cases where
In our case, we had a non-conflicting merge. This is the best case scenario. In a real project involving multiple engineers editing the same parts of code, you will very likely have conflicting changes.
We will discuss handling merge conflicts in the next lab.
Exercise
- Create a new
practice
branch. - Make at least three separate commits to the
practice
branch. Add code of your choosing. It can be trivial or non-trivial. You can modify existing lines or delete then. Follow the rules of good commit behavior:- Commit early and often, but only commit working code. Comment out code that has syntax or semantic errors.
- Write a concise, descriptive commit message.
- Merge the
practice
into themain
branch. - Make a commit to the
main
branch. - Merge the
main
branch into thepractice
branch
Summary and Key Commands
Git enables you to create branches, and switch between them. When you switch branch, Git replaces the contents of your working directory with the most recent version in the branch. The version history of all branches are kept separately in the local repository. This allows you to work on different things in parallel.
- Create a new branch:
git checkout -b [name]
- Switch between branches:
git checkout [name]
- Merge
[branch-name]
into the active branch:git merge [branch-name]
Knowledge Check
- Question: What is the purpose of branching in Git, and why is it useful?
- Question: What are two ways that you can identify the active branch you are currently working in?
- Question: What is the name of the default branch created when you initialize a new Git repository?
- Question: When you change the code in a branch, is
main
affected? - Question: Briefly describe what the special
HEAD
variable in Git refers to. - Question: Suppose you make have three branches:
main
,dev
, andrelease
. Fill in the blank: the branch names are __________________ inside Git that point to specific _____________________ in the repository. - Question: When you run
git checkout feature-1
, you are making the _____________ variable point to the ________________ variable. - Challenge: Create a new Git project, create and switch to a new branch, and modify a file with a new feature. Commit the change to this branch.
11.5 - Branching and Merging, Part 2
The previous lab explained the concept of branching, which creates parallel version histories. Merging is the process of unifying parallel version histories back into a single history.
One example is you create a branch to implement a long and complicated feature. Once the feature is complete and tested, you merge it back into the main
branch.
Merge conflicts occur when Git cannot automatically resolve differences between branches. This usually happens when:
- Two branches modify the same line in a file.
- One branch deletes a file while the other modifies it.
Merge conflicts occur frequently in real projects. Our goal is to learn how to recognize a conflict and resolve it.
Example 1: Simple Text Conflict
Do the following:
- Make a new subdirectory called
merge-conflicts
in yourseng-201/
directory. - Run
git init
to initialize a new Git repository. - Create the file
stats.py
and paste in the following code:def calculate_stats(numbers): total = sum(numbers) count = len(numbers) mean = total / count return {"total": total, "mean": mean, "count": count}
- Run
git add .
to stage the changes. - Run
git commit -m "elementary stats added"
to commit the changes.
Create conflicting changes
- Run
git checkout -b stddev
to create a new branch calledstddev
from your default branch (main
ormaster
) - Modify
stats.py
to contain the following:
import math
def calculate_stats(numbers):
total = sum(numbers)
count = len(numbers)
mean = total / count
variance = sum((x - mean) ** 2 for x in numbers) / count
std_dev = math.sqrt(variance)
return {"total": total, "mean": mean, "count": count, "std_dev": std_dev}
- Now stage and commit the change.
- Run
git checkout main
(ormaster
) to switch back to your default branch.stats.py
will show the “old” code from the default branch. - Change
stats.py
to the following:
# main: math_operations.py
def calculate_stats(numbers):
total = sum(numbers)
count = len(numbers)
mean = total / count
min_val = min(numbers)
max_val = max(numbers)
return {"total": total, "mean": mean, "count": count, "min": min_val, "max": max_val}
- Stage and commit this change.
Now we have a conflicting change. We changed the last few lines of calculate_stats()
differently in each branch.
stddev
is the active branch, but we have changes to stats.py
in both branches that edit the same lines.
Understanding a merge conflict
Now, let’s merge in an attempt to join our two branches. Make sure you are in the main
branch, and run git merge stddev
.
You will see output similar to the following in the Terminal:
Auto-merging stats.py
CONFLICT (content): Merge conflict in stats.py
Automatic merge failed; fix conflicts and then commit the result.
(3.12.2) ➜ merge-conflicts git:(main) ✗
Git has attempted to merge the two version histories, but this process failed because both branches edited the same lines of code. We are now in a conflicted state. You can think of the conflicted state as an unfinished commit. You can either discard the changes with git reset
, or you can resolve the issues and finish the new commit.
If Visual Studio Code is configured as your Git editor, you will see a screen similar to the following:
Notice that the content of stats.py
has physically changed! Git has inserted special characters into the code. The code will no longer compile.
To resolve a merge conflict, you must decide what to keep. Our example has 3 conflicting lines. The lines in the main
branch, pointed to be the HEAD
, are marked with:
<<<<<<< HEAD
min_val = min(numbers)
max_val = max(numbers)
return {"total": total, "mean": mean, "count": count, "min": min_val, "max": max_val}
=======
The lines changed from the stddev
branch are marked with:
=======
variance = sum((x - mean) ** 2 for x in numbers) / count
std_dev = math.sqrt(variance)
return {"total": total, "mean": mean, "count": count, "std_dev": std_dev}
>>>>>>> stddev
Remember, we ran the command git merge stddev
, so HEAD
is the main branch and the “incoming change” is from the stddev
branch.
Resolving a merge conflict
To resolve a merge conflict entails three things:
- Edit the code to keep what you want.
- Remove any lingering Git lines beginning with
<<<<<<<
,=======
, or>>>>>>>
. - Add and commit the changes.
Visual Studio Code provides you with some shortcuts and a merge editor. I find these to be dangerous. You really want to think about the code and what you want to keep in most cases.
Let’s resolve the merge conflicts manually. Here stats.py
currently the entire code:
|
|
As the developer, I actually want to keep both changes because I want the min, max, and standard deviation values.
I leave lines 8-9 (min and max) and lines 12-13 (standard deviation) as-is. I’ll delete lines 7, 11, and 15 containing the Git special characters.
Now the problem is with the return
lines: I want a combination of them. There is no shortcut to do this. I will simply create my own return line that amalgamates the old ones.
My code looks like this after resolving the conflicts:
|
|
I’m happy with my code. I should run and test it.
The last step is to stage and commit my changes:
git add .
git commit -m "Resolving merge conflicts with min, max, and stddev"
I now have a new merge commit on the main
branch that contains these changes. This version acts like any other version in your local repo, and the HEAD
will be pointing toward it. You will notice that all the angry red and !
markers are gone from Visual Studio Code. I now have three versions in main
’s history.
Example 2: Conflicts in multiple files
Let’s work through merge conflicts in multiple files.
Create a new file
In the main
branch, create the file app.py
with the following:
import stats
if __name__ == "__main__":
numbers = [1, 2, 3, 4, 5]
print(stats.calculate_stats(numbers))
Stage and commit the change to main
. We now have four versions in the main
branch history.
Checkout a new branch
Run git checkout -b mode
. Make the following changes:
- In the Explorer pane, right-click
app.py
and Rename it tomain.py
. - Set
main.py
to:
import stats
if __name__ == "__main__":
numbers = [1, 2, 3, 4, 5]
print(stats.calculate_stats(numbers))
numbers = [8, 9, 10, 11, 12, 13, 14]
print(stats.calculate_stats(numbers))
- Set
stats.py
to:
import math
def calculate_stats(numbers):
total = sum(numbers)
count = len(numbers)
mean = total / count
min_val = min(numbers)
max_val = max(numbers)
mode = max(numbers, key=numbers.count)
median = sorted(numbers)[len(numbers) // 2] if len(numbers) % 2 != 0 else (sorted(numbers)[len(numbers) // 2 - 1] + sorted(numbers)[len(numbers) // 2]) / 2
variance = sum((x - mean) ** 2 for x in numbers) / count
std_dev = math.sqrt(variance)
return {"total": total, "mean": mean, "median": median, "mode": mode, "count": count, "min": min_val, "max": max_val, "std_dev": std_dev}
- Stage and commit the changes.
We renamed the “main” file and added some code, and we also added median and mode to stats.
Concurrent changes to the main branch
Now checkout main
again with git checkout main
.
- We are going to streamline stats.py. Edit
stats.py
and change it to the following:
import math
def calculate_stats(numbers):
count = len(numbers)
mean = sum(numbers) / count
variance = sum((x - mean) ** 2 for x in numbers) / count
std_dev = math.sqrt(variance)
return {"mean": mean, "std_dev": std_dev}
- Open
app.py
and add another sample:
import stats
if __name__ == "__main__":
numbers = [1, 2, 3, 4, 5]
print(stats.calculate_stats(numbers))
numbers = [2, 2, 2]
print(stats.calculate_stats(numbers))
- Stage and commit the changes.
So we now have conflicting, concurrent changes in main
that will cause a problem with the changes in the mode
branch.
Resolving merge conflicts in multiple files
Now, let’s create and deal with the inevitable merge conflicts:
git checkout main
git merge mode
to merge themode
branch intomain
.
Both the Terminal and Visual Studio Code will indicate that you have conflicts in multiple files. You simple need to deal with them one at a time.
First, let’s open main.py
. Notice how the rename happened automatically from app.py
to main.py
. If you’re unhappy with this change, simply right-click and rename it back.
Let’s look first at main.py
:
We have a conflict because the sample lines were changed concurrently. Remember the process:
- Edit the code to the be way you like
- Remove the special Git characters
I like more samples, so edit the file to keep both numbers and print them both out. Your final result should look like this:
Now let’s go to stats.py
, which looks like this:
Visual Studio Code provides you with some shortcuts for resolving merge conflicts:
- Accept Current Change: Keep only the changes in
main
. - Accept Incoming Change: Keep only the changes in
stddev
- Accept Both Changes: Keep all the changed lines from both branches.
- Compare Changes: Provide another text view of the changes.
- Resolve in Merge Editor: I recommend skipping this.
In this case, I decide that I don’t care at all about the median and mode any more. I just want to keep the streamlined version.
Click on the “Accept Current Change” link. You will see only the changes to main
(the HEAD) are kept, and all incoming changes from mode
are discarded.
P.S. If you make a mistake, remember that all you’re doing is editing text files at this point. Just hit CTRL+Z/CMD+Z to undo.
Finally, make sure all your files are saved, stage, and commit the changes. Our final branch history looks like this:
Summary
Merge conflicts don’t have to be scary, but they can be annoying. Keeping your commits in all branches small and incremental will make merging easier.
The process for resolving merge commits is:
- Look for the conflicting changes and decide what to do.
- remove the Git special characters.
- Save, stage, and commit the merge conflict resolution.
Take your time with merge conflicts. Just quickly hitting “Accept Incoming Changes” or “Accept Current Changes” without a thought is what gets you in trouble. This may mean you manually edit the code, and that’s not a bad thing.
I strongly encourage you to avoid GUI-based merge editors, of which there are a few, until you master the process. It’s just text editing. Editing the code manually will help ensure each decision you make is intentional and easy to undo in the text editor. Once you have mastered merging manually, then feel free to move onto the GUI programs.
Knowledge Check
- What causes a merge conflict in Git?
- Suppose you want to merge a branch named
bug-fix
into themain
branch. What git command do you run to perform the merge? - How can you identify merge conflicts using Git commands?
- Describe the purpose of the conflict markers
<<<<<<<
,=======
, and>>>>>>>
. - (True/False) You can have multiple conflicting regions in a single file?
- (True/False) You can have multiple files with conflicts?
- Suppose the branch
delicious
is created from themain
branch. The filecheese.py
exists in both branches.cheese.py
is editing in thedelicious
branch, and deleted in themain
branch. Will there be a merge conflict ifmain
is merged intodelicious
? Will there be a merge conflict ifdelicious
is merged intomain
? - What are the three steps to resolving a merge conflict?
- What rule of thumb will make merging easier in the long run?
11.6 - GitHub CLI setup
Signup to GitHub
Sign up for a free GitHub account if you haven’t already. I recommend that you use a permanent, personal email.
Install the GitHub CLI
Let’s install the GitHub CLI, which will make working with remote GitHub repositories easier.
On MacOS
- Install Homebrew if you do not have it already. Run the following in the Terminal and follow the on-screen instructions:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
- Run
brew install gh
and follow the on-screen instructions.
On WSL or Ubuntu
- Paste the following into a Terminal:
(type -p wget >/dev/null || (sudo apt update && sudo apt-get install wget -y)) \
&& sudo mkdir -p -m 755 /etc/apt/keyrings \
&& wget -qO- https://cli.github.com/packages/githubcli-archive-keyring.gpg | sudo tee /etc/apt/keyrings/githubcli-archive-keyring.gpg > /dev/null \
&& sudo chmod go+r /etc/apt/keyrings/githubcli-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main" | sudo tee /etc/apt/sources.list.d/github-cli.list > /dev/null \
&& sudo apt update \
&& sudo apt install gh -y
- You will be prompted for your Ubuntu password.
Login with the GitHub CLI
Run gh auth login
and follow the onscreen instructions to register your computer with GitHub.
- Leave the default options selected in the CLI. You will hit Enter to open a web browser. Sign into GitHub with your GitHub credentials.
- If the browser does not open: manually open a browser to https://github.com/login/device. Sign into GitHub with your GitHub credentials if needed.
- Enter the code shown in the Terminal window.
- Complete the authorization and leave the default options as-is.
Once you have finished, your Terminal and Browser should look like this:
11.7 - Remote repos
Remote repositories in Git are repositories stored elsewhere than on your computer, usually on a site like GitHub or a private enterprise server for your company. Remote repositories have a few key purposes:
- Remote repositories are the mechanism by versions can be shared between computers, e.g., between a lab and home computer or between the computers of multiple teammates collaborating on code.
- Remote repos maintain a copy of your version control history so that if disaster strikes your computer, you have a backup of your project.
Remote repositories are a hub to which multiple local repositories are linked. They function the same as a local repo, but the user takes extra steps to share changes with the remote and to retrieve changes, perhaps made by teammates, from the remote.
11.7.1 - Scenario 1 - Sharing a new project
Scenario: You are on your computer. You make a new project and begin working. You decide you want to keep the project under version control with Git.
Create the local repo and save an initial version
- Create a new directory called
remote-sample
in yourseng-201/
directory. - Open the
remote-sample/
directory in Visual Studio Code. - Create a file named
test.py
. Put some code in there, likeprint("We are going to share our new repository")
- Run
git init
to create a local repository.
- Now stage and commit the changes.
You now have one version in the local repository, and the main
branch (as well as the HEAD
) are pointing to that version. I have left the INDEX and the HEAD out of the illustrations since we will not need them for this lab.
Create a “blank” remote repo on GitHub
- Browse to https://github.com and log in if necessary.
- Find and click the green button to Create a New Repository:
- On the “Create a new repository” form, enter
remote-sample
for the Repository name: - Leave all the rest of the options as-is.
- Click the green Create repository button at the bottom.
You will see a page that looks like this:
Make a note of the URL in your browser bar. Your repo can be accessed from this address.
Leave the browser window open. We will return to it in a minute.
Public vs. Private Repos: You have the choice to make your repo Public or Private when creating it, and you can change this setting later.
- Public repos are visible on the Internet. Anyone can view the website and checkout your code. Only you can commit code however.
- Private repos are only visible to you when signed in. Only you can checkout and commit to the repo. You can control more finely if you want specific users to have read or write access to your repo through the Settings tab on the GitHub repo website.
Connecting the local repo to the remote repo
We have created a local repo with git init
and created a “bare” remote repo using the GitHub website, but the two are not yet connected!
On your GitHub page in the browser, you have a section that looks like the following:Copy that code for your repo and paste it into the Terminal. Run those instructions in the Terminal.
You should see output similar to the following:
Enumerating objects: 3, done.
Counting objects: 100% (3/3), done.
Writing objects: 100% (3/3), 260 bytes | 260.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
To https://github.com/llayman/remote-sample.git
* [new branch] main -> main
branch 'main' set up to track 'origin/main'.
That means you are good and your local repo is connected to the remote repo on GitHub.
If you see an error like this:
error: src refspec main does not match any
error: failed to push some refs to 'https://github.com/llayman/remote-sample.git'
You forgot to git add
and git commit
your first version.
Viewing the remote repo
Refresh the GitHub page in your web browser. You should see something like this now:
This is GitHub’s rendering of your remote repository! In Git, the remote repo looks just like the local repo on your computer. This is just how GitHub chooses to display it.
- You can click on
test.py
to see the code. - Note that we are in the
main
branch as indicated in the top left dropdown. - You can click on the commit version, e.g.,
fb080da
, to see all the changes in the most recent commit. - You can click on the history-clock icon next to the version name to see the
main
branch’s version history. There’s only 1 version right now.
Understanding the commands
You pasted three separate commands in the Terminal.
git remote add
is what actually create a link between your local repo and the remote repository. Creating the remote repo link does not automatically share any version history or changes.
git branch -M main
made sure the name of your default branch was main
as opposed to master
.
git push
is what shared the version history from your local repo to the remote repo:
A few things happened to the repo state during this process.
- Your local repo now has a notion of an “upstream” remote repo that it is linked to.
- The version history of your local repo was
push
ed to the remote repo, including the branch namemain
. - The remote repo on GitHub now has the entire version history of the
main
branch, and knows which versionmain
refers to.
Again, the remote repo is behaves exactly the same as your local repo internally. It’s just that it saved to a GitHub server, and you need to run an additional command, git push
to share your changes with the remote repo.
Knowledge Check
- (Question) What is the purpose of running
git init
? - (Question) How do you connect a local Git repository to a remote repository?
- (Question) Explain the function of
git remote add
. - (Challenge) Create a local repository and link it to a newly created GitHub remote repository.
- (Challenge) Stage, commit, and push an initial version of a project to a remote repository, verifying success through the GitHub interface.
11.7.2 - git push
We showed in Scenario 1 that the git push
command was necessary to share the version history from the local repo to the remote repo.
Sending changes to and pulling changes from the remote repo is always manual, just like staging, committing, and merging are. This is a good thing because it allows you to decide when to share changes or integrate changes from your teammates.
Let’s illustrate the sharing process.
Create a second version
- Edit your
test.py
file. Make a change to the code. What is up to you. - Save the file, stage, and commit your change.
- Run
git log
The repos now look like this:
Your git log
clearly shows the new version saved to the local repo.
However, open your remote repository’s GitHub page in your browser. You will see that it is still showing the previous version. Your local main
branch is linked to the remote main
branch, but the latter is not up-to-date.
Again, sharing with and retrieving from the remote requires a manual command.
git push
Run the command git push
. This sends any changes to your local repo to the remote.
Refresh the GitHub page in your browser, and you will see that the version name and the content of test.py
are updated to the latest version. You will also see two versions now in the commit history.
Now everything is up to date!
Running git push
always runs on the active branch, which is main
in our case. Suppose you have two local branches, main
and rand
. If you have parallel commits to in multiple branches, you will either need to need to checkout
and git push
each branch , or run git push --all
.
Knowledge Check
- (Question) What does the
git push
command do? - (Question) Why is sending and pulling changes from the remote repository a manual process?
- (Question) How does the local main branch stay linked to the remote main branch?
- (Question) What happens if there are changes in the remote branch that are not present in your local branch before you push?
- (Question) How can you verify that your push was successful?
- (Challenge) Make a change to a file in your local repository, commit it, and then push it to the remote repository.
- (Challenge) View the commit history and confirm changes appear both locally and on the remote.
11.7.3 - Scenario 2 - Clone an existing project
Scenario: A remote repository already exists, and you need a copy of the version history on your computer. You could be a part of a team working on the same project, or maybe you created a new project in lab and you need to check it out from your home computer.
git clone
We already ran through this scenario when setting up Assignment 4 in class. I put a sample repository on GitHub, and you “cloned” it in class.
Let’s start a new project to illustrate the process.
- In your Terminal, navigate to your
seng-201/
directory.- When you clone, it will create a new subdirectory for you. So you need to be in the parent of where you want the workspace to live. We want to be in
seng-201/
for this example.
- When you clone, it will create a new subdirectory for you. So you need to be in the parent of where you want the workspace to live. We want to be in
- Run
git clone https://github.com/llayman/git-remote-clone
You will see output similar to:
➜ ~ git clone https://github.com/llayman/git-remote-clone
Cloning into 'git-remote-clone'...
remote: Enumerating objects: 4, done.
remote: Counting objects: 100% (4/4), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 4 (delta 0), reused 4 (delta 0), pack-reused 0 (from 0)
Receiving objects: 100% (4/4), done.
➜ ~
You will also have a new subdirectory named git-remote-clone
inside seng-201/
.
What happened?
git clone
went to the target URL looking for a repo. It found it, and made a copy of the version history on your local computer in thegit-remote-clone/
subdirectory.- Git created a local copy of the
main
branch, which is linked to the remotemain
branch - Git checked out the
main
branch into the workspace foldergit-remote-clone/
.
You are now ready to open git-remote-clone/
in Visual Studio Code or other editor and start working. You edit, stage, commit, make branches, and push as usual.
Do not edit the files yet. Leave them in their initial version to illustrate the next lab.
Knowledge Check
- (Question) What does the
git clone
command do? - (Question) How does
git clone
handle creating a subdirectory for the repository? - (Question) After cloning, what branch is typically checked out in your local copy?
- (Question) Does
git clone
also copy files into your workspace? - (Question) How is the local main branch linked to the remote main branch after cloning?
- (Challenge) Clone an existing repository to your local machine and verify the directory structure.
- (Challenge) Open the cloned project in an editor and review its initial state without making changes.
11.7.4 - Scenario 3 - Retrieving changes
Scenario: Your started work on an assignment in the computer lab and pushed your changes to the remote. You went home and clone
d the repo, worked some more, then push
ed your changes to the remote. Now you are back in lab, and you need to get the latest changes from the remote. Or, perhaps a teammate pushed changes to the remote and you need to retrieve them.
Remote changes
I will make some changes to and push them, so the repos now look like this:
The remote repo has a new version, but your local repo is not up-to-date. You need to manually retrieve the changes. This is a good thing! You don’t want changes to automatically be applied whenever someone else on your team sends them to the remote repo. They could conflict!
Super important point
Before you retrieve changes from the remote, you almost always want to either:
- Stage and commit any unsaved changes you have.
- Undo, reset, or discard any uncommitted changes you have. Ideally, you should have a “clean” workspace before you retrieve changes. It will make life easier on you.
git pull
Run the command git pull
. A few things happen:
- The changes from the remote repository on the active branch,
main
, are fetched and integrated into your local repo. - Any changes are automatically merged into your workspace. This is why we wanted our workspace to be “clean.”
You now have the most recent version of main
in your workspace. you are ready to edit it, commit, and push as usual.
Concurrent changes to the local and the remote
All of this is relatively straightforward when you are the only one working on a project. The version history of branches remains somewhat linear: you are the only one committing, pushing, and pulling, so you are always (probably) working on the latest version.
Life gets considerably more challenging when you have a team of developers all pushing and pulling from the same repo. If you commit a change to main
to your local repo, but then Bob pushes a new version of main
to the remote repo, what happens when you try to push
or pull
? Git will protect us from losing work, but we will likely end up with merge conflicts.
Team coordinator through Git remote repos can be smooth if we follow a good process. We will discuss this in the next lab.
Knowledge Check
- (Question) What does the
git pull
command do? - (Question) Why is it important to have a “clean” workspace before running
git pull
? - (Question) What happens if there are conflicting changes on the local and remote repositories when using
git pull
? - (Challenge) Create a scenario where you make changes locally and have conflicting changes on the remote repository. Use
git pull
and resolve any conflicts. - (Challenge) Demonstrate how to ensure your workspace is clean before pulling changes.
11.7.5 -
Scenario 1: Sharing a new project
- Do GitHub CLI setup
- [WORKSHEET] run through 1-3
- create remote-sample/ and open in Visual Studio Code
- Create test.py. print(“We are going to share our new repository”)
- git init
- git add . + git commit
- Create a “blank remote repo”. Go to github.com, new, remote-sample as name
- Show “success” page
- Comment on public vs. private
- copy the “…push an existing repository from the command line”
- View the remote repo in the browser
- [WORKSHEET] run through 4-6
Subsequent versions
- test test.py
- add and commit
- git log. Point out local repo vs. remote repo
- [WORKSHEET] add local main and remote main to pg 2 top picture
- git push
- [WORKSHEET] add 2nd version to remote, update main refs, label git push arrow
- Refresh the browser. Show the history.
- [WORKSHEET] fill bottom of page 2.
Scenario 2: Clone an existing project
- [WORKSHEET] Walk through 1-3
- Have students open https://github.com/llayman/git-remote-clone in browser.
- Terminal, cd into seng-201
- git clone https://github.com/llayman/git-remote-clone
- [WORKSHEET] Fill in drawing
- Create workspace folder, then local repo.
- Right to left. Cloned version into local. Link remote main to local main.
- Clone Version into workspace.
- [WORKSHEET] Fill in bottom.
- DO NOT EDIT FILES YET.
Scenario 3: Retrieving changes.
- [WORKSHEET] Explain the scenario at top.
- Or, scenario where a teammate makes a change.
- [YOU CODE] Edit git-remote-clone/hello.py and push a new version.
- Have students refresh the repo in their browser.
- [WORKSHEET] add main refs to the top.
- Have everyone run git pull. Point out how the code changes.
- Run git log
- [WORKSHEET] Fill out the middle bullet points and the bottom diagram.
12 - 12. Remote Servers
Most software that you use is a combination of a client software system and a server software system. “The cloud” is a generic term for a group of servers that do the same thing.
For example:
- You use the TikTok app on your phone, which performs searches and recommends videos in the cloud.
- You play a multiplayer game on your XBox, but a server controls people entering and leaving, tracking scores, and managing lag.
- You have used
pip
to install Python libraries, butpip
talks to a remote server to find the package and retrieve the bytes.
In the final weeks of SENG 201, we will connect to a remote server to host a network application. You will edit and deploy the application.
12.1 - Connecting to ada
We will use an on-premises (on-prem) server called Ada, named after Ada Lovelace, who wrote the first algorithm for the precursor to modern computers, Babbage’s Analytical Engine.
Offsite - use the VPN client
The ada
server is accessible only from the UNCW network.
You will need to use UNCW’s Virtual Private Network (VPN) client software to reach the server while offsite.
- Install the VPN client software. You can only install the VPN client while offsite.
- Windows or Mac: Follow the instructions at https://uncw.teamdynamix.com/TDClient/1875/Portal/KB/ArticleDet?ID=12377. Do this if you are running WSL.
- Native Linux: Point your web browser at https://vpn.uncw.edu and follow the prompts.
- Open the Cisco AnyConnect VPN program and connect to the pre-configured UNCW VPN.
- I recommend that you disconnect from the VPN when you don’t need it because it can slow your connection.
Connecting to ada
via SSH
We will use the Secure Shell (SSH) program to connect to ada
. SSH is an extremely popular software tool for creating client-server connections. SSH will connect you to ada
’s Linux CLI, which will function like a WSL or MacOS Terminal.
SSH is pre-installed on MacOS, Ubuntu, and WSL. Open a Terminal and enter the following:
ssh <your-uncw-id>@ada.cis.uncw.edu
# for example, ssh laymanl@ada.cis.uncw.edu
Enter your UNCW login password when prompted. Choose “yes” when prompted to trust the connected machine.
You should see something like the following after successfully signing in:
You are now logged into the ada
server. ada
is running Ubuntu Linux, and understands all the standard Linux CLI commands.
There are many commands at your disposal, including python
and git
.
Type pwd
to see your home directory location.
Rules for using ada
ada
is a shared server. As such:
- Do not read, write, or edit files outside your home directory.
- Do not change the permissions on your home directory using
chmod
or any other command. - Follow the Seahawk Respect Compact at all times.
- Do not intentionally do anything to harm the server, such as fill up the hard disk or overload the CPU.
Activity on the server is logged. Any intentional or negligent violation of these rules will result in a grade of 0 for the course and a violation of the Student Code of Conduct reported to the Dean of Students.
When in doubt if you are allowed to do something, ask the instructor first.
Next
Once you are done, move onto the Working on ada
lab.
12.2 - Working on ada
Class recording
The recording covers this lab and the previous lab on Server Setup as well as an introduction to Computer Networking concepts.
Part 1: Starting out
ada
is running Ubuntu Linux, and understands all the standard Linux CLI commands. There are additional commands at your disposal, including python
and git
.
Make sure you are connected to ada
using SSH. Type the following commands:
ll
- what do you see?mkdir dev
cd dev
pwd
Briefly summarize what you just did.
Part 2: Editing a file
When connected to a server like ada
, you typically only interface through the CLI. In ada
’s case, there is no window-like GUI.
Linux uses the ~
character as shorthand for your home folder, i.e., /home/<your_id>
. So ~/dev
is shorthand for /home/<your_id>/dev
.
Make sure you are in your ~/dev
folder. Do the following:
nano hello.py
- a text editor called Nano will open in the Terminal looking like this:
- type in
print("Hello World!")
- Hit
CTRL+X
to exit, thenY
to save the changes. - You will see the ada Terminal again. Type
ll
and you should see thehello.py
file. - Run
python3 hello.py
and you should see your “Hello World” message.
The Nano editor is quite handy for editing files on the server quickly. But, we are spoiled by the ease-of-use of IDEs like Visual Studio Code and PyCharm.
Editing a full-blown program with many different files using nano would be painful. In practice, software engineers don’t do much, if any, editing on servers. Instead, software engineers develop on their own machines and deploy their software programs to servers.
Deploying software to ada
Deployment is the act of making your software available for use. You could deploy your software to your own computer (you do this while testing). For other people to use your software, you need to make your computer accessible via a network and make sure the program is running all the time and ensure that your computer has enough resources to handle thousands of people using it all at once.
Hence, servers. Servers are network accessible and all they do (usually) is serve software programs that users can connect to.
So, how can you get a program to ada
? You can use file transfer tools like scp
, but we will use git
.
Initializing Git and GitHub on ada
We need to authorize your account on ada
to clone your remote repositories. Do the following:
ssh
ontoada
.- Run
gh auth login
. Accept the default options. - The step
Press Enter to open https://github.com/login/device in your browser...
becauseada
doesn’t have a GUI. - On your computer, open a browser to https://github.com/login/device and type in the 8-character code on
ada
’s terminal. - In the browser, accept the authorization options:
- You should see in the
ada
Terminal a “Logged in as” message. You are done.
git clone
a repository
You are now ready to use git
on ada
.
git clone
one of your existing remote repositories to your. You can use any of the ones from class or a homework assignment.- Use
python3
to run your code. Does it work?
Suppose you are actively developing that Python project. How would you deploy the changes to the server?
You would develop the program on your PC, then commit and push to GitHub. Then ssh
to ada
, pull
the latest version, and restart the program!
Most software deployed in this manner relies on the main
branch of the repository. Hence why it is critical that main
contain only “good, clean, working code” – main
is what users will see!
Next
We will put a program on ada
that is accessible via the network so that other people can use it.
Try browsing to http://152.20.12.250:23456/, but the app may not be running.
13 - 13. Server and Client App Samples
Intro
In the previous lab, we connected to the ada
server and used it’s CLI to create folders.
I also asked you to browse to http://152.20.12.250:23456/ to connect to a web application written in Python. You must be on a UNCW network or the VPN to connect, and it’s possible the app isn’t running.
Lab recordings
Goals
The goal of this lab is to create a web app server, which is a copy of the one above. You will then interact with it through two clients: (1) your web browser, and (2) a simple Python client.
13.1 - Flask server app
Intro
The Python web application above is written using the Flask framework. Flask is used by companies including Netflix, Uber, and LinkedIn to create web applications. It is installed as a Python library with the pip
tool.
Webapp setup
Deploying this Flask web application to ada
is your Assignment #7. Follow these steps to check out and run the project on your computer:
- Accept the GitHub Classroom assignment #7: https://classroom.github.com/a/wbeITctx. This is an individual assignment.
- Clone the repo to your your local computer. This should create a project directory called
assn7-<your_name>
or something similar. - Using your Terminal,
cd
into the project directory. - Open Visual Studio Code in the working directory with
code .
. It is essential that yourassn7-<your-name>/
directory is the top-level of Visual Studio Code. - In the menu bar, select View → Command Palette
- Search for “environment” and select Python: Create Environment…
- Select Venv
- Select a recent Python version.
- On “Select dependencies to install”, check the box next to
requirements.txt
. Click “Okay”.
It may take a minute. Visual Studio Code will create a copy of Python in the directory in the .venv/
subdirectory. This is considered best practice in Python development when you need to install libraries, like Flask, so that you do not “pollute” the system Python directory with many libraries that are not needed for all your programs.
Project structure
You will see several files in the project folder:
app.py
: This is the main Python file that defines the Flask application. It specifies what types of requests to respond to. It calls the other files to handle the logic. Think of it as the user interface of the application.quizzer.py
: a plain Python file that has some functions related to quiz questions and answers. This functions are called byapp.py
.questions.py
: contains a Python class definition for aMultipleChoiceQuestion
and initializes a list of QUESTIONS the app serves.test_quizzer.py
: unit tests forquizzer.py
. You can runpytest
in the Terminal to try them.templates/
: website files go in here to be sent to a browser. For now, there is onlyindex.html
, whichapp.py
sends back to clients that browser to the server’s home page.- Other things:
.venv/
: the Python virtual environment used to run the app. Ignore this..gitignore
: tells Git to ignore specific files.requirements.txt
: tells the virtual environment whichpip
libraries are needed to run the project.
Running the webapp
We need to run the Flask web application from Visual Studio Code’s integrated terminal.
Note: Flask will only run with the “virtual environment” in .venv/
active. Visual Studio Code will activate it for you automatically. If you want to run from your system Terminal, you will need to run source .venv/bin/activate
first from your project directory.
Run flask --app app run --debug
to start the Flask webapp. You should see output similar to the following in your Terminal:
You may be prompted by your OS to allow connections. You do not need to allow external connections for it to work.
Open a web browser to http://127.0.0.1:5000
You should see the Welcome Page:
Great! You are now running a web application built in Python using the Flask library.
Interacting with the web app
Your web browser is a client and the Flask app is a server. Web browsers issue HTTP requests to servers, and the servers send an HTTP response.
Think of HTTP requests and responses as another envelope. The envelope is a merely a string of text in a particular format. The contents of the envelope are bits that can be strings, images, videos, audio, integers, floats, etc.
This Flask web app is sending its contents as strings in JSON format. The JSON form is very similar to a Python dictionary: it has keys and values.
Key commands
Make sure you have the project open in Visual Studio code and are using the Integrated Terminal.
- To start:
flask --app app run --debug
- To stop: Hit
CTRL+C
with the Terminal selected.
13.2 - PyGame client app
Intro
In the previous lab, we checked out and ran a Flask web app.
We saw that a web browser can work as a function for the Flask web app. Let’s use another client that is a game. After all, the Flask app is just sending JSON data, which is basically a dictionary. Python can handle dictionaries.
The app below is a game with minimal functionality that enables you to answer quiz questions.
Pygame app setup
- Accept the PyGame Quizzer assignment: https://classroom.github.com/a/dPKVKNki.
- Clone the repo to your your local computer. This should create a project directory called
pygame-quizzer-<your_name>
or something similar. - Using your Terminal,
cd
into the project directory. - Open Visual Studio Code in the working directory with
code .
. It is essential that yourpygame-quizzer-<your-name>/
directory is the top-level of Visual Studio Code. - In the menu bar, select View → Command Palette
- Search for “environment” and select Python: Create Environment…
- Select Venv
- Select a recent Python version.
- On “Select dependencies to install”, check the box next to
requirements.txt
. Click “Okay”.
Visual Studio Code will take a minute to create a .venv/
subdirectory and install all the pygame libraries to it.
WSL users
You need to have WSL2 for GUI applications to work from WSL. On the Windows side, open a Command Prompt or PowerShell (not Ubuntu)
wsl --list --verbose
You will see something like:
NAME STATE VERSION * Ubuntu-24.04 Running 1
If you see VERSION 2, you are good.
If you see VERSION 1, run
wsl --set-version <Ubuntu name> 2 # e.g., wsl --set-version Ubuntu-24.04 2 wsl --update
This will take some time.
Finally, open a new Ubuntu terminal and run
sudo apt update sudo apt install libsdl2-2.0-0 libsdl2-dev libsdl2-image-2.0-0 libsdl2-image-dev
Project structure
You will see a few files in the project folder:
quiz_game.py
: The only actual Python file. You will run this.- Other things:
.venv/
: the Python virtual environment used to run the app. Ignore this..gitignore
: tells Git to ignore specific files.requirements.txt
: tells the virtual environment whichpip
libraries are needed to run the project.
Running the game
We need to run the game from Visual Studio Code’s integrated terminal.
Note: The game will only run with the “virtual environment” in .venv/
active. Visual Studio Code will activate it for you automatically. If you want to run from your system Terminal, you will need to run source .venv/bin/activate
first from your project directory.
To run the game:
- First, make sure your Flask webserver is also running. You will need to have two Visual Studio Codes running (or system Terminals with the virtual environments activated). To open a second Visual Studio Code:
- In Code, File → New Window will open a second IDE. From second IDE, you can do File → Open Folder to open the server project directory.
- You can also type
code .
in each of the game and server directories to open a separate IDE for each project.
- From the client game’s IDE terminal, run
python quizzer_game.py
You should see a screen like this:
- Use the arrow keys to make a choice.
- Hit enter to check the answer:
- The app will do nothing if you are wrong.
- The game will display a new question if you are right. There are only two questions, so 50/50 that you will see something different.
- Hit
q
or close the window to quit the game. Your score will always be 0.