Python basic

Many of my friends want to be Data Analysts. However, they are afraid of programming since it sounds like an advanced skill. Therefore, the first post in this series is about the Python programming language - one of the most popular programming languages in scientific computing and data analysis. This post will explain how Python works and give a brief introduction to Python basics. I also provide links to several pages/tutorials that help you to learn Python fundamentals in each section. It should be noted that, apart from Python, other programming languages, including R and Matlab, are also used in certain organizations. However, Python is the most popular one, and learning Python is a bit easy than the others. Once you have mastered Python, it will be much easier to learn another programming language, such as R and Matlab.

Set up Python environment

To set up your Python programming environment, you should identify your operating system first. Each system (Windows/Linux/macOS) has its own Python installation procedure. You can visit https://www.python.org/ in order to download Python for your operating system and follow the installation guide.

For more details, see this example of installing Python 3.10 on Windows 10. Source: Amit Thinks.

After installing Python successfully, you can open the command prompt/terminal to check it out. Simply run ``python --version``, if you install it correctly, the output should be:

Python 3.10.0

Now you can use your Python console by the command ``python`` :

This is an interactive Python console. Because Python is a scripting language, you can use this console to perform immediate calculations as lines of command.

You may confuse about why my command in the image above is ``python3``. This is due to the fact that Python has two different versions that are slightly different: Python 2 and Python 3. Unfortunately, python 2 is no longer supported at this time. However, some legacy systems still use it, especially in Linux and macOS. Python 2 is the built-in programming environment in Linux and macOS, and its command is ``python``. Therefore, in Linux and macOS, the default alias command of Python 3 is ``python3``.

Using Python with Visual Code Studio

Python code should be stored in a file with .py extension, for instance: test.py, hello.py, etc. These files are literally plain text files with a special filename extension *.py. To edit them, you can use your favourite text editors like Notepad, SublimeText, or even Microsoft Word. However, Visual Studio Code is a code editor designed for high-productivity programming. Therefore, it is a great idea if we start by downloading Visual Code Studio first.

After installing VS Code, you can open it and write your first Python file.

The very first Python program

Create a new file ``hello.py`` and type the script as above. Then open a command prompt/terminal and ``cd`` to the directory where your new file is located.

The command to compile and execute a python file: ``python <file_name>.py``

Hooray!! You've got your first Python program, and it runs very well.

Using Python with Jupyter Notebook

VS Code is one of the most popular code editors. Jupyter Notebook, on the other hand, is more favoured by researchers/data analysts in the realm of scientific computing and data analysis.

To install Jupyter Notebook, first, we have to install PyPi - a Python package manager. For Linux/macOS users, you can simply install PyPi by command line:

In Linux:

$ sudo apt-get install python-pip

In macOS:

$ brew install python-pip

For Windows users, you can follow this instruction to install PyPi or use Anaconda to handle it for you.

Now open Jupyter Notebook and code; if you install Jupyter Notebook by the command line, you should open another terminal and type: ``jupyter notebook``. Next, you should see an URL open on a web browser like below:

Jupyter Notebook is a web application. It is easier to start Jupyter Notebook if you install it with Anaconda; open it with a click in the Anaconda application window.

Create a new file in Jupyter Notebook and put your code in the "Code" cell. Then press Shift + Enter or Shift + Return in Mac to execute this cell. You will see the result as follow.

Code in Jupyter Notebook is organized into cells; each cell can execute independently. Interestingly, you can also note your command and explanation of your code by the text cell - the cell where you want to enter plain text. Learn more about Juypyter Notebook here.


Python basic

Syntax

Python syntax is fairly simple; like other programming languages, you should use pre-defined keywords to describe what you want to do. However, unlike other programming languages, the indentation in code is just for readability; indentation in Python is very important. This is because python uses indentation to indicate a block of code. So you should be careful with the indentation and space in your Python code.  Learn more here.

Variables

As you can see in my previous Python code example, ``a``, ``b``, and ``c`` are three Python variables. A variable represents a value that we want to store and can be changed by calculations and assigned operations. So, for example, we have the code below:

a = 1
b = a
a = 2
c = a + b

print(a) # 2
print(b) # 1
print(c) # 3

The output should be ``2``, ``1``, and ``3``. First, ``a`` was assigned by 1. Then ``b`` was assigned by a, now ``a`` and b equal to 1. After that, a was assigned by another value 2, so now ``a`` is equal to 2 and ``b`` is equal to 1. Finally, when ``c`` was assigned by a + b, it equals 3.

Data Types

Each data has a type; as in the previous example, the data type is integer. Aside from that, we have several data types such as float (for real numbers), complex (for complex numbers), set, bool, etc. Each one has its own responsibility and can be used for specific purposes. Learn more here.

For example, we can use string data type like below:

a = "Hello"
b = "World"
c = a + b

print(c) # Hello, World

Conditions

Conditional statement is an important concept in programming. It guides the computer program to do a specific task with some given conditions. Let's say we need to create a program that print "even" if the input number is even, and "odd" for the odd number. We all know that an integer is even if it is divided by 2, if not, it is an odd number. Now "if it is divided by 2" is our condition. We can implement this idea in Python by using if..else clause. For example:

Conditional diagram. Source: Wikipedia
n = int(input()) # get input number from console

if n % 2 == 0:
    print("even")
else:
    print("odd")

n % 2 == 0 is the condition that if n is divided by 2 or not (% is the mod operator in Python, you can learn more about Python operator here). In this script, if the condition is met, it will execute print("even"), otherwise, the statement after else will be executed - this is print("odd").

Another intuitive example of the conditional statement is as follows (you should try to understand what this script is doing by yourself):

point = float(input()) # get float number from range [0..10]

if point > 8:
    print("Excellent")
else if point > 6.5:
    print("Good")
else if point > 5.0:
    print("Medium")
else:
    print("Bad")

Loops

A loop is a set of instructions in computer programming that is repeatedly repeated until a given condition is met. For example, your task is calculating the sum of integers from 1 to 10. Forget mathematics; you need to write a program that performs plus functions one by one. A naive program should be like:

a = 0
a = a + 1
a = a + 2
a = a + 3
a = a + 4
a = a + 5
a = a + 6
a = a + 7
a = a + 8
a = a + 9
a = a + 10

This program will work well. However, it is ineffective to write a program like this; if we change the range, we now need to calculate the sum from 1 to 100. No! Writing another 100 lines of code is ridiculous. Thanks to programming, with a loop, we can do it easier.

a = 0
n = 10

for i in range(n + 1):
    a = a + 1

This program will iterate through 1 to 10 one-by-one as in the naive example; however, if we change n to 100, our new code just needs to change n = 10 to n = 100. This is how loops help us save time.

As in other programming languages, Python offers two ways to implement loop. It is for loop and while loop. The difference between a for loop and a while loop is that the number of iterations to be performed is already known and is used to achieve a certain outcome in a for loop. Otherwise, in a while loop, the command runs until a given condition is met and the statement is shown to be false (source).

With while loop, the above example should be:

a = 0
n = 10
i = 1

while i < n + 1:
    a = a + i
    i = i + 1

With the convenience of loop, we can use it to manipulate an important data type in Python: list. A list is a sequence of elements, it can have a fix-sized or arbitrary size. In Python, a list is not necessary to be formed by a sequence of elements with the same data type. Python list can be used as a sequence of mix-types elements as follows:

a = [1, 2, 3, 4, 5] # list of integer
b = ["A", "B", "C", "D"] # list of string
c = [1, "A", 2, "B"] # list of integer and string

Using for loop, we can easy to access each element in the list 0ne-by-one, for example:

a = [1, 2, 3, 4, 5] # list of integer

for i in range(len(a)):
    print(a[i])
    
# output should be:
# 1
# 2
# 3
# 4
# 5

Learn more about Python loop here and Python list.

Exercises

The above is about basic concepts in Python and programming. Before moving to more advanced Python skills. You should try to solve the below exercise first:

  • Write a Python program that received a list of number and calculate sum of them.
  • Write a Python program that received a list of integers and calculate sum of all even numbers from this list.

Learning Resources