Recap:
We know how to start and exit the Python interactive shell. We know that Python constants and variables are dynamically typed, automatically at execution time. We know that Python is a strongly-typed language, i.e., it will gripe if you use the wrong type of object. We've learned a couple of operators, + and *, and that those operators are "overloaded", i.e. behave differently depending on their operands. We know how to assign constants to a variable, and how to explicitly re-cast a constant or variable to a different type.
Today, we're going to build on that foundation, dip our toes into the murky waters of modules and functions, and maybe even write our first Python program.
I see a couple of you eager beavers have already opened your python interpreter. A+ for effort. Now, CTRL-d out of it and lets make a new home for some of our projects. You more cautious types who haven't even opened a terminal window should do so now.
I'm going to recommend putting our new home in your home directory. If you have a good reason for putting it in another directory, please cd there now - otherwise, type cd ~ and press enter.
Now, copy and paste the following command into your terminal window, then press enter.
mkdir -p pyedu/{lesson01/notes,lesson02/notes,lesson03/notes,lesson04/notes,lesson05/notes}
Yes, I'm a lazy sod who won't type 'mkdir' ten times when once will do. BTW - the number of folders is not a commitment to fill them, nor an indication that we won't overflow them. If we need more, we'll make more, and if we don't, no loss.
Now, type cd pyedu/lesson01 and press enter.
Start up the python interpreter shell (as always, your input is in green):
[zccw01@LapNix lesson01]$ python
Python 2.5.2 (r252:60911, Jun 28 2008, 14:11:09)
[GCC 4.1.1 20060724 (prerelease) (4.1.1-4pclos2007)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
We touched very briefly on several variable types in the last installment. Let's dig a little deeper into one of those types, the string. Python considers strings to be immutable sequential objects. Immutable means they can't be altered. So we can skip trying to alter them. Sequential means they are ordered, indexable, sliceable, have length, and can be iterated over. Ordered is a pretty obvious attribute - after all, if you entered a customer name like 'Mr. James Smythe', and it came out as "yMams m. tJeshre", you'd properly consider that to be a fairly useless data type. So the next attribute is indexable - let's do some indexing. In your python interpreter shell, type the following lines, pressing enter after each one:
'abcde'[ 0 ]
'abcde'[ 3 ]
'abcde'[ 9 ]
Here's what I got:
>>> 'abcde'[ 0 ]
'a'
>>> 'abcde'[ 3 ]
'd'
>>> 'abcde'[ 9 ]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
>>>
So, what do we now know? To index a sequential object, you append the index enclosed in square brackets, as [n]. We know that the first element of a sequential object is 0. And, we know that Python gets mad when your index goes off the end of the object.
I can hear the wheels turning now. So, you're thinking, if zero is the lowest index of the object, and the highest index can never be greater than the length of the object minus 1... What about negative indices? As it turns out, negative indices are fine with Python, and are fairly useful. Try the following, pressing enter after each line:
'abcde'[-1]
'abcde'[-3]
'abcde'[-9]
'abcde'[-0]
You should see something like this:
>>> 'abcde'[-1]
'e'
>>> 'abcde'[-3]
'c'
>>> 'abcde'[-9]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
>>> 'abcde'[-0]
'a'
>>>
When you feed Python a negative index, it makes the assumption that you want to work backwards from the end of the sequential object. This is actually quite useful, and it's very common to see string[-1] as a means of examining the last character of a string, which may be a null, or carriage return, or line feed, or... This notation is not symmetrical; 'abcde'[-5] will return 'a' where 'abcde'[5] will fail with an 'index out of range' error. This is a necessary evil, since -0 and 0 are equivalent in most languages.
So, that's indexing of a sequential object. What about slicing? A slice is simply a beginning and an ending index separated by a colon, enclosed in square brackets. If the beginning index is blank, it is assumed to be the first element of the object. If the ending index is blank, it is assumed to be the last element of the object. So let's slice:
'abcde'[2:4]
'abcde'[:3]
'abcde'[3:]
'abcde'[:]
'abcde'[4:99]
You should have:
>>> 'abcde'[2:4]
'cd'
>>> 'abcde'[:3]
'abc'
>>> 'abcde'[3:]
'de'
>>> 'abcde'[:]
'abcde'
>>> 'abcde'[4:99]
'e'
>>>
Note that in the last example, the out-of-range index did not cause an error. Slice notation is a little more forgiving than index notation. In this case, Python simply set the out-of-range ending index back to the end of the string.
Python is also fairly forgiving about invalid starting indices. Try:
'abcde'[45:99]
'abcde'[4:2]
Both of them return an empty, or null string, ''. In the first case, the starting index was out of range, in the second case, the starting index was greater than the ending index.
So, are negative indices valid in slice notation? You bet:
'abcde'[1:-1] # everthing except the first and last character
'abcde'[-2:] # just the last two characters
'abcde'[-3:4] # the third from last through the fourth character
We've covered indexing and slicing - what about iteration? Iteration is a common programming task, where you walk, stepwise, through an object or group of objects. To demonstrate classical iteration, we're going to need a couple of functions, len(), and range().
The len() function is fairly self-explanatory It returns an integer representing the length of its argument. Try entering the following:
len('A String')
len('A Longer String')
len('')
The results are much as you'd expect. No surprises here:
>>> len('A String')
8
>>> len('A Longer String')
15
>>> len('')
0
>>>
The range() function is a little trickier. In its simplest form, range(n), where n is an integer, range(n) will return a list of integers, beginning with zero, and up to, but not including n.
Give it a try:
>>> range(5)
[0, 1, 2, 3, 4]
>>> range(9)
[0, 1, 2, 3, 4, 5, 6, 7, 8]
>>>
The list that range() returns is a new data type, and one that we'll explore later, but for now, the important attribute of a list is that it is a sequential data type - which means we can iterate it. So now, all we need is an iterator statement, and the classic iterator statement in many languages is the for statement.
Lets put some pieces together. As usual, press enter after each line, noting that before the print() line you will need to press the space three times, and press enter twice after:
>>> my_str='Python'
>>> l=len(my_str)
>>> r=range(l)
>>> for x in r:
... print(my_str[ x ])
...
P
y
t
h
o
n
>>>
There are a couple of new things here. First, the for statement. The for statement is a type of flow control statement known as a loop. All of the statements we've seen up til now will execute, then proceed to the next statement in line. Flow control statements allow you to conditionally execute code blocks, skip code blocks, or repeat code blocks. If you've worked in other languages, you may have seen a for statement something like this:
for (i=1;i<10,i++)
In other words for (initialize;loop-test;increment)
Not with Python. The Python for statement is an iterator, pure and simple, and it starts at one end of an iterable and works its way to the end, at each pass assigning the next iterable element to the target variable.
for target in iterable-object:
So in our example, where the iterable was a list of integers returned from the range() function, on the first pass, x will be assigned a value of 0, on the next pass, a value of 1, and so forth til the end of the list is reached.
To illustrate better, try the following, remembering that the print statement must be indented by three spaces, and you must press enter twice after.
>>> for x in ['spam', 'Spam', 'SPAM', 'rat', 42]:
... print(x)
...
spam
Spam
SPAM
rat
42
>>>
So, back to our "Python" string iteration - can we make it better? Sure. Since everything in Python is an object, it's permissible to pass the return from one function directly to another, like this:
>>> my_str='Python'
>>> for x in range(len(my_str)):
... print(my_str[ x ])
...
P
y
t
h
o
n
>>>
But if we're going to use Python functionality, let's go all the way. Remember when I said this was a classical iteration example? Well, since we know that strings are sequential objects, and we know that all sequential objects are iterable, maybe we can dispense with the classical iteration trappings, and just iterate directly on the string. And so we can:
>>> for x in 'Python':
... print(x)
...
P
y
t
h
o
n
>>>
With these tools tucked firmly in our workbelt, let's solve a family problem. It seems our dear sister, Dolly, suffers from a bizarre form of dyslexia. Apparently, she can only read backward. Since spelling backward is such a royal pain, poor Dolly never gets any mail from her family.
CTRL-d to close the python interpreter shell.
Enter the following in your terminal window and press enter:
touch text_utils.py
Then, substituting your favorite programmer's editor for kwrite, enter the following:
kwrite text_utils.py &
Enter the following code into your text_utils.py file and save it. Don't close the editor window yet, we'll be needing it shortly. Note that each level of indentation is a multiple of three spaces - this matters. If you just copy-paste the whole body of code, it should align correctly. Don't be tempted to format with tabs - we'll find out why later.
def flip_txt(message):
rvs_msg = ""
for x in message:
rvs_msg = x + rvs_msg
return rvs_msg
There you go. Your first module, text_utils, and your first function, flip_txt(). So lets send Dolly a message.
Be sure you've saved text_utils.py, then, back in your terminal window, restart the Python interpreter:
[zccw01@LapNix lesson01]$ python
Python 2.5.2 (r252:60911, Jun 28 2008, 14:11:09)
[GCC 4.1.1 20060724 (prerelease) (4.1.1-4pclos2007)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
Enter the following statement:
>>> import text_utils
>>>
Notice that nothing happened, except that you got another >>> prompt. That's Python for you. You screw up and it yells at you, you get it right, and there's just silence. You'll have to get your positive reinforcement elsewhere. So, how do we send Dolly a message? Easy:
>>> text_utils.flip_txt("Dear Dolly. I hope you are doing well. Yours Truly.")
'.ylurT sruoY .llew gniod era uoy epoh I .ylloD raeD'
>>>
When you import a module, you bring it into your current scope, or namespace. At that point, all of its public objects and methods are accessible using the '.' notation. There is a variant of the import statement. If you hate the idea of typing text_utils.whatever(), you can import module as name, for example, import textutils as tu. Then we could say tu.flip_txt(). That can save on typing, but it's generally not a good idea to alias module names if you're working with a number of people who have to read your code, and are expecting known modules.
The final variant of the import statement, which is seriously, seriously not recommended, is; from module import x. x may be an asterisk, in which case everything from that module gets dragged into your namespace, or it may be specific, as in from text_utils import flip_txt. At that point, the objects may be directly invoked just as any other object in your code, i.e. instead of text_utils.flip_txt(), we could just issue flip_txt() directly. The reason this form is strongly discouraged is that with a large module you stand a fairly good chance of having name collisions with your code, with unpredictable results. It also obscures the source of the objects you are referencing, and we don't like obscurity.
Finally, let's go back to the editor window which has text_utils.py. Notice the beautifully indented format? Remember how I've been banging on about "three spaces"? Well, I've been lying. Not about the need to indent, but about three spaces - that part is totally arbitrary. You can indent one space, or ten spaces. It's completely up to you. But the indentation has to be consistent for every line of the code block. That's because whitespace is literally part of the Python syntax. C programmers hate this, and here's why. Below is a sample C++ program.
int main ()
{
cout << "Hello World!";
return 0:
}
Notice that it's nicely indented? But that's just a matter of preference. I've seen a variety of styles, like:
int main () {
cout << "Hello World!";
return 0:
}
You could even, as one of our classmates that I shamelessly ripped off demonstrated, write it as:
int main () { cout << "Hello World!"; return 0: }
C, and many other languages use special characters to delimit lines and code blocks, but leave the formatting totally at the programmer's discretion. And that's the problem. Programmers have no discretion. I swear on a stack of holy books that if you put five C programmers on a project, they will have six different indentation styles, and by the end of the project, they will be spending 90% of their time reformatting each others code to match their indentation quirks.
Worse, I've seen C programmers get tangled up for hours on something like this:
if (something)
if (something else)
do_stuff(stuff_params);
else
this_blows(something wrong);
It's not immediately obvious even in this small example what went wrong in the above code, and when you're dealing with much larger functions, with many lines of code it can drive you crazy. The problem, of course, is that the indentation does not match what the code is actually doing - the else clause should be lined up under the second if statement, and its code block similarly indented. Python simply can't have these problems.
Python takes the position that indenting code blocks is the right way to do things, and by making it part of the syntax, the language is also able to get rid of most of the semantic delimiter clutter, like squirelly braces and hemi-colons.
Python has retained a colon at the end of a flow control statements to designate the beginning of a new code block. You see two colons in our flip_txt() function; one at the end of the function declaration, and one at the end of the for statement. In both cases, that's your cue to indent another level. Notice also that it's easy to see the end of a code block - the indentation takes a step to the left. The for statement in our example controls a code block of one line, and you can see at a glance where the indentation, hence, the code block ends.
Which is why I cautioned you about tabs above. Let me emphasize that. Tabs are evil. Tabs display at different widths on different systems and editors, they sometimes get translated in strange ways during copy/paste operations, and the absolute worst is tabs and spaces mixed together. The behavior of spaces, on the other hand, is consistent and safe across all platforms and editors. I strongly suggest configuring your editor to convert all tabs to spaces, and to produce a set number of spaces instead of a tab character when the tab key is pressed.
Meanwhile, back to our text_utils.py. There are only two really new things here. The function declaration, which is the keyword def, followed by the function name with any expected parameters in parenthesis, followed by a colon. The other new statement is the return statement. The return statement ends the execution of the function, yields control back to the caller, and optionally, returns any Python object (or, as we'll see later, combination of objects) to the caller.
Other than that, with a bit of thought, it should be possible to puzzle out just how the text is getting reversed. Scream loudly if not.
Well, that should keep some of you scratching your heads for a few minutes.
Next up: There's methods to my madness.