Ruby Full Immersion
LUG Programming Course, 18th February 2008
This week we move on from JavaScript to the
Ruby programming language. Ruby is a dynamically typed, interpreted programming language. It's available for the Windows, Mac OS X, and Linux operating systems.
This lesson will give a very brief overview of the language, noting important differences from JavaScript where necessary. The first, and probably most important difference from JavaScript, is that Ruby runs on the operating system itself, rather than within a browser.
Ruby can be used to create classic
glue language scripts, much like
AWK or
Perl. It can also be used to create desktop applications, using a variety of bindings to underlying libraries, such as
Fox,
wxWidgets,
Korundum and Qt, and
Tk. You can also collect your Ruby scripts together to make a stand alone executable using
RubyScript2Exe. Undoubtedly,
Ruby on Rails has made Ruby widely known as a web application programming language. Il sito
Ruby Italia fornisce guide ed altri informazioni in italiano.
There is a wealth of
documentation available, including the
core and
standard libraries. There is an
online interpreter and tutorial, which is a clever idea indeed.
I managed to cover the material in the allotted time frame, but I'm still leaving a few students behind. The part about iterators and code blocks left a lot of confused looks, much as it did in JavaScript. I'll save our 'bonus' lesson to go over this ground again in a couple of weeks.
Ruby Basics
You can try out the following examples using the Interactive Ruby Shell called
irb. Just type
irb at the command line, and type
exit when you've had enough. Ruby also has
ri the documentation reader. You can look up documentation for a given class, or method using this command line program. For example, to look up the
String class type
ri String. You can find out more about
ri by typing
ri --help.
Ruby has several types, numbers, strings, arrays, hashes, ranges, symbols and regular expressions.
Unlike JavaScript, Ruby has
integer numbers as well as
floating point numbers. The
Bignum class can handle truly enormous numbers, which can be useful for astronomers and bankers. You can use the underscore (
_) to separate big numbers, such as
35_000_000.
Strings are a little more tricky. Just like JavaScript you can use single quotes (
') or double quotes (
"), but single quoted strings only accept
\\ and
\' as escape sequences. Double quoted strings can accept a much richer set of backslash (
\) sequences, as well as embedded expressions, for example
"Your name is #{name}". You can use different single character delimiters using
%q and
%Q, and multi-character delimiters using
here documents:
%q{String definition},
%Q<String\tdefinition>, or
str = <<xXx
<!-- saved from url=(0014)about:internet -->
<html xmlns="http://www.w3.org/1999/xhtml"></html>
xXx
Arrays have a similar syntax to JavaScript:
a = [ 'bee', 'wasp', 3.14159 ]
As with JavaScript, array elements are accessed by index value (in the range 0..array.length - 1), but Ruby adds a little magic by allowing negative index values, which refer to the end of the array. So
a.last == a[-1]
a[a.length – 1] == a[-1]
both give the result:
true
Hashes (or associated arrays) have a slightly different syntax from JavaScript:
b = { 'butterfly' => 'float like a butterfly', 'bee' => 'sting like a bee' }
Hash elements are accessed by key:
b['butterfly']
gives:
"float like a butterfly"
Ranges are very useful for creating sequences:
('a'..'f').to_a
gives:
=> ["a", "b", "c", "d", "e", "f"]
whereas (note there are three dots, not two as before):
(0...10).to_a
gives:
=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Symbols are unique strings, and are created by prefixing the string or name with a colon (
:), such as
:name, or
:"Name is #{name}". They are guaranteed to have the same identifier:
"hello".object_id == "hello".object_id
gives:
false
Whereas:
:"hello".object_id == :"hello".object_id
gives:
true
Regular expressions also have a similar syntax to JavaScript,
re = /[aeiou]/, as well as the
%r alternative,
re = %r{[aeiou]}.
As you've probably guessed, a local variable starts with the variable name (it does not require a var or other keyword), and the semicolon statement terminator (;) is optional, when there is only one statement per line.
Names can be letters followed by letters, numbers or the underscore. Naming style is different from JavaScript in that names – including filenames - are normally lowercase, and words are separated by underscores, for example
word_count. Constants always start with an uppercase letter (and usually are all uppercase, such as
PI), and class names follow the
UpperCamelCase naming style.
Comments start with a hash sign (#) and continue to the end of the line. Multiple line comments can be created between a line starting with =begin and a subsequent line starting with =end. For example:
# normal comment
a = 1
=begin
a = 2
=end
puts a
gives:
1
The boolean values are
true and
false, but a non existing object is
nil, and not
null as in JavaScript. Ruby has no equivalent of
undefined. The
nil object also has methods, so to test for
nil (on any object) we can call
obj.nil?, which only ever returns
true if the object is
nil.
Ruby makes accessing operating system commands very easy using the back quotes (`) or using %x. For example, in irb, the following will print the list of files in the current working directory:
puts `ls` # Unix
puts `cmd /C "dir /B"` # Windows
A Ruby Program Dissected
We'll create a short Ruby program to count the number of words in a file. To add a little spice, and to demonstrate some of the power of Ruby, we'll also calculate the unique word count.
The text for this exercise comes from “The Happy Prince and Other Tales”, by
Oscar Wilde, available from
project Gutenberg.
The code consists of a single class WordCount, and a small command line program which uses the class. Here then is the code in the word_count.rb file, with explanations (you can download the source files at the end of this discussion):
01 #!/usr/local/bin/ruby -w
02
For any script that will be run as a program, put a
shebang on the first line. The
-w option forces the Ruby interpreter to display warnings. Removing all warnings from our code allows us to write better idiomatic Ruby. The interpreter will tell us when we've got things wrong.
03 # == Synopsis
04 # word_count: Counts the number of words in a file
05 #
06 # == Usage
07 # word_count [options] -- file_path
08 #
09 # -h, --help:
10 # show help
11 #
12 # -m n, --min n:
13 # minimum of n letters in the word (default is 2)
14 #
15 # file_path: the file path of the file to read the words from.
16
The header comment can be parsed by
RDoc to produce documentation of our code. This can also be used inside the program itself, as we'll see a little later.
17 # The WordCount class calculates unique and word counts in
18 # a file.
19 class WordCount
20
This is the class definition, which is terminated by the
end statement on line 62. A class can inherit methods and variables from another class, called the super class, by specifying
< name after the class name. In our case we have not specified a super class, so
Object (the base class in Ruby) is inferred. Line 19 is equivalent to
class WordCount < Object.
21 # The minimum number of letters which make a word.
22 MIN = 2
23
We define a class constant for the default minimum number of characters in a word. This can be referenced inside the class by the name MIN, and outside the class using scope resolution; WordCount::MIN. Since this is a constant it's value must be initialised.
24 attr_reader :file, :title, :min, :count, :word_hash
25
The
Module.attr_reader method is a piece of Ruby magic designed to make our lives a little easier. It uses
metaprogramming to produce an instance variable, and a read accessor method for
file,
title,
min,
count and
word_hash.
Also notice that line 24 is a method call. In Ruby the parentheses are optional in certain circumstances, particularly when calling a method in a simple statement, such as in this example. It is normal Ruby practice to leave off the parentheses, except where this would cause confusion. So line 24 is equivalent to attr_reader(:file, :title, :min, :count, :word_hash).
Just taking the first parameter into consideration, attr_reader produces something like:
@file
def file
@file
end
We'll look at what that means in a moment.
26 # Creates a new WordCount object for the given file name
27 # and minimum word length.
28 def initialize(file, min = 1)
29 @file = file
30 @min = min <= 0 ? MIN : min
31 @count = 0
32 @title = nil
33 @word_hash = Hash.new(0)
34 end
35
Methods are defined using the def keyword, terminating with the end keyword (line 34). The initialize method is special in a Ruby class, because it is the constructor method. Every time a new WordCount instance is created, this method is called.
Our initialize method expects two parameters; file and min (line 28). The second parameter, min, is optional, if it is not specified it will take the value of 1. Required parameters must appear before optional parameters. In the body of our method we initialise the instance variables @file, @min, @count, @title, and @word_hash. Instance variables have an @ prefix symbol.
We use a ternary expression on line 30 to ensure that @min cannot be less than 1.
Whereas in JavaScript
new is a keyword, in Ruby
new is a class method. So line 33 assigns a new
Hash instance to the instance variable
@word_hash. The argument value
0 is the value that will be assigned to each newly created element of the hash.
36 # Parses the file.
37 def parse
38 re = /\w{#{@min},}/
39 File.open(@file, "r") do |file|
The parse method does the real work. Since it takes no arguments we don't need to write the parentheses. On line 38 we create a local variable re and assign it to a regular expression. Inside the regular expression we can use an embedded expression to calculate the minimum number of occurrences of the word letters. If @min has a value of 2, then re, the regular expression becomes /\w{2,}/.
We make good use of the Ruby
File and
String classes in our little program, and on line 39 we use the
File.open class method to open the
file for reading. It not only opens the file, but it also closes the file automatically at the
end of the code block, on line 49. I'll leave the discussion of Ruby code blocks till a little later.
40 while line = file.gets
41 line.strip!
42 if line.length > 0
43 @title = line unless @title
44 words = line.scan(re)
45 @count += words.length
46 words.each { |word| @word_hash[word] += 1 }
47 end
48 end
49 end
Line 40 uses a
while expression ... end code block to read each line of text. The
while loop is terminated when the expression evaluates to
false. The
File.gets method either returns a string (the line of text) or
nil if there are no more lines. In Ruby everything is
true, except
false and
nil. So the
while loop will terminate when no more lines of text are available.
The
File.gets method returns the line including end of line characters. We can remove these characters with the
String.strip! method (line 41). In Ruby, methods which end with an exclamation mark (
!) usually modify the object itself, rather than returning a modified object.
Since we have quite a lot of work to do on the line, we use a classic if test on line 42 to only check lines which contain text.
Line 43 however, uses a statement modifier. Statement modifiers can be read as “execute the statement if the expression is true”. But unlike JavaScript, Ruby also offers a complementary test to if, called unless. So the statement modifier reads “execute the statement unless the expression is true”. In other words set the @title to line unless the @title has already been set - @title was set to nil in the initialize method on line 32.
The actual word parsing is handled by the
String.scan method which returns an array of words that match a string or regular expression (line 44).
Now we get to another interesting Ruby construct, on line 46. We need to examine each scanned word, and add it to our unique
@word_hash. This is handled by using an iterator method
each.
Iterators and code blocks are a fundamental construct of Ruby. An iterator, such as
each in our case, is responsible for passing each element in its container to a code block that we supply (the code within the curly braces).
The problem is that the iterator has to pass one or more parameters to the code block. Ruby handles this by allowing the code block to define an argument list using the vertical bar (|) to delimit the list. In JavaScript, if you remember, we had to use an anonymous function.
So our code block on line 46 (which terminates with the closing curly brace on the same line) receives a single argument called word. An important point to note here is that the start of the code block - the do or opening curly brace ({) - must begin on the same line as the method's last parameter or closing parenthesis. In our case, there are neither, so we have to stay on the same line as the method call itself.
In reality we've already seen another code block, on line 39. This code block uses the
do ... end syntax, which is the preferred syntax in Ruby for multi-line code blocks. This is how
File.open can guarantee that the
file is closed when the code block terminates. How does that work exactly? I'll explain using
pseudo-code:
def File.open(name, mode, code_block = nil)
file = _open(name, mode)
code_block(file) if code_block
_close(file)
end
In Ruby, the code block can be passed to the method as the last parameter. This allows File.open to open the file, pass the open file to the code block (if there is one), and then close the file itself.
At this point we have almost finished the parse method.
50 self
51 end
52
Although it doesn't need to return any value, it is normal Ruby practice whenever possible to return the object itself, as this allows us to use
method call chaining. This is exactly what we do on line 50. In Ruby
self refers to the current object, whereas in JavaScript we used
this. Unless specified by using
return explicitly, Ruby will always return the last expression executed in a method. So line 50 is equivalent to
return self.
53 # The unique word count
54 def unique
55 @word_hash.length
56 end
57
We've already provided read accessor methods to our instance variables, but we also need to provide a read accessor method for the unique word count.
58 # Returns the count summary.
59 def to_s
60 "File: #{@file} title: '#{@title}' min: #{@min} total: #{@count} unique: #{unique}"
61 end
62 end
63
Similarly, we use the “convert object to string” method, to_s, to return a summary of the result of parsing the file. This is extremely simple in Ruby through the use of embedded expressions.
Line 62 brings us to the end of our class definition.
64 # Run the application.
65 if __FILE__ == $PROGRAM_NAME # aka $0
66
Next we'll write the code which will run our program. Our WordCount class could be used by other programs, so line 65 tests that the current file (__FILE__) is also the program name ($PROGRAM_NAME or $0). If this is the case, the program code will be executed, otherwise it will be ignored. This test is used frequently in library code to execute tests when the file is run as a program.
67 require 'getoptlong'
68 require 'rdoc/usage'
69
The
Kernel module provides several helper methods, including
require, which loads the specified script into memory, which the interpreter then parses. This method allows us to add code that we require, but that is not part of the core Ruby library.
70 min = WordCount::MIN
71 path = nil
72 opts = GetoptLong.new(
73 [ '--help', '-h', GetoptLong::NO_ARGUMENT ],
74 [ '--min', '-m', GetoptLong::REQUIRED_ARGUMENT ]
75 )
Lines 72 to 75 make use of a Ruby class called
GetoptLong which handles command line options in a uniform way. The format is well known in the Linux world, but may be new to Windows users. Each option can have two formats, long and short, possibly followed by an argument.
76 opts.each do |opt, arg|
77 case opt
78 when '--help'
79 Rdoc.usage
80 exit 0
81 when '--min'
82 min = arg.to_i
83 end
84 end
85
Lines 76 to 84 handle the options received. Here again, we use an iterator and a code block. Inside the code block there is a case ... when ... end statement. This is similar to the JavaScript switch ... case statement. Irrespective of the option style used, GetoptLong aways returns the long version. Here we test for the two predefined options. If the user asked for –-help we can make use of our documentation comments at the start of the file to explain what the user needs to do. This is achieved using the Rdoc.usage method on line 79. The Kernel module also provides an exit method which exits the program, returning an exit code, which we give as argument to the method (line 80). Programs that terminate successfully usually return positive values or 0, and use negative values to indicate that an error occurred.
Line 82 converts the string argument to an integer. Note that if a non integer value is given, the value returned is 0. See ri String#to_i for more information.
86 if ARGV.length != 1
87 puts "Missing file_path argument (try -–help)"
88 exit(-1)
89 end
90 path = ARGV.shift
91 unless File.exists?(path) && File.file?(path) && File.readable?(path)
92 puts "File path #{path} doesn't exist, isn't a file, or can't be read"
93 exit(-1)
94 end
95
Lines 86 to 89 check that the user has specified a file path, and if not, exits after printing an error message. The command line arguments are stored in the ARGV array, and by the time GetoptLong has finished, we should be left with just the file path. The exit statement on line 88 uses parentheses, because without them Ruby generates a warning. Line 90 removes the file path name from ARGV and stores it in the local variable path.
Line 91 checks that the path exists, is a file, and can be read, using the unless keyword. Just to remind you:
91 unless File.exists?(path) && File.file?(path) && File.readable?(path)
could also be expressed as:
91 if !File.exists?(path) || !File.file?(path) || !File.readable?(path)
I hope you agree with me that the unless statement is more readable.
96 puts WordCount.new(path, min).parse.to_s
97 exit 0
98
99 end
Finally, we can get down to doing the job. Line 96 creates a WordCount instance, parses the text, and prints the summary. Because our parse method returns self, we can chain these three method calls (new, parse and to_s) together on one line. Again, this is normal practice in Ruby.
Now that we've been though the program code, let's see what it does. On the command line (and in the directory where word_count.rb resides), type:
word_count -h
On Linux and Mac OS X, you will have to make word_count.rb executable, and possibly use the ./ prefix, as in ./word_count. The result should be something like:
C:\Courses\LUGPC6\WordCount>word_count -h
Synopsis
--------
word_count: Counts the number of words in a file
Usage
-----
word_count [options] -- file_path
-h, --help:
show help
-m n, --min n:
minimum of n letters in the word (default is 2)
file_path: the file path of the file to read the words from.
C:\Courses\LUGPC6\WordCount>
Now type:
word_count
the result should be something like:
C:\Courses\LUGPC6\WordCount>word_count
Missing file_path argument (try -–help)
C:\Courses\LUGPC6\WordCount>
Next type:
word_count -- ./text
the result should be something like:
C:\Courses\LUGPC6\WordCount>word_count -- ./text
File path ./text doesn't exist, isn't a file, or can't be read
C:\Courses\LUGPC6\WordCount>
Now, we'll actually do some parsing, type:
word_count -- ./text/TheDevotedFriend.txt
the result should be something like:
C:\Courses\LUGPC6\WordCount>word_count -- ./text/TheDevotedFriend.txt
File: ./text/TheDevotedFriend.txt title: 'The Devoted Friend' min: 2 total: 4154 unique: 926
C:\Courses\LUGPC6\WordCount>
Finally, let's set the minimum word length to 5. Type
word_count -m 5 -- ./text/TheSelfishGiant.txt
and the result should be something like:
C:\Courses\LUGPC6\WordCount>word_count -m 5 -- ./text/TheSelfishGiant.txt
File: ./text/TheSelfishGiant.txt title: 'The Selfish Giant' min: 5 total: 518 unique: 253
C:\Courses\LUGPC6\WordCount>
Feel free to use this code as a template for your own command line programs in Ruby.
Source Files
All the source files for this lesson, including the Aptana Studio project file and test text files, can be found in the
LUGPC6.zip archived file, distributed under the
GNU Lesser General Public License.
What's Next?
Next we'll take our knowledge of Ruby a step further by looking at a Ruby and Ajax example, creating our first web application.