Ruby Full Immersion

LUG Programming Course, 18th February 2008
This week we move on from JavaScript to the Ruby programming language. Ruby is a dynamically typed, interpreted programming language. It's available for the Windows, Mac OS X, and Linux operating systems.
This lesson will give a very brief overview of the language, noting important differences from JavaScript where necessary. The first, and probably most important difference from JavaScript, is that Ruby runs on the operating system itself, rather than within a browser.
Ruby can be used to create classic glue language scripts, much like AWK or Perl. It can also be used to create desktop applications, using a variety of bindings to underlying libraries, such as Fox, wxWidgets, Korundum and Qt, and Tk. You can also collect your Ruby scripts together to make a stand alone executable using RubyScript2Exe. Undoubtedly, Ruby on Rails has made Ruby widely known as a web application programming language. Il sito Ruby Italia fornisce guide ed altri informazioni in italiano.
There is a wealth of documentation available, including the core and standard libraries. There is an online interpreter and tutorial, which is a clever idea indeed.
I managed to cover the material in the allotted time frame, but I'm still leaving a few students behind. The part about iterators and code blocks left a lot of confused looks, much as it did in JavaScript. I'll save our 'bonus' lesson to go over this ground again in a couple of weeks.

Ruby Basics

You can try out the following examples using the Interactive Ruby Shell called irb. Just type irb at the command line, and type exit when you've had enough. Ruby also has ri the documentation reader. You can look up documentation for a given class, or method using this command line program. For example, to look up the String class type ri String. You can find out more about ri by typing ri --help.
Ruby has several types, numbers, strings, arrays, hashes, ranges, symbols and regular expressions.
Unlike JavaScript, Ruby has integer numbers as well as floating point numbers. The Bignum class can handle truly enormous numbers, which can be useful for astronomers and bankers. You can use the underscore (_) to separate big numbers, such as 35_000_000.
Strings are a little more tricky. Just like JavaScript you can use single quotes (') or double quotes ("), but single quoted strings only accept \\ and \' as escape sequences. Double quoted strings can accept a much richer set of backslash (\) sequences, as well as embedded expressions, for example "Your name is #{name}". You can use different single character delimiters using %q and %Q, and multi-character delimiters using here documents: %q{String definition}, %Q<String\tdefinition>, or
str = <<xXx
 <!-- saved from url=(0014)about:internet -->
 <html xmlns="http://www.w3.org/1999/xhtml"></html>
xXx
Arrays have a similar syntax to JavaScript:
a = [ 'bee', 'wasp', 3.14159 ]
As with JavaScript, array elements are accessed by index value (in the range 0..array.length - 1), but Ruby adds a little magic by allowing negative index values, which refer to the end of the array. So
a.last == a[-1]
a[a.length – 1] == a[-1]
both give the result:
true
Hashes (or associated arrays) have a slightly different syntax from JavaScript:
b = { 'butterfly' => 'float like a butterfly', 'bee' => 'sting like a bee' }
Hash elements are accessed by key:
b['butterfly']
gives:
"float like a butterfly"
Ranges are very useful for creating sequences:
('a'..'f').to_a
gives:
=> ["a", "b", "c", "d", "e", "f"]
whereas (note there are three dots, not two as before):
(0...10).to_a
gives:
=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Symbols are unique strings, and are created by prefixing the string or name with a colon (:), such as :name, or :"Name is #{name}". They are guaranteed to have the same identifier:
"hello".object_id == "hello".object_id
gives:
false
Whereas:
:"hello".object_id == :"hello".object_id
gives:
true
Regular expressions also have a similar syntax to JavaScript, re = /[aeiou]/, as well as the %r alternative, re = %r{[aeiou]}.
As you've probably guessed, a local variable starts with the variable name (it does not require a var or other keyword), and the semicolon statement terminator (;) is optional, when there is only one statement per line.
Names can be letters followed by letters, numbers or the underscore. Naming style is different from JavaScript in that names – including filenames - are normally lowercase, and words are separated by underscores, for example word_count. Constants always start with an uppercase letter (and usually are all uppercase, such as PI), and class names follow the UpperCamelCase naming style.
Comments start with a hash sign (#) and continue to the end of the line. Multiple line comments can be created between a line starting with =begin and a subsequent line starting with =end. For example:
# normal comment
a = 1
=begin
a = 2
=end
puts a
gives:
1
The boolean values are true and false, but a non existing object is nil, and not null as in JavaScript. Ruby has no equivalent of undefined. The nil object also has methods, so to test for nil (on any object) we can call obj.nil?, which only ever returns true if the object is nil.
Ruby makes accessing operating system commands very easy using the back quotes (`) or using %x. For example, in irb, the following will print the list of files in the current working directory:
puts `ls` # Unix
puts `cmd /C "dir /B"` # Windows

A Ruby Program Dissected

We'll create a short Ruby program to count the number of words in a file. To add a little spice, and to demonstrate some of the power of Ruby, we'll also calculate the unique word count.
The text for this exercise comes from “The Happy Prince and Other Tales”, by Oscar Wilde, available from project Gutenberg.
The code consists of a single class WordCount, and a small command line program which uses the class. Here then is the code in the word_count.rb file, with explanations (you can download the source files at the end of this discussion):
01 #!/usr/local/bin/ruby -w
02
For any script that will be run as a program, put a shebang on the first line. The -w option forces the Ruby interpreter to display warnings. Removing all warnings from our code allows us to write better idiomatic Ruby. The interpreter will tell us when we've got things wrong.
03 # == Synopsis
04 # word_count: Counts the number of words in a file
05 #
06 # == Usage
07 # word_count [options] -- file_path
08 #
09 # -h, --help:
10 #    show help
11 #
12 # -m n, --min n:
13 #    minimum of n letters in the word (default is 2)
14 #
15 # file_path: the file path of the file to read the words from.
16 
The header comment can be parsed by RDoc to produce documentation of our code. This can also be used inside the program itself, as we'll see a little later.
17 # The WordCount class calculates unique and word counts in
18 # a file.
19 class WordCount
20 
This is the class definition, which is terminated by the end statement on line 62. A class can inherit methods and variables from another class, called the super class, by specifying < name after the class name. In our case we have not specified a super class, so Object (the base class in Ruby) is inferred. Line 19 is equivalent to class WordCount < Object.
21   # The minimum number of letters which make a word.
22   MIN = 2
23
We define a class constant for the default minimum number of characters in a word. This can be referenced inside the class by the name MIN, and outside the class using scope resolution; WordCount::MIN. Since this is a constant it's value must be initialised.
24   attr_reader :file, :title, :min, :count, :word_hash
25
The Module.attr_reader method is a piece of Ruby magic designed to make our lives a little easier. It uses metaprogramming to produce an instance variable, and a read accessor method for file, title, min, count and word_hash.
Also notice that line 24 is a method call. In Ruby the parentheses are optional in certain circumstances, particularly when calling a method in a simple statement, such as in this example. It is normal Ruby practice to leave off the parentheses, except where this would cause confusion. So line 24 is equivalent to attr_reader(:file, :title, :min, :count, :word_hash).
Just taking the first parameter into consideration, attr_reader produces something like:
  @file

  def file
    @file
  end
We'll look at what that means in a moment.
26   # Creates a new WordCount object for the given file name
27   # and minimum word length.
28   def initialize(file, min = 1)
29     @file = file
30     @min = min <= 0 ? MIN : min
31     @count = 0
32     @title = nil
33     @word_hash = Hash.new(0)
34   end
35
Methods are defined using the def keyword, terminating with the end keyword (line 34). The initialize method is special in a Ruby class, because it is the constructor method. Every time a new WordCount instance is created, this method is called.
Our initialize method expects two parameters; file and min (line 28). The second parameter, min, is optional, if it is not specified it will take the value of 1. Required parameters must appear before optional parameters. In the body of our method we initialise the instance variables @file, @min, @count, @title, and @word_hash. Instance variables have an @ prefix symbol.
We use a ternary expression on line 30 to ensure that @min cannot be less than 1.
Whereas in JavaScript new is a keyword, in Ruby new is a class method. So line 33 assigns a new Hash instance to the instance variable @word_hash. The argument value 0 is the value that will be assigned to each newly created element of the hash.
36   # Parses the file.
37   def parse
38     re = /\w{#{@min},}/
39     File.open(@file, "r") do |file|
The parse method does the real work. Since it takes no arguments we don't need to write the parentheses. On line 38 we create a local variable re and assign it to a regular expression. Inside the regular expression we can use an embedded expression to calculate the minimum number of occurrences of the word letters. If @min has a value of 2, then re, the regular expression becomes /\w{2,}/.
We make good use of the Ruby File and String classes in our little program, and on line 39 we use the File.open class method to open the file for reading. It not only opens the file, but it also closes the file automatically at the end of the code block, on line 49. I'll leave the discussion of Ruby code blocks till a little later.
40       while line = file.gets
41         line.strip!
42         if line.length > 0
43           @title = line unless @title
44           words = line.scan(re)
45           @count += words.length
46           words.each { |word| @word_hash[word] += 1 }
47         end
48       end
49     end
Line 40 uses a while expression ... end code block to read each line of text. The while loop is terminated when the expression evaluates to false. The File.gets method either returns a string (the line of text) or nil if there are no more lines. In Ruby everything is true, except false and nil. So the while loop will terminate when no more lines of text are available.
The File.gets method returns the line including end of line characters. We can remove these characters with the String.strip! method (line 41). In Ruby, methods which end with an exclamation mark (!) usually modify the object itself, rather than returning a modified object.
Since we have quite a lot of work to do on the line, we use a classic if test on line 42 to only check lines which contain text.
Line 43 however, uses a statement modifier. Statement modifiers can be read as “execute the statement if the expression is true”. But unlike JavaScript, Ruby also offers a complementary test to if, called unless. So the statement modifier reads “execute the statement unless the expression is true”. In other words set the @title to line unless the @title has already been set - @title was set to nil in the initialize method on line 32.
The actual word parsing is handled by the String.scan method which returns an array of words that match a string or regular expression (line 44).
Now we get to another interesting Ruby construct, on line 46. We need to examine each scanned word, and add it to our unique @word_hash. This is handled by using an iterator method each. Iterators and code blocks are a fundamental construct of Ruby. An iterator, such as each in our case, is responsible for passing each element in its container to a code block that we supply (the code within the curly braces).
The problem is that the iterator has to pass one or more parameters to the code block. Ruby handles this by allowing the code block to define an argument list using the vertical bar (|) to delimit the list. In JavaScript, if you remember, we had to use an anonymous function.
So our code block on line 46 (which terminates with the closing curly brace on the same line) receives a single argument called word. An important point to note here is that the start of the code block - the do or opening curly brace ({) - must begin on the same line as the method's last parameter or closing parenthesis. In our case, there are neither, so we have to stay on the same line as the method call itself.
In reality we've already seen another code block, on line 39. This code block uses the do ... end syntax, which is the preferred syntax in Ruby for multi-line code blocks. This is how File.open can guarantee that the file is closed when the code block terminates. How does that work exactly? I'll explain using pseudo-code:
  def File.open(name, mode, code_block = nil)
    file = _open(name, mode)
    code_block(file) if code_block
    _close(file)
  end
In Ruby, the code block can be passed to the method as the last parameter. This allows File.open to open the file, pass the open file to the code block (if there is one), and then close the file itself.
At this point we have almost finished the parse method.
50     self
51   end
52
Although it doesn't need to return any value, it is normal Ruby practice whenever possible to return the object itself, as this allows us to use method call chaining. This is exactly what we do on line 50. In Ruby self refers to the current object, whereas in JavaScript we used this. Unless specified by using return explicitly, Ruby will always return the last expression executed in a method. So line 50 is equivalent to return self.
53   # The unique word count
54   def unique
55     @word_hash.length
56   end
57
We've already provided read accessor methods to our instance variables, but we also need to provide a read accessor method for the unique word count.
58   # Returns the count summary.
59   def to_s
60     "File: #{@file} title: '#{@title}' min: #{@min} total: #{@count} unique: #{unique}"
61   end
62 end
63
Similarly, we use the “convert object to string” method, to_s, to return a summary of the result of parsing the file. This is extremely simple in Ruby through the use of embedded expressions.
Line 62 brings us to the end of our class definition.
64 # Run the application.
65 if __FILE__ == $PROGRAM_NAME # aka $0
66
Next we'll write the code which will run our program. Our WordCount class could be used by other programs, so line 65 tests that the current file (__FILE__) is also the program name ($PROGRAM_NAME or $0). If this is the case, the program code will be executed, otherwise it will be ignored. This test is used frequently in library code to execute tests when the file is run as a program.
67   require 'getoptlong'
68   require 'rdoc/usage'
69
The Kernel module provides several helper methods, including require, which loads the specified script into memory, which the interpreter then parses. This method allows us to add code that we require, but that is not part of the core Ruby library.
70   min = WordCount::MIN
71   path = nil
72   opts = GetoptLong.new(
73     [ '--help', '-h', GetoptLong::NO_ARGUMENT ],
74     [ '--min', '-m', GetoptLong::REQUIRED_ARGUMENT ]
75   )
Lines 72 to 75 make use of a Ruby class called GetoptLong which handles command line options in a uniform way. The format is well known in the Linux world, but may be new to Windows users. Each option can have two formats, long and short, possibly followed by an argument.
76   opts.each do |opt, arg|
77     case opt
78       when '--help'
79         Rdoc.usage
80         exit 0
81       when '--min'
82         min = arg.to_i
83     end
84   end
85
Lines 76 to 84 handle the options received. Here again, we use an iterator and a code block. Inside the code block there is a case ... when ... end statement. This is similar to the JavaScript switch ... case statement. Irrespective of the option style used, GetoptLong aways returns the long version. Here we test for the two predefined options. If the user asked for –-help we can make use of our documentation comments at the start of the file to explain what the user needs to do. This is achieved using the Rdoc.usage method on line 79. The Kernel module also provides an exit method which exits the program, returning an exit code, which we give as argument to the method (line 80). Programs that terminate successfully usually return positive values or 0, and use negative values to indicate that an error occurred.
Line 82 converts the string argument to an integer. Note that if a non integer value is given, the value returned is 0. See ri String#to_i for more information.
86   if ARGV.length != 1
87     puts "Missing file_path argument (try -–help)"
88     exit(-1)
89   end
90   path = ARGV.shift
91   unless File.exists?(path) && File.file?(path) && File.readable?(path)
92     puts "File path #{path} doesn't exist, isn't a file, or can't be read"
93     exit(-1)
94   end
95
Lines 86 to 89 check that the user has specified a file path, and if not, exits after printing an error message. The command line arguments are stored in the ARGV array, and by the time GetoptLong has finished, we should be left with just the file path. The exit statement on line 88 uses parentheses, because without them Ruby generates a warning. Line 90 removes the file path name from ARGV and stores it in the local variable path.
Line 91 checks that the path exists, is a file, and can be read, using the unless keyword. Just to remind you:
91   unless File.exists?(path) && File.file?(path) && File.readable?(path)
could also be expressed as:
91   if !File.exists?(path) || !File.file?(path) || !File.readable?(path)
I hope you agree with me that the unless statement is more readable.
96   puts WordCount.new(path, min).parse.to_s
97   exit 0
98
99 end
Finally, we can get down to doing the job. Line 96 creates a WordCount instance, parses the text, and prints the summary. Because our parse method returns self, we can chain these three method calls (new, parse and to_s) together on one line. Again, this is normal practice in Ruby.
Now that we've been though the program code, let's see what it does. On the command line (and in the directory where word_count.rb resides), type:
word_count -h
On Linux and Mac OS X, you will have to make word_count.rb executable, and possibly use the ./ prefix, as in ./word_count. The result should be something like:
C:\Courses\LUGPC6\WordCount>word_count -h

Synopsis
--------
word_count: Counts the number of words in a file


Usage
-----
word_count [options] -- file_path

-h, --help:

   show help

-m n, --min n:

   minimum of n letters in the word (default is 2)

file_path: the file path of the file to read the words from.
C:\Courses\LUGPC6\WordCount>
Now type:
word_count
the result should be something like:
C:\Courses\LUGPC6\WordCount>word_count
Missing file_path argument (try -–help)
C:\Courses\LUGPC6\WordCount>
Next type:
word_count -- ./text
the result should be something like:
C:\Courses\LUGPC6\WordCount>word_count -- ./text
File path ./text doesn't exist, isn't a file, or can't be read
C:\Courses\LUGPC6\WordCount>
Now, we'll actually do some parsing, type:
word_count -- ./text/TheDevotedFriend.txt
the result should be something like:
C:\Courses\LUGPC6\WordCount>word_count -- ./text/TheDevotedFriend.txt
File: ./text/TheDevotedFriend.txt title: 'The Devoted Friend' min: 2 total: 4154 unique: 926
C:\Courses\LUGPC6\WordCount>
Finally, let's set the minimum word length to 5. Type
word_count -m 5 -- ./text/TheSelfishGiant.txt
and the result should be something like:
C:\Courses\LUGPC6\WordCount>word_count -m 5 -- ./text/TheSelfishGiant.txt
File: ./text/TheSelfishGiant.txt title: 'The Selfish Giant' min: 5 total: 518 unique: 253
C:\Courses\LUGPC6\WordCount>
Feel free to use this code as a template for your own command line programs in Ruby.

Source Files

All the source files for this lesson, including the Aptana Studio project file and test text files, can be found in the LUGPC6.zip archived file, distributed under the GNU Lesser General Public License.

What's Next?

Next we'll take our knowledge of Ruby a step further by looking at a Ruby and Ajax example, creating our first web application.