07.04.06

Ruby Tutorials for TextMate hackers

Posted in TextMate, ruby/rails at 11:45 am by Haris

Wanted to hack TextMate by creating new commands, but didn’t know how because you don’t know any scripting languages or how to do magic using the shell? Fear not, this post will show you all (or at least a small fraction of all) you need to know to use Ruby to write new and wonderful commands. No prior knowledge of Ruby is assumed.

Do get things started, let’s start with a simple command. All it will do is read every line in the selected text, and add an ever increasing number in front. Not the most ingenious of commands, but hey, this is a tutorial after all.

Getting Our Feet Wet

So go ahead and create a new command. Set its input to “Selected Text or Document”, and its output to “Replace Selected Text”. This means that our command would be given as input either the current selection if there is one, or the entire document, and it will replace all this text with whatever its output is. This is the code for the command:

#!/usr/bin/env ruby
text = STDIN.read
lines = text.split("\n")
i = 1
for line in lines do
  puts "#{i}. " + line
  i += 1
end

Let’s take things one at a time. The first line is a “secret code”. It tells the system that it should look for a program called “ruby”, and use that program to process what follows. In simple terms, we just told the computer that we are writing a Ruby program.

The next line tells Ruby to read everything that STDIN has to offer, and store this in the variable text. STDIN is a magical contraption that handles anything that has to do with the “STandarD INput”, which is just what we told TextMate to provide as input. In general in ruby you communicate with “objects”, like STDIN above, by telling them to do things. This is done by writing down the object, then a dot and then the command you want the object to perform (called a “method”). The object then does things and returns a value of what it did. We can then store this value via the equal sign as we did in the second line.

The third line has a similar flavor. We tell the object text to split itself. Now, you are thinking that text is just a string, not an object, but in Ruby everything is an object!. So we are telling the string text to split itself whenever it meets the character \n, which is the symbol for the newline character. The text object does just that, and returns an array, with elements the lines of the string text. An array is just a list of things, nothing more nothing less. We store this list of things in the variable lines.

Now we create a new variable, called i, for storing the ever increasing number that we want to appear in front of every row. Nothing special about that.

The fifth line of the code tells Ruby to go through each element of the lines array, and execute the code between the do and end parts once for each element, with the variable line each time having as value this element. So effectively we are saying “Do this for each line in the list of lines”.

Ok, so what is “this”? Line 6 tells Ruby to print something. There are two methods that instruct Ruby to print something. One is the method print, and the other is the method puts. They both should be followed by what we want them to print. The difference is that puts adds a newline at the end, which is perfect since when we split our lines above, they lost their newlines. So what do we tell Ruby to print? We tell it to print the string:"#{i}. " + line. First of all, let’s get the plus out of the way. It tells Ruby to concatenate the two string appearing on either side. So we are just adding the "#{i}. " bit in front of the line.

Now ,what is all this #{i} gobbledygook? When appearing in a double-quoted string, like in our case, #{expr} tells Ruby to compute the expression expr, and insert in this location the result of this computation. In our case, we ask it to compute i, so it just prints the number. We’ll up the ante a bit in our second iteration of the command.

The next command is a shorthand, standing for i = i + 1. It just increases i by one, so that the second time around it will be a tad bigger.

That’s it really, the end on the next line tells Ruby when to end the for loop that started three lines above. Our command is ready! Go ahead and enjoy it, and when you are ready to proceed read on.

Knee Deep into the Murky Waters

The above is the long version of the command. We will eventually shorten it, hopefully down to a single line. But first, we’ll start with a simple change. Replace the two lines in the for loop with:

puts "#{i += 1}. " + line

and also change the line before the for loop to set the starting value for i to 0 instead. Now try your command, you’ll see it still works!

Now, how is that possible? The key trick is that in Ruby, every single bit of code is an expression that returns a value. The same is true for equality assignments. So in this case, the lines i += 1 not only add one to i, but they also return as a value this new value of i, which is then printed by #{}.

The other slight improvement we’ll do is simplify the second and third lines into one. There is really no reason for there to be the text variable, we really only want the lines stuff. There are two ways to go about it. The one is to use the line:

lines = STDIN.read.split("\n")

Here, instead of storing the string returned by STDIN.read, we immediately ask it to split itself at the newlines, and store the result of this in he variable lines. STDIN offers us a slightly more convenient way to do that with only one method call:

lines = STDIN.readlines

With one little difference though. the readlines command keeps the newlines in, so we end up adding another newline with our puts command. Thus we will use print instead. So the overall method now looks like this:

#!/usr/bin/env ruby
lines = STDIN.readlines
i = 0
for line in lines do
  print "#{i += 1}. " + line
end

Well, that’s still not very satisfactory. After all, we only use the lines variable in one place. So why not just place the STDIN.readlines bit right there:

#!/usr/bin/env ruby
i = 0
for line in STDIN.readlines do
  print "#{i += 1}. " + line
end

There, that’s better! Still note perfect though, the i=0 line is less than ideal. One way to deal with that is to replace the whole thing with:

#!/usr/bin/env ruby
for line in STDIN.readlines do
  print "#{i = (i || 0) + 1}. " + line
end

i || 0 is a particular programming idiom. the || is the OR operator. It first evaluates the left-hand side of it, and if that is true it returns it, otherwise evaluates and returns the right-hand side. In Ruby other than the keywords true and false that have their expected meaning, all other objects except nil are true. nil is the special object meaning “nothing”, and it is considered to be false. So in this case, (i || 0) returns the number i provided that it has been defined, otherwise it returns 0.

Ok, that’s short enough. Let’s see now other ways of writing the same command.

Holding on to Blocks

Ruby has a wonderful thing called “blocks”, which you’ve already been using in this tutorial without knowing it. In fact, a slightly different way of writing the method would have been:

#!/usr/bin/env ruby
i = 0
STDIN.readlines.each do |line|
  print "#{i += 1}. " + line
end

each is a method that “enumerable” objects like lists (arrays) have. What it does is it accepts a block and executes the block for each item in the list. A block is the part from the do to the end. The |line| part tells Ruby, that within this block the variable line should refer to each of the objects in the array. So this code is functionally exactly the same the for ones above.

Actually, I just lied to you, it is not. To see this, try the i || 0 version of the command with each and see what happens. Simply put, any variables that have not been defined outside of the block do not have their values retained in successive iterations of the block. In our case, a way to get around it is to use another method called each_with_index instead of each, like so:

#!/usr/bin/env ruby
STDIN.readlines.each_with_index do |line,index|
  print "#{index + 1}. " + line
end

here the block gets to use two pieces of information, the actual element in the array, captured in the variable line, as well as the index of the element in the array, starting to count from 0. This can be further shortened by a shorthand for blocks: Instead of do-end, you can surround them in braces, like so:

#!/usr/bin/env ruby
STDIN.readlines.each_with_index { |line,index| print "#{index + 1}. " + line }

There you have it, our first one-line version of the command.

Drowning in Regular Expressions

We’ll now start afresh and try to approach the problem from a different point of view, by instead scanning through the entire input string, and adding something at the beginning of each line. We can do this via simple use of regular expressions. A regular expression, affectionately called a “regexp”, is a succinct way of describing to the computer a complicated string match we might want to do, for instance something like: “any word followed by any number of spaces followed by another word and then optionally a question mark”. This is only a simple example of what regexp can do. So here is the code for our command using regexps:

#!/usr/bin/env ruby
i = 0
print STDIN.read.gsub(/^/) { |text| "#{i += 1}. "}

Let’s see what happened here. STDIN.read returns to us the input string. Then we tell the string to do a gsub on itself. That means: search for a particular regular expression, and substitute all instances of it with something. In this case the regexp is described by the mysterious three letter part that says: /^/. The slashes determine the beginning and end of a regexp. Within a regexp, some characters have special meaning, ^ being one of them. It means, match the beginning of a new line. So this tells the string to look for all beginnings of lines. The block that follows tells them what to insert there. It accepts in the variable text the text that was matched by the regular expression, and all it does is output the appropriate number. The last expression in a block is the value that the block returns, and in our case that is the string.

To show a bit more of the power of regular expressions, we now move on to a different command. We’ll construct a command that wraps the word “the”, whenever it encounters it, in asterisks, like so: “here is *the* theory of *The*-one”. In order to do that, we will look for a space followed by the word “the” with any capitalization followed by another space. The code to do that is here:

#!/usr/bin/env ruby
print STDIN.read.gsub(/(\s)([tT][hH][eE])(\s)/,'\1*\2*\3')

ok, so, what do we have here? First, let’s talk about the regexp, /(\s)([tT][hH][eE])(\s)/. First, try to read it without the parentheses: /\s[tT][hH][eE]\s/. This says the following: First, match a whitespace (\s). Then, match either a t or a T (this is what the brackets are doing). Then, match either an h or a H and so on. Now, the parentheses tell the program to store these matches for future use. These are used in the replacement string, '\1*\2*\3'. The \1, \2 and \3 stand for the three matches. You should now be able to understand what is going on: When a “ the ” is encountered, the first space, the word “the” and the following space are stored in the “variables” \1, \2 and \3 respectively. Then we ask the program to replace the matched part with the string that has the \1 first, followed by an asterisk, followed by the \2 part (that was “the”), another asterisk and finally the \3 part.

Right at the bottom

Well, that’s it for now, let me know if you would like to see more posts like this one. If there is a next item, we’ll see how to use TextMate’s built in Ruby libraries for receiving input from the user.

I’ll leave you with some exercises. For most of these, there are more than one ways to go about it.

  1. Write a command that adds an increasing number of asterisks in front of each line.
  2. Write a command that doubles every word, i.e. “the fox” will become “thethe foxfox”. (Hint: \w+ matches a sequence of one or more word characters. It will try to match as many as it can.)
  3. Write a command that looks for any appearance of the word Ruby, and surrounds it in every increasing pairs of asterisks, as in: *Ruby* is as **Ruby** does.
  4. Write a command that counts the number of lines and the number of words in the text. We assume here that words in a line are separated by space. (Hint: if lines is an array, then lines.length returns the number of items in it.)

Later

3 Comments »

  1. Mark Eli Kalderon said,

    July 5, 2006 at 2:25 pm

    Thanks Haris. Great timing for me as learning Ruby is one of my summer projects. Best Mark

  2. After thought » Ruby Tutorials for TextMate hackers, part 2 said,

    July 6, 2006 at 12:23 am

    […] In the previous article from this series we learned some basic things about Ruby and how to use it to make TextMate do our bidding. In this post, we’ll see how to use some of TextMate’s Ruby libraries to do more stuff. […]

  3. links for 2007-08-24 « Amy G. Dala said,

    August 24, 2007 at 7:17 am

    […] After thought » Ruby Tutorials for TextMate hackers (tags: ruby textmate osx software geekery) […]

Leave a Comment