RegEx Subpatterns in Vim

When making a regular expression (regex), just try to match a little bit at a time so you know that the first part of your regex is working. That makes it much easier to see what is going wrong. Of course this means you should try it out on test data first that is easy to undo.

For example, if you come across some text you are not sure how to grab properly, don’t go all out trying to write out your whole regex at once, just try out the first part that you are unsure of. If it doesn’t work right, look at what it did then undo the change (u in vim) and try modifying the regex. Once you test it and get it working, undo your change on the test data and move on to the next section slowly making your regex more complicated and complete.

I had a table with many entries just like this:

Model Number Description
150-95566 Super Widget
1062500-101 Regular Widget

I wanted to make the model numbers into links which contain the model numbers in the URL. To do this I used pattern substitution.

In the source (loaded in vim) I had a whole lot of:

<TR>
<TD HEIGHT=19 ALIGN=LEFT VALIGN=BOTTOM SDNUM="1033;1033;General"><FONT COLOR="#000000">150-95566</FONT></TD>
<TD ALIGN=LEFT VALIGN=BOTTOM SDNUM="1033;1033;General"><FONT COLOR="#000000">Super Widget</FONT></TD>
</TR>
<TR>
<TD HEIGHT=19 ALIGN=LEFT VALIGN=BOTTOM SDNUM="1033;1033;General"><FONT COLOR="#000000">1062500-101</FONT></TD>
<TD ALIGN=LEFT VALIGN=BOTTOM SDNUM="1033;1033;General"><FONT COLOR="#000000">Regular Widget</FONT></TD
</TR>

and wanted to make them:

<TR>
<TD HEIGHT=19 ALIGN=LEFT VALIGN=BOTTOM SDNUM="1033;1033;General"><FONT COLOR="#000000"><a href="/pn/150-966">10-95566</a></FONT></TD>
<TD ALIGN=LEFT VALIGN=BOTTOM SDNUM="1033;1033;General"><FONT COLOR="#000000">Super Widget</FONT></TD>
</TR>
<TR>
<TD HEIGHT=19 ALIGN=LEFT VALIGN=BOTTOM SDNUM="1033;1033;General"><FONT COLOR="#000000"><a href="/pn/106200-101">1062500-101</a></FONT></TD>
<TD ALIGN=LEFT VALIGN=BOTTOM SDNUM="1033;1033;General"><FONT COLOR="#000000">Regular Widget</FONT></TD>
</TR>

The description and model lines are similar but different so I want to take advantage of that. Woo, looks tough!
Here is what I used:

:% s/\(<TD HEIGHT[^$]*000000">\)\([^<]*\)\(<[^$]*\)/\1<a href="\/pn\/\2">\2<\/a>\3/g

I hope you are roughly familiar with vim regex syntax. It is, in a very simple form, like so:

:% s/search text/replacement text/g

the % means all lines. If you wanted a range you would use:

:100,150 s/search text/replacement text/g

and it would only do the regex replacement on lines 100-150.

Okay, let’s break this down into the subpatterns so it is easier to see.
Subpatterns are defined by \( and \). Everything in between is the subpattern.

To match the first part (everything before the model number):

\(<TD HEIGHT[^$]*000000">\)

that means everything up to the 000000> (the [^$]* just means match all characters that aren’t the end of the line. The reason it does not go to the end of the line is the 000000″> has to match.

Then the model number itself:

\([^<]*\)

That means everything that is not a <.

The last bit matches < and anything else that is not the end of the line.

\(<[^$]*\)

The last bit on the replace side:

\1<a href="\/pn\/\2">\2<\/a>\3

uses \1 \2 and \3 (the subpattern parts above are numbered in the order they appear). Notice it uses \2 twice since I want it in the URL and the link text.

Replacing unusual characters in vim.

I had a lot of

â€<9d>

that needed to be just plain ” (double quote).

The 9d was actually in blue so it was one character with hex code 009d.
(If you type ga you get info about the character under the cursor.)

So, to search and replace you type:

:% s/â€

then

<C-V>x9d

(which means Ctrl+V then the keys x 9 d)
then

 /"/g

and it works!