Sed, AWK and Creating HTML Tables

Anyone who does even a modicum of web developments finds themselves taking tabled text, sometimes in the most horrid format, and having to introduce it into some sort of clean html.

A recent experience of this type was transferring material from a TWiki site (good for collaborative projects) to a Drupal site (good for a internal and external website). TWiki likes introducing a lot of inline style information and similar which makes it a bit of a pain to transfer tables to other systems. This is where those old-fashioned UNIX tools, sed (stream editor, 1973) and AWK (Aho, Weinberger, and Kernighan, 1977) still proves their resilience, power and simplicity.

I'll start with a generic method for working with a CSV file. After that I'll give a shorter example specifically for TWiki to HTML.

Generic; From CSV to HTML

1. Start With A CSV File. Turn Every Delimiter Into A New Line

Let's presume that we start with a CSV (comma separate values) file of a table, probably achieved through a process as ugly as screen-scraping or stripping tags.

The first action will be to use sed to turn delimiters into new lines, so that every future table cell is now on its own line.

sed -i 's/,/\n/g' file.csv

The "-i" stands for "in place", e.g., change file.csv, rather than sending to standard error. The "s" is for search, the "g" for global, the "," is what we are searching for, and "\\n" is newline ("\n") with an escape character ("\") so the regular expression doesn't get confused.

You must also be careful to ensure that your csv file does not use a field delimiter (like a comma!) that is used in the table cells.

If you have saved text with quotations, which is common, you'll also need to do the following:

sed -i 's/"//g' file.csv

Note if you leave out the "g" (global"), sed will simply replace the first instance. It is a stream editor after all..

2. Add a table cell open and table cell close to every line

sed -i 's/^/<td>/' file.csv

The caret symbol ("^") means the first character of each line.

sed -i 's/$/<\/td>/' file.csv

The dollar symbol ("$") symbol means the last character of each line.

Now we have a file with each future table cell with a opening and closing html code on each line. We have a file of table cells.

3. Add a table row to every 8th row

Table cells need to be organised into rows. To do this we use a small AWK program. In this example every 8th row has a table row tag added. I am sufficiently paranoid to create a new file for this.


#!/bin/bash
awk ' {
print $0
if(NR % 8 == 0 )
print "<tr>"
} ' file.csv > file1.csv

Or, directly on the command line: awk ' {print $0; if(NR % 8 == 0 ) print "<tr>"} ' file.csv > file1.csv

(Note the semi-colon after print $0).

4. Add a table row closure to every 9th row

As above; open file1.csv an add a tr to line 1. There's probably a better way to do this.


awk ' {
print $0
if(NR % 9 == 0 )
print "</tr>"
} ' file1.csv > file2.csv

Or, as a single line:

awk ' {print $0; if(NR % 9 == 0 ) print "</tr>"} ' file1.csv > file2.csv

Viola! Add the table, table header and table body tags along with any formatting characteristics and you're done.

TWiki to HTML

Go to EDIT, Raw View or even copy the appropriate text file from TWiki.

1. Remove table header line and convert.
2. Remove first "|" sed -i 's/.\(.*\)/\1/' test.txt
3. Remove last "|" sed -i 's/\(.*\)./\1/' test.txt
4. Add a table row and data marker at the start of each line sed -i 's/^/<tr><td>/' test.txt
5. Add a table row close and data close marker at the end of each line sed -i 's/$/<\/td><\/tr>/' test.txt
6. Turn all the other delimiters into table data markers. sed -i 's/|/<\/td><td>/g' test.txt

Surprise! You're finished already. You might even want to turn it into a small script!