- perl-utils
- Preamble
- Paragraph processing utilities
- Other text-oriented utilities
- File-oriented utilities
- To be continued...
perl-utils is the set of text- and file-oriented utilities. Text-oriented scripts are supposed to be used mostly for processing paragraphs. By default, a paragraph is idenitified as a bunch of text lines delimited by an empty or blank lines.
Assuming the text file is the set of paragraphs, it is easier to sort, merge and filter some files without losing links between lines of paragraphs.
For example, multiline log entries in log files could contain additional useful information. Using grep -C
(or grep -A
, or grep -B
) doesn't guarantee complete extraction of particular log entries (or can extract other log entries not necessary at the moment).
paragrep - grep-like filter for searching matches in paragraphs.
paragrep assumes the input consists of paragraphs and prints the paragraphs matching a pattern. Paragraph is identified as a block of text delimited by an empty or blank lines.
The initial version was very simple and was implemented as a shell function invoking perl inline script for grepping log files:
paragrep() {
perl -ne '
if ( m/$break_of_para/ ) {
print $para if defined $para && $para =~ /$regexp/;
$para = "";
}
$para .= $_;
END {
print $para if defined $para && $para =~ /$regexp/;
}
' -s -- -break_of_para="$1" -regexp="$2" "${@:3}"
}
or
paragrep() {
perl -ne '
( m/$break_of_para/ or eof ) and do {
print $para if defined $para && $para =~ /$regexp/;
$para = "";
};
$para .= $_;
' -s -- -break_of_para="$1" -regexp="$2" "${@:3}"
}
Later I decided to implement it as the standalone script adding more functionality and flexibility.
Example
Each log entry in log files usually begins with the timestamp in the generalized numeric form date time, which can be covered by the pattern without reflecting on which date format has been used to output dates:
paragrep -Pp '^\d+[/-]\d+[/-]\d+ \d+:\d+:\d+' PATTERN FILENAME
Also the aliases for parsing log files and INI-like configuration files:
alias lgrep="paragrep -Pp '^\d+[/-]\d+[/-]\d+ \d+:\d+:\d+'"
alias cgrep="paragrep -Pp '^(#@ |#-> )?\['"
Similar tools
While working on the script I found a lot of interesting implementations of the task on different languages. Here is a quite short excerpt of them interested me:
- paragrep in Python
- paragrep in Haskell
- Ack
- greple
- Example from Perl Cookbook, Chapter 6
- grep(1)
- perlre(1)
Small and powerful script to merge two or more logfiles so that multilined entries appear in the correct chronological order without breaks of log entries.
sponge is Perl version of the sponge from the Debian package moreutils.
It reads standard input to memory and writes it out to the specified file. Unlike a shell redirect, the script soaks up all its input before opening the output file. This allows constructing pipelines that read from and write to the same file. If no file is specified, outputs to STDOUT.
My first release was the Perl inline script within the shell function:
sponge() {
perl -ne '
push @lines, $_;
END {
open(OUT, ">$file")
or die "sponge: cannot open $file: $!\n";
print OUT @lines;
close(OUT);
}
' -s -- -file="$1"
}
Perl has many ways to do it. So, there is a bit another way also supporting the -a
option for appending to the file:
sponge() {
perl -e '
$file = shift || "-";
@lines = <>;
open OUT, ( defined $a ? ">>" : ">" ) . $file
or die "sponge: cannot open $file: $!\n";
print OUT @lines;
close OUT;
' -s -- "$@"
}
Awk can do sponge as well:
#!/usr/bin/awk -f
# slurp a stuff and burp...
# ... | awk -f sponge.awk [-v ORS="\r\n"] [-v append=1] [-v file=file]
NR == 1 { lines = $0 }
NR != 1 { lines = lines ORS $0 }
END {
if ( ! file ) { file = "-" }
if ( append ) {
print lines >> file;
} else {
print lines > file;
}
}
or the same but more convenient in shell:
#!/bin/sh
# slurp a stuff and burp...
# ... | sponge [-a] file
sponge() (
case "$1" in
-a | --append )
append=1
file="$2"
;;
* )
append=""
file="$1"
;;
esac
awk -v append="$append" -v file="$file" '
NR == 1 { lines = $0 }
NR != 1 { lines = lines ORS $0 }
END {
if ( ! file ) { file = "-" }
if ( append ) {
print lines >> file;
} else {
print lines > file;
}
}'
)
sponge "$@"
Example
An abstract example of usage is described in the tool's help and shown below:
sed '...' file | grep '...' | sponge [-a] file
See also
This is Perl implementation of the AWK script to transpose the input file so rows become columns and columns become rows.
#!/usr/bin/awk -f
{
for (i = 1; i <= NF; i++) {
a[NR,i] = $i
}
}
NF > p {
p = NF
}
END {
for (j = 1; j <= p; j++) {
str = a[1,j]
for (i = 2; i <= NR; i++) {
str = str OFS a[i,j];
}
print str
}
}
Example
( echo {1..5} ; echo {100..104} ) | ./transpose
See also
file-rename
renames the filenames supplied according to the rule specified as the first argument. It supports several ways to rename files: applying a perl code to copy or move files; rotating names cyclically left or right; swapping two names; flipping the whole list of files.
Example
file-rename 's/\.bak$//' *.bak
See Also