$ wget http://openbsd.secsup.org/3.6/i386/cd36.iso --16:12:25-- http://openbsd.secsup.org/3.6/i386/cd36.iso => `cd36.iso' Resolving openbsd.secsup.org... done. Connecting to openbsd.secsup.org[208.209.50.18]:80... connected. HTTP request sent, awaiting response... 200 OK Length: 4,843,520 [text/plain] 100%[========================================================================================>] 4,843,520 1.08M/s ETA 00:00 16:12:29 (1.08 MB/s) - `cd36.iso' saved [4843520/4843520]
$ wget -q -O - http://www.uuasc.org/ <!doctype HTML public "-//IETF//DTD HTML//EN"> <HTML> <head> <title>UNIX Users Association of Southern California</title> </head> <body bgcolor="#FFFFFF" text="#000080"> <center> <h1><img src="uuasc.gif" alt="UUASC"></h1> <h2>UNIX Users Association of Southern California</h2> </center> ...
Sys Admin Magazine (p1 of 16)
[logo_cmp_black.gif]
[Type=count&AdID=50446&FlightID=31994&TargetID=2715&Segments=1411,1462
,1628,3158,3448,3977,4875&Targets=1215,2715,2878&Values=34,46,51,63,77
,80,92,101,140,203,442,646,944,945,963,1104,1184,1388,1405,1426,1431,1
736,1766,1785,1944,1970,2310,2352&RawValues=&random=cjRkbtk,baRdiupcea
imw]
[logo_sa_new.jpg] [spacer_319cce.gif]
____________________ Search
[Jump to:_________]
[spacer_999999.gif]
[spacer_999999.gif]
January 2005
[Type=count&AdID=50783&FlightID=32184&TargetID=2587&Segments=1411,3035
,3448,4875&Targets=2587,2878&Values=34,46,51,63,77,80,92,101,140,290,4
42,646,918,944,945,963,1184,1388,1405,1426,1431,1736,1766,1785,1944,19
70,2310,2352&RawValues=&random=dfnKozk,baRdiupceaimr]
Feature Article
(NORMAL LINK) Use right-arrow or to activate.
Arrow keys: Up and Down to move. Right to follow a link; Left to go back.
H)elp O)ptions P)rint G)o M)ain screen Q)uit /=search [delete]=history list
Sys Admin Magazine (1/10)
[IMG] _____________________ [ Search ]
January 2005 Current Issue
Feature Article [IMG]
Table of contents
Open Source Anti-Virus for the Whole Network: ClamAV Buy this issue.
* James Mikusi
Unix Review Spotlight
Mikusi provides an overview of the ClamAV anti-virus
tool, which filters any given input and outputs a Changes to the CIW
basic summary stating whether a virus was detected. Associate Certificatio
Exam
Columns The CIW (Certified
Internet Webmaster)
Checking Your Bookmarks * Randal L. Schwartz Foundation exam has
recently upgraded to
Questions and Answers * Amy Rich version 5. This
vendor-neutral exam is
------------------------------------------------- at the core of all the
http://www.samag.com/
$ websnarf 'http://docs.sun.com/app/docs/doc/806-2221-10/6jbf1novc?a=view' Snarfing http://docs.sun.com/app/docs/doc/806-2221-10/6jbf1novc?a=view...to docs.sun.com: Solaris 8 Sun Hardware Platform Guide.txt $ cat 'docs.sun.com: Solaris 8 Sun Hardware Platform Guide.txt' http://docs.sun.com/app/docs/doc/806-2221-10/6jbf1novc?a=view sun.com How To Buy | My Sun | Worldwide Sites | Search sun.com Sun Microsystems Logo [IMG]Products and [IMG]Support and Services Training docs.sun.com - Sun Product Documentation ... Table 1-1 Platform Names for Sun Systems +------------------------------------------------------------------------+ | System | Platform Name | Platform Group | |---------------------------+--------------------------+-----------------| | SPARCclassic | SUNW,SPARCclassic | sun4m | |---------------------------+--------------------------+-----------------| | SPARCstation LX | SUNW,SPARCstation-LX | sun4m | |---------------------------+--------------------------+-----------------| | SPARCstation LX+ | SUNW,SPARCstation-LX+ | sun4m | |---------------------------+--------------------------+-----------------| | SPARCstation 4 | SUNW,SPARCstation-4 | sun4m | |---------------------------+--------------------------+-----------------| | SPARCstation 5 | SUNW,SPARCstation-5 | sun4m | |---------------------------+--------------------------+-----------------|
#!/bin/sh
if [ $# -lt 1 ]; then
echo "Usage: `basename $0` url..." 1>&2
exit 1
fi
for url in "$@"; do
echo -n "Snarfing $url..." 1>&2
title=`html_title "$url"`
echo "to $title.txt" 1>&2
echo "$url" > "$title.txt"
echo >> "$title.txt"
elinks -dump "$url" >> "$title.txt"
done
#!/usr/bin/perl -w
use strict;
use warnings;
use File::Basename;
use LWP::Simple;
use HTML::TokeParser;
my $program = basename $0;
my $url = shift or die "Usage: $program url\n";
my $html = get($url) or die "Can't retrieve $url\n";
my $p = HTML::TokeParser->new(\$html);
if ($p->get_tag("title")) {
print $p->get_trimmed_text, "\n";
} else {
print $url, "\n";
}
$ telnet www.freebsd.org 80
Trying 216.136.204.117...
Connected to www.freebsd.org (216.136.204.117).
Escape character is '^]'.
GET /
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content="HTML Tidy, see www.w3.org" />
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1" />
<title>The FreeBSD Project</title></pre></small>
<meta name="description" content="The FreeBSD Project" />
<meta name="keywords"
content="FreeBSD, BSD, UNIX, Support, Gallery, Release, Application, Softwar
e, Handbook, FAQ, Tutorials, Bugs, CVS, CVSup, News, Commercial Vendors, homepage, CTM, Unix" />
...
$ telnet www.uuasc.org 80
Trying 216.237.5.34...
Connected to compata.com (216.237.5.34).
Escape character is '^]'.
GET /
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Compata - Advanced Computer Applications</title>
$ telnet www.uuasc.org 80
Trying 216.237.5.34...
Connected to compata.com (216.237.5.34).
Escape character is '^]'.
GET / HTTP/1.0
Host: www.uuasc.org
HTTP/1.1 200 OK
Date: Sun, 09 Jan 2005 22:37:55 GMT
Server: Apache/2.0.51 (Fedora)
Last-Modified: Wed, 15 Dec 2004 06:49:44 GMT
ETag: "b04563-1181-f14eb200"
Accept-Ranges: bytes
Content-Length: 4481
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Content-Language: en
<!doctype HTML public "-//IETF//DTD HTML//EN">
<HTML>
<head>
<title>UNIX Users Association of Southern California
GET http://www.slashdot.org/instead of
GET /
$ ./bin/dump-proxy
listening on http://vanadium.sabren.com:42513/
$VAR1 = bless( {
'_protocol' => 'HTTP/1.1',
'_content' => '',
'_uri' => bless( do{\(my $o = 'http://www.uuasc.org/')}, 'URI::http' ),
'_headers' => bless( {
'proxy-connection' => 'keep-alive',
'accept-charset' => 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'user-agent' => 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0',
'keep-alive' => '300',
'accept' => 'text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5',
'accept-language' => 'en-us,en;q=0.5',
'accept-encoding' => 'gzip,deflate',
'host' => 'www.uuasc.org'
}, 'HTTP::Headers' ),
'_method' => 'GET'
}, 'HTTP::Request' );
$VAR1 = bless( {
'_protocol' => 'HTTP/1.1',
'_content' => '<!doctype HTML public "-//IETF//DTD HTML//EN">
<HTML>
<head>
<title>UNIX Users Association of Southern California>/title>
</head>
...
'_rc' => '200',
'_headers' => bless( {
'client-date' => 'Sun, 09 Jan 2005 23:14:59 GMT',
'etag' => '"b04563-1181-f14eb200"',
'content-type' => 'text/html; charset=ISO-8859-1',
'connection' => 'close',
'client-response-num' => 1,
'last-modified' => 'Wed, 15 Dec 2004 06:49:44 GMT',
'content-language' => 'en',
'accept-ranges' => 'bytes',
'date' => 'Sun, 09 Jan 2005 23:14:59 GMT',
'title' => 'UNIX Users Association of Southern California',
'client-peer' => '216.237.5.34:80',
'content-length' => '4481',
'server' => 'Apache/2.0.51 (Fedora)'
}, 'HTTP::Headers' ),
'_msg' => 'OK',
#!/usr/bin/perl -w
use strict;
use Socket;
use HTTP::Daemon;
use LWP::UserAgent;
use Data::Dumper;
$| = 1;
my $port = $ARGV[0] || 0;
my $daemon = HTTP::Daemon->new(LocalPort => $port, Reuse => 1);
my $agent = LWP::UserAgent->new;
warn "listening on @{[$daemon->url]}\n";
my $conn;
while ($conn = $daemon->accept) {
my $request = $conn->get_request;
print Dumper($request);
my $response = $agent->request($request);
print Dumper($response);
$conn->send_response($response);
}
HTTP/1.1 200 OK Date: Sun, 09 Jan 2005 22:37:55 GMT Server: Apache/2.0.51 (Fedora) Last-Modified: Wed, 15 Dec 2004 06:49:44 GMT ETag: "b04563-1181-f14eb200" Accept-Ranges: bytes Content-Length: 4481 Connection: close Content-Type: text/html; charset=ISO-8859-1 Content-Language: en
if (m{<title>(.*)</title>}i) {
print "Title is '$1'\n";<title>The title of this page is
very long</title>
<a href="foo.html">foo</a><a href="bar.html">bar</a>
instead of
<a href="foo.html">foo</a><a href="bar.html">bar</a>
if ($page =~ m{<b>(.*?)</title>}ims) {
$page =~ m{</P><H2>(.*?)</H2>.*?<FONT SIZE="2"><B>(.*?)</B></FONT><BR>\s*(.*?)<P>\s*<script LANGUAGE="JAVASCRIPT"}ms;
print "$1\n\n"; # title
print "$2\n\n"; # dept
my $story = $3;
$story =~ s/^\s*//;
$story =~ s/\s*$//;
$story =~ s/<[^>]*>//gs;
print "$story\n\n";
while ($page =~ m{\)</FONT>\s*<P>(.*?)</TD>}msg) {
...
$ ./bin/summarize slashdot-article.html | fmt
Classic Gerald Weinberg Essay Reprinted
from the talking-to-fatso dept.
danielread writes "Programmer abuse has been a popular topic recently,
especially within the gaming industry. However, excessive overtime and
overwork are not new problems for software professionals. Twenty years
ago, acclaimed author Gerald Weinberg wrote an essay called 'Personal
Chemistry and the Healthy Body,' which is as relevant for programmers
today as it was two decades ago. Given this topic's recent resurgence,
Mr. Weinberg was generous enough to let developer.* Magazine reprint
this classic essay."
------------------------------
I read the essay, but I couldn't find the passage where it talks about
how essential caffeine is to programming. I think I'm going to have to
go back and look harder...
...
use HTML::TokeParser;
my $parser = HTML::TokeParser->new($FILENAME)
or die "Can't open $FILENAME: $!\n";
while (my $token = $parser->get_token( )) {
my $type = $token->[0];
if ($type eq 'S') { ... } # start tag
elsif ($type eq 'E') { ... } # end tag
elsif ($type eq 'T') { ... } # text
elsif ($type eq 'C') { ... } # comment
elsif ($type eq 'D') { ... } # declaration
elsif ($type eq 'PI') { ... } # processing instruction
else { die "$type isn't a valid HTML token type" }
}
from Perl Cookbook, Second Edition, O'Reilly and Associates, 2003.
use HTML::Parser;
my $program = basename $0;
my $url = shift or die "Usage: $program url\n";
my $html = get($url) or die "Can't retrieve $url\n";
my $found_title = 0;
package TitleParser;
use base 'HTML::Parser';
my $p = TitleParser->new;
$p->parse($html);
$p->eof;
print "$url\n";
sub start {
my ($self, $tag, $attr) = @_;
if ($tag eq 'title') {
$found_title = 1;
}
}
sub text {
my ($self, $text) = @_;
if ($found_title) {
print "$text\n";
exit 0;
}
}
#!/home/kavery/bin/perl -w
use strict;
use warnings;
use File::Basename;
use LWP::Simple;
use HTML::TreeBuilder;
my $program = basename $0;
my $url = shift or die "Usage: $program url\n";
my $html = get($url) or die "Can't retrieve $url\n";
my $root = HTML::TreeBuilder->new;
$root->parse($html);
$root->eof;
my $title = $root->look_down(_tag => 'title');
if ($title) {
print $title->as_text;
} else {
print $url;
}
print "\n";
#!/usr/bin/perl -w
use strict;
use HTML::TreeBuilder 3; # make sure our version isn't ancient
my $root = HTML::TreeBuilder->new;
$root->parse( # parse a string...
q{
<ul>
<li>Ice cream.</li>
<li>Whipped cream.
<li>Hot apple pie <br>(mmm pie)</li>
</ul>
});
$root->eof( ); # done parsing for this tree
$root->dump; # print( ) a representation of the tree
$root->delete; # erase this tree because we're done with it
<html> @0 (IMPLICIT)
<head> @0.0 (IMPLICIT)
<body> @0.1 (IMPLICIT)
<ul> @0.1.0
<li> @0.1.0.0
"Ice cream."
<li> @0.1.0.1
"Whipped cream. "
<li> @0.1.0.2
"Hot apple pie "
<br> @0.1.0.2.1
"(mmm pie)"
from Perl & LWP, O'Reilly and Associates, 2002.
<table border="1"> <tr> <td><i>foo</i></td> </tr> <tr> <td>bar</td> </tr> <tr> <td><i>baz</i></td> </tr> </table>
<?xml version="1.0" encoding="ISO-8859-1"?> <rss version="2.0"> <channel> <title>MacNN | The Macintosh News Network: Linux/Unix</title> <link>http://www.macnn.com/</link> <description>MacNN is the leading source for news about Apple and the Mac industry. It offers news, reviews, discussion, tips, troubleshooting, links, and reviews every day. The best place for Mac News. Period.</description> <language>en-us</language> <lastBuildDate>Mon, 10 Jan 2005 00:55:02 -0500</lastBuildDate> <image> <title>The Macintosh News Network</title> <url>http://www4.macnn.com/macnn/MacNN_120x50_BW_w_DS.gif</url> <link>http://www.macnn.com</link> </image> <item> <title>Portlock now supports Yellow Dog Linux on PowerPC</title> <link>http://www.macnn.com/news/26877</link> <description>Portlock today announced the latest release of Portlock Storage Manager, which adds support for Yell...</description> <pubDate>Mon, 8 Nov 2004 09:25:00 -0500</pubDate> </item> <item> <title>Sun reclaims Apple exec for Solaris marketing</title> <link>http://www.macnn.com/news/26832</link> <description>Sun Microsystems has hired a new vice president of marketing for its Solaris operating system, lurin...</description> <pubDate>Tue, 2 Nov 2004 18:50:00 -0500</pubDate> </item>
<rss version="0.91">
<channel>
<title>XML.com</title>
<link>http://www.xml.com/</link>
<description>XML.com features a rich mix of information and services for the XML community.</description>
<language>en-us</language>
<item>
<title>Normalizing XML, Part 2</title>
<link>http://www.xml.com/pub/a/2002/12/04/normalizing.html</link>
<description>In this second and final look at applying relational normalization techniques to W3C XML Schema data modeling, Will Provost discusses when not to normalize, the scope of uniqueness and the fourth and fifth normal forms.</description>
</item>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
>
<channel rdf:about="http://www.xml.com/cs/xml/query/q/19">
<title>XML.com</title>
<link>http://www.xml.com/</link>
<description>XML.com features a rich mix of information and services for the XML community.</description>
<language>en-us</language>
<items>
<rdf:Seq>
<rdf:li rdf:resource="http://www.xml.com/pub/a/2002/12/04/normalizing.html"/>
<rdf:li rdf:resource="http://www.xml.com/pub/a/2002/12/04/som.html"/>
<rdf:li rdf:resource="http://www.xml.com/pub/a/2002/12/04/svg.html"/>
</rdf:Seq>
</items>
</channel>
<item rdf:about="http://www.xml.com/pub/a/2002/12/04/normalizing.html">
<title>Normalizing XML, Part 2</title>
<link>http://www.xml.com/pub/a/2002/12/04/normalizing.html</link>
<description>In this second and final look at applying relational normalization techniques to W3C XML Schema data modeling, Will Provost discusses when not to normalize, the scope of uniqueness and the fourth and fifth normal forms.</description>
<dc:creator>Will Provost</dc:creator>
<dc:date>2002-12-04</dc:date>
</item>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
<title>XML.com</title>
<link>http://www.xml.com/</link>
<description>XML.com features a rich mix of information and services for the XML community.</description>
<language>en-us</language>
<item>
<title>Normalizing XML, Part 2</title>
<link>http://www.xml.com/pub/a/2002/12/04/normalizing.html</link>
<description>In this second and final look at applying relational normalization techniques to W3C XML Schema data modeling, Will Provost discusses when not to normalize, the scope of uniqueness and the fourth and fifth normal forms.</description>
<dc:creator>Will Provost</dc:creator>
<dc:date>2002-12-04</dc:date>
</item>
Examples from "What is RSS?" by Mark Pilgrim, http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html
# create an RSS 0.91 file
use XML::RSS;
my $rss = new XML::RSS (version => '0.91');
$rss->channel(title => 'freshmeat.net',
link => 'http://freshmeat.net',
language => 'en',
description => 'the one-stop-shop for all your Linux software needs',
rating => '(PICS-1.1 "http://www.classify.org/safesurf/" 1 r (SS~~000 1))',
copyright => 'Copyright 1999, Freshmeat.net',
pubDate => 'Thu, 23 Aug 1999 07:00:00 GMT',
lastBuildDate => 'Thu, 23 Aug 1999 16:20:26 GMT',
docs => 'http://www.blahblah.org/fm.cdf',
managingEditor => 'scoop@freshmeat.net',
webMaster => 'scoop@freshmeat.net'
);
$rss->image(title => 'freshmeat.net',
url => 'http://freshmeat.net/images/fm.mini.jpg',
link => 'http://freshmeat.net',
width => 88,
height => 31,
description => 'This is the Freshmeat image stupid'
);
$rss->add_item(title => "GTKeyboard 0.85",
link => "http://freshmeat.net/news/1999/06/21/930003829.html",
description => 'blah blah'
);
$rss->skipHours(hour => 2);
$rss->skipDays(day => 1);
$rss->textinput(title => "quick finder",
description => "Use the text input below to search freshmeat",
name => "query",
link => "http://core.freshmeat.net/search.php3"
);
# print the RSS as a string
print $rss->as_string;
# or save it to a file
$rss->save("fm.rdf");
#!/usr/bin/env python
import sys, feedparser
def main():
for url in sys.argv[1:]:
feed = feedparser.parse(url)
for entry in feed['entries']:
print entry['title'], ' (', entry['link'], ')'
if __name__ == '__main__':
main()
$ ./bin/rssdump http://xml.metafilter.com/rss.xml
Torture Tapes ( http://www.metafilter.com/mefi/38493 )
If SEPTA is still around in six months... ( http://www.metafilter.com/mefi/38492 )
We Don't Need No Stinking Drummer! ( http://www.metafilter.com/mefi/38491 )
music ( http://www.metafilter.com/mefi/38490 )
Al Hartley ( http://www.metafilter.com/mefi/38489 )
print "Content-type: text/xml\n\n";
my $x = Template::Extract->new();
my %params;
path_info() =~ /(\w+)/ or die "No file name given!";
open IN, "rss/$1" or die "Can't open $file: $!";
while (<IN>) { /(\w+): (.*)/ and $params{$1} = $2; last if !/\S/; }
my $template = do {local $/; <IN>;};
$rss = new XML::RSS;
$rss->channel( title => $params{title}, link => $params{link},
description => $params{description} );
my $doc = join "\n", grep { /\S/ } split /\n/, get($params{link});
$doc =~ s/\r//g;
$doc =~ s/^\s+//g;
for (@{$x->extract($template, $doc)->{records}}) {
$rss->add_item(
title => $_->{title},
link => $_->{url},
description => $_->{content}
);
}
print $rss->as_string;
[% FOR records %]
<!--START OF ABSTRACT OF NEWSITEM-->
[% ... %]
<a href="[% url %]"><acronym title="Click here to read this article">
[% title %]</acronym></a> ([% date %]) <BR>
[% ... %]<font size="2">[% content %]</font></font></div>
[% ... %]