最終目標は図書館蔵書検索をするスクリプトを作ること。

指定したURLが指し示すものをダウンロードする

LWP::Simpleを使うよ:

# === Libraries ===
use strict;
use warnings;
# LWP & Encode modules
use LWP::Simple;
use utf8;
use Encode;
binmode(STDERR, ':raw :encoding(utf8)');
# === Main part ===
# Here set the url.
my $url = "http://www.klnet.pref.kanagawa.jp/opac/OPP0200";
# get then content of the url.
my $content = get($url);
die "$url を読み込めませんでした" unless defined $content;
# decoding.
# Note how to use "decode":
# decode($content's character code, the target string)
$content = decode('utf-8', $content);
print($content);

LWP:UserAgent

# === Libraries ===
use strict;
use warnings;
# LWP & Encode modules
use LWP 5.64;
use utf8;
use Encode;
binmode(STDERR, ':raw :encoding(utf8)');
# === Main part ===
# Here set the url.
my $url = "http://www.klnet.pref.kanagawa.jp/opac/OPP0200";
# get then content of the url.
my $browser = LWP::UserAgent->new;
my $response = $browser->get($url);
die "$url を読み込めませんでした。", $response->status_line
unless $response->is_success;
die "HTMLを読み込んだはずなのに、", $response->content_type,
"が返ってきました。"
unless $response-> content_type eq 'text/html';
# decoding.
# Note how to use "decode":
# decode($content's character code, the target string)
my $content = decode('utf8', $response->content);
print $content;

ブラウザ情報とかも一緒に送信するよ

use strict;
use warnings;
# LWP module
use LWP 5.64;
# Character Encoding
use Encode;
use utf8;
binmode( STDERR, ':raw :encoding(utf8)' );
my $url = 'http://tv.yahoo.co.jp/vhf/kanagawa/realtime.html';
# get then content of the url.
my $browser  = LWP::UserAgent->new;
my $response = $browser->get(
$url,
'User-Agent' => 'Mozilla/4.77 [en] (Win98; U)',
'Accept' =>
'image/gif, image/x-xbitmap, image.jpeg, image.pjpeg, image/png, */*',
'Accept-Encoding' => 'gzip',
'Accept-Language' => 'ja,en',
'Accept-Charset'  => 'iso-8859-1, *, utf8',
);
die "$url を読み込めませんでした。", $response->status_line
unless $response->is_success;
die "HTMLを読み込んだはずなのに、", $response->content_type,
"が返ってきました。"
unless $response->content_type eq 'text/html';
# decoding.
# Note how to use "decode":
# decode($content's character code, the target string)
my $content = decode( 'euc-jp', $response->content );
print($content);

Spidering hacks―ウェブ情報ラクラク取得テクニック101選

作者: Kevin Hemenway,Tara Calishain,村上雅章
出版社/メーカー: オライリー・ジャパン
発売日: 2004/05
メディア: 単行本
購入: 52人クリック: 904回
この商品を含むブログ (103件) を見る