blog/content/post/2009-09-04-00001218.md

116 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: スターバックスの店舗情報をスクレイピング
author: kazu634
date: 2009-09-04
url: /2009/09/04/_1323/
wordtwit_post_info:
- 'O:8:"stdClass":13:{s:6:"manual";b:0;s:11:"tweet_times";i:1;s:5:"delay";i:0;s:7:"enabled";i:1;s:10:"separation";s:2:"60";s:7:"version";s:3:"3.7";s:14:"tweet_template";b:0;s:6:"status";i:2;s:6:"result";a:0:{}s:13:"tweet_counter";i:2;s:13:"tweet_log_ids";a:1:{i:0;i:4767;}s:9:"hash_tags";a:0:{}s:8:"accounts";a:1:{i:0;s:7:"kazu634";}}'
categories:
- Perl
- scraper
---
<div class="section">
<p>
前から細々とやっていましたが、だいぶ慣れてきたので本格的に取り組みます。
</p>
<p>
とりあえずここに書き散らかしておきますね。
</p>
<h4>
リンクの取得
</h4>
<p>
<a href="http://www.starbucks.co.jp/search/index.html/" onclick="__gaTracker('send', 'event', 'outbound-article', 'http://www.starbucks.co.jp/search/index.html/', 'スターバックス コーヒー | 店舗検索');" target="_blank">スターバックス コーヒー | 店舗検索</a>からリンクを取得します。このXPathなら簡単なはず
</p>
<pre class="syntax-highlight">
<span class="synPreProc">#!/usr/bin/perl</span>
<span class="synStatement">use strict</span>;
<span class="synStatement">use </span>Web::Scraper;
<span class="synStatement">use </span>URI;
<span class="synStatement">my</span> <span class="synIdentifier">$uri</span> = URI-&#62;<span class="synStatement">new</span>(<span class="synConstant">&#34;http://www.starbucks.co.jp/search/index.html/&#34;</span>);
<span class="synStatement">my</span> <span class="synIdentifier">$scraper</span> = scraper {
process <span class="synConstant">'//area[@shape=&#34;RECT&#34;]'</span>, <span class="synConstant">'prefs[]'</span> =&#62; <span class="synConstant">'@href'</span>;
process <span class="synConstant">'//td[@class=&#34;SelectFromPlace&#34;]//a'</span>, <span class="synConstant">'citys[]'</span> =&#62; <span class="synConstant">'@href'</span>;
};
<span class="synStatement">my</span> <span class="synIdentifier">$result</span> = <span class="synIdentifier">$scraper</span>-&#62;scrape(<span class="synIdentifier">$uri</span>);
</pre>
<h4>
実行結果
</h4>
<pre class="syntax-highlight">
kazu634@srv634% perl 20090904225211_starbucks.pl ~/work/tmp_perl/scrap <span class="synStatement">[</span><span class="synConstant">4861</span><span class="synStatement">]</span>
---
citys:
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/tokyo.php
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/kanagawa.php
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/osaka.php
prefs:
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%8C%<span class="synConstant">97</span>%E6%B5%B7%E9%<span class="synConstant">81</span>%<span class="synConstant">93</span>
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E9%9D%<span class="synConstant">92</span>%E6%A3%AE%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%B2%A9%E6%<span class="synConstant">89</span>%8B%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%AE%AE%E5%9F%8E%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E7%A7%8B%E7%<span class="synConstant">94</span>%B0%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%B1%B1%E5%BD%A2%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E7%A6%8F%E5%B3%B6%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E8%8C%A8%E5%9F%8E%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E6%A0%<span class="synConstant">83</span>%E6%9C%A8%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E7%BE%A4%E9%A6%AC%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%9F%BC%E7%8E%<span class="synConstant">89</span>%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%8D%<span class="synConstant">83</span>%E8%<span class="synConstant">91</span>%<span class="synConstant">89</span>%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/tokyo.php
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/kanagawa.php
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%B1%B1%E6%A2%A8%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E9%<span class="synConstant">95</span>%B7%E9%<span class="synConstant">87</span>%8E%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E6%<span class="synConstant">96</span>%B0%E6%BD%9F%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%AF%8C%E5%B1%B1%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E7%9F%B3%E5%B7%9D%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E7%A6%8F%E4%BA%<span class="synConstant">95</span>%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E6%BB%8B%E8%B3%<span class="synConstant">80</span>%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E4%BA%AC%E9%<span class="synConstant">83</span>%BD%E5%BA%9C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/osaka.php
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%<span class="synConstant">85</span>%B5%E5%BA%AB%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%<span class="synConstant">92</span>%8C%E6%AD%8C%E5%B1%B1%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%A5%<span class="synConstant">88</span>%E8%<span class="synConstant">89</span>%AF%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E9%B3%A5%E5%8F%<span class="synConstant">96</span>%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%B3%B6%E6%A0%B9%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%B2%A1%E5%B1%B1%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%BA%<span class="synConstant">83</span>%E5%B3%B6%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%B1%B1%E5%8F%A3%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E7%A6%8F%E5%B2%A1%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E4%BD%<span class="synConstant">90</span>%E8%B3%<span class="synConstant">80</span>%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E9%<span class="synConstant">95</span>%B7%E5%B4%8E%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E7%<span class="synConstant">86</span>%8A%E6%9C%AC%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%A4%A7%E5%<span class="synConstant">88</span>%<span class="synConstant">86</span>%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%AE%AE%E5%B4%8E%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E9%B9%BF%E5%<span class="synConstant">85</span>%<span class="synConstant">90</span>%E5%B3%B6%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E6%B2%<span class="synConstant">96</span>%E7%B8%<span class="synConstant">84</span>%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%BE%B3%E5%B3%B6%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E9%A6%<span class="synConstant">99</span>%E5%B7%9D%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E6%<span class="synConstant">84</span>%9B%E5%AA%9B%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E9%AB%<span class="synConstant">98</span>%E7%9F%A5%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E9%9D%<span class="synConstant">99</span>%E5%B2%A1%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E6%<span class="synConstant">84</span>%9B%E7%9F%A5%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%B2%<span class="synConstant">90</span>%E9%<span class="synConstant">98</span>%9C%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E4%B8%<span class="synConstant">89</span>%E9%<span class="synConstant">87</span>%8D%E7%9C%8C
</pre>
<h4>
「スタバ」に関連する最近のエントリ
</h4>
<ul>
<li>
<a href="http://d.hatena.ne.jp/sirocco634/20081214/1229221635" onclick="__gaTracker('send', 'event', 'outbound-article', 'http://d.hatena.ne.jp/sirocco634/20081214/1229221635', ' この考えに同意 &#8211; 武蔵の日記');" target="_blank"> この考えに同意 &#8211; 武蔵の日記</a>
</li>
<li>
<a href="http://d.hatena.ne.jp/sirocco634/20080423/1208960605" onclick="__gaTracker('send', 'event', 'outbound-article', 'http://d.hatena.ne.jp/sirocco634/20080423/1208960605', ' 戸塚modi &#8211; 武蔵の日記');" target="_blank"> 戸塚modi &#8211; 武蔵の日記</a>
</li>
</ul>
</div>