blog/content/post/2009-09-04-00001218.md

116 lines
13 KiB
Markdown
Raw Normal View History

2019-03-31 11:00:21 +00:00
---
title: スターバックスの店舗情報をスクレイピング
author: kazu634
date: 2009-09-04
url: /2009/09/04/_1323/
wordtwit_post_info:
- 'O:8:"stdClass":13:{s:6:"manual";b:0;s:11:"tweet_times";i:1;s:5:"delay";i:0;s:7:"enabled";i:1;s:10:"separation";s:2:"60";s:7:"version";s:3:"3.7";s:14:"tweet_template";b:0;s:6:"status";i:2;s:6:"result";a:0:{}s:13:"tweet_counter";i:2;s:13:"tweet_log_ids";a:1:{i:0;i:4767;}s:9:"hash_tags";a:0:{}s:8:"accounts";a:1:{i:0;s:7:"kazu634";}}'
categories:
- Perl
- scraper
---
<div class="section">
<p>
前から細々とやっていましたが、だいぶ慣れてきたので本格的に取り組みます。
</p>
<p>
とりあえずここに書き散らかしておきますね。
</p>
<h4>
リンクの取得
</h4>
<p>
<a href="http://www.starbucks.co.jp/search/index.html/" onclick="__gaTracker('send', 'event', 'outbound-article', 'http://www.starbucks.co.jp/search/index.html/', 'スターバックス コーヒー | 店舗検索');" target="_blank">スターバックス コーヒー | 店舗検索</a>からリンクを取得します。このXPathなら簡単なはず
</p>
<pre class="syntax-highlight">
<span class="synPreProc">#!/usr/bin/perl</span>
<span class="synStatement">use strict</span>;
<span class="synStatement">use </span>Web::Scraper;
<span class="synStatement">use </span>URI;
<span class="synStatement">my</span> <span class="synIdentifier">$uri</span> = URI-&#62;<span class="synStatement">new</span>(<span class="synConstant">&#34;http://www.starbucks.co.jp/search/index.html/&#34;</span>);
<span class="synStatement">my</span> <span class="synIdentifier">$scraper</span> = scraper {
process <span class="synConstant">'//area[@shape=&#34;RECT&#34;]'</span>, <span class="synConstant">'prefs[]'</span> =&#62; <span class="synConstant">'@href'</span>;
process <span class="synConstant">'//td[@class=&#34;SelectFromPlace&#34;]//a'</span>, <span class="synConstant">'citys[]'</span> =&#62; <span class="synConstant">'@href'</span>;
};
<span class="synStatement">my</span> <span class="synIdentifier">$result</span> = <span class="synIdentifier">$scraper</span>-&#62;scrape(<span class="synIdentifier">$uri</span>);
</pre>
<h4>
実行結果
</h4>
<pre class="syntax-highlight">
kazu634@srv634% perl 20090904225211_starbucks.pl ~/work/tmp_perl/scrap <span class="synStatement">[</span><span class="synConstant">4861</span><span class="synStatement">]</span>
---
citys:
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/tokyo.php
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/kanagawa.php
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/osaka.php
prefs:
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%8C%<span class="synConstant">97</span>%E6%B5%B7%E9%<span class="synConstant">81</span>%<span class="synConstant">93</span>
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E9%9D%<span class="synConstant">92</span>%E6%A3%AE%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%B2%A9%E6%<span class="synConstant">89</span>%8B%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%AE%AE%E5%9F%8E%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E7%A7%8B%E7%<span class="synConstant">94</span>%B0%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%B1%B1%E5%BD%A2%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E7%A6%8F%E5%B3%B6%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E8%8C%A8%E5%9F%8E%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E6%A0%<span class="synConstant">83</span>%E6%9C%A8%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E7%BE%A4%E9%A6%AC%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%9F%BC%E7%8E%<span class="synConstant">89</span>%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%8D%<span class="synConstant">83</span>%E8%<span class="synConstant">91</span>%<span class="synConstant">89</span>%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/tokyo.php
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/kanagawa.php
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%B1%B1%E6%A2%A8%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E9%<span class="synConstant">95</span>%B7%E9%<span class="synConstant">87</span>%8E%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E6%<span class="synConstant">96</span>%B0%E6%BD%9F%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%AF%8C%E5%B1%B1%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E7%9F%B3%E5%B7%9D%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E7%A6%8F%E4%BA%<span class="synConstant">95</span>%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E6%BB%8B%E8%B3%<span class="synConstant">80</span>%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E4%BA%AC%E9%<span class="synConstant">83</span>%BD%E5%BA%9C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/osaka.php
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%<span class="synConstant">85</span>%B5%E5%BA%AB%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%<span class="synConstant">92</span>%8C%E6%AD%8C%E5%B1%B1%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%A5%<span class="synConstant">88</span>%E8%<span class="synConstant">89</span>%AF%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E9%B3%A5%E5%8F%<span class="synConstant">96</span>%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%B3%B6%E6%A0%B9%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%B2%A1%E5%B1%B1%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%BA%<span class="synConstant">83</span>%E5%B3%B6%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%B1%B1%E5%8F%A3%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E7%A6%8F%E5%B2%A1%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E4%BD%<span class="synConstant">90</span>%E8%B3%<span class="synConstant">80</span>%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E9%<span class="synConstant">95</span>%B7%E5%B4%8E%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E7%<span class="synConstant">86</span>%8A%E6%9C%AC%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%A4%A7%E5%<span class="synConstant">88</span>%<span class="synConstant">86</span>%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%AE%AE%E5%B4%8E%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E9%B9%BF%E5%<span class="synConstant">85</span>%<span class="synConstant">90</span>%E5%B3%B6%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E6%B2%<span class="synConstant">96</span>%E7%B8%<span class="synConstant">84</span>%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%BE%B3%E5%B3%B6%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E9%A6%<span class="synConstant">99</span>%E5%B7%9D%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E6%<span class="synConstant">84</span>%9B%E5%AA%9B%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E9%AB%<span class="synConstant">98</span>%E7%9F%A5%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E9%9D%<span class="synConstant">99</span>%E5%B2%A1%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E6%<span class="synConstant">84</span>%9B%E7%9F%A5%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E5%B2%<span class="synConstant">90</span>%E9%<span class="synConstant">98</span>%9C%E7%9C%8C
- <span class="synStatement">!!</span>perl/scalar:URI::http http://www.starbucks.co.jp/search/result_city2.php?<span class="synIdentifier">SearchPerfecture</span>=%E4%B8%<span class="synConstant">89</span>%E9%<span class="synConstant">87</span>%8D%E7%9C%8C
</pre>
<h4>
「スタバ」に関連する最近のエントリ
</h4>
<ul>
<li>
<a href="http://d.hatena.ne.jp/sirocco634/20081214/1229221635" onclick="__gaTracker('send', 'event', 'outbound-article', 'http://d.hatena.ne.jp/sirocco634/20081214/1229221635', ' この考えに同意 &#8211; 武蔵の日記');" target="_blank"> この考えに同意 &#8211; 武蔵の日記</a>
</li>
<li>
<a href="http://d.hatena.ne.jp/sirocco634/20080423/1208960605" onclick="__gaTracker('send', 'event', 'outbound-article', 'http://d.hatena.ne.jp/sirocco634/20080423/1208960605', ' 戸塚modi &#8211; 武蔵の日記');" target="_blank"> 戸塚modi &#8211; 武蔵の日記</a>
</li>
</ul>
</div>