blog/content/post/2008-02-25-『新版perl言語プログラミング...

308 lines
16 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: 『新版Perl言語プログラミングレッスン入門編』第8章
author: kazu634
date: 2008-02-25
url: /2008/02/25/_877/
wordtwit_post_info:
- 'O:8:"stdClass":13:{s:6:"manual";b:0;s:11:"tweet_times";i:1;s:5:"delay";i:0;s:7:"enabled";i:1;s:10:"separation";s:2:"60";s:7:"version";s:3:"3.7";s:14:"tweet_template";b:0;s:6:"status";i:2;s:6:"result";a:0:{}s:13:"tweet_counter";i:2;s:13:"tweet_log_ids";a:1:{i:0;i:3771;}s:9:"hash_tags";a:0:{}s:8:"accounts";a:1:{i:0;s:7:"kazu634";}}'
categories:
- Perl
- Programming
---
<div class="section">
<p>
 「もっと正規表現」というセクション名だよー
</p>
<p>
<a name="seemore"></a>
</p>
<h4>
簡単なマッチ
</h4>
<p>
 「=~」という演算子を用いて色々とごにゃごにゃする。
</p>
<pre class="syntax-highlight">
<span class="synIdentifier">$str</span> =~<span class="synStatement"> /</span><span class="synSpecial">\d+</span><span class="synStatement">/</span>
</pre>
<p>
とすると、数字列が存在すればTrueを、存在しなければFalseを返す。
</p>
<h4>
マッチした範囲を取り出したい!
</h4>
<p>
 特殊な変数「$&」を用いると、マッチした範囲を取り出すことができる。
</p>
<pre class="syntax-highlight">
<span class="synStatement">use strict</span>;
<span class="synStatement">use warnings</span>;
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">&#34;The price is 300yen.&#34;</span>;
<span class="synStatement">if</span> (<span class="synIdentifier">$str</span> =~<span class="synStatement"> /</span><span class="synSpecial">\d+</span><span class="synStatement">/</span>) {
<span class="synStatement">print</span> <span class="synConstant">&#34;</span><span class="synIdentifier">$&#38;</span><span class="synConstant"> is the number</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
}
</pre>
<p>
この場合は、「$&」には「300」が入る。
</p>
<h4>
マッチした範囲を複数取り出す!
</h4>
<p>
 特殊変数「$&」はマッチした部分を取り出すのに役に立ちます。こいつを使うのはマッチさせたい箇所が一つだけのとき。もし文字列の中でマッチさせたい部分が複数ある時は$1, $2, $3, &#8230;, $nを用います。$1, $2, $3, &#8230;には正規表現の中で()に囲まれた部分が自動的に設定されます。
</p>
<pre class="syntax-highlight">
<span class="synStatement">use strict</span>;
<span class="synStatement">use warnings</span>;
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">&#34;168, 57, 37&#34;</span>;
<span class="synStatement">if</span> (<span class="synIdentifier">$str</span> =~<span class="synStatement"> /</span><span class="synSpecial">(\d+)</span><span class="synConstant">, </span><span class="synSpecial">(\d+)</span><span class="synConstant">, </span><span class="synSpecial">(\d+)</span><span class="synStatement">/</span>) {
<span class="synStatement">my</span> <span class="synIdentifier">$height</span> = <span class="synIdentifier">$1</span>;
<span class="synStatement">my</span> <span class="synIdentifier">$weight</span> = <span class="synIdentifier">$2</span>;
<span class="synStatement">my</span> <span class="synIdentifier">$age</span> = <span class="synIdentifier">$3</span>;
<span class="synStatement">print</span> <span class="synConstant">&#34;Height: </span><span class="synIdentifier">$height</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
<span class="synStatement">print</span> <span class="synConstant">&#34;Weight: </span><span class="synIdentifier">$weight</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
<span class="synStatement">print</span> <span class="synConstant">&#34;Age: </span><span class="synIdentifier">$age</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
}
</pre>
<p>
こいつの別な書き方としては
</p>
<pre class="syntax-highlight">
<span class="synStatement">use strict</span>;
<span class="synStatement">use warnings</span>;
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">&#34;168, 57, 37&#34;</span>;
<span class="synStatement">if</span> (<span class="synIdentifier">$str</span> =~<span class="synStatement"> /</span><span class="synSpecial">(\d+)</span><span class="synConstant">, </span><span class="synSpecial">(\d+)</span><span class="synConstant">, </span><span class="synSpecial">(\d+)</span><span class="synStatement">/</span>) {
<span class="synStatement">my</span> (<span class="synIdentifier">$height</span>, <span class="synIdentifier">$weight</span>, <span class="synIdentifier">$age</span>) = (<span class="synIdentifier">$1</span>, <span class="synIdentifier">$2</span>, <span class="synIdentifier">$3</span>);
<span class="synStatement">print</span> <span class="synConstant">&#34;Height: </span><span class="synIdentifier">$height</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
<span class="synStatement">print</span> <span class="synConstant">&#34;Weight: </span><span class="synIdentifier">$weight</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
<span class="synStatement">print</span> <span class="synConstant">&#34;Age: </span><span class="synIdentifier">$age</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
}
</pre>
<p>
上の二つのプログラムは「コンマで区切られた数字列を取り出す」という見方で作られている。仮に(数字列に限らず)「コンマで区切られた文字列を取り出す」という視点に立てば次のようになる。
</p>
<pre class="syntax-highlight">
<span class="synStatement">use strict</span>;
<span class="synStatement">use warnings</span>;
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">&#34;168, 57, 37&#34;</span>;
<span class="synStatement">my</span> (<span class="synIdentifier">$height</span>, <span class="synIdentifier">$weight</span>, <span class="synIdentifier">$age</span>) = <span class="synStatement">split</span>(<span class="synStatement">/</span><span class="synConstant">,</span><span class="synStatement">/</span>, <span class="synIdentifier">$str</span>);
<span class="synStatement">print</span> <span class="synConstant">&#34;Height: </span><span class="synIdentifier">$height</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
<span class="synStatement">print</span> <span class="synConstant">&#34;Weight: </span><span class="synIdentifier">$weight</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
<span class="synStatement">print</span> <span class="synConstant">&#34;Age: </span><span class="synIdentifier">$age</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
</pre>
<p>
ちなみに複数マッチするように正規表現をくんだとき、「$& = $1」である。また「$1, $2, $3, &#8230;, $n」は左括弧の順で代入されていく。
</p>
<h4>
\1という表記
</h4>
<p>
 \1は正規表現の中で用いて、直前の正規表現と同じものを指します。例えば「/[bcdfghjklmnpqrstvwxyz]([aeiou])\1」と表記することで、
</p>
<ul>
<li>
see
</li>
<li>
too
</li>
</ul>
<p>
などにマッチする。「()」は必須。
</p>
<h4>
変数$_とパターンマッチ
</h4>
<p>
 単に「/^From:/」と書くと、デフォルト変数の$_がパターンマッチの対象になる。
</p>
<h4>
簡単な置換
</h4>
<p>
 基本は
</p>
<pre class="syntax-highlight">
<span class="synStatement">s/</span><span class="synConstant">置換前</span><span class="synStatement">/</span><span class="synConstant">置換後</span><span class="synStatement">/</span>
</pre>
<p>
という形。例としては
</p>
<pre class="syntax-highlight">
<span class="synStatement">use strict</span>;
<span class="synStatement">use warnings</span>;
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">&#34;How I wonder what you are.</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
<span class="synIdentifier">$str</span> =~ <span class="synStatement">s/</span><span class="synConstant">what</span><span class="synStatement">/</span><span class="synConstant">who</span><span class="synStatement">/</span>;
<span class="synStatement">print</span> <span class="synIdentifier">$str</span>;
</pre>
<p>
こいつはマッチするのが複数あっても、最初の一つだけしか置換しない。全部置換させたい場合は「s/置換前/置換後/g」と表記する。
</p>
<h4>
特殊変数$&を置換で使う
</h4>
<p>
 言うまでもなく「$&」はマッチした部分を指す。
</p>
<pre class="syntax-highlight">
<span class="synStatement">use strict</span>;
<span class="synStatement">use warnings</span>;
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">&#34;How I wonder what you are.</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
<span class="synIdentifier">$str</span> =~ <span class="synStatement">s/</span><span class="synConstant">what</span><span class="synStatement">/</span><span class="synSpecial">\*</span><span class="synIdentifier">$&#38;</span><span class="synSpecial">\*</span><span class="synStatement">/</span>;
<span class="synStatement">print</span> <span class="synIdentifier">$str</span>;
</pre>
<p>
こいつだと、マッチした部分を「*」で囲んでいる。
</p>
<h4>
マッチした部分の削除
</h4>
<p>
 「s/置換前//」と表記する。
</p>
<h4>
特殊変数$1, $2, $3, &#8230;の利用
</h4>
<p>
 マッチの時と同じように、()でくくったパターンにマッチした文字列を左括弧の順に$1, $2, $3, &#8230; という特殊変数で得ることができます。
</p>
<pre class="syntax-highlight">
<span class="synStatement">use strict</span>;
<span class="synStatement">use warnings</span>;
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">&#34;The price is 300yen. The distance is 120km.</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
<span class="synIdentifier">$str</span> =~ <span class="synStatement">s/</span><span class="synSpecial">((\d+)([A-Za-z]+))</span><span class="synStatement">/</span><span class="synIdentifier">$2</span><span class="synConstant">&#60;</span><span class="synIdentifier">$3</span><span class="synConstant">&#62;</span><span class="synStatement">/g</span>;
<span class="synStatement">print</span> <span class="synIdentifier">$str</span>;
</pre>
<h4>
式を評価する修飾子 /e
</h4>
<p>
 修飾子「/e」をつけると、置換後の部分をPerlの式として評価計算してくれます。
</p>
<pre class="syntax-highlight">
<span class="synStatement">use strict</span>;
<span class="synStatement">use warnings</span>;
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">&#34;How I wonder what you are.</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
<span class="synIdentifier">$str</span> =~ <span class="synStatement">s/</span><span class="synSpecial">\w+</span><span class="synStatement">/</span><span class="synConstant">ucfirst(</span><span class="synIdentifier">$&#38;</span><span class="synConstant">)</span><span class="synStatement">/eg</span>;
<span class="synStatement">print</span> <span class="synIdentifier">$str</span>;
</pre>
<h4>
メタ文字 \b
</h4>
<p>
 正規表現でメタ文字 \b は、単語の境界にマッチするものです。
</p>
<h4>
tr///による置換の例
</h4>
<pre class="syntax-highlight">
<span class="synStatement">use strict</span>;
<span class="synStatement">use warnings</span>;
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">&#34;How I wonder what you are.</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
<span class="synIdentifier">$str</span> =~ <span class="synStatement">tr/</span><span class="synConstant">A-Za-z</span><span class="synStatement">/</span><span class="synConstant">a-zA-Z</span><span class="synStatement">/</span>;
<span class="synStatement">print</span> <span class="synIdentifier">$str</span>;
</pre>
<h4>
tr///の使い方
</h4>
<p>
 ここで使われている「tr///」は、次のような形式で用い、文字を置換したり削除したりするのに用いられます。使用例は
</p>
<pre class="syntax-highlight">
<span class="synStatement">tr/</span><span class="synConstant">置換元の文字リスト</span><span class="synStatement">/</span><span class="synConstant">置換先の文字リスト</span><span class="synStatement">/</span>オプション
</pre>
<p>
という感じ。tr///の戻り値は置換(あるいは削除)した文字数になります。
</p>
<h4>
tr///の慣用句(1): 英大文字の数を数える
</h4>
<pre class="syntax-highlight">
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">'This is Perl.'</span>;
<span class="synStatement">my</span> <span class="synIdentifier">$count</span> = (<span class="synIdentifier">$str</span> =~ <span class="synStatement">tr/</span><span class="synConstant">A-Z</span><span class="synStatement">/</span><span class="synConstant">A-Z</span><span class="synStatement">/</span>);
<span class="synStatement">print</span> <span class="synIdentifier">$count</span>, <span class="synConstant">&#34;</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
</pre>
<h4>
tr///の慣用句(2): $nameを$unameに代入し、$unameの方だけをすべて大文字にする
</h4>
<pre class="syntax-highlight">
<span class="synStatement">my</span> <span class="synIdentifier">$name</span> = <span class="synConstant">'Hiroshi'</span>;
(<span class="synStatement">my</span> <span class="synIdentifier">$uname</span> = <span class="synIdentifier">$name</span>) =~ <span class="synStatement">tr/</span><span class="synConstant">a-z</span><span class="synStatement">/</span><span class="synConstant">A-Z</span><span class="synStatement">/</span>;
<span class="synStatement">print</span> <span class="synConstant">&#34;</span><span class="synSpecial">\$</span><span class="synConstant">name = </span><span class="synIdentifier">$name</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
<span class="synStatement">print</span> <span class="synConstant">&#34;</span><span class="synSpecial">\$</span><span class="synConstant">uname = </span><span class="synIdentifier">$uname</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
</pre>
<h4>
tr///の慣用句(3): 英小文字を削除する
</h4>
<pre class="syntax-highlight">
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">'Yahoo!JAPAN &#38; Yahoo! (USA)'</span>;
<span class="synIdentifier">$str</span> =~ <span class="synStatement">tr/</span><span class="synConstant">a-z</span><span class="synStatement">//d</span>;
<span class="synStatement">print</span> <span class="synConstant">&#34;|</span><span class="synIdentifier">$str</span><span class="synConstant">|</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
</pre>
<h4>
tr///の慣用句(4): 英数字以外を1個のスペースに置換する
</h4>
<pre class="syntax-highlight">
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">'Yahoo!JAPAN &#38; Yahoo! (USA)'</span>;
<span class="synIdentifier">$str</span> =~ <span class="synStatement">tr/</span><span class="synConstant">A-Za-z0-9</span><span class="synStatement">/</span><span class="synConstant"> </span><span class="synStatement">/cs</span>;
<span class="synStatement">print</span> <span class="synConstant">&#34;|</span><span class="synIdentifier">$str</span><span class="synConstant">|</span><span class="synSpecial">\n</span><span class="synConstant">&#34;</span>;
</pre>
</div>