308 lines
16 KiB
Markdown
308 lines
16 KiB
Markdown
---
|
||
title: 『新版Perl言語プログラミングレッスン入門編』第8章
|
||
author: kazu634
|
||
date: 2008-02-25
|
||
wordtwit_post_info:
|
||
- 'O:8:"stdClass":13:{s:6:"manual";b:0;s:11:"tweet_times";i:1;s:5:"delay";i:0;s:7:"enabled";i:1;s:10:"separation";s:2:"60";s:7:"version";s:3:"3.7";s:14:"tweet_template";b:0;s:6:"status";i:2;s:6:"result";a:0:{}s:13:"tweet_counter";i:2;s:13:"tweet_log_ids";a:1:{i:0;i:3771;}s:9:"hash_tags";a:0:{}s:8:"accounts";a:1:{i:0;s:7:"kazu634";}}'
|
||
categories:
|
||
- Perl
|
||
- Programming
|
||
|
||
---
|
||
<div class="section">
|
||
<p>
|
||
「もっと正規表現」というセクション名だよー
|
||
</p>
|
||
|
||
<p>
|
||
<a name="seemore"></a>
|
||
</p>
|
||
|
||
<h4>
|
||
簡単なマッチ
|
||
</h4>
|
||
|
||
<p>
|
||
「=~」という演算子を用いて色々とごにゃごにゃする。
|
||
</p>
|
||
|
||
<pre class="syntax-highlight">
|
||
<span class="synIdentifier">$str</span> =~<span class="synStatement"> /</span><span class="synSpecial">\d+</span><span class="synStatement">/</span>
|
||
</pre>
|
||
|
||
<p>
|
||
とすると、数字列が存在すればTrueを、存在しなければFalseを返す。
|
||
</p>
|
||
|
||
<h4>
|
||
マッチした範囲を取り出したい!
|
||
</h4>
|
||
|
||
<p>
|
||
特殊な変数「$&」を用いると、マッチした範囲を取り出すことができる。
|
||
</p>
|
||
|
||
<pre class="syntax-highlight">
|
||
<span class="synStatement">use strict</span>;
|
||
<span class="synStatement">use warnings</span>;
|
||
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">"The price is 300yen."</span>;
|
||
<span class="synStatement">if</span> (<span class="synIdentifier">$str</span> =~<span class="synStatement"> /</span><span class="synSpecial">\d+</span><span class="synStatement">/</span>) {
|
||
<span class="synStatement">print</span> <span class="synConstant">"</span><span class="synIdentifier">$&</span><span class="synConstant"> is the number</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
}
|
||
</pre>
|
||
|
||
<p>
|
||
この場合は、「$&」には「300」が入る。
|
||
</p>
|
||
|
||
<h4>
|
||
マッチした範囲を複数取り出す!
|
||
</h4>
|
||
|
||
<p>
|
||
特殊変数「$&」はマッチした部分を取り出すのに役に立ちます。こいつを使うのはマッチさせたい箇所が一つだけのとき。もし文字列の中でマッチさせたい部分が複数ある時は$1, $2, $3, …, $nを用います。$1, $2, $3, …には正規表現の中で()に囲まれた部分が自動的に設定されます。
|
||
</p>
|
||
|
||
<pre class="syntax-highlight">
|
||
<span class="synStatement">use strict</span>;
|
||
<span class="synStatement">use warnings</span>;
|
||
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">"168, 57, 37"</span>;
|
||
<span class="synStatement">if</span> (<span class="synIdentifier">$str</span> =~<span class="synStatement"> /</span><span class="synSpecial">(\d+)</span><span class="synConstant">, </span><span class="synSpecial">(\d+)</span><span class="synConstant">, </span><span class="synSpecial">(\d+)</span><span class="synStatement">/</span>) {
|
||
<span class="synStatement">my</span> <span class="synIdentifier">$height</span> = <span class="synIdentifier">$1</span>;
|
||
<span class="synStatement">my</span> <span class="synIdentifier">$weight</span> = <span class="synIdentifier">$2</span>;
|
||
<span class="synStatement">my</span> <span class="synIdentifier">$age</span> = <span class="synIdentifier">$3</span>;
|
||
<span class="synStatement">print</span> <span class="synConstant">"Height: </span><span class="synIdentifier">$height</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
<span class="synStatement">print</span> <span class="synConstant">"Weight: </span><span class="synIdentifier">$weight</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
<span class="synStatement">print</span> <span class="synConstant">"Age: </span><span class="synIdentifier">$age</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
}
|
||
</pre>
|
||
|
||
<p>
|
||
こいつの別な書き方としては
|
||
</p>
|
||
|
||
<pre class="syntax-highlight">
|
||
<span class="synStatement">use strict</span>;
|
||
<span class="synStatement">use warnings</span>;
|
||
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">"168, 57, 37"</span>;
|
||
<span class="synStatement">if</span> (<span class="synIdentifier">$str</span> =~<span class="synStatement"> /</span><span class="synSpecial">(\d+)</span><span class="synConstant">, </span><span class="synSpecial">(\d+)</span><span class="synConstant">, </span><span class="synSpecial">(\d+)</span><span class="synStatement">/</span>) {
|
||
<span class="synStatement">my</span> (<span class="synIdentifier">$height</span>, <span class="synIdentifier">$weight</span>, <span class="synIdentifier">$age</span>) = (<span class="synIdentifier">$1</span>, <span class="synIdentifier">$2</span>, <span class="synIdentifier">$3</span>);
|
||
<span class="synStatement">print</span> <span class="synConstant">"Height: </span><span class="synIdentifier">$height</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
<span class="synStatement">print</span> <span class="synConstant">"Weight: </span><span class="synIdentifier">$weight</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
<span class="synStatement">print</span> <span class="synConstant">"Age: </span><span class="synIdentifier">$age</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
}
|
||
</pre>
|
||
|
||
<p>
|
||
上の二つのプログラムは「コンマで区切られた数字列を取り出す」という見方で作られている。仮に(数字列に限らず)「コンマで区切られた文字列を取り出す」という視点に立てば次のようになる。
|
||
</p>
|
||
|
||
<pre class="syntax-highlight">
|
||
<span class="synStatement">use strict</span>;
|
||
<span class="synStatement">use warnings</span>;
|
||
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">"168, 57, 37"</span>;
|
||
<span class="synStatement">my</span> (<span class="synIdentifier">$height</span>, <span class="synIdentifier">$weight</span>, <span class="synIdentifier">$age</span>) = <span class="synStatement">split</span>(<span class="synStatement">/</span><span class="synConstant">,</span><span class="synStatement">/</span>, <span class="synIdentifier">$str</span>);
|
||
<span class="synStatement">print</span> <span class="synConstant">"Height: </span><span class="synIdentifier">$height</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
<span class="synStatement">print</span> <span class="synConstant">"Weight: </span><span class="synIdentifier">$weight</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
<span class="synStatement">print</span> <span class="synConstant">"Age: </span><span class="synIdentifier">$age</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
</pre>
|
||
|
||
<p>
|
||
ちなみに複数マッチするように正規表現をくんだとき、「$& = $1」である。また「$1, $2, $3, …, $n」は左括弧の順で代入されていく。
|
||
</p>
|
||
|
||
<h4>
|
||
\1という表記
|
||
</h4>
|
||
|
||
<p>
|
||
\1は正規表現の中で用いて、直前の正規表現と同じものを指します。例えば「/[bcdfghjklmnpqrstvwxyz]([aeiou])\1」と表記することで、
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
see
|
||
</li>
|
||
<li>
|
||
too
|
||
</li>
|
||
</ul>
|
||
|
||
<p>
|
||
などにマッチする。「()」は必須。
|
||
</p>
|
||
|
||
<h4>
|
||
変数$_とパターンマッチ
|
||
</h4>
|
||
|
||
<p>
|
||
単に「/^From:/」と書くと、デフォルト変数の$_がパターンマッチの対象になる。
|
||
</p>
|
||
|
||
<h4>
|
||
簡単な置換
|
||
</h4>
|
||
|
||
<p>
|
||
基本は
|
||
</p>
|
||
|
||
<pre class="syntax-highlight">
|
||
<span class="synStatement">s/</span><span class="synConstant">置換前</span><span class="synStatement">/</span><span class="synConstant">置換後</span><span class="synStatement">/</span>
|
||
</pre>
|
||
|
||
<p>
|
||
という形。例としては
|
||
</p>
|
||
|
||
<pre class="syntax-highlight">
|
||
<span class="synStatement">use strict</span>;
|
||
<span class="synStatement">use warnings</span>;
|
||
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">"How I wonder what you are.</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
<span class="synIdentifier">$str</span> =~ <span class="synStatement">s/</span><span class="synConstant">what</span><span class="synStatement">/</span><span class="synConstant">who</span><span class="synStatement">/</span>;
|
||
<span class="synStatement">print</span> <span class="synIdentifier">$str</span>;
|
||
</pre>
|
||
|
||
<p>
|
||
こいつはマッチするのが複数あっても、最初の一つだけしか置換しない。全部置換させたい場合は「s/置換前/置換後/g」と表記する。
|
||
</p>
|
||
|
||
<h4>
|
||
特殊変数$&を置換で使う
|
||
</h4>
|
||
|
||
<p>
|
||
言うまでもなく「$&」はマッチした部分を指す。
|
||
</p>
|
||
|
||
<pre class="syntax-highlight">
|
||
<span class="synStatement">use strict</span>;
|
||
<span class="synStatement">use warnings</span>;
|
||
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">"How I wonder what you are.</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
<span class="synIdentifier">$str</span> =~ <span class="synStatement">s/</span><span class="synConstant">what</span><span class="synStatement">/</span><span class="synSpecial">\*</span><span class="synIdentifier">$&</span><span class="synSpecial">\*</span><span class="synStatement">/</span>;
|
||
<span class="synStatement">print</span> <span class="synIdentifier">$str</span>;
|
||
</pre>
|
||
|
||
<p>
|
||
こいつだと、マッチした部分を「*」で囲んでいる。
|
||
</p>
|
||
|
||
<h4>
|
||
マッチした部分の削除
|
||
</h4>
|
||
|
||
<p>
|
||
「s/置換前//」と表記する。
|
||
</p>
|
||
|
||
<h4>
|
||
特殊変数$1, $2, $3, …の利用
|
||
</h4>
|
||
|
||
<p>
|
||
マッチの時と同じように、()でくくったパターンにマッチした文字列を左括弧の順に$1, $2, $3, … という特殊変数で得ることができます。
|
||
</p>
|
||
|
||
<pre class="syntax-highlight">
|
||
<span class="synStatement">use strict</span>;
|
||
<span class="synStatement">use warnings</span>;
|
||
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">"The price is 300yen. The distance is 120km.</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
<span class="synIdentifier">$str</span> =~ <span class="synStatement">s/</span><span class="synSpecial">((\d+)([A-Za-z]+))</span><span class="synStatement">/</span><span class="synIdentifier">$2</span><span class="synConstant"><</span><span class="synIdentifier">$3</span><span class="synConstant">></span><span class="synStatement">/g</span>;
|
||
<span class="synStatement">print</span> <span class="synIdentifier">$str</span>;
|
||
</pre>
|
||
|
||
<h4>
|
||
式を評価する修飾子 /e
|
||
</h4>
|
||
|
||
<p>
|
||
修飾子「/e」をつけると、置換後の部分をPerlの式として評価(計算)してくれます。
|
||
</p>
|
||
|
||
<pre class="syntax-highlight">
|
||
<span class="synStatement">use strict</span>;
|
||
<span class="synStatement">use warnings</span>;
|
||
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">"How I wonder what you are.</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
<span class="synIdentifier">$str</span> =~ <span class="synStatement">s/</span><span class="synSpecial">\w+</span><span class="synStatement">/</span><span class="synConstant">ucfirst(</span><span class="synIdentifier">$&</span><span class="synConstant">)</span><span class="synStatement">/eg</span>;
|
||
<span class="synStatement">print</span> <span class="synIdentifier">$str</span>;
|
||
</pre>
|
||
|
||
<h4>
|
||
メタ文字 \b
|
||
</h4>
|
||
|
||
<p>
|
||
正規表現でメタ文字 \b は、単語の境界にマッチするものです。
|
||
</p>
|
||
|
||
<h4>
|
||
tr///による置換の例
|
||
</h4>
|
||
|
||
<pre class="syntax-highlight">
|
||
<span class="synStatement">use strict</span>;
|
||
<span class="synStatement">use warnings</span>;
|
||
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">"How I wonder what you are.</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
<span class="synIdentifier">$str</span> =~ <span class="synStatement">tr/</span><span class="synConstant">A-Za-z</span><span class="synStatement">/</span><span class="synConstant">a-zA-Z</span><span class="synStatement">/</span>;
|
||
<span class="synStatement">print</span> <span class="synIdentifier">$str</span>;
|
||
</pre>
|
||
|
||
<h4>
|
||
tr///の使い方
|
||
</h4>
|
||
|
||
<p>
|
||
ここで使われている「tr///」は、次のような形式で用い、文字を置換したり削除したりするのに用いられます。使用例は
|
||
</p>
|
||
|
||
<pre class="syntax-highlight">
|
||
<span class="synStatement">tr/</span><span class="synConstant">置換元の文字リスト</span><span class="synStatement">/</span><span class="synConstant">置換先の文字リスト</span><span class="synStatement">/</span>オプション
|
||
</pre>
|
||
|
||
<p>
|
||
という感じ。tr///の戻り値は置換(あるいは削除)した文字数になります。
|
||
</p>
|
||
|
||
<h4>
|
||
tr///の慣用句(1): 英大文字の数を数える
|
||
</h4>
|
||
|
||
<pre class="syntax-highlight">
|
||
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">'This is Perl.'</span>;
|
||
<span class="synStatement">my</span> <span class="synIdentifier">$count</span> = (<span class="synIdentifier">$str</span> =~ <span class="synStatement">tr/</span><span class="synConstant">A-Z</span><span class="synStatement">/</span><span class="synConstant">A-Z</span><span class="synStatement">/</span>);
|
||
<span class="synStatement">print</span> <span class="synIdentifier">$count</span>, <span class="synConstant">"</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
</pre>
|
||
|
||
<h4>
|
||
tr///の慣用句(2): $nameを$unameに代入し、$unameの方だけをすべて大文字にする
|
||
</h4>
|
||
|
||
<pre class="syntax-highlight">
|
||
<span class="synStatement">my</span> <span class="synIdentifier">$name</span> = <span class="synConstant">'Hiroshi'</span>;
|
||
(<span class="synStatement">my</span> <span class="synIdentifier">$uname</span> = <span class="synIdentifier">$name</span>) =~ <span class="synStatement">tr/</span><span class="synConstant">a-z</span><span class="synStatement">/</span><span class="synConstant">A-Z</span><span class="synStatement">/</span>;
|
||
<span class="synStatement">print</span> <span class="synConstant">"</span><span class="synSpecial">\$</span><span class="synConstant">name = </span><span class="synIdentifier">$name</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
<span class="synStatement">print</span> <span class="synConstant">"</span><span class="synSpecial">\$</span><span class="synConstant">uname = </span><span class="synIdentifier">$uname</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
</pre>
|
||
|
||
<h4>
|
||
tr///の慣用句(3): 英小文字を削除する
|
||
</h4>
|
||
|
||
<pre class="syntax-highlight">
|
||
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">'Yahoo!JAPAN & Yahoo! (USA)'</span>;
|
||
<span class="synIdentifier">$str</span> =~ <span class="synStatement">tr/</span><span class="synConstant">a-z</span><span class="synStatement">//d</span>;
|
||
<span class="synStatement">print</span> <span class="synConstant">"|</span><span class="synIdentifier">$str</span><span class="synConstant">|</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
</pre>
|
||
|
||
<h4>
|
||
tr///の慣用句(4): 英数字以外を1個のスペースに置換する
|
||
</h4>
|
||
|
||
<pre class="syntax-highlight">
|
||
<span class="synStatement">my</span> <span class="synIdentifier">$str</span> = <span class="synConstant">'Yahoo!JAPAN & Yahoo! (USA)'</span>;
|
||
<span class="synIdentifier">$str</span> =~ <span class="synStatement">tr/</span><span class="synConstant">A-Za-z0-9</span><span class="synStatement">/</span><span class="synConstant"> </span><span class="synStatement">/cs</span>;
|
||
<span class="synStatement">print</span> <span class="synConstant">"|</span><span class="synIdentifier">$str</span><span class="synConstant">|</span><span class="synSpecial">\n</span><span class="synConstant">"</span>;
|
||
</pre>
|
||
</div>
|