{"id":8056,"date":"2017-09-26T08:43:14","date_gmt":"2017-09-25T23:43:14","guid":{"rendered":"http:\/\/www.gesource.jp\/weblog\/?p=8056"},"modified":"2017-09-19T23:45:23","modified_gmt":"2017-09-19T14:45:23","slug":"python%e3%81%a7janome%e3%82%92%e4%bd%bf%e3%81%a3%e3%81%a6%e5%bd%a2%e6%85%8b%e7%b4%a0%e8%a7%a3%e6%9e%90","status":"publish","type":"post","link":"https:\/\/www.gesource.jp\/weblog\/?p=8056","title":{"rendered":"Python\u3067Janome\u3092\u4f7f\u3063\u3066\u5f62\u614b\u7d20\u89e3\u6790"},"content":{"rendered":"<p><a href=\"http:\/\/mocobeta.github.io\/janome\/\">Janome<\/a>\u306fPython\u3067\u66f8\u304b\u308c\u305f\u5f62\u614b\u7d20\u89e3\u6790\u306e\u30e9\u30a4\u30d6\u30e9\u30ea\u3067\u3059\u3002<br \/>\n\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u304c\u7c21\u5358\u3067\u3001Mecab\u3092\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u3067\u304d\u306a\u3044\u74b0\u5883\u3067\u3082\u4f7f\u7528\u3067\u304d\u308b\u70b9\u304c\u9b45\u529b\u3067\u3059\u3002<\/p>\n<h2>Janome\u306e\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb<\/h2>\n<p>\u6b21\u306e\u30b3\u30de\u30f3\u30c9\u3067\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u3067\u304d\u307e\u3059\u3002<\/p>\n<pre><code>pip install janome\n<\/code><\/pre>\n<h3>Janome\u306e\u4f7f\u3044\u65b9<\/h3>\n<p>Tokenizer\u3092\u30a4\u30f3\u30dd\u30fc\u30c8\u3057\u307e\u3059\u3002<\/p>\n<pre><code>&gt;&gt;&gt; from janome.tokenizer import Tokenizer\n<\/code><\/pre>\n<p>Tokenizer\u30aa\u30d6\u30b8\u30a7\u30af\u30c8\u3092\u4f5c\u308a\u307e\u3059\u3002<\/p>\n<pre><code>&gt;&gt;&gt; tokenizer = Tokenizer()\n<\/code><\/pre>\n<p>Tokenizer\u30aa\u30d6\u30b8\u30a7\u30af\u30c8\u306etokenize()\u30e1\u30bd\u30c3\u30c9\u306b\u89e3\u6790\u3059\u308b\u6587\u5b57\u5217\u3092\u6e21\u3057\u307e\u3059\u3002<br \/>\n\u8fd4\u308a\u5024\u306fToken\u30aa\u30d6\u30b8\u30a7\u30af\u30c8\u306elist\u3067\u3059\u3002<\/p>\n<pre><code>&gt;&gt;&gt; for token in tokenizer.tokenize(\"\u3059\u3082\u3082\u3082\u3082\u3082\u3082\u3082\u3082\u306e\u3046\u3061\"):\n...     print(token)\n...\n\u3059\u3082\u3082  \u540d\u8a5e,\u4e00\u822c,*,*,*,*,\u3059\u3082\u3082,\u30b9\u30e2\u30e2,\u30b9\u30e2\u30e2\n\u3082      \u52a9\u8a5e,\u4fc2\u52a9\u8a5e,*,*,*,*,\u3082,\u30e2,\u30e2\n\u3082\u3082    \u540d\u8a5e,\u4e00\u822c,*,*,*,*,\u3082\u3082,\u30e2\u30e2,\u30e2\u30e2\n\u3082      \u52a9\u8a5e,\u4fc2\u52a9\u8a5e,*,*,*,*,\u3082,\u30e2,\u30e2\n\u3082\u3082    \u540d\u8a5e,\u4e00\u822c,*,*,*,*,\u3082\u3082,\u30e2\u30e2,\u30e2\u30e2\n\u306e      \u52a9\u8a5e,\u9023\u4f53\u5316,*,*,*,*,\u306e,\u30ce,\u30ce\n\u3046\u3061    \u540d\u8a5e,\u975e\u81ea\u7acb,\u526f\u8a5e\u53ef\u80fd,*,*,*,\u3046\u3061,\u30a6\u30c1,\u30a6\u30c1\n<\/code><\/pre>\n<p>tokenize()\u30e1\u30bd\u30c3\u30c9\u306e\u5f15\u6570stream\u306bTrue\u3092\u6307\u5b9a\u3059\u308b\u3068\u3001\u8fd4\u308a\u5024\u304cgenerator\u306b\u306a\u308a\u307e\u3059\u3002<\/p>\n<p>tokenize()\u30e1\u30bd\u30c3\u30c9\u306e\u5f15\u6570wakati\u306bTrue\u3092\u6307\u5b9a\u3059\u308b\u3068\u3001\u8fd4\u308a\u5024\u304c\u8868\u5c64\u5f62(surface)\u306e\u307f\u306b\u306a\u308a\u307e\u3059\u3002<\/p>\n<pre><code>&gt;&gt;&gt; for token in tokenizer.tokenize(\"\u3059\u3082\u3082\u3082\u3082\u3082\u3082\u3082\u3082\u306e\u3046\u3061\", wakati=True):\n...     print(token)\n...\n\u3059\u3082\u3082\n\u3082\n\u3082\u3082\n\u3082\n\u3082\u3082\n\u306e\n\u3046\u3061\n<\/code><\/pre>\n<p>Token\u30aa\u30d6\u30b8\u30a7\u30af\u30c8\u306f\u3001\u6b21\u306e\u30a4\u30f3\u30b9\u30bf\u30f3\u30b9\u5909\u6570\u3092\u6301\u3061\u307e\u3059\u3002<\/p>\n<pre><code>surface (\u8868\u5c64\u5f62)\npart_of_speech (\u54c1\u8a5e)\ninfl_type (\u6d3b\u7528\u578b)\ninfl_form (\u6d3b\u7528\u5f62)\nbase_form (\u57fa\u672c\u5f62)\nreading (\u8aad\u307f)\nphonetic (\u767a\u97f3)\nnode_type\n<\/code><\/pre>\n<p>\u5b9f\u884c\u4f8b<\/p>\n<pre><code>&gt;&gt;&gt; tokens = tokenizer.tokenize(\"\u543e\u8f29\u306f\u732b\u3067\u3042\u308b\")\n&gt;&gt;&gt; print(tokens[0])\n\u543e\u8f29    \u540d\u8a5e,\u4ee3\u540d\u8a5e,\u4e00\u822c,*,*,*,\u543e\u8f29,\u30ef\u30ac\u30cf\u30a4,\u30ef\u30ac\u30cf\u30a4\n&gt;&gt;&gt; tokens[0].surface\n'\u543e\u8f29'\n&gt;&gt;&gt; tokens[0].part_of_speech\n'\u540d\u8a5e,\u4ee3\u540d\u8a5e,\u4e00\u822c,*'\n&gt;&gt;&gt; tokens[0].infl_type\n'*'\n&gt;&gt;&gt; tokens[0].infl_form\n'*'\n&gt;&gt;&gt; tokens[0].base_form\n'\u543e\u8f29'\n&gt;&gt;&gt; tokens[0].reading\n'\u30ef\u30ac\u30cf\u30a4'\n&gt;&gt;&gt; tokens[0].phonetic\n'\u30ef\u30ac\u30cf\u30a4'\n<\/code><\/pre>\n<p>part&#95;of&#95;speech\u306f\u30ab\u30f3\u30de\u533a\u5207\u308a\u306a\u306e\u3067\u3001\u6b21\u306e\u3088\u3046\u306b\u3057\u3066\u5024\u3092\u53d6\u5f97\u3067\u304d\u307e\u3059\u3002<\/p>\n<pre><code>&gt;&gt;&gt; tokens[0].part_of_speech.split(',')[0]\n'\u540d\u8a5e'\n&gt;&gt;&gt; tokens[0].part_of_speech.split(',')[1]\n'\u4ee3\u540d\u8a5e'\n&gt;&gt;&gt; tokens[0].part_of_speech.split(',')[2]\n'\u4e00\u822c'\n&gt;&gt;&gt; tokens[0].part_of_speech.split(',')[3]\n'*'\n<\/code><\/pre>\n<h2>\u6700\u5f8c\u306b<\/h2>\n<p>\u5b9f\u884c\u74b0\u5883<\/p>\n<ul>\n<li>Linux Mint<\/li>\n<li>Python \u30d0\u30fc\u30b8\u30e7\u30f33.5.2<\/li>\n<li>Janome \u30d0\u30fc\u30b8\u30e7\u30f30.3.5<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Janome\u306fPython\u3067\u66f8\u304b\u308c\u305f\u5f62\u614b\u7d20\u89e3\u6790\u306e\u30e9\u30a4\u30d6\u30e9\u30ea\u3067\u3059\u3002 \u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u304c\u7c21\u5358\u3067\u3001Mecab\u3092\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u3067\u304d\u306a\u3044\u74b0\u5883\u3067\u3082\u4f7f\u7528\u3067\u304d\u308b\u70b9\u304c\u9b45\u529b\u3067\u3059\u3002 Janome\u306e\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb \u6b21\u306e\u30b3\u30de\u30f3\u30c9\u3067\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u3067\u304d\u307e\u3059\u3002  &#8230;<\/p>\n<p><a href=\"https:\/\/www.gesource.jp\/weblog\/?p=8056\" class=\"more-link\">Continue reading &lsquo;Python\u3067Janome\u3092\u4f7f\u3063\u3066\u5f62\u614b\u7d20\u89e3\u6790&rsquo; &raquo;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[7],"tags":[],"class_list":["post-8056","post","type-post","status-publish","format-standard","hentry","category-python"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.gesource.jp\/weblog\/index.php?rest_route=\/wp\/v2\/posts\/8056","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.gesource.jp\/weblog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gesource.jp\/weblog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gesource.jp\/weblog\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gesource.jp\/weblog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=8056"}],"version-history":[{"count":0,"href":"https:\/\/www.gesource.jp\/weblog\/index.php?rest_route=\/wp\/v2\/posts\/8056\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.gesource.jp\/weblog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=8056"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gesource.jp\/weblog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=8056"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gesource.jp\/weblog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=8056"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}