PythonでXMLファイルを読み込む方法～コーディングのコツ～

2024-02-10

こんにちは、挑戦するエンジニアのりですよ。

みなさんは、Pythonのプログラムを作るときに、XMLファイルを扱うことってありませんか？
XMLファイルはTEXT形式のファイルですが、ライブラリの使い方がわからなと大変ですよね。

今回は、PythonでXMLファイルを読み込む方法を解説します。

始めに

この記事では、PythonのプログラムでXMLファイルを読み込んで値を取得する方法を解説します。

この記事を読んでいただくことで、XMLファイルからPythonのプログラムで値を取得する方法を知っていただくことが出来ます。

筆者の環境（Windows11 23H2、Python3.12）での手順を紹介します。
また、使用するライブラリはPython3標準で使えるため追加でインストールする必要はありません。

参考にしたサイト
https://docs.python.org/ja/3/library/index.html

ぜひ、挑戦してみてください！

XMLファイルを読み込むサンプルプログラム（その１）

まず、読み込み対象のXMLファイルです。

<?xml version="1.0" encoding="utf-8"?>
<root>
  <group1>
    <setting1>データ１</setting1>
    <setting2>データ２</setting2>
  </group1>
  <group2>
    <settings>データ３</settings>
    <settings>データ４</settings>
  </group2>
</root>

次に、読み込みプログラムです。

import xml.etree.ElementTree as ET

xmlfilepath: str = 'sample01.xml'
tree = ET.parse(xmlfilepath)

# ルートを取得する
root = tree.getroot()
print('root.tag=' + root.tag)

# 子ノードを取得する（その１）
child = root.find('group1')
print('child.tag=' + child.tag)

# 孫ノードを取得する（その１）
grandchild = child.find('setting1')
print('grandchild.tag=' + grandchild.tag)
print('grandchild.text=' + grandchild.text)

# 子ノードを取得する（その２）
# ※見つからなくても例外はスローされない（child.tagでエラーになる）
child = root.find('group2')
print('child.tag=' + child.tag)

# 孫ノードを取得する（その２）
grandchildren = child.findall('settings')
print('grandchildren=' + str(grandchildren))

# 孫ノードのデータを取得する
grandchildrentext = child.findtext('settings')
print('grandchildrentext=' + grandchildrentext)

そして、実行結果はこうなります。

root.tag=root
child.tag=group1
grandchild.tag=setting1
grandchild.text=データ１
child.tag=group2
grandchildren=[<Element 'settings' at 0x0000029FB5B32250>, <Element 'settings' at 0x0000029FB5B322F0>]
grandchildrentext=データ３

それでは、サンプルプログラム（その１）の解説を見ていきましょう。

よくある設定ファイルのXMLの場合

一つ目のサンプルは、設定ファイルなどによくあるXMLの例です。

「root」のタグから子ノード「group1」その次のノード「setting1」まで、タグのつながりが一意です。
このようなタイプのXMLファイルの場合、値の取得は次の手順で行います。

使用ライブラリのインポート

import xml.etree.ElementTree as ET

Python3標準のライブラリ「xml.etree.ElementTree」をインポートします。
これを使うことで、XMLファイルを読み込んで値を取得できます。

ElementTreeの取得

xmlfilepath: str = 'sample01.xml'
tree = ET.parse(xmlfilepath)

XMLファイルのパスを含めたファイル名を指定して、データを取得します。

ルートを取得する

root = tree.getroot()

取得したデータのうち、root配下のデータを取得します。

子ノードを取得する（その１）

child = root.find('group1')

取得したroot配下のデータから、子ノード「group1」を取得するコードです。
「find」メソッドに取得する子ノードのタグ名「group1」を引数に指定します。
この時、引数に指定されたタグ名が見つからなくても、例外はスローされないので注意が必要です。

孫ノードを取得する（その１）

grandchild = child.find('setting1')
print('grandchild.text=' + grandchild.text)

取得した子ノードのデータから、さらにその子ノード（孫ノード）からデータを取得するコードです。
「find」メソッドに、取得する孫ノードのタグ名「setting1」を引数に指定します。
値「データ１」は「text」から取得できます。

次は、find命令の使用法の番外編です。

find命令の使用法の番外編

find命令の使用法は他にもあります。

「root」のタグから子ノード「group2」その次のノード「settings」まで、タグのつながりが一意ではありません。

子ノードを取得する（その２）

child = root.find('group2')
print('child.tag=' + child.tag)

取得したroot配下のデータから、子ノード「group2」を取得します。
「find」メソッドに取得する子ノードのタグ名「group2」を引数に指定します。
ここまでは先ほどと同じです。

孫ノードを取得する（その２）

grandchildren = child.findall('settings')
print('grandchildren=' + str(grandchildren))

取得した子ノードのデータから、さらにその子ノード（孫ノード）を取得します。
「findall」メソッドに、取得する孫ノードのタグ名「settings」を引数に指定します。
孫ノードのタグ「settings」は複数存在するので、「findall」メソッドを使用すると複数のデータをlist形式で取得できます。

孫ノードのデータを取得する

grandchildrentext = child.findtext('settings')
print('grandchildrentext=' + grandchildrentext)

取得した子ノードのデータから、さらにその子ノード（孫ノード）からデータを取得します。
「findtext」メソッドに、取得する孫ノードのタグ名「settings」を引数に指定します。
孫ノードのタグ「settings」は複数存在しますが、「findtext」メソッドでは先頭のタグのデータを取得します。

次は、サンプルプログラム（その２）とその解説を見ていきましょう。

XMLファイルを読み込むサンプルプログラム（その２）

まず、読み込み対象のXMLファイルです。

<?xml version="1.0" encoding="utf-8"?>
<root_tag>
    <child_tag child_attrib= "A">
        <grandchild_tag grandchild_attrib="a1">データ１
        </grandchild_tag>
        <grandchild_tag grandchild_attrib="a2">データ２
        </grandchild_tag>
    </child_tag>
    <child_tag child_attrib= "B">
        <grandchild_tag grandchild_attrib="b1">データ３
        </grandchild_tag>
        <grandchild_tag grandchild_attrib="b2">データ４
        </grandchild_tag>
    </child_tag>
</root_tag>

次に、読み込みプログラムです。

import xml.etree.ElementTree as ET

xmlfilepath: str = 'sample02.xml'

tree = ET.parse(xmlfilepath)

# ルートを取得する
root = tree.getroot()

print('root.tag=' + root.tag)
print('root.attrib=' + str(root.attrib))

# 子ノードを取得する
for child in root:
    print('child.tag=' + child.tag)
    print('child.attrib=' + str(child.attrib['child_attrib']))

    # 孫ノードを取得する
    for grandchild in child:
        print('grandchild.tag=' + grandchild.tag)
        print('grandchild.attrib=' + str(grandchild.attrib['grandchild_attrib']))
        print('grandchild.text=' + grandchild.text)

そして、実行結果はこうなります。

root.tag=root_tag
root.attrib={}
child.tag=child_tag
child.attrib=A
grandchild.tag=grandchild_tag
grandchild.attrib=a1
grandchild.text=データ１

grandchild.tag=grandchild_tag
grandchild.attrib=a2
grandchild.text=データ２

child.tag=child_tag
child.attrib=B
grandchild.tag=grandchild_tag
grandchild.attrib=b1
grandchild.text=データ３

grandchild.tag=grandchild_tag
grandchild.attrib=b2
grandchild.text=データ４

それでは、サンプルプログラム（その２）の解説を見ていきましょう。

サンプルプログラム（その２）の解説

二つ目のサンプルは、データファイルなどによくあるXMLの例です。

「root」のタグから子ノード「child_tag」、孫ノード「grandchild_tag」まで、タグのつながりが一意ではありません。ただし、属性や設定されている値は異なります。

使用ライブラリのインポートからの処理

import xml.etree.ElementTree as ET

xmlfilepath: str = 'sample02.xml'

tree = ET.parse(xmlfilepath)

# ルートを取得する
root = tree.getroot()

サンプルプログラム（その１）と同じ手順で、root配下のデータを取得します。

子ノードを取得する

for child in root:
    print('child.tag=' + child.tag)
    print('child.attrib=' + str(child.attrib['child_attrib']))

for文を使用して、子ノードを繰り返し取得するコードです。
取得した子ノードの情報は「child」に格納されます。
サンプルでは、「tag」や「attrib」を使って、タグや属性の情報を取得しています。

孫ノードを取得する

for grandchild in child:
    print('grandchild.tag=' + grandchild.tag)
    print('grandchild.attrib=' + str(grandchild.attrib['grandchild_attrib']))
    print('grandchild.text=' + grandchild.text)

for文を使用して、孫ノードを繰り返し取得するコードです。
取得した孫ノードの情報は「grandchild」に格納されます。

サンプルでは、「tag」や「attrib」を使って、タグや属性の情報を取得しています。
また、「text」を使って、設定値を取得することもできます。

このようなアルゴリズムを応用すれば、たくさんのデータを処理することが出来ますね。

それでは、最後のまとめです。

まとめ

今回は、PythonでXMLファイルを読み込む方法について解説しました。
XMLを読み込めることで、いろいろな処理を構築出来るといいですね。

それでは、今日の解説はこのへんで。
またのお越しをお待ちしております。

よかったらシェアしてね！

URLをコピーしました！

URLをコピーしました！