Нужна помощь в правильном форматировании XML различий в словаре с использованием Python 3.4.4
Need help in formatting the output. Please help!!! test1.xml
<pre><?xml version="1.0"?> <?xml-stylesheet href="catalog.xsl" type="text/xsl"?> <!DOCTYPE catalog SYSTEM "catalog.dtd"> <catalog> <product description="Cardigan Sweater" product_image="cardigan.jpg"> <catalog_item gender="Men's"> <item_number>QWZ5671</item_number> <cool_number>QWZ5671</cool_number> <price>39.5</price> <size description="Medium"> <color_swatch image="red_cardigan.jpg">Red</color_swatch> <color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch> </size> </catalog_item> <catalog_item gender="Women's"> <item_number>RRX986</item_number> <price>42.50</price> <size description="Small"> <color_swatch image="red_cardigan.jpg">Red</color_swatch> <color_swatch image="navy_cardigan.jpg">Nay</color_swatch> <color_swatch image="burgundy_cardigan.jpg">Burundy</color_swatch> </size> </catalog_item> </product> </catalog>
test2.xml
<pre><?xml version="1.0"?> <?xml-stylesheet href="catalog.xsl" type="text/xsl"?> <!DOCTYPE catalog SYSTEM "catalog.dtd"> <catalog> <product description="Cardigan Sweater" product_image="cardigan.jpg"> <catalog_item gender="Men's"> <item_number>QWZ5671</item_number> <cool_number>QWZ5671</cool_number> <price>39.5</price> <size description="Medium"> <color_swatch image="red_cardigan.jpg">pink</color_swatch> <color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch> </size> </catalog_item> <catalog_item gender="Women's"> <item_number>peac</item_number> <price>42.50</price> <size description="Small"> <color_swatch image="red_cardigan.jpg">lost</color_swatch> <color_swatch image="navy_cardigan.jpg">pet</color_swatch> <color_swatch image="burgundy_cardigan.jpg">hey</color_swatch> </size> </catalog_item> </product> </catalog>
current output with no filenames and jumbled differences
{'QWZ5671': [{'color_swatch': ['Red', 'pink']}], 'RRX986': [{'item_number': ['RRX986', 'peac']}, {'color_swatch': ['hey', 'pet', 'Burundy', 'Nay', 'lost', 'Red']}]}
Expected output with proper formatting and filenames. if someone can help with this
{'QWZ5671': [{'color_swatch': ['test1.xml': 'Red', 'test2.xml': 'pink']}], 'RRX986': [{'item_number': ['test1.xml': 'RRX986', 'test2.xml': 'peac']}, {'color_swatch': ['test1.xml':'Burundy, 'test2.xml':'hey'], ['test1.xml':'Nay', 'test2.xml':'pet'], ['test1.xml': 'Red','test2.xml': 'lost']}]}
Что я уже пробовал:
from lxml import etree from collections import defaultdict import pprintpp from pprintpp import ppprint as pp root_1 = etree.parse('test1.xml').getroot() root_2 = etree.parse('test2.xml').getroot() d1, d2 = [], [] for node in root_1.findall('.//catalog_item'): item = defaultdict(list) for x in node.iter(): if x.attrib: item[x.attrib.keys()[0]].append(x.attrib.values()[0]) if x.text.strip(): item[x.tag].append(x.text.strip()) d1.append(dict(item)) for node in root_2.findall('.//catalog_item'): item = defaultdict(list) for x in node.iter(): if x.attrib: item[x.attrib.keys()[0]].append(x.attrib.values()[0]) if x.text.strip(): item[x.tag].append(x.text.strip()) d2.append(dict(item)) d1 = sorted(d1, key = lambda x: x['item_number']) d2 = sorted(d2, key = lambda x: x['item_number']) res_dict = defaultdict(list) for x, y in zip(d1, d2): for key1, key2 in zip(x.keys(), y.keys()): if key1 == key2 and sorted(x[key1]) != sorted(y[key2]): res_dict[x['item_number'][0]].append({key1: list(set(x[key1]) ^ set(y[key2]))}) if res_dict == {}: print('Data is same in both XML files') else: pp(dict(res_dict))
Richard MacCutchan
Вам нужно добавить имена исходных файлов к каждому элементу в списках.
Member 14867652
Дело сделано. Есть какие-нибудь предложения по сортировке беспорядка на текущем выходе? как будто он приближается
{'color_swatch': ['привет', 'животное', 'Burundy', 'нет', 'потерянный', 'Красный']}]}
значений из файлов XML