The best Python3 XML parsing Tutorial In 2024, In this tutorial you can learn What is XML?,python for XML parsing,python xml parsing using SAX,make_parser method,parser method,parseString method,Python parsing XML instance,Use xml.dom parse xml,
XML refers to extensible markup language(e X tensible M arkup L anguage), a subset of the Standard Generalized Markup Language, is a method for marking an electronic document to have a structured markup language. You can learn through this site XML Tutorial
XML is designed to transmit and store data.
XML is a set of rules to define the semantics of tags, these tags will document divided into many parts and these parts to be identified.
It is also a meta-markup language that defines the syntax of the language used to define other domain-specific, semantic, structured markup language.
Common DOM and XML programming interfaces SAX, two different interfaces with XML files the way, of course, the use of different occasions.
There are three ways python parsing XML, SAX, DOM, and ElementTree:
python standard library contains SAX parser, SAX with the event-driven model, triggered by one event in the process of parsing XML and calling user-defined callback functions to handle XML files.
The XML data is parsed into a tree in memory, operating through the tree to manipulate XML.
Use this section to an XML instance document movies.xml reads as follows:
<collection shelf="New Arrivals"> <movie title="Enemy Behind"> <type>War, Thriller</type> <format>DVD</format> <year>2003</year> <rating>PG</rating> <stars>10</stars> <description>Talk about a US-Japan war</description> </movie> <movie title="Transformers"> <type>Anime, Science Fiction</type> <format>DVD</format> <year>1989</year> <rating>R</rating> <stars>8</stars> <description>A schientific fiction</description> </movie> <movie title="Trigun"> <type>Anime, Action</type> <format>DVD</format> <episodes>4</episodes> <rating>PG</rating> <stars>10</stars> <description>Vash the Stampede!</description> </movie> <movie title="Ishtar"> <type>Comedy</type> <format>VHS</format> <rating>PG</rating> <stars>2</stars> <description>Viewable boredom</description> </movie> </collection>
SAX is an event-driven API.
Use SAX parsing an XML document involves two parts: the parser and event handler.
The parser is responsible for reading XML documents, and sends event event handlers, such as elements begin with the element end event;
The event handler is responsible for the event accordingly, XML data transfer for processing.
Use sax manner xml xml.sax first introduced in the parse function, as well as the ContentHandler xml.sax.handler in python.
characters (content) method
The timing of the call:
From the beginning of the line, before experiencing the label, there is a character, content value of these strings.
From a label, a label before the next encounter, the presence of the character, content value of these strings.
From a label, before encountering a line terminator, the presence of characters, content value of these strings.
Tag may be the beginning of the tag, it can be the end of the label.
startDocument () method
Documentation startup called.
endDocument () method
When the call reaches the end of the document parser.
startElement (name, attrs) method
Call encountered XML start tag, name is the name of the tag, attrs is a dictionary property value tag.
endElement (name) method
Call encountered XML end tag.
The following method creates a new parser object and returns.
xml.sax.make_parser( [parser_list] )
Parameter Description:
The following method creates a SAX parser and parse xml document:
xml.sax.parse( xmlfile, contenthandler[, errorhandler])
Parameter Description:
parseString method creates an XML parser and parse xml string:
xml.sax.parseString(xmlstring, contenthandler[, errorhandler])
Parameter Description:
#!/usr/bin/python3 import xml.sax class MovieHandler( xml.sax.ContentHandler ): def __init__(self): self.CurrentData = "" self.type = "" self.format = "" self.year = "" self.rating = "" self.stars = "" self.description = "" # 元素开始调用 def startElement(self, tag, attributes): self.CurrentData = tag if tag == "movie": print ("*****Movie*****") title = attributes["title"] print ("Title:", title) # 元素结束调用 def endElement(self, tag): if self.CurrentData == "type": print ("Type:", self.type) elif self.CurrentData == "format": print ("Format:", self.format) elif self.CurrentData == "year": print ("Year:", self.year) elif self.CurrentData == "rating": print ("Rating:", self.rating) elif self.CurrentData == "stars": print ("Stars:", self.stars) elif self.CurrentData == "description": print ("Description:", self.description) self.CurrentData = "" # 读取字符时调用 def characters(self, content): if self.CurrentData == "type": self.type = content elif self.CurrentData == "format": self.format = content elif self.CurrentData == "year": self.year = content elif self.CurrentData == "rating": self.rating = content elif self.CurrentData == "stars": self.stars = content elif self.CurrentData == "description": self.description = content if ( __name__ == "__main__"): # 创建一个 XMLReader parser = xml.sax.make_parser() # turn off namepsaces parser.setFeature(xml.sax.handler.feature_namespaces, 0) # 重写 ContextHandler Handler = MovieHandler() parser.setContentHandler( Handler ) parser.parse("movies.xml")
The above code is executed as follows:
*****Movie***** Title: Enemy Behind Type: War, Thriller Format: DVD Year: 2003 Rating: PG Stars: 10 Description: Talk about a US-Japan war *****Movie***** Title: Transformers Type: Anime, Science Fiction Format: DVD Year: 1989 Rating: R Stars: 8 Description: A schientific fiction *****Movie***** Title: Trigun Type: Anime, Action Format: DVD Rating: PG Stars: 10 Description: Vash the Stampede! *****Movie***** Title: Ishtar Type: Comedy Format: VHS Rating: PG Stars: 2 Description: Viewable boredom
Complete SAX API documentation please refer to the Python SAX APIs
Document Object Model (Document Object Model, referred to as DOM), it is a W3C-recommended treatment Extensible Markup Language standard programming interface.
In a DOM parser to parse an XML document, read the entire document at once, all the elements of the document saved in a tree structure in memory, then you can use the DOM to provide different functions to read or modify the document content and structure to be modified to write the contents of the xml file.
python with xml.dom.minidom to parse xml document, examples are as follows:
#!/usr/bin/python3 from xml.dom.minidom import parse import xml.dom.minidom # 使用minidom解析器打开 XML 文档 DOMTree = xml.dom.minidom.parse("movies.xml") collection = DOMTree.documentElement if collection.hasAttribute("shelf"): print ("Root element : %s" % collection.getAttribute("shelf")) # 在集合中获取所有电影 movies = collection.getElementsByTagName("movie") # 打印每部电影的详细信息 for movie in movies: print ("*****Movie*****") if movie.hasAttribute("title"): print ("Title: %s" % movie.getAttribute("title")) type = movie.getElementsByTagName('type')[0] print ("Type: %s" % type.childNodes[0].data) format = movie.getElementsByTagName('format')[0] print ("Format: %s" % format.childNodes[0].data) rating = movie.getElementsByTagName('rating')[0] print ("Rating: %s" % rating.childNodes[0].data) description = movie.getElementsByTagName('description')[0] print ("Description: %s" % description.childNodes[0].data)
Results of the above procedures are as follows:
Root element : New Arrivals *****Movie***** Title: Enemy Behind Type: War, Thriller Format: DVD Rating: PG Description: Talk about a US-Japan war *****Movie***** Title: Transformers Type: Anime, Science Fiction Format: DVD Rating: R Description: A schientific fiction *****Movie***** Title: Trigun Type: Anime, Action Format: DVD Rating: PG Description: Vash the Stampede! *****Movie***** Title: Ishtar Type: Comedy Format: VHS Rating: PG Description: Viewable boredom
Complete DOM API documentation please refer to the Python the DOM APIs .