XML Parsing Returning List of Elements

I’m working on a project for the company I work at. They have a program that generates an XML file and would like to extract and format specific tag names as formatted output. To accomplish this, I’ve turned to Python and am currently writing two programs.

The first program successfully formats the raw data in the XML file into its properly indented tree structure.

The second program is where I’m stuck. By using the minidom module, I have so far been able to generate output that prints a single line of seven variables each obtained from a specific tag within the XML file.

The challenge is I need to have multiple results for each Element Tag that I’m pulling data from throughout the length of the document.

The entire XML document is far too large to post on this site, and contains sensitive data, so I’ll have to truncate and modify part of it so you can at least see the hierarchies.

<ws_Worker>
    <ws_Summary>
      <ws_Employee_ID>555555</ws_Employee_ID>
    <ws_Name>John Doe</ws_Name>
    </ws_Summary>
  <ws_Eligibility ws_PriorValue="false">true</ws_Eligibility>
  <ws_Personal>
      <ws_Name_Data>
        <ws_Name_Type>Legal</ws_Name_Type>
      <ws_First_Name>John</ws_First_Name>
      <ws_Last_Name>Doe</ws_Last_Name>
      <ws_Formatted_Name>John Doe</ws_Formatted_Name>
      <ws_Reporting_Name>Doe, John</ws_Reporting_Name>
      </ws_Name_Data>
    <ws_Address_Data>
        <ws_Address_Type>WORK</ws_Address_Type>
      <ws_Address_Is_Public>true</ws_Address_Is_Public>
      <ws_Is_Primary>true</ws_Is_Primary>
      <ws_Address_Line_Data ws_Label="Address Line 1" ws_Type="ADDRESS_LINE_1">123 Sixth St.</ws_Address_Line_Data>
      <ws_Municipality>Baltimore</ws_Municipality>
      <ws_Region>Maryland</ws_Region>
      <ws_Postal_Code>12345</ws_Postal_Code>
      <ws_Country>US</ws_Country>
      </ws_Address_Data>
    <ws_Email_Data>
        <ws_Email_Type>WORK</ws_Email_Type>
      <ws_Email_Is_Public>true</ws_Email_Is_Public>
      <ws_Is_Primary>true</ws_Is_Primary>
      <ws_Email_Address ws_PriorValue="doeball@icloud.com">jdoe@company.com</ws_Email_Address>
      </ws_Email_Data>
    <ws_Tobacco_Use>false</ws_Tobacco_Use>
    </ws_Personal>
  <ws_Status>
      <ws_Employee_Status>Active</ws_Employee_Status>
    <ws_Active>true</ws_Active>
    <ws_Active_Status_Date>2020-01-01</ws_Active_Status_Date>
    <ws_Hire_Date>2020-01-01</ws_Hire_Date>
    <ws_Original_Hire_Date>2015-01-01</ws_Original_Hire_Date>
    <ws_Hire_Reason>Hire_Employee_Rehire_Employee_After_13_Weeks</ws_Hire_Reason>
    <ws_Continuous_Service_Date>2020-01-01</ws_Continuous_Service_Date>
    <ws_First_Day_of_Work>2020-01-01</ws_First_Day_of_Work>
    <ws_Retirement_Eligibility_Date>2016-10-01</ws_Retirement_Eligibility_Date>
    <ws_Retired>false</ws_Retired>
    <ws_Seniority_Date>2015-10-01</ws_Seniority_Date>
    <ws_Terminated>false</ws_Terminated>
    <ws_Not_Eligible_for_Hire>false</ws_Not_Eligible_for_Hire>
    <ws_Regrettable_Termination>false</ws_Regrettable_Termination>
    <ws_Resignation_Date>2018-11-01</ws_Resignation_Date>
    <ws_Not_Returning>false</ws_Not_Returning>
    <ws_Return_Unknown>false</ws_Return_Unknown>
    <ws_Has_International_Assignment>false</ws_Has_International_Assignment>
    <ws_Home_Country>US</ws_Home_Country>
    <ws_Rehire>true</ws_Rehire>
    </ws_Status>
  <ws_Position>
      <ws_Operation>NONE</ws_Operation>
    <ws_Position_ID>12345</ws_Position_ID>
    <ws_Effective_Date>2020-01-10</ws_Effective_Date>
    <ws_Primary_Position>true</ws_Primary_Position>
    <ws_Position_Title>Driver</ws_Position_Title>
    <ws_Business_Title>Driver</ws_Business_Title>
    <ws_Worker_Type>Regular</ws_Worker_Type>
    <ws_Position_Time_Type>Part_time</ws_Position_Time_Type>
    <ws_Job_Exempt>false</ws_Job_Exempt>
    <ws_Scheduled_Weekly_Hours>29</ws_Scheduled_Weekly_Hours>
    <ws_Default_Weekly_Hours>40</ws_Default_Weekly_Hours>
    <ws_Full_Time_Equivalent_Percentage>72.5</ws_Full_Time_Equivalent_Percentage>
    <ws_Exclude_from_Headcount>false</ws_Exclude_from_Headcount>
    <ws_Pay_Rate_Type>Hourly</ws_Pay_Rate_Type>
    <ws_Workers_Compensation_Code>1234</ws_Workers_Compensation_Code>
    <ws_Job_Profile>DRIVER</ws_Job_Profile>
    <ws_Management_Level>Individual Contributor</ws_Management_Level>
    <ws_Job_Family>DRV</ws_Job_Family>
    <ws_Business_Site>LOC_TOWN</ws_Business_Site>
    <ws_Business_Site_Name>Local Town</ws_Business_Site_Name>
    <ws_Business_Site_Address_Line_Data ws_Label="Address Line 1" ws_Type="ADDRESS_LINE_1">1234 Sixth St.</ws_Business_Site_Address_Line_Data>
    <ws_Business_Site_Municipality>Baltimore</ws_Business_Site_Municipality>
    <ws_Business_Site_Region>Maryland</ws_Business_Site_Region>
    <ws_Business_Site_Postal_Code>12345</ws_Business_Site_Postal_Code>
    <ws_Business_Site_Country>US</ws_Business_Site_Country>
    <ws_Supervisor>
        <ws_Operation>NONE</ws_Operation>
      <ws_Supervisor_ID>1234567</ws_Supervisor_ID>
      <ws_Supervisor_Name>Little Mac</ws_Supervisor_Name>
      </ws_Supervisor>
    </ws_Position>
  <ws_Additional_Information>
      <ws_WD_Username>John.Doe</ws_WD_Username>
    <ws_Last_4_SSN_Digits>1234</ws_Last_4_SSN_Digits>
    </ws_Additional_Information>
  </ws_Worker>

Keep in mind, there are 36 other <ws_Worker> elements throughout this file.

Here is my program so far:

from xml.dom import minidom

xmldoc = minidom.parse("//tocp-fs1/mydocs/mantonishak/Documents/Python/The_Hard_Way/Out.xml")

outworkers = xmldoc.getElementsByTagName("ws_Worker")[0]
# Knowing your heiarchy is important.  ws_Worker is at the top.  Asking the first value of the list.
outsummaries = outworkers.getElementsByTagName("ws_Summary")
outpersonals = outworkers.getElementsByTagName("ws_Personal")
outpositions = outworkers.getElementsByTagName("ws_Position")
outadditionals = outworkers.getElementsByTagName("ws_Additional_Information")

for outpersonal in outpersonals:
    desc = outpersonal.getElementsByTagName("ws_Formatted_Name")[0].firstChild.data
    # displays the user's Full Name
    for outsummary in outsummaries:
        desc2 = outsummary.getElementsByTagName("ws_Employee_ID")[0].firstChild.data
        # displays the user's Workday ID
    for location in outpositions:
       desc3 = location.getElementsByTagName("ws_Business_Site_Name")[0].firstChild.data
       # displays the user's current work location (Store Name)
    for title in outpositions:
        desc4 = title.getElementsByTagName("ws_Position_Title")[0].firstChild.data
        # displays the user's current title
    for email in outpersonals:
        desc5 = email.getElementsByTagName("ws_Email_Address")[0].firstChild.data
        lst = desc5.split("@")
        atsign = (lst[1])
        # This splits the ws_Email_Address value at the @ sign, removes it, and displays the string
        # to the right of the @ sign (which is the domain)
    for firstletter in outpersonals:
        desc6 = firstletter.getElementsByTagName("ws_First_Name")[0].firstChild.data
        firstletter = desc6[0]
        # This grabs the first letter of the ws_First_Name value so it can be combined later with
        # the ws_Last_Name value to create the username
    for lastname in outpersonals:
        desc7 = lastname.getElementsByTagName("ws_Last_Name")[0].firstChild.data
        username = (firstletter + desc7)
        # grabs the last name and combines with the first letter of the first name
        # this creates the username
    for ssn in outadditionals:
        desc8 = ssn.getElementsByTagName("ws_Last_4_SSN_Digits")[0].firstChild.data
        firstpass = desc6[0:2]
        lastpass = desc7[-2:]
        password = (firstpass + desc8 + lastpass)
        # this takes the first two chars of the ws_First_Name adds them as a string with the
        # ws_Last_4_SSN_Digits and the last two chars of ws_Last_Name.
    print("Full Name: %s, Employee ID: %s, Location: %s, Title: %s, Domain: %s, Username: %s, Password: %s" %
            (desc, desc2, desc3, desc4, atsign, username.lower(), password.lower()))
            # Creates the output in a straight horizontal line.  The .lower attributes for
            # username and password will format all characters in the strings above into lowercase.

And my output looks like this:

Full Name: John Doe, Employee ID: 1234567, Location: Local Town, Title: Driver, Domain: company.com, Username: jdoe, Password: jo1234oe

So Line 5 is where I think the magic has to happen. The integer [0] is only pulling the child tags within the first <ws_Worker> element. If I change that integer to [1], it pulls the second [2] pulls the third and so on.

How do I construct a loop that changes that integer and collectively prints the output of each <ws_Worker> element throughout the file?

You know, this data looks like it was generated by a webservice (ws), and you might be able to just automatically load it as python classes. What kind of service and programming language is generating this? What language is it written in?

Baring that, I would check out the https://docs.python.org/3/library/xml.etree.elementtree.html#module-xml.etree.ElementTree library instead since it will do a lot of this conversion into dicts already.