XML, WSDL, & SOAP

A crash course on some old web tech.

Introduction

I'm sorry you're here, but if you are, you've probably been asked to work with some old web technology. I'll make this as short and simple as possible to get you up to speed with what & how SOAP works, which requires that you also know XML, XSD & WSDL. This tutorial is more about how to read and understand this technology, but will not go into the tons of details needed to write your own.

In a nutshell, SOAP is just a wrapper for data. The wrapper contains some information you need to know in order to read or send the information. WSDL is for describing APIs (what data you can send and receive). XSD is for validating whether data is arranged the way you want. All of these standards are written in XML, a markup language.

What This Tutorial Covers

1. What is XML & whar are namespaces?

2. What is XSD & why does WSDL use it?

3. How is WSDL used.

4. How is SOAP used.

What You Need For Just The Tutorial

Nothing. I'll just be explaining syntax and usage. We're not going to build anything.


XML

XML (eXtensible Markup Language) is a syntax used to describe data or transport data between web services. Make sure you understand whether an XML document is describing what data should look like or if it's showing you actual data for transport. It will help alleviate a lot of confusion when you first learn how XML is used.

XML that describes data looks like the following:

copy

<?xml version="1.0" encoding="UTF-8"?>
<elementRoot xmlns="https://usually.a.site.with.info" targetNamespace="https://usually.a.site.with.info">
    <elementOne></elementOne>
    <elementTwo></elementTwo>
    <elementOne></elementOne>
</elementRoot>

XML that transports actual data looks like the following:

copy

<?xml version="1.0" encoding="UTF-8"?>
<elementRoot xmlns:namespace="https://usually.a.site.with.info">
    <namespace:elementOne attributeName="one">12345</namespace:elementOne>
    <namespace:elementTwo>"some string data"</namespace:elementTwo>
    <namespace:elementOne attributeName="two">456678</namespace:elementOne>
    <emptyElement/>
</elementRoot>

Prolog: The first line. It explains which version of XML syntax is used and what encoding. Everyone uses XML 1.0 for the most part. It is rare to use XML 1.1 because no one liked it. UTF-8 is the most common encoding used on the web. I explain encodings below if you don't know about it.

Elements: An element is comprised of an opening tag <element> and a closing tag </element>. Elements in XML can be named whatever you want (minus some characters like < and &). They can also be nested, meaning you have elements inside the opening and closing tags of other elements.

Empty Elements: An empty element is one that has no data, either because it's defining what data should look like, or it simply doesn't have any data and it's expected to be in the current XML document. Empty elements don't have a closing tag. Instead, they are one tag with a forward slash before the closing greater than sign like so: <emptyElement/>

Root Element: XML must have a root element, which is an opening and closing tag that goes around all other elements (excluding the prolog).

Attribute: Elements can have attributes. Besides some predefined standard attributes, they should only be used to separate elements by some type of id. For example, if you have two "data" elements that described two different people's information, you can assign each element and attribute with each person's name. That way you can identify which person each "data" element is describing.


Namespaces

All XML documents have a namespace. For instance, if I use two XML documents and they both define an <example> element, the way I distinguish between which element I'm referring to is with a namespace.

Defining A Namespace: Namespaces are defined using the xmlns attribute like so:

xmlns:namespacePrefix="namespaceString"

Namespace: A namespace is just a string. It's common practice to use a URL so that it's a completely unique namespace that can be used anywhere (target namespace explains this more).

Namespace Prefix: When actually referring to a namespace, we don't use the namespace name. Instead, we assign it a namespace prefix. Now when we want to refer to an element from that namespace, we add the namespace prefix, followed by a ":" colon, and then the element name like so:

<namespacePrefix:elementName>

Default Namespaces: We can define a default namespace, which means that all child elements of the element that contains the default namespace attribute definition will be assumed to be part of the default namespace. To define a default namespace, just don't include any prefix like so:

xmlns="namespaceString"

Target Namespace: If we want to refer to elements defined in the current document from a different document, we can do so by referring to the target namespace. In the current document, you define the target namespace using the targetNamespace attribute on the root element. In the other document, you define a namespace using the target namespace name.

Below is an example of a document with a target namespace, and below that is a different document that uses element from the first by defining a namespace with the target namespace:

<root targetNamespace="https://abaganon.com/myTargetNamespace">
	<info></info>
</root>
<root xmlns:namespacePrefix="https://abaganon.com/myTargetNamespace">
	<namespacePrefix:info>"this is using the info element defined in the other XML document"</info>
</root>

Encodings

This isn't directly related to XML, but it's good to know if you don't.

Computers store everything in bytes, groups of 8 bits, and each bit can be either 0 or 1. So we can say that the byte 01111000 should print the character "x". That's what encodings do. They are a predetermined map between byte values and what characters they should print. Encodings can have two parts. A code chart & a protocol. The code chart is just the map between byte value and character. The protocol exists because we don't initially know how many bytes a character takes up. So protocols usually reserve some bits in each byte that say whether the byte value ends here or should continue into the next byte. That way you can have a character take up 1 byte, 4 bytes, or however many you want.

So, ASCII is an old code chart with no protocol. It just maps a single byte to 256 different characters. Unicode is a huge code chart that can support more than a million characters (only 10% of the values have been assigned characters as of today). UTF-8 is a protocol for reading bytes and determining their Unicode value.


XSD

XSD (XML Schema Definition) is XML that describes what data should look like, specifically they are describing what an XML element is allowed to have (what values, attributes, and nested elements, if any). It's used in WSDL. XSD uses certain elements to describe data.

An example of a two data definitions (describing) is shown below, followed by what they look like when actually used (transporting):


<schema>
	<element name="myStringNumberData">
		<complexType>
			<complexContent>
				<all>
					<element name="myStringData" type="string"/>
					<element name="myNumberData" type="integer"/>
				</all>
			</complexContent>
		</complexType>
	</element>
	
	<simpleType>
		<restriction>
			<element name="myBooleanData" type="boolean"/>
		</restriction>
	</complexType>
</schema>
						

<myStringNumberData>
	<myStringData>"some string"</myStringData>
	<myNumberData>12345</myNumberData>
</myStringNumberData>

<myBooleanData>true</myBooleanData>
						

schema: This is the root element. It must be included around all the data definitions.

element: This is the first tag and it is used to give the data definition a name with the name attribute. If this tag isn't used, that means the next one (simpleType | complexType) must have a name, and that name must refer to a standalone empty element with a type that matches the name, like below:


<element name="myStringNumberData" type="myStrNumDataType"/>
<complexType name="myStrNumDataType">
	etc.

simpleType | complexType: This describes whether the data is simple or complex. Simple types refer to XML elements that only contain some primitive value, like booleans, strings, integers, etc. Complex types can have nested elements and attributes.

simpleContent | complexContent: If complexType was used, it must have one of these elements. Simple content can only have attributes (no nested elements). Complex content can have both nested elements and attributes.

To simplify:
complexType + complexContent = element with nested elements and maybe attributes.
complexType + simpleContent = element without nested elements, but has attributes.
simpleType = element without nested elements and without attributes.

group: The group element is used with complexContent to group together multiple elements that must follow some "indicator" rule. Indicators are described next.

all | choice | sequence: These are "indicators" that describe how elements should appear.

    < >
  • all = they must all appear.
  • choice = one or the other element must appear.
  • sequence = they must appear in order.

element | attribute: The data this data definition holds. It could be one or more simple elements, complex elements, or a single attribute.

To expand:
simple element = an element with only a primitive value. will have a "type" attribute like: boolean, string, integer, etc.
complex element = an element that has nested elements. will have a custom "type" attribute that must be defined elsewhere.
attribute = an attribute that can be attached to another element.



That wraps up the main parts of XSD. There are other tags you can use to expand on data definitions, but they're too much to go over. If you need a specific one or see one you don't know, just duckduckgo it.


WSDL

WSDL (Web Services Description Language, pronounced "wiz-dull") is XML that describes a web service API. It holds information about what functions you can call, what arguments they take, their data types, what data will be returned, etc.

WSDL uses certain elements to describe the web service API. An example WSDL and the elements a WSDL must have are shown below.




	
	
		
	
		
			
				
					
				
			
		
	
		
			
				
					
				
			
		
	
		
	
 
 	
	
		
	
	
	
		
	
	
	
	
		
			
			
		
	

	
	
		
			
		

		
			
		
		
		
			
		
		
	

	
	
		ThisService WSDL File
		
			
		
	

						

definitions: This is the root element. It contains a targetNamespace attribute & any xmlns attributes needed.

types: This is an XSD that defines what the data looks like.

message: These elements either describe the groups of data that can be sent or the groups of data that can be returned. Usually their name attributes contain either "request" (send data) or "response" (return data), but whether a message is for sending or receiving is defined by the portType tag.

  • part: This tag says what data should be sent or returned inside a message tag. Each part tag must have a "name" & "type" attribute. The type should be defined in the types tag.

portType: This tag is used to wrap operations by protocol, although the protocol isn't specified here. That's done in binding.

  • operation: These tags wrap around input, output, and fault tags which describe an entire interaction with an API (what is sent, received, and error handling).
  • input | output | fault: These tags inside an operation tag and specify which message element is used to send or receive data, or handle errors.

binding: This tag is a wrapper for specifying which protocols (often it's SOAP) will be used to actually send or receive messages defined in portType. It's type attribute must match a portType name attribute. Inside this tag, you'll find the same kind of elements in the portType tag, except this time, they're describing what protocols are used to send the messages.

  • binding: Yes, there's another binding tag inside. This standalone tag is the one that actually specificies the protocol using the transport attribute. It must also have a style attribute which specifices either RPC or Document styling.

service: Finally, the service tag wrapper for specifying endpoints (the address you send and receive messges from), though that is done using the tags below.

  • port: This should have a binding attribute that matches a binding name. Inside this element, there should be an address element that lists the endpoint used for that binding.



When reading a WSDL, it's best to look at it in this order:

  1. portType: What data can be sent and retrieved.
  2. binding: What protocols are used to send/receive messages?
  3. types: How is the data formatted?


SOAP

SOAP (Simple Object Access Protocol) is just XML for wrapping a message. Like all XML, it should start with a prolog. An example SOAP message and the elements it must have are below.


<?xml version="1.0"?&rt;
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope/" soap:encodingStyle="http://www.w3.org/2003/05/soap-encoding"&rt;

<soap:Header&rt;</soap:Header&rt;

<soap:Body&rt;
  <message:GetPrice xmlns:m="https://www.w3schools.com/prices"&rt;
    <message:Item&rt;Apples</m:Item&rt;
  </message:GetPrice&rt;
</soap:Body&rt;

</soap:Envelope&rt;
						

envelope: This is the root element. It should contain the following attributes for defining namespace and encoding.

  • xmlns:soap="http://www.w3.org/2003/05/soap-envelope/"
  • soap:encodingStyle="http://www.w3.org/2003/05/soap-encoding"

header: This element is optional and it's used for adding additional custom functionality to SOAP messages if you want them.

body: Inside the body element will be an element that matches a message element specified in the WSDL, with data elements specfied in the WSDL inside.

fault: If there was an error, the body tag might contain a fault element describing the error.


<soap:Body&rt;
	<message:Fault&rt;
	     <faultcode xsi:type="xsd:string"&rt;SOAP-ENV:Client</faultcode&rt;
		 
	     <faultstring xsi:type="xsd:string"&rt;
	        Failed to locate method (ValidateCreditCard) in class (examplesCreditCard) at /usr/local/ActivePerl-5.6/lib/site_perl/5.6.0/SOAP/Lite.pm line 1555.
	     </faultstring&rt;
	</message:Fault&rt;
</soap:Body&rt;
						

Finally, SOAP messages sent over HTTP must have the following two HTTP headers.

  • Content-Type: application/soap+xml; charset=utf-8
  • Content-Length: NUMBER-OF-BYTES-IN-MESSAGE

Final Thoughts

So now you know how to construct, describe, validate, and send messages using all those standards written in XML. Those standards were formulated to work over any kind of protocol, but the web (HTTP) has dramatically taken over. Thus, they've really fallen out of favor and REST + JSON has taken over.

  • XML → JSON (JSON is easier to read/parse than XML and works better with Javascript, the only front-end web language)
  • SOAP/XML → REST/JSON (no need for extra SOAP info since we assume it's all HTTP)
  • WSDL/XSD → REST/YAML tools, like Swagger, are used to generate APIs