Using the Companies House GovTalk Service to retrieve Company Details

As we move into the 2nd decade of the 21st century government data, hitherto closed, is now being opened up with governments adopting standards such as XBRL (eXtended Business Reporting Language). One of the functions I wanted was to retrieve the details of a company from the Companies House service. It would make it really simple for users to create their company in the service if all they have to do is provide the Company Number and/or the Name and hit a button. Certainly easier than all that tedious typing in. Additionally we can send data the other way – and therefore provide a simple easy way to make returns to Companies House.

Now Government data is open, but it is not that open. You need to apply to Companies House for an account. Initially and while you are developing and testing the service it returns only one company – MILLENIUM STADIUM PLC. Below is an image of my test site showing (some) of the details of the company retrieved.

Company Details Page

Web page showing some of the details available from the Company Details Service

To use the service you must apply by sending an email to xml@companieshouse.gov.uk, providing your name, address and telephone details. There is an FAQ on using the service available from http://xmlgw.companieshouse.gov.uk/faq.shtml.

Creating the request for Company Details.

Requests are sent to Companies house over Http coded as an XML file. The Company Details Request can be seen below;

<GovTalkMessage xsi:schemaLocation="http://www.govtalk.gov.uk/CM/envelope http://xmlgw.companieshouse.gov.uk/v1-0/schema/Egov_ch-v2-0.xsd" xmlns="http://www.govtalk.gov.uk/CM/envelope" xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:gt="http://www.govtalk.gov.uk/schemas/govtalk/core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
<EnvelopeVersion>1.0</EnvelopeVersion>
<Header>
<MessageDetails>
<Class>CompanyDetails</Class>
<Qualifier>request</Qualifier>
<TransactionID>5</TransactionID>
</MessageDetails>
<SenderDetails>
<IDAuthentication>
<SenderID>????????????????????????????????</SenderID>
<Authentication>
<Method>CHMD5</Method>
<Value>????????????????????????????????</Value>
</Authentication>
</IDAuthentication>
</SenderDetails>
</Header>
<GovTalkDetails>
<Keys/>
</GovTalkDetails>
<Body>
<CompanyDetailsRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://xmlgw.companieshouse.gov.uk/v1-0/schema/CompanyDetails.xsd">
<CompanyNumber>99999999</CompanyNumber>
<CompanyName>TEST COMPANY PLC</CompanyName>
<GiveMortTotals>1</GiveMortTotals>
</CompanyDetailsRequest>
</Body>
</GovTalkMessage>

At it’s highest level this consists of two upper levels; the top ‘Header’ details the type of message in the MessageDetails tag, the Class giving what kind of request is being made (in this case the ‘CompanyDetails’ ). A qualifier which in this case is a ‘request’ and finally a Transaction ID. The combination of your login details and the Transaction ID has to be unique; once this has been used – it cannot be used again (or you will get an error). You could use a simple counter, but in my case – and partly to avoid having to set up the counter I am using the Ticks method of the DateTime – which is not due to recycle again until 2035 if memory serves which means plenty of time before this is likely to recycle.

The second part of this top tag identifies you to the Companies House Server. The SenderID is self explanatory and is the id you got when applying to Companies House. Method refers to the encryption used on the final tag; the ‘Value’ tag. This is created by appending the UserID and Password you got from Companies House with the Transaction ID in the above section. This combined string is then encrypted using MD5.

Encryption is undertaken by the following function;

        /// <summary>
        /// Encodes the password with MD5
        /// </summary>
        /// <param name="clearPassword"></param>
        /// <returns></returns>
        private String EncodePassword(String clearPassword)
        {
            MD5CryptoServiceProvider md5Hasher = new MD5CryptoServiceProvider();
            byte[] hashedBytes;

            UTF32Encoding encoder = new UTF32Encoding();
            hashedBytes = md5Hasher.ComputeHash(ASCIIEncoding.Default.GetBytes(clearPassword));

            String result = Regex.Replace(BitConverter.ToString(hashedBytes), "-", "").ToLower();

            return result;

        }

Note: I remove any of the dashes (‘-‘) from the encrypted string. It seems from testing that the Companies House service does not like ’em.

The second part of this message is the message itself. We set the Company Number and Name  we are looking for in this section.

To create the request I could have written a C# function to create the message on the fly but this is not only wasteful, but is also a maintenance headache if Companies House modifies the structure of these messages. So to help this what I did was to Add a scheme/framework message and added it to my project as an ‘Embedded Resource’, in a folder called ‘Schemas’. The following code then reads this file and returns it’s contents as a stream.

       /// <summary>
        /// Gets the framwork XML which makes the request
        /// </summary>
        /// <param name="RequestFile"></param>
        /// <returns></returns>
        private Stream GetRequest(String RequestFile)
        {
            String requestFile              = String.Empty;
            Stream sr                       = null;
            System.Reflection.Assembly asm  = null;

            try
            {
                requestFile = String.Format("Simplicita.Data.Services.Schemas.{0}", RequestFile);
                asm         = Assembly.GetExecutingAssembly();
                sr          = asm.GetManifestResourceStream(requestFile);
            }
            catch (Exception)
            {
                throw;
            }
            finally
            {
                asm = null;
            }

            return sr;
        }

The Address of the XML file is worked out by taking the default namespace + any sub folder + the file name. So looking in the above code my XML file can be found in Simplicita.Data.Services.Schemas – Schemas being the folder I put it into. Next we get the Executing Assembly; since the XML is stored in the assembly – and finally we call GetManifestResourceStream to retrieve it.

The results are then passed back to the calling function which loads the XML stream into an XML document. Now there are a couple of namespaces that are defined in the XML and we have to set these in the XML Document. We do this by defining a Namespace Manager and it is done this way;

            nsmgr = new XmlNamespaceManager(xmlDoc.NameTable);
            nsmgr.AddNamespace("default", "http://www.govtalk.gov.uk/CM/envelope");

I will need to define another namespace when retrieving the answer from Companies House, but for the moment I only need the one.

Now I said earlier that the TransactionId is set by taking the Ticks value of the current DateTime. This is set by the following declaration;

            long transactionId = DateTime.Now.Ticks;

We now have everything needed to set up the Authentication tag.

            TransactionID = xmlDoc.SelectSingleNode("default:GovTalkMessage/default:Header/default:MessageDetails/default:TransactionID", nsmgr);

            if (TransactionID != null) TransactionID.InnerText = transactionId.ToString();

            IDAuthentication = xmlDoc.SelectSingleNode("default:GovTalkMessage/default:Header/default:SenderDetails/default:IDAuthentication", nsmgr);

            // if we have found the authentication tag set the security parameters
            if (IDAuthentication != null)
            {
                SenderId = IDAuthentication.SelectSingleNode("default:SenderID", nsmgr);
                AuthenticationValue = IDAuthentication.SelectSingleNode("default:Authentication/default:Value", nsmgr);
                digest = String.Format("{0}{1}{2}", "UserID", "Password", TransactionID.InnerText);

                SenderId.InnerText = "XMLGatewayTestUserID";
                AuthenticationValue.InnerText = EncodePassword(digest);

            }

First up is to retrieve the TransactionID node from the XML, because of the namespace this has to be defined in the XPath address and the namespace manager is passed as the second parameter. When I defined the namespace manager in the line repeated below

nsmgr.AddNamespace("default", "http://www.govtalk.gov.uk/CM/envelope");

I defined the word ‘default’ as the namespace identifier. This is why ‘default‘  occurs before each level of the XPath address. Assuming we find the IDAuthentication tag then we go on to set the authentication details.

I’ve used String.Format to set a string containing the three items which make up the identification; UserID, Password and the TransactionID. This is then passed to the EncodePassword function which is shown above.

That then completes the identification side of the message creation. Now it time to set the nodes that identify what company – you are looking for. Time for a little sidebar; whatever company you give – when testing your account Companies House will only send you details of one company – Company Number 03176906, Millenium Stadium PLC (currently).

We basically set the CompanyNumber tag to the Company Number – or an empty string if we do not know it, and the CompanyName Tag to the Company Name or an empty string. Note in the template shown above these two tags have text with in them – if they are not set to an empty string then these values will be passed on!

The Code that sets these two tags can be seen below.

                DetailRequest = Body.SelectSingleNode("default:CompanyDetailsRequest", nsmgr);
                companyNumber = DetailRequest.SelectSingleNode("default:CompanyNumber", nsmgr);
                companyName = DetailRequest.SelectSingleNode("default:CompanyName", nsmgr);

                companyNumber.InnerText = CompanyNumber;
                companyName.InnerText = CompanyName;

Sending the request to Companies House.

That now completes the creation of the request. To send the XML to Companies we use the WebRequest class (see http://msdn.microsoft.com/en-us/library/system.net.webrequest(VS.71).aspx) to set up the request and then a StreamWriter to actually write the XML. First thing to do is to Create the request by making a call to the Create Method  of the WebRequest class (note: Not by using the new keyword), by passing it the Uri of the Companies House Service (http://xmlgw.companieshouse.gov.uk/v1-0/xmlgw/Gateway) Then tell it what kind of request this is going to be (POST) and finally the format (text/xml). We then request a Stream from the WebRequest object by making a call to GetRequestStream().

Finally the StreamWriter is used to write it to the web.

            WebRequest req = null;
            StreamWriter writer = null;
            String uri = String.Empty;
            uri = "http://xmlgw.companieshouse.gov.uk/v1-0/xmlgw/Gateway";
            req = WebRequest.Create(uri);
            req.Method = "POST";
            req.ContentType = "text/xml";

            writer = new StreamWriter(req.GetRequestStream());
            writer.WriteLine(xmlDoc.OuterXml);
            writer.Close();

Getting a response to the request

Just as we used a WebRequest when sending the request, we use the WebResponse object to get the response, and a stream to read that response. First off get the response from the original WebRequest object. Then get a stream from the WebResponse.

            WebResponse         res                 = null;
            Stream              reader              = null;
            res = req.GetResponse();
            reader = res.GetResponseStream();

To read a stream we basically copy the stream into a MemoryStream and then once it has all be read into the MemoryStream and then subsequently we convert that stream into an Array of Bytes. This is done by the following function;

        private static byte[] ReadFully(Stream stream)
        {
            byte[] buffer = new byte[32768];
            using (MemoryStream ms = new MemoryStream())
            {
                while (true)
                {
                    int read = stream.Read(buffer, 0, buffer.Length);
                    if (read <= 0)
                        return ms.ToArray();
                    ms.Write(buffer, 0, read);
                }
            }
        }

Because we cannot load this Byte Array into an XmlDocument it must be first converted into a string. Bit of a pain really; in that we started with a Stream, converted that to a MemoryStream and then to a Byte Array and finally into a string. The function which does this final conversion is as follows;

        private String ConvertByteArrayToString(byte[] bytes)
        {
            System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();
            return enc.GetString(bytes);
        }

To read the response into a string it now takes just one line;

String GatewayResult = ConvertByteArrayToString(ReadFully(reader));

Once this line is executed GatewayResult will have the response from Companies House. A correct response will be something like the following XML;

<?xml version="1.0" encoding="UTF-8" ?>
<GovTalkMessage xsi:schemaLocation="http://www.govtalk.gov.uk/CM/envelope http://xmlgw.companieshouse.gov.uk/v1-0/schema/Egov_ch-v2-0.xsd" xmlns="http://www.govtalk.gov.uk/CM/envelope" xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:gt="http://www.govtalk.gov.uk/schemas/govtalk/core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
  <EnvelopeVersion>1.0</EnvelopeVersion>
  <Header>
    <MessageDetails>
      <Class>CompanyDetails</Class>
      <Qualifier>response</Qualifier>
      <TransactionID>634141362546959689</TransactionID>
      <GatewayTimestamp>2010-07-07T21:51:04-00:00</GatewayTimestamp>
    </MessageDetails>
    <SenderDetails>
      <IDAuthentication>
        <SenderID>YourUserID</SenderID>
        <Authentication>
          <Method>CHMD5</Method>
          <Value></Value>
        </Authentication>
      </IDAuthentication>
    </SenderDetails>
  </Header>
  <GovTalkDetails>
    <Keys/>
  </GovTalkDetails>
  <Body>
    <CompanyDetails xmlns="http://xmlgw.companieshouse.gov.uk/v1-0/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlgw.companieshouse.gov.uk/v1-0/schema http://xmlgw.companieshouse.gov.uk/v1-0/schema/CompanyDetails-v2-1.xsd">
	<CompanyName>MILLENNIUM STADIUM PLC</CompanyName>
	<CompanyNumber>03176906</CompanyNumber>
	<RegAddress>
		<AddressLine>MILLENNIUM STADIUM</AddressLine>
		<AddressLine>WESTGATE STREET</AddressLine>
		<AddressLine>CARDIFF</AddressLine>
		<AddressLine>CF10 1NS</AddressLine>
	</RegAddress>
	<CompanyCategory>Public Limited Company</CompanyCategory>
	<CompanyStatus>Active</CompanyStatus>
	<CountryOfOrigin>United Kingdom</CountryOfOrigin>
	<RegDateType>0</RegDateType>
	<IncorporationDate>1996-03-25</IncorporationDate>
	<Accounts>
		<AccountRefDate>30-06</AccountRefDate>
		<NextDueDate>2010-12-31</NextDueDate>
		<Overdue>NO</Overdue>
		<LastMadeUpDate>2009-05-31</LastMadeUpDate>
		<AccountCategory>FULL</AccountCategory>
		<DocumentAvailable>1</DocumentAvailable>
	</Accounts>
	<Returns>
		<NextDueDate>2011-04-22</NextDueDate>
		<Overdue>NO</Overdue>
		<LastMadeUpDate>2010-03-25</LastMadeUpDate>
		<DocumentAvailable>1</DocumentAvailable>
	</Returns>
	<Mortgages>
		<MortgageInd>LT300</MortgageInd>
		<NumMortCharges>9</NumMortCharges>
		<NumMortOutstanding>9</NumMortOutstanding>
		<NumMortPartSatisfied>0</NumMortPartSatisfied>
		<NumMortSatisfied>0</NumMortSatisfied>
	</Mortgages>
	<SICCodes>
        	<SicText>9261 - Operate sports arenas &amp; stadiums</SicText>
	</SICCodes>
	<LastFullMemDate>2010-03-25</LastFullMemDate>
	<HasUKestablishment>0</HasUKestablishment>
	<HasAppointments>1</HasAppointments>
	<InLiquidation>0</InLiquidation>
</CompanyDetails>
  </Body>
</GovTalkMessage>

Which shows the default Millenium PLC company.

Last thing to do is convert that to an XmlDocument – for manipulation how ever you want.

            xmlResult = new XmlDocument();
            xmlResult.LoadXml(GatewayResult);

            return xmlResult;

Dealing with Errors

With the best will in the world it would be really nice if everything worked, all the time and everytime but life likes to make things interesting. If there is a problem with the request the GovTalk service will return something like the following;

<?xml version="1.0" encoding="UTF-8" ?>
<GovTalkMessage xsi:schemaLocation="http://www.govtalk.gov.uk/CM/envelope http://xmlgw.companieshouse.gov.uk/v1-0/schema/Egov_ch-v2-0.xsd" xmlns="http://www.govtalk.gov.uk/CM/envelope" xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:gt="http://www.govtalk.gov.uk/schemas/govtalk/core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
  <EnvelopeVersion>1.0</EnvelopeVersion>
  <Header>
    <MessageDetails>
      <Class>CompanyDetails</Class>
      <Qualifier>error</Qualifier>
      <TransactionID>634141372294837235</TransactionID>
      <GatewayTimestamp>2010-07-07T22:07:22-00:00</GatewayTimestamp>
    </MessageDetails>
    <SenderDetails>
      <IDAuthentication>
        <SenderID>UserID</SenderID>
        <Authentication>
          <Method>CHMD5</Method>
          <Value>MD5EncryptedAuthenticationString</Value>
        </Authentication>
      </IDAuthentication>
    </SenderDetails>
  </Header>
  <GovTalkDetails>
    <Keys/>
	<GovTalkErrors>
	  <Error>
	    <RaisedBy>CompanyDetails</RaisedBy>
	    <Number>502</Number>
	    <Type>fatal</Type>
	    <Text>Authorisation Failure</Text>
	    <Location></Location>
	  </Error>
	</GovTalkErrors>
  </GovTalkDetails>
  <Body>
  </Body>
</GovTalkMessage>

In this instance the Qualifier tag will have the value error rather than response   which is returned when the request succeeds. The details of the error can be found in the GovTalkMessage/GovTalkDetails/GovTalkErrors tag. In the above I had altered the password so that it would not authenticate – hence the authorisation failure that can be seen in the Text tag that is contained in the Error tag.

Final Comments

That completes the requesting of Company Details from Companies House – they are now in an XmlDocument for manipulation in whatever way is required. The complete function for sending the request is shown below.

        private XmlDocument GovTalkCompanyDetailRequest(String CompanyNumber, String CompanyName)
        {
            long                transactionId       = DateTime.Now.Ticks;
            String              uri                 = String.Empty;
            String              digest              = String.Empty;
            XmlDocument         xmlDoc              = null;
            XmlDocument         xmlResult           = null;
            XmlNode             TransactionID       = null;
            XmlNode             IDAuthentication    = null;
            XmlNode             SenderId            = null;
            XmlNode             companyNumber       = null;
            XmlNode             companyName         = null;
            XmlNode             AuthenticationValue = null;
            XmlNode             Body                = null;
            XmlNode             DetailRequest       = null;
            XmlNamespaceManager nsmgr               = null;
            WebRequest          req                 = null;
            WebResponse         res                 = null;
            StreamWriter        writer              = null;
            Stream              str                 = null;
            Stream              reader              = null;

            uri = "http://xmlgw.companieshouse.gov.uk/v1-0/xmlgw/Gateway";

            xmlDoc = new XmlDocument();
            str = GetRequest("CompanyDetailRequest.xml");
            xmlDoc.Load(str);

            nsmgr = new XmlNamespaceManager(xmlDoc.NameTable);
            nsmgr.AddNamespace("default", "http://www.govtalk.gov.uk/CM/envelope");

            TransactionID = xmlDoc.SelectSingleNode("default:GovTalkMessage/default:Header/default:MessageDetails/default:TransactionID", nsmgr);

            if (TransactionID != null) TransactionID.InnerText = transactionId.ToString();

            IDAuthentication = xmlDoc.SelectSingleNode("default:GovTalkMessage/default:Header/default:SenderDetails/default:IDAuthentication", nsmgr);

            // if we have found the authentication tag set the security parameters
            if (IDAuthentication != null)
            {
                SenderId = IDAuthentication.SelectSingleNode("default:SenderID", nsmgr);
                AuthenticationValue = IDAuthentication.SelectSingleNode("default:Authentication/default:Value", nsmgr);
                digest = String.Format("{0}{1}{2}", "UserID", "Password", TransactionID.InnerText);

                SenderId.InnerText = "XMLGatewayTestUserID";
                AuthenticationValue.InnerText = EncodePassword(digest);

            }

            Body = xmlDoc.SelectSingleNode("default:GovTalkMessage/default:Body", nsmgr);

            if (Body != null)
            {
                DetailRequest = Body.SelectSingleNode("default:CompanyDetailsRequest", nsmgr);
                companyNumber = DetailRequest.SelectSingleNode("default:CompanyNumber", nsmgr);
                companyName = DetailRequest.SelectSingleNode("default:CompanyName", nsmgr);

                companyNumber.InnerText = CompanyNumber;
                companyName.InnerText = CompanyName;

            }

            req = WebRequest.Create(uri);
            req.Method = "POST";
            req.ContentType = "text/xml";

            writer = new StreamWriter(req.GetRequestStream());
            writer.WriteLine(xmlDoc.OuterXml);
            writer.Close();

            res = req.GetResponse();

            reader = res.GetResponseStream();

            String GatewayResult = ConvertByteArrayToString(ReadFully(reader));

            xmlResult = new XmlDocument();
            xmlResult.LoadXml(GatewayResult);

            return xmlResult;

        }
Advertisements

4 thoughts on “Using the Companies House GovTalk Service to retrieve Company Details

  1. Pingback: 2010 in review « Developing My Cloud Application

  2. Cheers, this saved me a load of time …

    Couple of quick comments, for briefer code…

    private Stream GetRequestXML(String RequestFile)
    {
    return this.GetType().Assembly.GetManifestResourceStream(“Namespace.Folder.” + RequestFile);
    }

    and

    reader = res.GetResponseStream();

    StreamReader sr = new StreamReader(reader);
    string GatewayResult = sr.ReadToEnd();

    xmlResult = new XmlDocument();

    removing the need for the ReadFully and ConvertByteArrayToString methods.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s