Experiences Extracting XCRI
In a previous post we explored the web for XCRI resources. From that investigation, the most useful thing that we found was the XCRI directory, which lists 65 course data feeds. The types of feed break down thus:
40SOAP web services
17HTTP/REST web services
We will ignore the demo feeds, and the HTTP/REST feeds are easy enough to access at their base URLs, with an ordinary HTTP request (e.g. from your web browser). In this post we will focus on our experiences with the SOAP endpoints, to get an understanding of the challenges involved in constently harvesting data from them.
Discovering the endpoints
Whether it is HTTP/REST or SOAP, the XCRI directory has some significant limitations in terms of accessing its listing. There is a web user interface, but no machine readable API. In order to extract the URLS from which the feeds can be obtained, it is necessary to manually click-through from the directory listing to a feed-specific page, to obtain the endpoint URL. For example, Wolds College
As you can see from that page, we can then manually extract the following useful information:
- Data Link: http://host.igsl.co.uk:7101/Lincs-webservice-context-root/xxpSoapHttpPort?WSDL
- A related website: http://www.xxp.org/getlincscourses.html
- A description of the service: The SOAP call accepts a single parameter specifying the UKPRN of the provider, in this case 10022722.
This is a painstaking way of obtaining the information, and as the directory grows the problem will become compounded.
How to use the endpoints
One of the most significant issues for rapidly integrating the XCRI feeds is the heterogeneity in implementation of the SOAP APIs. For example, let us consider two different organisations:
There are a number of striking things which affect the ease by which these services can be integrated into, for example, an aggregator:
Each service has a different SOAP method to be invoked to get the data. The Open University uses getOUCourses, and Lincolnshire Teenage Services uses getLincsCourses.
Each service takes different arguments to its SOAP method in order to get the data. The Open University uses ALL, UG or PG, and Lincolnshire Teenage Services uses college identifiers (there is no equivalent to an ALL)
Each service is documented differently
Together, these make a standardised approach to harvesting data from the services basically impossible, and therefore each service will need to be incorporated in its own custom way. There would be a significant advantage in standardising to some degree the SOAP methods to be used. Or better still to abandon SOAP altogether and insist on a REST API.
We are currently working on software which will ease the process of integrating XCRI sources from both SOAP and REST endpoints, and a future blog post will discuss that work.