Querying Wikipedia in ASP.NET using LINQ-to-Wiki

Have you ever visited Wikipedia and simply just gotten lost in the sheer vastness of knowledge that is available there? If only something existed to allow you to easily create complex queries that would provide you with exactly what you needed using syntax that were familiar with (such as LINQ)? Well then this may be just the post for you!

Introducing LINQ-to-Wiki

LINQ-to-Wiki is a library designed by Petr Onderka to query any sites running MediaWiki (which includes Wikipedia) through any available .NET language. It provides extensive functionality to allow complex queries to be performed and is not limited to just reading wiki pages, but it can also perform edits, content additions and more. You can request a variety of different items that would otherwise normally require a significant amount of scrolling, clicking and result in the eventual “how did I get here” several hours later. All of this after losing focus on your original goal because of sheer magnitude and borderline addiction to knowledge the site can evoke.

A few of the many things related to Wikipedia content that can be accessed through queries in LINQ-to-Wiki are  :

  • Listing all of the articles within a category
  • Listing all of the links contained within a page
  • Grabbing images and related articles
  • Full query and search support

LINQ-to-Wiki uses traditional LINQ queries that any .NET developer would be accustomed to and then the library translates these into API Requests through MediaWiki for whatever big plans that you are trying to conquer the world with.

Getting Started

LINQ-to-Wiki can be accessed in the following two methods :

Once you have added the appropriate references to the LINQ-to-Wiki files to your project, then you are ready to get started!

Your First Query

Querying is really where LINQ-to-Wiki shines (as you could imagine with the cosmos of data within Wikipedia)! The actual querying process is very straight-forward and really doesn’t differ much from using a traditional DataContext that you would be accustomed to working with in any other flavor of LINQ-to-X (SQL, Entities etc.).

You’ll first need to initialize a Wiki class that will act as your DataContext and the source of all of your queries. You can initialize it using actual Login information (if you plan on editing and performing more advanced actions) but in this demonstration we will just be focusing on querying, so feel free to make up your own credentials :

var wikipedia = new Wiki("Example");

Once you have created your necessary Wiki object, then you will basically be ready to start querying. However, Wikipedia is a huge, complex data-filled cosmos and before we start adventuring around in our LINQ-powered spaceship, let’s take a look at a map to see where we can go.

Exploring the Cosmos of Wikipedia

Before we delve to deep into some serious querying, let’s review over some of the properties and collections that we can use from our Wiki object. Since this post is primarily concerned with querying, we will be looking at the Query property of our Wiki object.

var query = wikipedia.Query.AdventurePlaceholder;

Some of the major properties that we will be concerned with regarding querying of our Query object are :

  • allcategoriesThis is an enumeration of all of the available Categories
  • allimagesThis is an enumeration of all of the available Images
  • alllinksThis is an enumeration of all of the available Links
  • categorymembersThis lists all of the pages in a given category
  • backlinksThis finds all pages that link back to a specific page.
  • searchThis allows a full-text search to be performed

From each of these we can use the LINQ methods that we all know and love such as .Where() and .Select() and then we wrap everything up to execute our query using the .AsEnumerable() method. Each of these items will also have specific properties that can be accessed within your inner clauses to further narrow your search, so don’t neglect how wonderful Intellisense can be.

Blasting off into the Cosmos (Finally!)

So let’s start out with a simple query to get ourselves off the launch pad. We will query Wikipedia for all of the images that start with “Microsoft” and return the title of each :

//This will retrieve all of the images that begin with "Microsoft" (using the built-in prefix property) and select the title of each.
var query = wikipedia.Query.allimages().Where(i => i.prefix == "Microsoft").Select(s => s.title).ToEnumerable();

That’s it! Using a simple Controller Action within MVC (for this example) we can output each of our results to a basic list within our View :

public ActionResult QueryWiki()
{
     var wikipedia = new Wiki("Example")
     var query = wikipedia.Query.allimages().Where(i => i.prefix == "Microsoft").Select(s => s.title).ToEnumerable();
     return View(query);
}

along with this simple View :

<ul>
     @foreach (var image in Model){
         <li>@image</li> 
     }
</ul>

will result in a huge (and very ugly) list of all of the images within Wikipedia that begin with “Microsoft”.

"Microsoft" Wikipedia Image Results

Query results containing all Wikipedia Images that begin with “Microsoft”

Let’s spice it up a bit (because just text is boring)

Let’s make things a little more appealing to the eyes by pulling some additional properties besides the title of the images. We can use the url, height and width properties available from our images to create a similar list that will feature images of each of these items instead of just a plain-jane unordered list.

First, we will create a very simple class that will store the properties that we are concerned about that we can pass across to the View for display :

public class WikiImage
{
     public string Url { get; set; }
     public int Height { get; set; }
     public int Width { get; set; }
     //Simple Constructor
     public WikiImage(string url, int height, int width)
     {
          Url = url;
          Height = height;
          Width = width;
     }
}

Using our new and improved query (which will select the url, height and width properties from our image)

var query = wikipedia.Query.allimages()
            .Where(i => i.prefix == "Microsoft")
            .Select(s => new WikiImage(s.url,s.height,s.width)).ToList();

along with a few minor adjustments to the View (the controller action remains basically the same),

@foreach (var image in Model){
     <img src='@image.Url' height='@image.Height' width='@image.Width' /><br />
}

gives us our result…

(err the result is too big to easily display full-size. I’ll adjust the height and width in the view to provide a better example)

*ahem* And gives us our result!

A Ton of Microsoft Square Images

Results from our new query to grab all of the images that start with “Microsoft” on Wikipedia

Additional Complexity Coming Soon!

This post is a just a simple example of some of the things that you can do using LINQ-to-Wiki. Next time, we will be covering using some of the more advanced features such as using PageResults to create even more complex queries and pulling some additional data and who knows what else!

For More Information (if you just can’t wait to dig in)

If you are interested in learning a bit more about LINQ-to-Wiki, visit the github page where you can find a plethora of documentation detailing each of the individual methods and properties that you can query against. I would also highly recommend downloading the LINQ-to-Wiki Samples project, which contains all kinds of samples to get you started.

You can also download this example from github from the link below :

About these ads

7 thoughts on “Querying Wikipedia in ASP.NET using LINQ-to-Wiki

  1. I agree as well. I understand the documentation provided on the github page has all the details of the library but some of the information is a little confusing and having read through this website has undoubtedly clarify some things.

    • The details were a bit hard to swallow as there was simply so much documentation available there. I thought a simple write-up like this would help people out a bit and make things a bit easier to understand.

      Thanks again for visiting the site :)

  2. Hello Rion,

    By any chance. Do you know if it would be possible to obtain all the information from a page with a list of subjects, for instance: http://en.wikipedia.org/wiki/List_of_sports

    I have been trying to change the parameters and even going into the wiki.Query class but I have not been able to get any info from the wikipedia sandbox and therefore, not sure how to do this.

    If you have any idea or suggestion, I would really appreciate it.

    Thank you.

  3. This is cool, but please keep publicly accessible members properly camelcased. Having random methods and properties in all lowercase is really jarring compared to… literally everything in the .NET Framework.

    • I believe that all of the code that I used follows a reasonable system of naming conventions (as there wasn’t an extraordinary amount of code that I wrote with the exception of the WikiImage class).

      The lowercase properties that you are noticing are a result of the actual LINQ-to-Wiki library itself. All of the references (such as i.prefix, i.url, and the allimages() methods) are a result of the naming conventions that were used within the library.

      I wasn’t that crazy about actually having to use that syntax myself.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s