Information system based on Semantic MediaWiki

  • Tutorial

Foreword


It is very fortunate that a few days ago a good article appeared about the Semantic MediaWiki . Without claiming the same profound presentation of the material, I will pick up the baton and describe my practical experience in using MediaWiki with almost zero initial knowledge. I apologize to the author of the first ganqqwerty article for running ahead and telling you about Semantic Forms.

Start


At the beginning of the year, I volunteered to solve a non-core task - to create an information system for our organization. Now the decision has more or less taken shape, I will try to share my experience.

Our employees annually report on their achievements. Based on this information, quantitative indicators are calculated. All sorts of summary tables are also interesting. In general, there is enough really useful information there, it makes sense to make it convenient to get it.

Previously, everything was designed as an Excel table of a certain structure. Each employee filled out his sheet, indicators were considered according to the given formulas. On this, in general, information ended its journey - if it was used somewhere else, it had to be extracted again.

As it always happens, I did not come up with this idea at all - I wanted, roughly speaking, to make my VKontaktik to improve awareness of each other. The idea in the minds of the bosses was transformed and shot me with this project - they say, great, we’ll definitely do it, but we have annual reports on the nose, can this information be hammered into such a system? I pretend to be " dashing and silly ", I answer in the affirmative and go to study the material part.

Task


So, it is required to create a website very, very quickly, where each user can easily and simply post information of a certain structure. And so that this information could be easily processed - all sorts of indicators to consider, build lists, tablets. Search, of course, is needed, and not just textual, but taking into account the structure of this very information.

MediaWiki


There was almost no time to study implementation options, I had to trust intuition. I decided that if MediaWiki is successfully used in large projects, primarily Wikipedia , then this should suit us. It’s not with their hands that they write everything there, there should be automation tools that I need.
As befits a serious system, MediaWiki has an extension mechanism and this gives hope that everything necessary has already been added.

Installation and initial setup of MediaWiki was carried out in full accordance with the instructions. A sign of product maturity - with Redmine had to mess around much longer.

A little longer I had to torment myself with setting up LDAP authentication due to some kind of profound error, but everything worked out and employees got the opportunity to use the system with their credentials. Access by anonymous users was completely denied.

Facilitating input - forms


The first task is to save users from wiki markup. This barrier is too high, at best I will be overwhelmed with questions, at worst - no one will use the system. I’m looking for an extension that allows you to use forms to enter information. After a couple of unpretentious long-abandoned extensions, I find what I need: Semantic Forms .

This extension allows you to create descriptions of forms that are placed on pages in the namespace Form.
For example, the description of the form for filling out information about the employee is on the page Form:Сотрудникand, as a first approximation, looks like this:

Этот текст будет показан при просмотре страницы.
Обычно он содержит описание формы.
Само определение формы находится внутри тега includeonly.
{{{for template|Сотрудник}}}
Должность: {{{field|Должность}}}
Отдел: {{{field|Отдел}}}
{{{end template}}}


Now on a page you need to insert a special function call:

Введите Фамилию Имя Отчество сотрудника чтобы создать или редактировать его страницу:
{{#forminput:form=Сотрудник}}


The result will be an input field for the name of the new / edited page and a button:



As expected, clicking on the button will open a page with the form:


Patterns


Next is the most interesting. In what form is the data entered in the form stored and what to do next?

Proceeding to editing the source text of the saved page, you can see the following construction:

{{Сотрудник
|Должность=начальник
|Отдел=особый
}}


This call template Template:Сотрудникwith parameter values Должностьand Отделequal to начальникand особыйrespectively. Templates are defined on pages from the namespace Templateand determine what the template call will be replaced with. The values ​​of the template parameters will be substituted for the parameter names in triple braces. If you define a template this way:

Должность:  {{{Должность}}}
Отдел: {{{Отдел}}}
[[Category:Сотрудник]]


then Ivanov Ivan Ivanovich’s page will look like this:



The last line in the template definition indicates that the page belongs to a category Сотрудник. Each category has its own page in the namespace Category(in our case - Category:Сотрудник), on which all pages from this category are listed. On the same page, you can set special category properties, for example, a form that will be used to edit category pages:

[[Has default form::Сотрудник]]


Semantic MediaWiki and semantic annotations (properties)


Categories alone are not enough to structure information. And then heavy artillery comes to the rescue - the extension of Semantic Forms brought me to the Semantic MediaWiki . This extension allows explicit definition of semantic annotations . For ease of understanding, programmers can consider wiki pages as objects, and semantic annotations as named properties of these objects. I will also talk about properties in the future. The syntax for defining properties is similar to the syntax for defining categories (belonging to a category can be considered a property of an object):

[[Отдел::особый]]


In our template, Position and Department are natural candidates for the role of properties. Let's fix this in the template:

Должность: [[Должность::{{{Должность}}}]]
Отдел: [[Отдел::{{{Отдел}}}]]
[[Category:Сотрудник]]


Visually, practically nothing has changed - instead of defining a property, its value is displayed, that is, the value of the template parameter:



By default, the property is of the Page type, that is, the name of the wiki page, so the property values ​​turned red - this way links to non-existing pages are shown. If the pages existed, the links would be blue. The type of property can be changed. A question for the reader to understand the main ideas: where and how can I change the type of property?

  • Where: like other entities, properties have their own namespace. Therefore, properties (type, etc.) of the property itself are Должностьset on the page Property:Должность.
  • Like: by itself, using the same mechanism of properties. Let the property Должностьtype Stringand a set of possible values:


This is a property of type [[Has type::String]].
The allowed values for this property are:
* [[Allows value::начальник]]
* [[Allows value::дурак]]


By the way, this change will affect the form: the input field Должностьwill turn into a drop-down list with the corresponding values.

Note: if the magic does not work, you will have to add a parameter propertyto the field definition. The parameter value is the name of the property used by this field:

{{{field|Должность|property=Должность}}}


Inquiries


It remains to deal with data processing. Categories and properties can be used in queries ; query results can be included in the text of pages. Instead of Hello, world! display a table of employees:

{{#ask: [[Category:Сотрудник]] 
|?Должность
|?Отдел
|format=table}}


First, a couple of words for a general understanding of the syntax: {{#f: ... }}- This is a function call with the name f. Functions are defined in extensions, I have not tried to define them. Vertical sticks share function parameters. That is, we have a call to the ask function with four parameters.

This request consists of two parts. The first part (the first parameter of the ask function) selects pages that satisfy a certain rule. In this case - belonging to the category Сотрудник. The second part (other parameters) determines the method of outputting the results. In this case, it will be a table with three columns:
  1. The name of the page . The column is displayed by default, but this can be suppressed if necessary by the parameter mainlabel=-.
  2. Position . This we have already asked.
  3. Department . And this is us too.


If necessary, you can add a filter, for example, choosing employees of a certain department (in this case, the Department column can be deleted, it is boring):

{{#ask: [[Category:Сотрудник]] [[Отдел::особый]]
|?Должность=а сюда можно вписать заголовок столбца
|format=table}}


The output formats of the ask function can do quite a lot. In particular, I used format=sumto summarize the values ​​of a given property for the found object pages. For example, if each employee has the Salary property, then this way you can calculate the total salary for the department.

Calculations


For more complex calculations, the ParserFunctions extension offers a set of functions similar to control constructs (if and switch) and expressions in programming languages.

Loops are not directly supported, you can use recursion on auxiliary templates instead, but this will not add readability or performance. There is a separate LoopFunctions extension for loops , but I have not tried it.

There were enough ParserFunctions for my computational tasks, but as a general solution it would be interesting to find an extension that allows using some scripting language inside the wiki. Possible candidates, if I understood their descriptions correctly, are:
  1. Scribunto - extensions for embedding scripting languages, so far only Lua is supported;
  2. Script - calculations on R;
  3. Winter - ( W iki Inter preter) - its own language, reminiscent of PHP and a bit of LISP, as written in the documentation ;
  4. StackFunctions - almost PostScript without graphics.


When choosing an extension for a scripting language, pay attention to security!

Subobjects


Objects without fields whose values ​​are lists of entities is too simple a case. Everything in life is much harder and you need to be able to cope with it.

Suppose you want to give employees the opportunity to keep records of their business trips: departure-return dates and purpose. Simply adding a field for entering arbitrary text is not suitable - the structure of information and the possibility of its analysis are lost.

It would be possible to start a separate page for each trip, and on the employee’s page display the result of the request for his business trips (those who wish can implement the corresponding forms, templates and requests for training). But quite often, this approach unnecessarily complicates the input of information. If necessary, everything can fit on one page.

By tradition, let's start with the user interface. If the template parameter is a list of subobjects, then the form field for this parameter must be associated with the form to define the subobject.
The SemanticForms extension will automatically generate an interface for managing a list of subobjects.
It was not easy to formulate, to read, I think, even more difficult, so I will give an example.

For the Командировкиform field , Сотрудникyou need to specify a parameter holds template, and below (it won’t work otherwise), define another form ( Командировка) and specify the parameters in it multiple- it can enter several times and embed in field=Сотрудник[Командировки]- this form determines the value of the Командировкиform field Сотрудник:

{{{for template|Сотрудник}}}
...
{{{field|Командировки|holds template}}}
{{{end template}}}
{{{for template|Командировка|label=Командировки|multiple
|embed in field=Сотрудник[Командировки]}}}
Отъезд: {{{field|Отъезд}}}
Возвращение: {{{field|Возвращение}}}
Цель: {{{field|Цель}}}
{{{end template}}}


Note: The same template ( Командировка) cannot be bound to several fields ( Командировкиand anything else). You have to create intermediate templates.

The result will be this interface:


After saving the page, the field value Командировкиwill be a list of template calls Командировка:

{{Сотрудник
...
|Командировки={{Командировка
|Отъезд=2013/04/30
|Возвращение=2013/05/10
|Цель=заодно и отдохнуть
}}{{Командировка}}
}}


In the template definition, Сотрудникparameter substitution Командировкиwill cause recursive template substitution Командировка, which is defined as follows:

{{#subobject:
|Отъезд={{{Отъезд}}}
|Возвращение={{{Возвращение}}}
|Цель={{{Цель}}}
}}


Now, not only the employee’s page is an object, on this page for each business trip its own subobject is defined. The same query language is used to fetch subobjects. By adding the Сотрудникfollowing query to the template definition :

{{#ask: [[-Has subobject::{{FULLPAGENAME}}]]
|?Отъезд
|?Возвращение
|?Цель}}


we will get a sign with all the employee’s business trips. Of the new here, only the use of the property Has subobject. This property is automatically detected on all pages and its value is the set of subobjects defined on this page. A minus sign at the beginning means that this property must be inverted , that is, use feedback from the subobject to the page. {{FULLPAGENAME}}Is a built-in variable whose value is the name of the current page. Thus, we choose trips for the current employee.

In the documentation this moment is described rather vaguely, part of the information in the discussion , had to act by trial and error. In the end, a solution and understanding were found, I share.

Access rights


Of course, I agreed at the beginning of the project that with the restriction of access rights in MediaWiki it is bad and any registered user will be able to view all the information. However, the appetite comes with food and access restrictions still had to be screwed.

A study of extensions implementing access control showed that IntraACL is the leader . This is an extension and patch of MediaWiki. There are still no guarantees of full control, because extensions have direct access to the database, and for good, you need to view and patch them. Fortunately, this level of security suited everyone.

Unfortunately, the finished patch was only for MediaWiki 1.18.6, and I already installed 1.20.2 and downloaded decent data. I had to sit and port the patch for several days. According to the law of meanness, the very next day, after everything worked for me, a ready-made patch for MediaWiki 1.20.3 appeared.

When installing, pay attention to the ACL namespace index - it should not conflict with other namespaces. It seems that everything should work, because in the file HACL_GlobalFunctions.phpthis index is defined in 300:

if (!isset($haclgNamespaceIndex))
        $haclgNamespaceIndex = 300;


But the HACL_Initialize.phpvariable is pre-initialized inappropriately:

$haclgNamespaceIndex = 102;


IntraACL allows you to define user groups and assign groups and individual users rights for specific pages, namespaces and categories. The definitions of groups and access rules are stored on pages in the ACL namespace. You can work through the graphical interface or directly edit the source of wiki pages.

I came across an annoying feature - if you create a list of access rights before the user is created, then the rights do not work until you re-save the page with the list of access rights. It was a hassle until I found a script maintenance/createAndPromote.phpand modified it so that you could create regular users without waiting for them to log into the system. Let me remind you that the list of users is known to me in advance.

Probably if I knew about assembly right awayMediawiki4Intranet , which includes IntraACL, I would use this solution and save myself a few days.

Server Code Debugging


While I patched IntraACL, I figured out debugging tools . I really liked the debugging console, which allows you to view the log file directly on the page. It turns on like this:

$wgDebugToolbar = true;


Conclusion



I liked the Semantic MediaWiki as the basis for creating an information system. The tasks are practically solved:
  • Users enter and edit information on their own.
  • Based on this information, indicators are considered and summary tables are displayed.
  • It is possible to determine access rights to individual parts of the system.
  • A large number of ready-made extensions and good documentation allow you to quickly add new functionality.


There are also disadvantages:
  • As far as I remember, wiki markup was invented to simplify the creation of pages by simple users. In this case, the tasks are too complicated. Similar constructions in the syntax of programming languages, in my opinion, would look simpler. Although it may be all a matter of habit.
    Forms allow you to hide markup from ordinary users, but administrators who define forms and templates have to worry.
  • Delay in updating query results. To update the information often have to re-save the page. This can be a source of errors.
  • I'm a little annoyed by the global names of properties, templates, and other service entities. In programming languages, this is stricter.


Acknowledgments


I really appreciate it:

Also popular now: