
InterSystems iKnow. Download data from Vkontakte
- Tutorial
This article continues the series of stories ( one , two ) about the main ways / scenarios for using iKnow - a Natural Language Processing tool from the InterSystems technology stack.
Previous posts on this topic were mainly devoted to working with data after they were placed in the domain (the place where the whole analysis of the text goes). The same article will be about how to correctly and conveniently upload information to iKnow. As an example, consider loading information about Vkontakte users: their personal data, posts, etc.
The article implies a certain basic background in the field of InterSystems technologies (in particular, Caché ObjectScript).

According to the official documentation , there are two scenarios for loading data into an existing domain:
The standard library offers many ready-made implementations of the lister (RSS-lister, file lister, global lister). However, the final programmer has the opportunity to write his own implementation, suitable for his own needs.
Before writing a lister for Vkontakte posts, I wrote a wrapper for some Vkontakte API methods on COS that operate on open access data. All code is available on github in the package
I decided that it would be interesting if the lister could download the latest posts for some keyword, well, and some other parameters. It turned out that this is not at all difficult to achieve. Head of Documentationdedicated to customization, says that to create your own lister you need to inherit from the system class and override several methods.
So, everything in the same package I created a class
Each lister must be assigned a unique short name (alias), according to which iKnow system methods will access it. If this name is not specified, the full class name of this lister will be used instead.
To specify alias, simply override the class method in your class
ClassMethod GetAlias () As% String
{
Quit "VKAPI"
}
All data sources submitted for download have an external id, which should contain the short name of the lister and full reference, which, in turn, consists of the name of the source group and local reference.
For the lister to work, you need to redefine the class methods
The extrenal id in our case turned out like this:
Here VKAPI is the short name of our lister, the search query plays the role of the name of the group of sources, and the id of the Vkontakte record is the local reference.
Method code
ClassMethod SplitFullRef (domainId As% Integer, fullRef As% String, Output groupName As% String,
Output localRef As% String) As% Status [Private]
{
set delim = ":::"
set localRef = $ piece (fullRef, delim, $ l (fullRef, delim))
set groupName = $ e (fullRef, 1, * - $ l (localRef) - $ l (delim))
Quit $$$ OK
}
ClassMethod BuildFullRef (domainId As% Integer, groupName As% String, localRef As% String) As% String [Private]
{
quit groupName _ "::: "_localRef
}
You also need to specify which one
ClassMethod DefaultProcessor () As% String
{
Quit "% iKnow.Source.Temp.Processor"
}
All the main boot activity occurs in another overridden method with an eloquent name
First, we give all the method code for our case.
We will have the following arguments (in order): the query word by which we want to search for records; number of records; a boolean value corresponding to whether we want to check the list for loading for the existence of a source with the same local reference; restrictions on the time of publication of the record.
Let's go through the code in more detail.
First, we highlight the arguments.
set query = $ li (listparams, 1)
set count = $ li (listparams, 2)
set checkExists = + $ lg (listparams, 3, 1)
set startDate = $ lg (listparams, 4)
set startTime = $ lg (listparams , 5)
set endDate = $ lg (listparams, 6)
set endTime = $ lg (listparams, 7)
We will make a request to the Vkontakte API through our wrapper method. The result of this method is a list of class objects
#dim response As% ListOfObjects
set tSC = ## class (VKReader.Requests.APIPublicMethodsCaller) .NewsfeedSearch (.response, query,
count ,,, startDate, startTime, endDate, endTime)
quit: $$$ ISERR (tSC) tSC
Register the metadata keys for further easy saving of meta-information. In the metadata we want to store the date and time the post was published, as well as the id, city, country and date of birth of the author.
do ..RegisterMetadataKeys ($ lb ("PostDate", "PostTime", "AuthorID", "AuthorCity", "AuthorCountry",
"AuthorDOB", "AuthorSex"))
Save the comma-separated-list IDs of users and groups, who are the authors of the records we found. Id groups, as in Vkontakte API, are negative integers, and user IDs are positive.
set userIds = "1"
set groupIds = "1"
if (response.GetAt (i) .FromID <0) {
set groupIds = groupIds _ "," _ (- (response.GetAt (i) .FromID))
else {
set userIds = userIds _ "," _ response. GetAt (i) .FromID
}
}
Get information about these users and groups using wrapper methods. They return lists of objects of types
set tSC = ## class (VKReader.Requests.APIPublicMethodsCaller) .UsersGet (.responseUsers, userIds,
"sex, city, bdate, country")
quit: $$$ ISERR (tSC) tSC
set tSC = ## class (VKReader.Requests.APIPublicMethodsCaller) .GroupsGetById (.responseGroups, groupIds,
"city, country")
quit: $$$ ISERR (tSC) tSC
In a loop, we process all the found posts. First, we isolate all the received meta-information into local variables.
set tPostDate = response.GetAt (i). Date
set tPostTime = response.GetAt (i) .Time
set tOwnerID = response.GetAt (i) .OwnerID
set tFromID = response.GetAt (i) .FromID
set tID = response.GetAt (i) .ID
#dim tTextStream as% GlobalCharacterStream
set tTextStream = response.GetAt (i) .Text
if (tFromID <0) {
set tAuthorCity = responseGroups.GetAt (-tFromID). City
set tAuthorCountry = responseGroups.GetAt (-tFromID) .Country
set tAuthorDOB = ""
set tAuthorSex = ""
} else {
set tAuthorCity = responseUsers.GetAt (tFromID) .City
set tAuthorCountry = responseUsers.GetAt. tetutut = tUutCutry = responseUsers.GetAt
. responseUsers.GetAt (tFromID) .DOB
set tAuthorSex = responseUsers.GetAt (tFromID) .Sex
}
Local reference - wall host id, sender id, and record id separated by a grid.
set tLocalRef = tOwnerID _ "#" _ tFromID _ "#" _ tID
If necessary, check to see if there are sources with the same local reference.
if (checkExists) {
continue: .. RefExists (query, tLocalRef, checkExists - 1)
}
The following code could be different if another source handler were selected. I use a handler for temporary storage, so I need to expand the list using the method
set tRef = $ lb (i% ListerClassId, ..AddGroup (query), tLocalRef)
do tTextStream.Rewind ()
if (tTextStream.Size = 0) {
continue
}
set len = 32000
while (len = 32000) {
do ..StoreTemp (tRef, tTextStream.Read (.len))
}
do ..SetMetadataValues (tRef, $ lb (tPostDate, tPostTime, tFromID, tAuthorCity, tAuthorCountry,
tAuthorDOB, tAuthorSex))
Everything. Lister is written!
We will test his work.
I wrote a small web application that, using the lister we implemented, allows you to browse, search for similar ones, add on demand, and delete entries from the domain. Here are some screenshots:
Initially an empty domain.

Click on the plus sign to add new posts.
In the form that appears, fill in the fields and click on the button to add entries.

We are waiting for some time and entries are added.

For those users or groups who have provided information about themselves in the public domain, our lister saves them in the meta-information fields, and this small demo displays them in the form of a not-too-elegant table.
Out of the box, iKnow can show similar entries: click on the button with a target near some post and make sure that it works.

In the course of the article, we figured out how data loading into the domain works, discussed in detail how the average lister works and how to write your own lister, which will also work. We wrote our lister for working with Vkontakte data, and also made sure that it really works modulo the fact that the domain and configuration were created somewhere behind the scenes.
In case there is a desire to look behind these scenes, all the code that was presented, used or mentioned in the article can be found on the project page on github .
Previous posts on this topic were mainly devoted to working with data after they were placed in the domain (the place where the whole analysis of the text goes). The same article will be about how to correctly and conveniently upload information to iKnow. As an example, consider loading information about Vkontakte users: their personal data, posts, etc.
The article implies a certain basic background in the field of InterSystems technologies (in particular, Caché ObjectScript).
Long road to the domain

According to the official documentation , there are two scenarios for loading data into an existing domain:
- The class instance is created
%iKnow.Source.Loader
. It is bound to a specific domain (the one whose id was passed to the constructor). An instance of the class that implements the lister interface is created. This instance invokes a methodAddListToBatch
with some arguments specifying the loaded information. Thus, a new list of information for downloading is added to the current domain batch. This can be done several times. In order to load the current batch into the domain, you need to call the method on the loaderProcessBatch
. This option is better for large volume downloads. - A class instance is created that implements the lister interface , a method is called for this instance
ProcessList
with some arguments specifying the loaded information, and the download takes place directly into the domain directly. This option is better for small volume downloads.
List customization
The standard library offers many ready-made implementations of the lister (RSS-lister, file lister, global lister). However, the final programmer has the opportunity to write his own implementation, suitable for his own needs.
Before writing a lister for Vkontakte posts, I wrote a wrapper for some Vkontakte API methods on COS that operate on open access data. All code is available on github in the package
VKReader
. I decided that it would be interesting if the lister could download the latest posts for some keyword, well, and some other parameters. It turned out that this is not at all difficult to achieve. Head of Documentationdedicated to customization, says that to create your own lister you need to inherit from the system class and override several methods.
So, everything in the same package I created a class
VKReader.Lister
that inherits from the class %iKnow.Source.Lister
. If you write your lister, it must also be inherited from this class. Each lister must be assigned a unique short name (alias), according to which iKnow system methods will access it. If this name is not specified, the full class name of this lister will be used instead.
To specify alias, simply override the class method in your class
GetAlias
. For our VKontakte lister, I did it like this: ClassMethod GetAlias () As% String
{
Quit "VKAPI"
}
All data sources submitted for download have an external id, which should contain the short name of the lister and full reference, which, in turn, consists of the name of the source group and local reference.
For the lister to work, you need to redefine the class methods
BuildFullRef
and SplitFullRef
, accordingly, collect the full reference from groupname and local reference and break it into these two parts. The extrenal id in our case turned out like this:
VKAPI:searchQuery:::vkPostId
Here VKAPI is the short name of our lister, the search query plays the role of the name of the group of sources, and the id of the Vkontakte record is the local reference.
Method code
BuildFullRef
and SplitFullRef
: ClassMethod SplitFullRef (domainId As% Integer, fullRef As% String, Output groupName As% String,
Output localRef As% String) As% Status [Private]
{
set delim = ":::"
set localRef = $ piece (fullRef, delim, $ l (fullRef, delim))
set groupName = $ e (fullRef, 1, * - $ l (localRef) - $ l (delim))
Quit $$$ OK
}
ClassMethod BuildFullRef (domainId As% Integer, groupName As% String, localRef As% String) As% String [Private]
{
quit groupName _ "::: "_localRef
}
You also need to specify which one
Processor
will be standard for this lister. In iKnow Processor
, this is an object that handles the processing of downloaded data directly. There are several types of different handlers (Processor
s), but since in our case the data will only be stored directly in memory, I decided to use a handler for temporary storage. A handler is also specified through an override. ClassMethod DefaultProcessor () As% String
{
Quit "% iKnow.Source.Temp.Processor"
}
All the main boot activity occurs in another overridden method with an eloquent name
ExpandList
. This method extends the list for uploading to the domain. The arguments to the ProcessList and AddListToBatch methods will be the same as you define them in ExpandList
. First, we give all the method code for our case.
We will have the following arguments (in order): the query word by which we want to search for records; number of records; a boolean value corresponding to whether we want to check the list for loading for the existence of a source with the same local reference; restrictions on the time of publication of the record.
A lot of code under the spoiler
Method ExpandList (listparams As% List) As% Status
{
set query = $ li (listparams, 1)
set count = $ li (listparams, 2)
set checkExists = + $ lg (listparams, 3, 1)
set startDate = $ lg (listparams, 4)
set startTime = $ lg (listparams, 5)
set endDate = $ lg (listparams, 6)
set endTime = $ lg (listparams, 7)
#dim response As% ListOfObjects
set tSC = ## class (VKReader. Requests.APIPublicMethodsCaller) .NewsfeedSearch (.response, query,
count ,,, startDate, startTime, endDate, endTime)
quit: $$$ ISERR (tSC) tSC
do ..RegisterMetadataKeys ($ lb ("PostDate", "PostTime", "AuthorID", "AuthorCity", "AuthorCountry",
"AuthorDOB "," AuthorSex "))
set userIds = "1"
set groupIds = "1"
for i = 1: 1: response.Count () {
if (response.GetAt (i) .FromID <0) {
set groupIds = groupIds _ "," _ (- (response.GetAt (i) .FromID))
} else {
set userIds = userIds _ "," _ response.GetAt (i) .FromID
}
}
set tSC = ## class (VKReader.Requests.APIPublicMethodsCaller) .UsersGet (. responseUsers, userIds,
"sex, city, bdate, country")
quit: $$$ ISERR (tSC) tSC
set tSC = ## class (VKReader.Requests.APIPublicMethodsCaller) .GroupsGetById (.responseGroups, groupIds,
"city, country" )
quit: $$$ ISERR (tSC) tSC
for i = 1: 1:response.Count () {
set tPostDate = response.GetAt (i). Date
set tPostTime = response.GetAt (i) .Time
set tOwnerID = response.GetAt (i) .OwnerID
set tFromID = response.GetAt (i) .FromID
set tID = response.GetAt (i) .ID
#dim tTextStream as% GlobalCharacterStream
set tTextStream = response.GetAt (i) .Text
if (tFromID <0) {
set tAuthorCity = responseGroups.GetAt (-tFromID) .City
set tAuthorCountry = responseGroups.GetAt (-tF ) .Country
set tAuthorDOB = ""
set tAuthorSex = ""
} else {
set tAuthorCity = responseUsers.GetAt (tFromID) .City
set tAuthorCountry = responseUsers.GetAt (tFromID) .Country
set tAuthorDOB = responseUsers.GetAt (tFromID) .DOB
set tAuthorSex = responseUsers.GetAt (tFromID) .Sex
}
set tLocalRef = tOwnerID _ "# _ _" # _ "
if (checkExists) {
continue: .. RefExists (query, tLocalRef, checkExists - 1)
}
set tRef = $ lb (i% ListerClassId, ..AddGroup (query), tLocalRef)
do tTextStream.Rewind ()
if (tTextStream.Size = 0) {
continue
}
set len = 32000
while (len = 32000) {
do ..StoreTemp (tRef, tTextStream.Read (.len))
}
do ..SetMetadataValues (tRef, $ lb (tPostDate, tPostTime, tFromID, tAuthorCity, tAuthorCountry,
tAuthorDOB, tAuthorSex)
}
}}
{
set query = $ li (listparams, 1)
set count = $ li (listparams, 2)
set checkExists = + $ lg (listparams, 3, 1)
set startDate = $ lg (listparams, 4)
set startTime = $ lg (listparams, 5)
set endDate = $ lg (listparams, 6)
set endTime = $ lg (listparams, 7)
#dim response As% ListOfObjects
set tSC = ## class (VKReader. Requests.APIPublicMethodsCaller) .NewsfeedSearch (.response, query,
count ,,, startDate, startTime, endDate, endTime)
quit: $$$ ISERR (tSC) tSC
do ..RegisterMetadataKeys ($ lb ("PostDate", "PostTime", "AuthorID", "AuthorCity", "AuthorCountry",
"AuthorDOB "," AuthorSex "))
set userIds = "1"
set groupIds = "1"
for i = 1: 1: response.Count () {
if (response.GetAt (i) .FromID <0) {
set groupIds = groupIds _ "," _ (- (response.GetAt (i) .FromID))
} else {
set userIds = userIds _ "," _ response.GetAt (i) .FromID
}
}
set tSC = ## class (VKReader.Requests.APIPublicMethodsCaller) .UsersGet (. responseUsers, userIds,
"sex, city, bdate, country")
quit: $$$ ISERR (tSC) tSC
set tSC = ## class (VKReader.Requests.APIPublicMethodsCaller) .GroupsGetById (.responseGroups, groupIds,
"city, country" )
quit: $$$ ISERR (tSC) tSC
for i = 1: 1:response.Count () {
set tPostDate = response.GetAt (i). Date
set tPostTime = response.GetAt (i) .Time
set tOwnerID = response.GetAt (i) .OwnerID
set tFromID = response.GetAt (i) .FromID
set tID = response.GetAt (i) .ID
#dim tTextStream as% GlobalCharacterStream
set tTextStream = response.GetAt (i) .Text
if (tFromID <0) {
set tAuthorCity = responseGroups.GetAt (-tFromID) .City
set tAuthorCountry = responseGroups.GetAt (-tF ) .Country
set tAuthorDOB = ""
set tAuthorSex = ""
} else {
set tAuthorCity = responseUsers.GetAt (tFromID) .City
set tAuthorCountry = responseUsers.GetAt (tFromID) .Country
set tAuthorDOB = responseUsers.GetAt (tFromID) .DOB
set tAuthorSex = responseUsers.GetAt (tFromID) .Sex
}
set tLocalRef = tOwnerID _ "# _ _" # _ "
if (checkExists) {
continue: .. RefExists (query, tLocalRef, checkExists - 1)
}
set tRef = $ lb (i% ListerClassId, ..AddGroup (query), tLocalRef)
do tTextStream.Rewind ()
if (tTextStream.Size = 0) {
continue
}
set len = 32000
while (len = 32000) {
do ..StoreTemp (tRef, tTextStream.Read (.len))
}
do ..SetMetadataValues (tRef, $ lb (tPostDate, tPostTime, tFromID, tAuthorCity, tAuthorCountry,
tAuthorDOB, tAuthorSex)
}
}}
Let's go through the code in more detail.
First, we highlight the arguments.
set query = $ li (listparams, 1)
set count = $ li (listparams, 2)
set checkExists = + $ lg (listparams, 3, 1)
set startDate = $ lg (listparams, 4)
set startTime = $ lg (listparams , 5)
set endDate = $ lg (listparams, 6)
set endTime = $ lg (listparams, 7)
We will make a request to the Vkontakte API through our wrapper method. The result of this method is a list of class objects
VKReader.Data.Post
that contains some fields specific to a VKontakte record. #dim response As% ListOfObjects
set tSC = ## class (VKReader.Requests.APIPublicMethodsCaller) .NewsfeedSearch (.response, query,
count ,,, startDate, startTime, endDate, endTime)
quit: $$$ ISERR (tSC) tSC
Register the metadata keys for further easy saving of meta-information. In the metadata we want to store the date and time the post was published, as well as the id, city, country and date of birth of the author.
do ..RegisterMetadataKeys ($ lb ("PostDate", "PostTime", "AuthorID", "AuthorCity", "AuthorCountry",
"AuthorDOB", "AuthorSex"))
Save the comma-separated-list IDs of users and groups, who are the authors of the records we found. Id groups, as in Vkontakte API, are negative integers, and user IDs are positive.
set userIds = "1"
set groupIds = "1"
if (response.GetAt (i) .FromID <0) {
set groupIds = groupIds _ "," _ (- (response.GetAt (i) .FromID))
else {
set userIds = userIds _ "," _ response. GetAt (i) .FromID
}
}
Get information about these users and groups using wrapper methods. They return lists of objects of types
VKReader.Data.User
and VKReader.Data.Group
containing fields that are typical for users and groups of VKontakte (like a city, country, and everything else). set tSC = ## class (VKReader.Requests.APIPublicMethodsCaller) .UsersGet (.responseUsers, userIds,
"sex, city, bdate, country")
quit: $$$ ISERR (tSC) tSC
set tSC = ## class (VKReader.Requests.APIPublicMethodsCaller) .GroupsGetById (.responseGroups, groupIds,
"city, country")
quit: $$$ ISERR (tSC) tSC
In a loop, we process all the found posts. First, we isolate all the received meta-information into local variables.
set tPostDate = response.GetAt (i). Date
set tPostTime = response.GetAt (i) .Time
set tOwnerID = response.GetAt (i) .OwnerID
set tFromID = response.GetAt (i) .FromID
set tID = response.GetAt (i) .ID
#dim tTextStream as% GlobalCharacterStream
set tTextStream = response.GetAt (i) .Text
if (tFromID <0) {
set tAuthorCity = responseGroups.GetAt (-tFromID). City
set tAuthorCountry = responseGroups.GetAt (-tFromID) .Country
set tAuthorDOB = ""
set tAuthorSex = ""
} else {
set tAuthorCity = responseUsers.GetAt (tFromID) .City
set tAuthorCountry = responseUsers.GetAt. tetutut = tUutCutry = responseUsers.GetAt
. responseUsers.GetAt (tFromID) .DOB
set tAuthorSex = responseUsers.GetAt (tFromID) .Sex
}
Local reference - wall host id, sender id, and record id separated by a grid.
set tLocalRef = tOwnerID _ "#" _ tFromID _ "#" _ tID
If necessary, check to see if there are sources with the same local reference.
if (checkExists) {
continue: .. RefExists (query, tLocalRef, checkExists - 1)
}
The following code could be different if another source handler were selected. I use a handler for temporary storage, so I need to expand the list using the method
StoreTemp
(for more details for each handler, see the page with its documentation). I also need to set the resulting values for the metadata fields. set tRef = $ lb (i% ListerClassId, ..AddGroup (query), tLocalRef)
do tTextStream.Rewind ()
if (tTextStream.Size = 0) {
continue
}
set len = 32000
while (len = 32000) {
do ..StoreTemp (tRef, tTextStream.Read (.len))
}
do ..SetMetadataValues (tRef, $ lb (tPostDate, tPostTime, tFromID, tAuthorCity, tAuthorCountry,
tAuthorDOB, tAuthorSex))
Everything. Lister is written!
We will test his work.
Testing the Lister
I wrote a small web application that, using the lister we implemented, allows you to browse, search for similar ones, add on demand, and delete entries from the domain. Here are some screenshots:
Initially an empty domain.

Click on the plus sign to add new posts.
In the form that appears, fill in the fields and click on the button to add entries.

We are waiting for some time and entries are added.

For those users or groups who have provided information about themselves in the public domain, our lister saves them in the meta-information fields, and this small demo displays them in the form of a not-too-elegant table.
Out of the box, iKnow can show similar entries: click on the button with a target near some post and make sure that it works.

Summary
In the course of the article, we figured out how data loading into the domain works, discussed in detail how the average lister works and how to write your own lister, which will also work. We wrote our lister for working with Vkontakte data, and also made sure that it really works modulo the fact that the domain and configuration were created somewhere behind the scenes.
In case there is a desire to look behind these scenes, all the code that was presented, used or mentioned in the article can be found on the project page on github .