reposting Twitter (or rss) to vkontakte.ru status on Haskell
This article will focus on a small program that reposts tweets to VKontakte status.
The task is quite simple and completely unoriginal. It all started with the fact that I read an article on Habr about how this is solved in python and a similar article about php . On the Internet, it seems that even some online services are specifically for this task. But here the whole tsimus is to solve this simple problem yourself, using your favorite tools. Actually the solution on php appeared later and for the same purpose.
Well, what did I write on? At haskell, natürlich!
I will tell you in more detail about how I did everything and how to repeat it. Perhaps no special knowledge is required for understanding.
Introduction
The two articles and the article about reposting from rss in livejournal on the Haskell helped me in implementing the solution .
At first, I wanted to honestly do twitter work via twitter-api : I poked the corresponding library from hackage , but it didn’t work right away and I left it - I wanted to get the result faster and it was too lazy to dig into and figure out what I'm doing wrong. And since twitter is broadcast on rss and reading rss on haskell is already a solved problem, I went this way.
Moreover, this is a more universal solution. You can broadcast any rss channel to vkontakte. You could even say that this is not twitter2vkontakte, but rss2vkontakte.
In addition, I used vkontakte-apiinstead of parsing the page in search of status, like my predecessors. I think this is a plus.
The rest is the literate-haskell code. That is, not code with comments, but detailed comments with pieces of code, which are normal sources on haskell. This post can simply be saved as a whole to a file with the extension .lhs and fed to the interpreter / compiler. Everything should work fine.
All the working code here is highlighted with these symbols:
Necessary preparations
It is assumed that you already have the Haskell compiler and the main set of libraries. If not, then this is easy to fix - you need to install the Haskell Platform . It is very simple.
Now, to install additional libraries, just type in the console:
The following is a short list of imports with brief explanations.
A couple of times I used regular expressions: Once I cut and pasted the list: For all Internet queries I used the curl library: I read and parse rss feeds: And even once I encoded the string in Unicode: Then there will be a more meaningful code with more meaningful and, possibly, sometimes overly detailed explanations ... Twitter via rss The first thing we need is the address of our rss tweet feed. You can take it on your Twitter page. Let's get a separate constant for it: How to pick up rss feeds and parse, I spied in an article about rss2lj
. But I did not use this library. Everything is well done there, of course, but I need one simple function that will download the rss feed, take the first element and extract its contents. And here’s how I did it: I’ll explain what is happening in it. The function takes the url, the list of options, and gives the operation code ( if everything went well) and the server response. In this case, we specify our twitter-rss feed as the address, and do not give any options. We do not pay attention to the completion code. But the substantial part of the answer is called feed. The next line should be read from right to left: we extract the elements of the feed ( ), get the list, take the first element ( ) from it, extract the message ( ) from it, and return it to the output.
And now in more detail about these functions, in the same order. Each of them is written in point-free-style, that is, without specifying an argument, simply as a composition (dot.) Of other functions.
The composition can also be read from right to left, point by point (: that is, in the order of application of the functions:
After work,
This function has a similar
Formatting (
That's all with rss. I may have described everything in unnecessarily detailed, but I think, for those who are curious, unfamiliar with haskell, this description was informative.
Vkontakte api
First of all, we’ll create several constants for working with VKontakte: This is the data corresponding to your registration on VKontakte. All operations are carried out by GET requests to the server (all the same function ), with the corresponding tricky addresses. They are built as follows:
base address (for example userapi.com/data ?) plus a list of parameters in the form key = value, separated by ampersands &.
To form such addresses, we write a couple of auxiliary functions: This function simply takes a pair (key, value) and makes a string of the required format from it. We form the url of the required format from the base address , the list of options (in the form of pairs), and the session identifier (about it later). The content is in brackets: it takes a function and a list, and applies the function to each element of the list. That is, from a list of pairs , makes a list of strings . And just glues all these lines into one ( ).
For different tasks, the set of options is different, but in all cases you need to specify the user identifier (
In order to somehow work with VKontakte, you must first log in. Then the server will give us cookies (cookies) and the session identifier (sid = session id). I did not use cookies, but sid is needed for almost any operation with obtaining / changing user data. The authentication address has a bunch of options, the purpose of which I did not understand, but I took from the documentation and nothing works without them. We form this address using the function just written
What happens in it: a curl request is sent to an address
I will not dwell on regular expressions in haskell - this is a separate topic. We can assume that this is just a search for a substring of the desired type.
Wonderful! We got sid, now we have all the possibilities of api in contact. For our task, only one is needed - a change in status.
In principle, any interaction with VKontakte will be free to the following command:
where
One more trifle: there will be spaces in the tweet text, and this is unacceptable for a url request. Therefore, we will make a simple function that replaces all spaces with% 20: It splits the string into a list of words, inserts the string "% 20" between adjacent elements of this list, and then glues everything again into one string (the function performs the last two actions ). Now we can collect from the already discussed parts, the function of changing the status: It would be possible to write this function and simpler, in one line: But the first option is clearer, there the server response is checked - if the answer contains , then everything is fine - the status has changed, about which we inform the user (that is, that is). All! Now we have all the parts of the mosaic and assembling it is very simple. main
What all these functions were written for: Looks extremely simple, doesn't it? Here comments are superfluous. I think that all other functions look quite clear with my explanations. Statistics for the sake of ~ 40 LinesOfCode. Conclusion To run this code, you must, as already mentioned, simply save the entire post in a file with the extension .lhs and type in the console: That's all. I don’t know if a sequel on how to automate this launch is needed. For myself, I (as a Mac OS X user) decided this by creating a “Service” in Automator and assigning a hotkey, to quickly call it up is only automating the launch, but for me this is enough.
Hope this was interesting to anyone to read. Waiting for your questions / suggestions / objections (:
upd: moved to a thematic blog.
The task is quite simple and completely unoriginal. It all started with the fact that I read an article on Habr about how this is solved in python and a similar article about php . On the Internet, it seems that even some online services are specifically for this task. But here the whole tsimus is to solve this simple problem yourself, using your favorite tools. Actually the solution on php appeared later and for the same purpose.
Well, what did I write on? At haskell, natürlich!
I will tell you in more detail about how I did everything and how to repeat it. Perhaps no special knowledge is required for understanding.
Introduction
The two articles and the article about reposting from rss in livejournal on the Haskell helped me in implementing the solution .
At first, I wanted to honestly do twitter work via twitter-api : I poked the corresponding library from hackage , but it didn’t work right away and I left it - I wanted to get the result faster and it was too lazy to dig into and figure out what I'm doing wrong. And since twitter is broadcast on rss and reading rss on haskell is already a solved problem, I went this way.
Moreover, this is a more universal solution. You can broadcast any rss channel to vkontakte. You could even say that this is not twitter2vkontakte, but rss2vkontakte.
In addition, I used vkontakte-apiinstead of parsing the page in search of status, like my predecessors. I think this is a plus.
The rest is the literate-haskell code. That is, not code with comments, but detailed comments with pieces of code, which are normal sources on haskell. This post can simply be saved as a whole to a file with the extension .lhs and fed to the interpreter / compiler. Everything should work fine.
All the working code here is highlighted with these symbols:
>
Necessary preparations
It is assumed that you already have the Haskell compiler and the main set of libraries. If not, then this is easy to fix - you need to install the Haskell Platform . It is very simple.
Now, to install additional libraries, just type in the console:
cabal update
cabal install regex-tdfa curl feed utf8-string
The following is a short list of imports with brief explanations.
A couple of times I used regular expressions: Once I cut and pasted the list: For all Internet queries I used the curl library: I read and parse rss feeds: And even once I encoded the string in Unicode: Then there will be a more meaningful code with more meaningful and, possibly, sometimes overly detailed explanations ... Twitter via rss The first thing we need is the address of our rss tweet feed. You can take it on your Twitter page. Let's get a separate constant for it: How to pick up rss feeds and parse, I spied in an article about rss2lj
> import Text.Regex.TDFA ((=~))
> import Data.List (intercalate)
> import Network.Curl (curlGetString)
> import Network.Curl.Opts
> import Text.Feed.Import (parseFeedString)
> import Text.Feed.Query (getFeedItems, getItemSummary)
> import Codec.Binary.UTF8.String (encodeString)
> feedUrl = "https://twitter.com/statuses/user_timeline/22251772.rss"
. But I did not use this library. Everything is well done there, of course, but I need one simple function that will download the rss feed, take the first element and extract its contents. And here’s how I did it: I’ll explain what is happening in it. The function takes the url, the list of options, and gives the operation code ( if everything went well) and the server response. In this case, we specify our twitter-rss feed as the address, and do not give any options. We do not pay attention to the completion code. But the substantial part of the answer is called feed. The next line should be read from right to left: we extract the elements of the feed ( ), get the list, take the first element ( ) from it, extract the message ( ) from it, and return it to the output.
> getTweet :: IO String
> getTweet = do
> (_,feed) <- curlGetString feedUrl []
> return $ getMsg $ head $ getItems feed
> where
> getItems = maybe (error "rss parsing failed!") getFeedItems . parseFeedString
> getMsg = maybe (error "rss-item parsing failed!") format . getItemSummary
> format = unwords . ("twitter:":) . tail . words . encodeString
curlGetString :: URLString -> [CurlOption] -> IO (CurlCode, String)
CurlOk
getItems feed
head
getMsg
And now in more detail about these functions, in the same order. Each of them is written in point-free-style, that is, without specifying an argument, simply as a composition (dot.) Of other functions.
The composition can also be read from right to left, point by point (: that is, in the order of application of the functions:
getItems
first, the function is used parseFeedString
(from the Feed library), it is of type ( String -> Maybe Feed
), that is, it receives a string with any porridge from rss tags at the input, and gives an abstract type of feed with which you can already do something. Since the value is returned Maybe Feed
(“There may be a feed”), it may turn out that the parser will choke and return Nothing
- then we get an error with the text “rss parsing failed!”. If parsing succeeds, we get the value ( Just фид
), and then apply the function to itgetFeedItems
that retrieves list items from the feed. This branch ( Nothing
or Just ...
) is implemented by a standard function maybe
. After work,
getItems
we get a list of feed items [Item]
. We need only the first of them (that is, the last by date). Take it by function head
. And now we want to dig out of his text message: getMsg
. This function has a similar
getItems
structure: first applied getItemSummary
, which returns Maybe String
. If it was not possible to extract the content, we issue a corresponding error. Otherwise, we format the received message. Formatting (
format
) is performed briefly as follows (again from right to left): encode the line in unicode, break it into words (by spaces), throw out the first word, insert “twitter:” instead (optional), glue all the words back into one line. The first word in rss tweets is always your nickname. Therefore, we throw it away. That's all with rss. I may have described everything in unnecessarily detailed, but I think, for those who are curious, unfamiliar with haskell, this description was informative.
Vkontakte api
First of all, we’ll create several constants for working with VKontakte: This is the data corresponding to your registration on VKontakte. All operations are carried out by GET requests to the server (all the same function ), with the corresponding tricky addresses. They are built as follows:
> email = "Ваш e-mail"
> uid = "Ваш user-id вконтакте"
> pass = "Ваш пароль"
curlGetString
base address (for example userapi.com/data ?) plus a list of parameters in the form key = value, separated by ampersands &.
To form such addresses, we write a couple of auxiliary functions: This function simply takes a pair (key, value) and makes a string of the required format from it. We form the url of the required format from the base address , the list of options (in the form of pairs), and the session identifier (about it later). The content is in brackets: it takes a function and a list, and applies the function to each element of the list. That is, from a list of pairs , makes a list of strings . And just glues all these lines into one ( ).
> param :: (String, String) -> String
> param (key, value) = key ++ "=" ++ value ++ "&"
> formUrl :: String -> [(String, String)] -> String -> String
> formUrl base opts sid = base ++ ( concatMap param (opts++[("id",uid)]) ) ++ sid
base
opts
sid
map
(ключ, значение)
"ключ=значение&"
concat
concatMap = concat . map
For different tasks, the set of options is different, but in all cases you need to specify the user identifier (
uid
), therefore, in order not to write this option every time, we add it in the definition of this function. In order to somehow work with VKontakte, you must first log in. Then the server will give us cookies (cookies) and the session identifier (sid = session id). I did not use cookies, but sid is needed for almost any operation with obtaining / changing user data. The authentication address has a bunch of options, the purpose of which I did not understand, but I took from the documentation and nothing works without them. We form this address using the function just written
> login :: IO String
> login = do
> (_,headers) <- curlGetString authUrl [CurlHeader True]
> return ( headers =~ "sid=[a-z0-9]*" :: String )
> where
> authUrl = formUrl "http://login.userapi.com/auth?"
> [("site","2"), ("fccode","0"),
> ("fcsid","0"), ("login","force"),
> ("email",email), ("pass",pass)] ""
formUrl
, while in the last two options our email and password are inserted. But the sid parameter remains empty - we don’t have it yet, and actually for the sake of it we wrote a function login
. What happens in it: a curl request is sent to an address
authUrl
that returns headers headers
(an option is set for this CurlHeader
). They actually have cookies, a redirect address, and something else. Here is the address where the server sends us, and what we are looking for is hidden. Using the secret regex technique, the headers
coveted session id of the form “sid = 35dfe55b09b599c9fx622fcx8cd83a37” is ripped out. I will not dwell on regular expressions in haskell - this is a separate topic. We can assume that this is just a search for a substring of the desired type.
Wonderful! We got sid, now we have all the possibilities of api in contact. For our task, only one is needed - a change in status.
In principle, any interaction with VKontakte will be free to the following command:
(_,answer) <- curlGetString someUrl []
where
someUrl
- the appropriate request (see the documentation), and answer
- the server response. Here's what the status change request looks like:
Note that the third parameter of the function is , is not specified. This is a partial application - the function has 3 parameters, and we gave only 2, which means we got a function from the remaining one parameter. That is , a function not only of a parameter (actually a new status), but also of a second parameter , which, as it were, is added on the right.> setActivityUrl :: String -> String -> String
> setActivityUrl text = formUrl "http://userapi.com/data?" [("act", "set_activity"), ("text", text)]
formUrl
sid
setActivityUrl
text
sid
One more trifle: there will be spaces in the tweet text, and this is unacceptable for a url request. Therefore, we will make a simple function that replaces all spaces with% 20: It splits the string into a list of words, inserts the string "% 20" between adjacent elements of this list, and then glues everything again into one string (the function performs the last two actions ). Now we can collect from the already discussed parts, the function of changing the status: It would be possible to write this function and simpler, in one line: But the first option is clearer, there the server response is checked - if the answer contains , then everything is fine - the status has changed, about which we inform the user (that is, that is). All! Now we have all the parts of the mosaic and assembling it is very simple. main
> escSpaces = intercalate "%20" . words
intercalate
> setStatus :: String -> String -> IO ()
> setStatus text sid = do
> (_,answer) <- curlGetString url []
> if answer =~ "\"ok\":1" :: Bool
> then putStrLn text
> else error "something is bad with vkontakte-api..."
> where
> url = setActivityUrl (escSpaces text) sid
setStatus text sid = curlGetString (setActivityUrl (escSpaces text) sid) []
"ok":1
What all these functions were written for: Looks extremely simple, doesn't it? Here comments are superfluous. I think that all other functions look quite clear with my explanations. Statistics for the sake of ~ 40 LinesOfCode. Conclusion To run this code, you must, as already mentioned, simply save the entire post in a file with the extension .lhs and type in the console: That's all. I don’t know if a sequel on how to automate this launch is needed. For myself, I (as a Mac OS X user) decided this by creating a “Service” in Automator and assigning a hotkey, to quickly call it up is only automating the launch, but for me this is enough.
> main = do
> tweet <- getTweet
> sid <- login
> setStatus tweet sid
runhaskell имя_файла.lhs
Hope this was interesting to anyone to read. Waiting for your questions / suggestions / objections (:
upd: moved to a thematic blog.