Event Prediction and Data Mining - Forward to the Future



    An interesting open source information monitoring service, Recorded Future, has appeared on the Web .

    It allows you to accumulate information from more than 150,000 different media with the ability to store the archive for up to 5 years with the possibility of subsequent analysis and extraction of knowledge about the possible consequences of what happened and future events.

    The author of the service is Chris Holden, who kindly offered us to use Recorded Future without making a payment, although full functionality is available only on a commercial basis.

    For example, now the service carries out continuous monitoringmore than 8,000 political leaders of various countries of the world, allowing you to track where and why any famous figure will go. Sometimes, good analytics of these events allows us to establish relationships in international relations and to predict the most probable models of their development by analyzing the travel history of the chosen person.

    The most interesting cases demonstrating the capabilities of the system are reflected in the following applied examples:

    - tracking emerging cyber threats and hackers' actions in the world
    - analyzing the contents of letters from Osama Bin-Laden’s circle of friends
    - analyzing protest activity
    - analyzing elections in Greece and Egypt

    Recorded Future in Action

    The use of the service has wider boundaries than use for the purpose of analyzing the geopolitical situation, terrorism and protest activity. It is successfully used for monitoring corporate news, information on competing companies, their products and the mechanisms for their coverage in the press.

    Analytics allows you to track events associated with the emergence of any new technology, the conclusion of contracts, the change of members of the board of directors or key persons of the company, which is already a very powerful and convenient analytical tool with the ability to assess emotional coloring ("positive", "negative") :

    Futures - “What Apple has outlined for 2012/2013”



    The service offers a paid API ( http://code.google.com/p/recordedfuture/wiki/RecordedFutureAPI ) that allows you to flexibly set labels for tracking according to specified criteria, including geography:

    Forecast of protest activity for August 2012 in relation to the Russian Federation



    Request creation example (Python):

    import urllib, json, datetime, zlib, sys, time
    def query(q, usecompression=True):
    	"""
            Результатом выполнения запроса будет являться JSON-объект
    	"""
    	try:
    		url = 'http://api.recordedfuture.com/ws/rfq/instances?%s'
    		if usecompression:
    			url = url + '&compress=1'
    		for i in range(3):
    			try:
    				data = urllib.urlopen(url % urllib.urlencode({"q":q}))
    				if type(data) != str:
    					data = data.read()
    				if usecompression:
    					data = zlib.decompress(data)
    				break
    			except:
    				print >>sys.stderr, "Retrying failed API call."
    				time.sleep(1)
                    res = json.loads(data)
                    if res['status'] != "SUCCESS":
                            print >>sys.stderr, "Error",str(res['errors'])
    		return res
    	except Exception, e:
    		print str(e)
    		return {'status': 'FAILURE', 'errors': str(e)}
    


    The idea used in the service is very simple - dates are allocated from all sources in different notations (numerical, symbolic), after which the events that are assigned to them are recorded. At the same time, it is analyzed exactly when this event will happen (“soon”, “in a few months”, “in the distant future”). Service continuously send updates on the most interesting areas to track:



    Use prepared Class on the Python:

    python company-entquery.py MYTOKEN tickerfile.txt 2010-06-14 2010-06-20 > entoutputfile.txt,
    python company-aggquery.py MYTOKEN tickerfile.txt 2010-06-14 2010-06-20 > aggrawoutputfile.txt
    where:

    MYTOKEN - resulting hash to access the API;
    tickerfile.txt is a special file whose directives point to the media and resources that need to be analyzed.

    The summary report will be a conclusion of the form:

    Ticker,Entity,Time,Count,Momentum,Positive,Negative
    MSFT,33312449,2011-11-01 19:30:00,780,0.43689,0.062,0.00461
    GOOG,33321272,2011-11-01 19:30:00,1707,0.72436,0.07052,0.0254
    AMZN,33328212,2011-11-01 19:30:00,344,0.20139,0.05491,0.01374
    CHK,33511577,2011-11-01 19:30:00,6,0.00817,0,0
    MSFT,33312449,2011-11-02 19:30:00,1235,0.4538,0.04981,0.0137
    GOOG,33321272,2011-11-02 19:30:00,2602,0.80317,0.06482,0.02282
    AMZN,33328212,2011-11-02 19:30:00,619,0.22222,0.06884,0.00787
    CHK,33511577,2011-11-02 19:30:00,45,0.02334,0,0.02581


    The processing of this information rests with the programmer, with the exception of evaluating "positive" and "negative." The use of such a resource allows you to create a fairly powerful and effective competitive analysis tool and be used for BI purposes.

    Also popular now: