dustalov October 1, 2012 at 08:37

NLPub - catalog of linguistic solutions

I want to introduce NLPub , a small knowledge base dedicated to computer linguistics in Russia.

Now it’s impossible to surprise anyone with devices and applications that can understand and speak the human language. Such applications are based on natural language processing methods that form a general direction at the intersection of linguistics and artificial intelligence.

Why the vast majority of devices, applications and services do not work with Russian?

I often have to repeat this, but the reason is simple and tragic. The fact is that solving natural language processing problems involves the use of specialized programs - analyzers, which are in dire need of information resources - dictionaries, corps, thesauri, thanks to which they are able to perform their function.

All of this is practically absent in Russia, which paralyzes the work of commercial enterprises and academic groups, forcing them to reinvent the wheel or simply abandon linguistic technologies.

The most useful thing that can be done right away is to help interested people get used to it faster and start using the few available technologies that are available at the moment.

To do this, you need to compile a catalog of available software with a description of the functionality, write training materials, provide links to data, manuals and other information resources. That's why I created NLPub and I invite everyone to join in its development.

What information is collected through NLPub?

Particular attention is paid to the following topics:

text processing tools available for both commercial and non-commercial use - tokenizers, morphological analyzers, parsers, tonality analysis tools;
resources - dictionaries, thesauruses, text corps, necessary for solving fundamental and applied problems;
events - thematic conferences and seminars for researchers and developers;
education - educational institutions and professional retraining courses in the field of natural language processing and data analysis.

How can the project be helped?

I see three available ways:

to replenish the knowledge base , providing readers with high-quality, correct and relevant material on the situation in domestic computer linguistics;
correct inaccuracies made in the process of compiling and developing the knowledge base;
talk about NLPub in various thematic communities, increasing public interest in the field of natural language processing (at least write a blog about it how I did it ).

Who does this belong to?

NLPub is a non-profit project and is not affiliated with commercial companies. This in no way closes the way into it for commercial companies. On the contrary, posting information about their products is extremely welcome along with open and free solutions. Already today in the list of tools you can find many commercial products.

I will be happy to answer all questions and comments set forth both in the comments here and through more private communication channels.

Tags:

NLPub - catalog of linguistic solutions

What information is collected through NLPub?

How can the project be helped?

Who does this belong to?

Also popular now: