Next Gen Ecommerce
When you open your online store, the owner usually does the following:
- I have access to a supplier of underpants, equipment ... (everyone inserts his own here), why don’t I open an online store, it's cool, I heard I can make a lot of money on the Internet, it’s promising and profitable.
Thus, there are thousands of sites selling the same products, cluttering up the Internet space more and more.
It turns out that another 1001 stores are selling underpants. Instead of money, as a rule, an entrepreneur gets a headache in the form of seo, schmeo and the disproportionate cost of contextual advertising.
The online store is bent before it appears.
I propose to go the other way.
Purpose (aka Theory):
Search for unoccupied niches for trading.
Ideal situation Demand - is, Offers - is not, Cheap contextual advertising.
So - we are looking for “gold”.
Let's talk about Web Data Mining - extracting data from the Internet, and then analyzing the data.
Baseline:
In my experiment to test the theory, I will build on WHAT Internet users are looking for in search engines.
At the moment, there are several sources for obtaining such data.
- Keyword databases collected from various sources (old databases can be found for free).
- Tips from search engines Yandex and Google.
- Yandex technology “Live” - shows real-time user requests.
Since obtaining data from search engines is a rather difficult task, for a start we’ll get by with a small base of 30 million phrases walking around the Internet.
Preparation of the initial data:
- For the subsequent analysis, we translate all phrases into lower case
- We clean the phrases from unnecessary characters (we are only interested in [a..Z] [a ... I] [0..9])
- We remove the mat and porn and others “ stop ”words like“ free ”,“ download ”,“ torrent ”.
After that, the base is cut by about 30%.
Required data:
So, we are interested in the parameters characterizing supply and demand.
Sources:
- Yandex.Direct API (Budget forecast: CreateNewForecast, GetForecast)
(free, without restrictions )
- Google Adwords API (trafficEstimatorService Forecast)
(using the API for money)
- Yandex.Wordstat (http://wordstat.yandex.ru/)
(free, unstable, IP bans quickly with a large number of requests)
- * Yandex. Demand (http : //direct.yandex.ru/spros)
(new service, the banyat is not so fast, it works more stable)
- * Search by Yandex.Direct (http://direct.yandex.ru/search)
(you can get the number of ads from here by keyword, no ban was noticed)
The services that I used to test my theory are marked with asterisks.
Data Collection:
Stage 1.
Since the collection process through the API is long and resource-intensive, we first use the Yandex.Direct search. Each phrase is matched by the number of ads.
Here the first pitfall got out. The number of ads depends on the time of day.
Therefore, we will have to go through our database 2 times.
The first time is a round-the-clock gathering.
The second - according to the resulting sample (ads <1) from 9 am to 6 pm.
Stage 2.
Having a list of phrases with the number of ads 0 and 1, we get the number of phrase requests in search engines. The number of phrases at the beginning of stage 2 is 10% of the initial volume.
We will parallelize the collection of information through lists of proxy servers, for which a proxy search and ranking system was written with signs of connection speed and ban.
Result:
The theory was confirmed. There are vacant niches, and in completely different areas! The experiment is still ongoing.
(Proof: sapper blades)
But:
- At the exit I received a lot of garbage, which I had to look through manually, extracting monetized requests from the list.
- The list of stop words has been significantly expanded, and I could not imagine what kind of nastiness the net users are looking for.
- For more automation of the process, you need to add additional filters (I don’t know which ones yet), but at least a classifier.
- Fasten the analysis of Direct and Adwords bids.
- Build your own base through the “Live” Yandex.
- Get finally PROFFIT :)
- I have access to a supplier of underpants, equipment ... (everyone inserts his own here), why don’t I open an online store, it's cool, I heard I can make a lot of money on the Internet, it’s promising and profitable.
Thus, there are thousands of sites selling the same products, cluttering up the Internet space more and more.
It turns out that another 1001 stores are selling underpants. Instead of money, as a rule, an entrepreneur gets a headache in the form of seo, schmeo and the disproportionate cost of contextual advertising.
The online store is bent before it appears.
I propose to go the other way.
Purpose (aka Theory):
Search for unoccupied niches for trading.
Ideal situation Demand - is, Offers - is not, Cheap contextual advertising.
So - we are looking for “gold”.
Let's talk about Web Data Mining - extracting data from the Internet, and then analyzing the data.
Baseline:
In my experiment to test the theory, I will build on WHAT Internet users are looking for in search engines.
At the moment, there are several sources for obtaining such data.
- Keyword databases collected from various sources (old databases can be found for free).
- Tips from search engines Yandex and Google.
- Yandex technology “Live” - shows real-time user requests.
Since obtaining data from search engines is a rather difficult task, for a start we’ll get by with a small base of 30 million phrases walking around the Internet.
Preparation of the initial data:
- For the subsequent analysis, we translate all phrases into lower case
- We clean the phrases from unnecessary characters (we are only interested in [a..Z] [a ... I] [0..9])
- We remove the mat and porn and others “ stop ”words like“ free ”,“ download ”,“ torrent ”.
After that, the base is cut by about 30%.
Required data:
So, we are interested in the parameters characterizing supply and demand.
Sources:
- Yandex.Direct API (Budget forecast: CreateNewForecast, GetForecast)
(free, without restrictions )
- Google Adwords API (trafficEstimatorService Forecast)
(using the API for money)
- Yandex.Wordstat (http://wordstat.yandex.ru/)
(free, unstable, IP bans quickly with a large number of requests)
- * Yandex. Demand (http : //direct.yandex.ru/spros)
(new service, the banyat is not so fast, it works more stable)
- * Search by Yandex.Direct (http://direct.yandex.ru/search)
(you can get the number of ads from here by keyword, no ban was noticed)
The services that I used to test my theory are marked with asterisks.
Data Collection:
Stage 1.
Since the collection process through the API is long and resource-intensive, we first use the Yandex.Direct search. Each phrase is matched by the number of ads.
Here the first pitfall got out. The number of ads depends on the time of day.
Therefore, we will have to go through our database 2 times.
The first time is a round-the-clock gathering.
The second - according to the resulting sample (ads <1) from 9 am to 6 pm.
Stage 2.
Having a list of phrases with the number of ads 0 and 1, we get the number of phrase requests in search engines. The number of phrases at the beginning of stage 2 is 10% of the initial volume.
We will parallelize the collection of information through lists of proxy servers, for which a proxy search and ranking system was written with signs of connection speed and ban.
Result:
The theory was confirmed. There are vacant niches, and in completely different areas! The experiment is still ongoing.
(Proof: sapper blades)
But:
- At the exit I received a lot of garbage, which I had to look through manually, extracting monetized requests from the list.
- The list of stop words has been significantly expanded, and I could not imagine what kind of nastiness the net users are looking for.
- For more automation of the process, you need to add additional filters (I don’t know which ones yet), but at least a classifier.
- Fasten the analysis of Direct and Adwords bids.
- Build your own base through the “Live” Yandex.
- Get finally PROFFIT :)