Comparative analysis of markets used. Cars of Germany and France in the B and C segment

Published on December 17, 2018

Comparative analysis of markets used. Cars of Germany and France in the B and C segment

Hi, Habr!

In this post, I would like to share the experience of using several python tools for a comparative analysis of the used car market in Europe using the example of Germany and France.

image

From the point of view of implementation, everything will be enough vanilla. The zoo of used technologies is somewhat redundant, because it was an end in itself — I wanted to test it.

Prehistory The idea to conduct such an analysis visited me once when I was browsing through the site of French ads. I clearly had the feeling that, in the Aboriginal view, the Citroën C3 was an extremely miserable car, which was getting cheaper by 20 per cent per year, but the Opel Corsa was almost immortal, so its price does not change with age (or even Rolex). It was a funny observation, given that in the case of the Germans Kors, an adult was considered to be a sign of a fiasco in professional implementation. And even jokes went about the fact that after 10 years, any car turns into an Opel. In the absence of Western Germans, Trabant, the role of Zaporozhets in German folklore was performed by this brand. C3, as far as I know, is not particularly represented in Russia, but it is useful to know about it that this is the spiritual heir of the classic 2CV from the title picture,

So, get down to business. In order to make statistics, you need a source with sufficient sampling. In Germany, this is autoscout24.de in France, the best that I came across according to the number of ads - www.lacentrale.fr (the French version of the scout exists, but less popular).

We will make the selection according to the B-class as the most widespread one and in which the initial question arose (with a slight call into the C-class). To collect data, select those models for which there is data for the past 10 years, and for which there are at least 1000 ads on both sites. Otherwise, the sample will be very noisy.

If you want to get datamining of a healthy person, autoscout offers a RESTful API if you register and receive a key.

I, of course, didn’t do this, and I started mining the smoker — head-on through a headless browser.

Mining the smoker looked like this - based on the desired filters (about which below) we form the query string, pass it as an address to the headless browser PhantomJS (since I installed it, its development was for some reason suspended).

From the unobvious at this stage - I added the expectation of loading the element with a characteristic class, which contained data on the number of options to the corresponding query.

with io.open("dump.html", "w", encoding="utf-8") as f:
    f.write(html)
try:
    element = WebDriverWait(browser, 20).until(
        EC.presence_of_element_located((By.CLASS_NAME, "cl-filters-summary-counter"))
    )  # wait until element with summry statistics is present or drop after 20 sec

The html generated in the browser is fed to the BeautifulSoup parser and we are looking for an element in it containing the number of the results found.

value = bsObj.findAll("span", {"class": "cl-filters-summary-counter"})[0].text
value = value.replace(u'\xa0', u' ')  # removes delimeter if results exceed 1000

As you might guess, the generated request included a model, a small price range and age, that is, there were a lot of requests, with the number of requests exceeding 500 servers breaking the connection. If you want to resolve the issue beautifully, it is better to contact such services through a proxy that is changed along the way (and if you have an API key, then you don’t need it). On Habré there is a smart article on how to work through a proxy. I solved the issue of low technology - the price should be close to the linear law with age, so I figured out the scope of this line and limited the requests only to the zone + .- 30-50 percent of the linear price, and entered 10 second pauses between the requests. That was enough for me not to get banned. Data collection for one model took about half an hour of silent blinking in the console.

A few words about filters. To get an informative sample, I applied the following filters:

  • All cars are taken in a 4-5 door set (Kors, Clio and several other models have cheaper 3 door versions).
  • All cars come from 4-5 seats - this is a separate characteristic from the previous one, because Clio, C3 and 308 have a version of Société (Entreprise) and quite a lot of it on the market - there are all kinds of electricians and fitters in the market; for junk and they cost less by a couple of kiloevoors, all other things being equal.
  • Engine power is limited to 129 hp Because further in this segment begin charged versions and prices there are not at all consistent with their modest status. If you repeat the ahtung - the auto scooter in the address uses power not in horsepower, but in kilowatts.

Engines and gearboxes are not specialized, but in 99% this is a mechanic, and in the case of France, there is a lot of diesel there, simply because it was promoted there for 10 years to dieselgate. And now they want to transplant everyone from him, from the chagrin of the fickleness of government morals to the French, for the second week they indulge in two favorite national pastimes - the strike and the burning of the cars.

The data is collected, spread out along the sheets in Excel files (which datasens without Excel, are you! Thank you openpyxl - for the convenience of shifting). For visualization, all data for all models are combined into one CSV file.

For visualization, I wanted to file a simple web interface. In principle, the current version does not require a backend. Little data, processing - rudimentary, you can throw out all the long JSON along with the graphics and process on the client side. But I wanted to test the server with a view later to use for tasks with not such trivial calculations. And I do not know how to JS, so the client side would have to suffer, and the server can be washed down on the python, the benefit is the tools.

To implement the server, I was tormented between bokeh, which I used to tinker with earlier and a bunch of Plotly + Dash. In a number of past tasks, I was very pleased with bokeh, especially the fact that it can be embedded in Jupyter Notebook (with Jupyter Labs is not so simple) and the fact that it is quite easy to organize inside notebooksinteractive components without starting the bokeh server (). Bokeh is the gateway to the world of d3.js for those who do not know how in JS.

For this task, I decided to use a bunch of Plotly + Dash (the last is the gate to the world of React for those who do not know how to JS). Selection rather with a view to try. As you can see from the comparison , the difference is not fundamental.

We proceed to the interface implementation.

We pull up our CSV, shove a couple of data frames.

In order to properly stylize the page and use responsive design, we allow local CSS.

app = dash.Dash(__name__, static_folder='assets')  # resource folder
app.scripts.config.serve_locally = True
app.css.config.serve_locally = True

Next, create a simple layout using one table of two columns of 6 out of 12 standard for adaptive layouts.

app.layout = html.Div([
    # include custom local css to allow two-column responsive
    html.Link(href='/assets/twocolumns_dash.css', rel='stylesheet'),
    html.Div([  # row div
        html.Div([  # column div
            html.H3('Average'),
            dcc.Graph(id='market-app', ),
            html.H4('Select model'),
            dcc.Dropdown(id='model_pick', options=model_options, value=None, multi=True)
        ], className="six columns"),
        html.Div([  # column div
            html.H3('Distribution'),
            dcc.Graph(id='market-app2', ),
            html.H4('Select year'),
            dcc.Slider(
                id='year-slider',
                min=years.min(),
                max=years.max(),
                value=years.min(),
                step=None,
                marks={str(year): str(year) for year in years}
            )
        ], className="six columns"),
    ], className="row")
])

The controls are implemented quite unpretentiously, of the features - there can be many inputs, and only one way out (for example, the left graph takes data from the drop-down menu and from the slider of the year, but only one element can update, this is a Dash feature, you will need crutches to crawl).

@app.callback(Output('market-app2', 'figure'),
              [Input('model_pick', 'value'), Input('year-slider', 'value')])
def update_figure_dist(selected_models, year_picked):
    traces = []
    for model in selected_models:
        traces.append(go.Bar(
            x=df_filtered.loc[model, year_picked, :].index.values.tolist(),
            y=df_filtered.loc[model, year_picked, :]['results'].values.tolist(),
            name=model
        ))
    return {
        'data': traces,
        'layout': go.Layout(
            xaxis={'title': 'price'},
            yaxis={'title': 'offers'},
            hovermode='closest',
            legend=dict(orientation="h", xanchor="center", y=1.2, x=0.5)
        )
    }

The ready interface is http://eu-carmarket.herokuapp.com/

For German data, the brackets are (DE) for France (FR).

On the left we see prices for the entire period averaged over each year. Hereinafter, all prices in euros, according to the rules for placing on sites - prices with VAT. On the right is the distribution of offers by price in the selected year. The distribution was noisy for many models, so when building it is smoothed over the closest neighbors with a core of 5 elements (looking at this, I decided not to take models with less than 1000 ads)

So what do we see in the data?

Answering a question motivated research - no, Citroen does not depreciate at a terrible speed compared to timeless Korsoi.

image

The sharp discrepancy in price in 2017 is not the first strong depreciation of Citroens in the first year. This is actually their rise in price with the transition to a new generation. Now, instead of putting the old Citroen 2CV ommazh, they are more similar to the analogue of the Mini, with trendy crossover options - stylish, fashionable youth.

If you take one model, the difference in the German and French markets is shocking because there is none, neither in values ​​nor in the rate of depreciation (although an unbroken car older than 2 years in France is found only if it is stored with a locked garage).

image

image

image

If you take the market popular with Russian IT specialists for the Holland relocation, then you need to throw 20 percent to the German price. While there are no customs borders between countries in Europe, it’s impossible to transport a purchased car if it doesn’t work as personal belongings that move with you. This requires that you live in a previous country for more than six months. Otherwise, all of the emerging benefits will be removed from you when you try to put the car on registration.

If you compare the model with a large pile, you can see several interesting observations. After 10 years, everything converges to about one point, though the depreciation of Peugeot, Mazda or Seat is stronger than Volkswagen Polo, Opel Meriva or Skoda Fabia. So yes, after 10 years any car becomes an Opel, but not a Korsoi, Korsoi are only favorites like C3.

image

The depreciation rate does not significantly depend on the model. And from the country. Small deviations from the universal linear monotony (for example, Renault Megane
in 2016, Ford Fiesta in 2017) are just a change in the generation of the model.

image

image

Since the depreciation in this case is not a physical concept, it does not mean that the machines will be in one state. The French will have crumpled sides, abrasions, chipped bumpers, mirrors strapped on adhesive tape, and a legacy of thousands of kilometers away with a checkered game. But the French are convinced that the degree of wear is the same as that of the Germans and is paid according to their convictions. But to persuade the Germans to buy a used. car from the caring hands of French drivers and mechanics - whether they will be able to lie.

Concerning prices in comparison with Russia. There is some problem with the fact that many of the models described in Russia are not sold. Of the little in common for both markets can be found on the Volkswagen website the new Polo (although in Russia it is a sedan, and in Europe it is a hatchback). In Russia, it costs from 8,300 new, in Germany it is new from 13,500, and for 8,300 it will be 2012 (in the not-for-mentioned Holland table - 1,6200 new). A good comparison can be presented everywhere Kia Rio: Russia - 9000, Germany - 11950 (explicit dumping against German autopatriotism), France - 13700, Holland - 19950 (Ja-ja, one half louma for Kiryusha at the price of Volkswagen Tiguan / Hyundai Tussan / Nissan X -trail: hug, cry and remember how to pedal on a bicycle).