Carvago

Grabbing and categorizing 4.5 million car classifieds from all over Europe every day thanks to Revolt.BI.

The Czech company Carvago runs a very successful online marketplace for used cars from all over Europe. Til now it offers around 1 million cars to customers from the Czech Republic, Slovakia, and Germany, with other European countries to follow in the near future.

Call

In order to achieve these goals, each day Carvago needs to find used cars on offer all across Europe and include these in their offer.

To ensure their customers find the best deal for the car they want at any time, Carvago needs to constantly update its database with newly offered cars on all portals around Europe. Often their providers do not fill in the details of the car they are offering, or may place an advert on more than one portal at the same time.

So besides retrieving new listings, the system needs to detect and remove duplicates, correct incorrect information, and classify all offers on the basis of an exhaustive catalogue of car models, including key parameters such as motor and gearbox type, drivetrain, et al.

Analysis

The main requirements for the solution were as follows

  • Prepare a catalogue containing models covering 95+% of the European passenger car market, including key parameters such as engine, transmission type, fuel, drive train, number of seats/doors, etc.
  • Automatically updates catalogue via external web sources (mobile.de, cars-data.com, and the like)
  • Create a database with the current range of cars available on the European market
  • Deduplicate, accurately assign, and classify the collected advertisements based on the catalogue


We analyzed in detail the various existing sources of information on cars and advertisements, and identified:

  • 3000+ different car models
  • 250+ car brands
  • 85 main car parameters
  • 14 major servers with different data structures
  • 4.5 million ads added or updated every day
  • mostly no classification of the ads - just in the form of text and/or photos of the car

Interesting fact: 10 car models cover 37% of the market.

Solution

Revolt.BI's solution for Carvago comprises several components

  • Creation of a data warehouse for the catalogue and advertisements
  • Data retrieval
  • Data analysis
  • Business analytics

We chose Keboola as our Data Warehouse and DevOps platform, based on Keboola's excellent computing power, integration of all necessary services, diagnostics of all processes and many other advantages. For data storage we settled on Snowflake.

For automated photo analysis & image recognition we use deep learning - a convolutional neural network (CNN) - which is able to identify objects and many other types of features in an image and draw conclusions by analyzing them at low cost, thanks to a unique set of efficient algorithms and technologies. Our image analytics solution is also able to correct erroneous information - for example recognizing from the car's photo that it's a station wagon, even though the description states it as a VAN or MPV. We can even automatically detect the type of air conditioning from the interior photo!

Interesting fact: To train the neural network well for one model, 2000 photos are needed.


We handle business analytics using Tableau - no other visualization tool could handle so easily the many different views and aspects needed for Carvago's daily operations, not only for the company itself, but also for its business partners.

Result

By working with Revolt.BI, Carvago gained unique and always up-to-date data on European used cars, including relevant parameters and price of each car in question, along with analytical tools to assist in their business use.

Revolt.BI's business analytics enable Carvago's sales department and its customers to make data-driven decisions, such as targeting dealers according to the strength of their segments or making detailed comparisons of the cars on offer across ad servers.

Catalogue

  • 3000+ models of 250+ brands
  • Complete records of key parameters
  • Automatic checking and completion of unknown parameters such as body type, number of doors, engine capacity, transmission type, etc.
  • manual checking and modification of catalogue items as needed

Data retrieval

  • 4.5 million listings per day
  • 130 ad servers
  • deduplication
  • automatic data consistency ensured
  • possibility of manual control and correction
  • matching to catalogue entries
  • daily updates, selected data, e.g. auctions, can be updated in real time

Analytical tools

  • diagnostics of the data extraction process
  • tool for quick detection of errors, along with suspicious and/or poor quality listings
  • complete overview of the European market situation across regions, models, vehicle ages, price levels, and other parameters
  • Identification of attractive car offers (comprehensive assessment of model, age, equipment) that can be sold profitably, e.g. in other regions
  • a tool for correct pricing based on model, age, condition, and equipment
CZ flagUK flag