データ品質に関するこれらの毎⽇のテストのおかげで、我々は今や重要なエラー要因を制御し、品質を維持することができます。製品として出荷するデータベースを構築するための⼀連のデータ・アセンブリにおいて、エラーが発⽣しないこと、⽣成後のデータを⼿直しする必要がないことを、実⾏前に知ることができるのです。Philippe Bobo, Software & IS Director
A data specialist
ETAI's advanced expertise and media mix apply to three industries: automotive, manufacturing, and retail. The company offers diverse information solutions to these industries: databases, software, conventions, trade shows, and magazines. With 400 employees, ETAI generates 60 million euros ($80 million) in revenue.
ETAI's business consists of producing and selling data. For example, for the automotive industry, ETAI collects raw data from suppliers (vehicles and parts manufacturers), consolidates and reconciles this data, and sells applications based on the newly created technical database. These products are provided as either DVD-ROM or Internet portal subscription to automotive dealers, repair shops, etc.
Complex data architecture
To build automotive industry applications, ETAI collects information related to vehicles (over 50,000), spare parts, parts equivalence, suitability of parts for vehicles, repair durations and techniques, costs, etc.
"This is a vast domain and there is no standard for data," explained Philippe Bobo, Director of Software and Information Systems at ETAI. "We not only get disparate data repository formats, but also many paper documents which then require hand-keying. We decided to keep data entry applications in the most efficient formats to increase data entry productivity and reduce the cost of updates."
As a result, each time a new version of a database product is released (over a hundred times a year) all these repositories need to be processed which entails data consolidation, reconciliation, and cleansing. "This takes a lot of time and resources, and there are always data quality challenges," recalls Philippe Bobo. "We used to run numerous programs in Access, Java, Python, PL/SQL, etc. Also, many steps in the process are manual and require domain expertise."
Conversely, in ETAI's other businesses (directories, magazines, trade shows, etc.) it made sense to consolidate non-technical repositories so as to leverage data models that are similar between verticals.
"We looked for a data integration solution that could not only automate the assembly processes for the automotive technical databases, but also merge all data management systems for ETAI's other businesses," explained Philippe Bobo. "It was critical for us to concentrate our efforts around a single tool that would meet all our needs. Down the road, even our internal systems will be using the same data integration environment."
Advanced market research
"Our tool research was very methodical. With the help of consultants from Apsia we started by analyzing the company's integration processes, both manual and automated," recalled Philippe Bobo. "Subsequently, several products that conformed to our primary criteria-especially the applicability to all of our businesses, and the ability to process the high data volumes of the automotive databases were selected for a pilot project."
Various scenarios were presented to vendors whose solutions were compatible with ETAI's budget constraints. "Compared to proprietary solutions, Talend Open Studio for Data Integration offers greater flexibility," confirmed Philippe Bobo. "The tool handled our prototype very well. This is partly because it's based on industry standard languages and component libraries are readily available; this accelerates team ramp-up time."
Talend Open Studio for Data Integration was also the fastest-it processed 2 million records 30% to 50% more quickly than the other contenders and the existing custom program.
Beyond the complexity of consolidating and reconciling data, the heterogeneity of data sources represented an important challenge. ETAI's data sources include MySQL, DB2/400, Access, SQL Server, Oracle, Excel, XML, etc. Manufacturers' data usually comes in complex flat files. The technical database is hosted on Oracle, but is deployed on MySQL, and the company's back-end systems run on AS/400. A broad connectivity palette was very important.
Selecting the industry's first open source data integration solution
After each vendor had built a prototype, ETAI compared the results and weighted each criteria based on the risk it presented for the success of the projects. Critical factors were the performance and feature set of each tool, the robustness of the vendor, its geographical location, and its ability to assist the ETAI teams long-term. Talend's solution, which best met ETAI's criteria, was selected.
"The price of the solution was also a factor," explained Philippe Bobo. "For budgetary reasons, the traditional licensing model would have slowed down the deployment of the solution on our seven sites. Talend's Open Source model does not cancel all costs but it reduces them significantly, especially in the deployment phase. It also makes them easier to predict."
The first project that ETAI is building with Talend Open Studio for Data Integration simulates the daily consolidation of all data entry repositories. This simulation makes it possible to detect and fix data discrepancies immediately. "Thanks to these daily tests we can now maintain control of the quality of our data. When we launch the assembly chain for the shippable database, we know it will run without error and we won't need to edit the consolidated data."
To get the most out of Talend Open Studio for Data Integration, ETAI elected to use Talend's service offerings. IT teams attended advanced training sessions, and ETAI also subscribed to a Gold Technical Support contract with guaranteed response times. An expert Talend consultant visits ETAI on a regular basis to provide assistance to the development teams.
"The support Talend provides is excellent, confirming the importance we initially attributed to this factor," indicated Philippe Bobo. "The advanced expertise and the professionalism of the consultants we have worked with are an important asset."