Many questionnaires have a question “Please write the main business activity of the organisation where you work”. The answer is commonly asked as an open text field, challenging the survey holder to code the response into an industry classification. Alternatively, in web surveys respondents can self-identify their industry from a database. provides a look-up database allowing survey respondents to identify their industry from a list of 321 entries, coded 3-4 - digit NACE2.0, in 47 languages.  The database can be searched using an autosuggest box or a search tree

This work builds on previous work on the industry question in the WageIndicator web survey where a database and a search tree was developed, all coded according to the European Community NACE Statistical Classification of Economic Activities, commonly referred to as NACE Rev. 2.0. WageIndicator did not aim for a question with a short aggregated list, because of the related aggregation bias whereby respondents may classify their detailed industry into different aggregated categories. Therefore, a database of industry names and a two-level search tree was developed, offering 340 industry categories to the survey respondents. The database has English and national industry names for all 99 countries in 47 languages. The national labels are shown to the survey respondent. The survey questions and answers for the measurement of industry, including their translations, can be found in the excel files below.

The WageIndicator web survey shows that respondents tend to skip the question about industry relatively often, presumably because they judge answering the question as cognitively too demanding. Therefore, SERISS Deliverable D8.11 an occupation>industry prediction has been developed, providing survey respondents with a limited set of industries, most likely for their occupation. Of course, the limited list of industries, shown to the respondent, always includes an option ‘other’, with the full look-up table shown in the next step. This prediction is based on regression models, using a pooled database with more than 2 million observations for occupation and industry.

Note that the codes correspond fully to the 3 or 4 digit codes of NACE2.0. For any database current Internet technologies allow for an API (Application Programming Interface) which means that the database can also be used offline, for example in tablets.


Relevant materials:

The PDF file ‘Explanatory note about the Industry Database - SERISS D8.10’ explains the industry database and details the principles underlying the database.

The Excel file ‘Database of NACE-coded industries for 99 countries - SERISS D8.10’ contains the source list of industries and their codes and translations, the structure of the search tree for online browsing of the database, a mapping table of the codes into the ISIC classification and a label set for all codes included in the database. The database is used in the live-search. Survey holders can use the database to prepare an API or search tree for the occupation question in their surveys.

The PDF ‘Explanatory note about the occupation>industry prediction - SERISS D8.11’ explains how the occupation>industry predictions have been done.

The Excel file 'Database of the occupation>industry predictions - SERISS D8.11' holds the predictions of the most likely industries (NACE Rev 2 classification at 2-digit) for all 4-digit ISCO-08 occupations.