|
|
In the following example we will find dubious spellings
of City value in the address list. In our address table we store City
values in two languages: English and Italian. Time came to check if translation
is always the same. Cause if it is not, we will have to make corrections.
The source dataset looks like this:
| id | City_EN | City_IT | Street |
| 1 | Milan | Milano | Via XX Settembre |
| 2 | Turin | Torino | Corso Vittorio Emanuele |
| 3 | Rome | Roma | Via Falci |
| 4 | Milan | Milan | Via Napoleone |
| 5 | Rome | Rome | Via Piovesi |
| 6 | Milan | Milano | Piazza Partigiani |
| ... | ... | ... | ... |
First we will group all possible combinations of ENG/ITA city names pairs:
SELECT City_EN, City_IT
FROM Address_Table
GROUP BY City_EN, City_IT
|
And from the first glance can say that there is lack of uniformity. While
we wish to have single possible translation of English to Italian:
| City_EN | City_IT |
| Milan | Milan |
| Milan | Milano |
| Rome | Roma |
| Rome | Rome |
| Turin | Torino |
At the next step we will identify English values that was translated to
Italian in many ways:
SELECT City_EN, count(*) AS cn
FROM
(
  SELECT City_EN, City_IT
  FROM Address_Table
  GROUP BY City_EN, City_IT
) AS a
GROUP BY City_EN HAVING count(*) > 1
|
Recieving something like this:
Finally we will find all Address records that need to be revised and possibly
be fixed:
SELECT c.*
FROM Address_Table AS c INNER JOIN
(
  SELECT City_EN, count(*) AS cn
  FROM
  (
    SELECT City_EN, City_IT
    FROM Address_Table
    GROUP BY City_EN, City_IT
  ) AS a
  GROUP BY City_EN HAVING count(*) > 1
) AS b ON c.City_EN = b.City_EN
ORDER BY c.City_EN, c.City_IT;
|
And here is our final dataset:
| id | City_EN | City_IT | Street |
| 4 | Milan | Milan | Via Napoleone |
| 1 | Milan | Milano | Via XX Settembre |
| 6 | Milan | Milano | Piazza Partigiani |
| 3 | Rome | Roma | Via Falci |
| 5 | Rome | Rome | Via Piovesi |
sqlexamples.info
|