Open Refine (previously Google Refine) is a data cleaning software that uses your web browser as an interface. This means it will look like it runs on the internet but all your data remains on your machine and you do not need internet connection to work with it.
The main aim of Refine is to help you exploring and cleaning your data before you use it further. It is built for large datasets – so don’t worry as long as your spreadsheets can hold the information, refine can as well.
Creating a new Project
To work with your data in Refine you need to start a new project.
Walkthrough: Creating a Refine project
You now have successfully created your first refine project. Remember: although it runs in a web-browser, the Refine server is still on your machine – all the data is there (so no worries if you handle sensitive information)
Sorting and Facetting
Once we created our project, let’s go and explore the data and the Refine interface a bit. Using Refine might be intimidating at first, since it seems so different from spreadsheets, once you get used to it you will notice how easily you can do things with it.
One of the commonly used functions in spreadsheets is sorting and filtering data – to figure out minima, maxima or things about certain categories. Refine can do the same thing.
Walkthrough: Sorting rows
The other frequently used function in Spreadsheets is filtering – in Refine this is called facetting. Facetting in Refine is really powerful – you will see in most of the rest of the Recipe we’ll use facets.
Walkthrough: Facetting rows based on a column
Reconciling Columns
Sometimes humans make mistakes when they enter data – they mistype city names or put in characters they can not see but the computer can. (For example, you can add a simple space at the end of a name and the computer will think they are different). For this let’s create a text facet for the cities:
Walkthrough: Reconciling Columns
Making city names look nice
Did you notice how most of the Cities are all uppercase? It’s rare to read them like this. And maybe you want to have nicer looking names: No problem. Refine supports this.
Walkthrough: Changing Case in Refine
- Let’s change the case in our city column from all uppercase to titlecase
- To do this, open the column options, go to edit cells -> common transforms -> to titlecase
- Tada – magically your names have been converted.
Congratulations! You successfully cleaned up a dataset using Refine!