Data cleansing is the process of detecting and correcting (or removing) corrupt or inaccurate data from a data set, a table or a database.
It refers to identifying incomplete, incorrect, inaccurate, irrelevant, etc. parts of the data and then replacing, modifying, or deleting these parts. In PetroVisor, cleansing rules can be used when loading new data into the system like in the P# Cleansing module which is described here, or to check if manually edited values in the Value editor are correct. Data Cleansing rules are compiled in PetroVisor's domain-specific scripting language P#.
PetroVisor uses a semantic layer as generic database interface. The semantic layer represents a Petroleum Engineering domain specific data model that enables the proper management of the acquired data. This interface, called dictionary, assigns a unique name and a physical measurement time to every signal that is acquired and stored from a data source. This enables the tool to acknowledge the physical type of the signals it is acquiring and processing and therefore the PetroVisor tool is aware of the physical type of the measurement at all times.
The implications of using the semantic data model are:
- Only physically meaningful calculations are allowed (i.e. tubing head pressure cannot be subtracted from oil rate)
- Data can be retrieved and calculations can be performed in any valid unit for any given measurement type (i.e. tubing head pressure can be retrieved in psia or in Pa without the need to perform any manual conversion)
- Data cleansing is set up for a signal in this dictionary rather than for a specifically mapped tag. By doing so, cleansing rules stay valid even though the underlying tag mapping may change. This allows the setup of a generally valid set of data cleansing rules that can be extended as new sensors are installed.