Data Entry: Please note that the research database will be replaced by UNIverse by the end of October 2023. Please enter your data into the system https://universe-intern.unibas.ch. Thanks

Login for users with Unibas email account...

Login for registered users without Unibas email account...

 
A framework for exploration and cleaning of environmental data : Tehran air quality data experience
JournalArticle (Originalarbeit in einer wissenschaftlichen Zeitschrift)
 
ID 2793183
Author(s) Shamsipour, Mansour; Farzadfar, Farshad; Gohari, Kimiya; Parsaeian, Mahboubeh; Amini, Hassan; Rabiei, Katayoun; Hassanvand, Mohammad Sadegh; Navidi, Iman; Fotouhi, Akbar; Naddafi, Kazem; Sarrafzadegan, Nizal; Mansouri, Anita; Mesdaghinia, Alireza; Larijani, Bagher; Yunesian, Masud
Author(s) at UniBasel Amini, Heresh
Year 2014
Title A framework for exploration and cleaning of environmental data : Tehran air quality data experience
Journal Archives of Iranian medicine
Volume 17
Number 12
Pages / Article-Number 821-829
Keywords Air pollution, air quality data management, EBD-NASBOD, Iran, outlier detection, Tehran
Abstract

Management and cleaning of large environmental monitored data sets is a specific challenge. In this article, the authors present a novel framework for exploring and cleaning large datasets. As a case study, we applied the method on air quality data of Tehran, Iran from 1996 to 2013. ; The framework consists of data acquisition [here, data of particulate matter with aerodynamic diameter ≤10 µm (PM10)], development of databases, initial descriptive analyses, removing inconsistent data with plausibility range, and detection of missing pattern. Additionally, we developed a novel tool entitled spatiotemporal screening tool (SST), which considers both spatial and temporal nature of data in process of outlier detection. We also evaluated the effect of dust storm in outlier detection phase.; The raw mean concentration of PM10 before implementation of algorithms was 88.96 µg/m3 for 1996-2013 in Tehran. After implementing the algorithms, in total, 5.7% of data points were recognized as unacceptable outliers, from which 69% data points were detected by SST and 1% data points were detected via dust storm algorithm. In addition, 29% of unacceptable outlier values were not in the PR.  The mean concentration of PM10 after implementation of algorithms was 88.41 µg/m3. However, the standard deviation was significantly decreased from 90.86 µg/m3 to 61.64 µg/m3 after implementation of the algorithms. There was no distinguishable significant pattern according to hour, day, month, and year in missing data.; We developed a novel framework for cleaning of large environmental monitored data, which can identify hidden patterns. We also presented a complete picture of PM10 from 1996 to 2013 in Tehran. Finally, we propose implementation of our framework on large spatiotemporal databases, especially in developing countries.

Publisher Acad. of Medical Sciences of I. R. Iran]
ISSN/ISBN 1029-2977
edoc-URL http://edoc.unibas.ch/dok/A6328998
Full Text on edoc Available
Digital Object Identifier DOI 0141712/AIM.008
PubMed ID http://www.ncbi.nlm.nih.gov/pubmed/25481321
ISI-Number WOS:000347754100006
Document type (ISI) Article
 
   

MCSS v5.8 PRO. 0.362 sec, queries - 0.000 sec ©Universität Basel  |  Impressum   |    
08/05/2024