ORIGINAL RESEARCH
Multivariate Analysis for Characterization of
Air Pollution Sources: Part 1 Prior Data Screening
and Underlying Assumptions
More details
Hide details
1
Faculty of Public and Environmental Health, Department of Environmental Health & Environmental Studies,
University of Khartoum, Khartoum, 205, Sudan
2
College of Health Sciences, Department of Public Health, Saudi Electronic University, Riyadh, 11673,
Kingdom of Saudi Arabia
3
International Joint Research Center for Persistent Toxic Substances (IJRC-PTS), State Key Laboratory of Urban Water
Resource and Environment, School of Municipal and Environmental Engineering, Harbin Institute of Technology,
Harbin 150090, China
Submission date: 2023-10-02
Final revision date: 2023-12-05
Acceptance date: 2024-01-11
Online publication date: 2024-04-18
Publication date: 2024-05-23
Corresponding author
Mohammed O.A. Mohammed
Faculty of Public and Environmental Health, Department of Environmental Health & Environmental Studies,
University of Khartoum, Khartoum, 205, Sudan
Pol. J. Environ. Stud. 2024;33(4):4257-4271
KEYWORDS
TOPICS
ABSTRACT
There is a real need for comparability and consistency of findings obtained from different multivariate
methods, based on different assumptions and sensitivity to data errors. This study aims to investigate
essential aspects of data screening prior to analysis, particularly the detection of outliers, communalities,
multicollinearity, and Kaiser-Meyer-Olkin (KMO) and Bartlett’s tests, and to examine the influence of
changing test parameters such as the number of convergence, number of bootstrap runs, FPEAK value, and
minimum value of coefficient of determination (R2) on model results. Positive matrix factorization (PMF)
and Unmix were applied to monitoring data collected from a receptor site. Findings of communalities
estimate and multicollinearity indicated possible data errors in Ca, Cu, Na, and Mn, which affected the
stability of source profiles. PMF detected biomass burning, coal combustion, traffic, industrial emissions,
Mn-enriched sources, and secondary aerosols, while the Unmix model identified similar sources with
comparable profiles, apart from profiles of vehicle exhaust and industrial emissions showing slight
differences. Unmix was highly influenced by outliers, multicollinearity, and, to a lesser extent, change in
sample size compared to PMF. We recommend interpreting the results of Bootstrapping, rather than basic
runs for both PMF and Unmix. We also recommend data screening prior to further modeling. We suggest
checking multicollinearity using more than one statistical measure, particularly VIF (Variance Inflation
Factor) values together with tolerance values.