Crowdsourced data leaking user's privacy while using anonymization technique
DOI:
https://doi.org/10.22581/muet1982.2954Keywords:
Classification , Privacy leakage, Anonymization, Machine learningAbstract
Due to the tremendous value embedded in big educational data, numerous research institutes have collected large volumes of student behavioral data. To fully utilize the underlying values, the collected data may be shared with third parties, such as worldwide intelligent data experts. However, this may pose privacy risks to data owners, even though the data collectors usually anonymize the data before crowdsourcing. To demonstrate that anonymization alone is insufficient to protect user privacy, we conducted an experimental study using offline and online behavioral traces collected through campus cards and smartphones. Our study demonstrates that a student’s identity can be identified with high probability based on anonymized behavior payment traces. The analysis of results demonstrates that only ten features, i.e., Transmission Control Protocol (TCP), synchronization attempts, content length, downlink traffic, last acknowledgement packet delay, uplink traffic, cell ID, base station ID, day, hour (offline payment, time) day, hour, minute (online payment time), and point of sale ID (POS_ID) are sufficient to uniquely identify an individual. Five supervised standard learning algorithm classifiers have been utilized to predict the user identity i.e., Extra Tree, Bagging, Decision Tree, Nearest Neighbor (KNN), and Random Forest Tree classifiers. The evaluation results showed that the achieved accuracy reached 99.99%, 99.95%, 99.02%, 98.84%, and 99.56%, respectively.
Downloads
References
D. Gupta and R. Rani, “A study of big data evolution and research challenges”, Journal of information science, vol. 45, no. 3, pp. 322-340, 2019.
M. Hilbert, “Big data for development: A review of promises and challenges”, Development Policy Review, vol. 34, no. 1, pp. 135-174, 2016.
J. Huang, “A big data based education information system for university student management”, Journal of System and Management Sciences, vol. 13, no. 2, pp. 428-436, 2023.
A. R. Baig and H. Jabeen, “Big data analytics for behavior monitoring of students”, Procedia Computer Science, vol. 82, pp. 43-48, 2016.
X. Yang and J. Ge, “Predicting student learning effectiveness in higher education based on big data analysis”, Mobile Information Systems, vol. 2022, no. 1, p. 8409780, 2022.
N. Mirbahar, A. A. Laghari, and K. Kumar, “Enhancing Mobile App Recommendations with Crowdsourced Educational Data Using Machine Learning and Deep Learning”, IEEE Access, 2025.
M. Li and F. Wang, “Analysis of college students’ physical health test data based on big data and health promotion countermeasures”, Advances in Multimedia, vol. 2022, no. 1, p. 6879597, 2022.
S. Ma, “Enhancing sports education through big data analytics: Leveraging models for improved teaching strategies”, Applied and Computational Engineering, vol. 57, pp. 184-189, 2024.
R. Wyber, S. Vaillancourt, W. Perry, P. Mannava, T. Folaranmi, and L. A. Celi, “Big data in global health: improving health in low-and middle-income countries”, Bulletin of the World Health Organization, vol. 93, pp. 203-208, 2015.
L. Shugang and Z. Yuning, “Research on intelligent link prediction model of friend influence based on big data and complex network”, 2021 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS), 2021: IEEE, pp. 738-743.
N. Eagle and A. Pentland, “Reality mining: sensing complex social systems”, Personal and ubiquitous computing, vol. 10, pp. 255-268, 2006.
D. T. Wagner, A. Rice, and A. R. Beresford, “Device Analyzer: Large-scale mobile data collection”, ACM SIGMETRICS Performance Evaluation Review, vol. 41, no. 4, pp. 53-56, 2014.
D. T. Wagner, A. Rice, and A. R. Beresford, “Device analyzer: Understanding smartphone usage”, Mobile and Ubiquitous Systems: Computing, Networking, and Services: 10th International Conference, MOBIQUITOUS 2013, Tokyo, Japan, December 2-4, 2013, Revised Selected Papers 10, 2014: Springer, pp. 195-208.
D. Ashbrook and T. Starner, “Using GPS to learn significant locations and predict movement across multiple users”, Personal and Ubiquitous computing, vol. 7, pp. 275-286, 2003.
A. Pentland, D. Lazer, D. Brewer, and T. Heibeck, “Using reality mining to improve public health and medicine”, Strategy for the Future of Health: IOS Press, 2009, pp. 93-102.
J. K. Laurila et al., “The mobile data challenge: Big data for mobile computing research”, Pervasive computing, 2012.
S. Li, S. Zhao, P. Gope, and L. Da Xu, “Data Privacy Enhancing in the IoT User/Device Behavior Analytics”, ACM Transactions on Sensor Networks, vol. 19, no. 2, pp. 1-13, 2022.
K. Wang, C.-M. Chen, M. Shojafar, Z. Tie, M. Alazab, and S. Kumari, “AFFIRM: Provably forward privacy for searchable encryption in cooperative intelligent transportation system”, IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 11, pp. 22607-22618, 2022.
M. Chen, S. Mao, and Y. Liu, “Big data: A survey”, Mobile networks and applications, vol. 19, pp. 171-209, 2014.
N. Cele, “Big data-driven early alert systems as means of enhancing university student retention and success”, South African Journal of Higher Education, vol. 35, no. 2, pp. 56-72, 2021.
D. Erickson and N. Andrews, “Partnerships among community development, public health, and health care could improve the well-being of low-income people”, Health Affairs, vol. 30, no. 11, pp. 2056-2063, 2011.
M. Bienkowski, M. Feng, and B. Means, “Enhancing Teaching and Learning through Educational Data Mining and Learning Analytics: An Issue Brief”, Office of Educational Technology, US Department of Education, 2012.
J. Wang and P. Wang, “Innovation research on big data-driven student management work in universities”, 2021 International Wireless Communications and Mobile Computing (IWCMC), 2021: IEEE, pp. 2007-2012.
Z. Chai, Z. Chai, M. Wang, and G. Quan, “How Will the Smart Logistics Service of Universities Based on Big Data Be Transformed and Developed?: Taking Jiangnan University as an Example”, 2023 5th International Conference on Computer Science and Technologies in Education (CSTE), 2023: IEEE, pp. 265-269.
Y. Liu, M. Hu, and X. Lu, “Social frequency analysis of university students via digital campus cards”, 2016 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2016, vol. 1: IEEE, pp. 369-372.
L. WAN, “A research on the correlation of loneliness and social anxiety in college students”, Advances in Psychology, vol. 6, no. 4, pp. 391-397, 2016.
M. Saifuzzaman, T. N. Ananna, M. J. M. Chowdhury, M. S. Ferdous, and F. Chowdhury, “A systematic literature review on wearable health data publishing under differential privacy”, International Journal of Information Security, vol. 21, no. 4, pp. 847-872, 2022.
S. Shen, T. Zhu, D. Wu, W. Wang, and W. Zhou, “From distributed machine learning to federated learning: In the view of data privacy and security”, Concurrency and Computation: Practice and Experience, vol. 34, no. 16, p. e6002, 2022.
L. Zhang, J. Xu, P. Vijayakumar, P. K. Sharma, and U. Ghosh, “Homomorphic encryption-based privacy-preserving federated learning in IoT-enabled healthcare system”, IEEE Transactions on Network Science and Engineering, vol. 10, no. 5, pp. 2864-2880, 2022.
M. Xue et al., “Use the spear as a shield: An adversarial example based privacy-preserving technique against membership inference attacks”, IEEE Transactions on Emerging Topics in Computing, vol. 11, no. 1, pp. 153-169, 2022.
Y. Zhao, J. Chen, J. Zhang, D. Wu, M. Blumenstein, and S. Yu, “Detecting and mitigating poisoning attacks in federated learning using generative adversarial networks”, Concurrency and Computation: Practice and Experience, vol. 34, no. 7, p. e5906, 2022.
B. Balle, G. Cherubin, and J. Hayes, “Reconstructing training data with informed adversaries”, 2022 IEEE Symposium on Security and Privacy (SP), 2022: IEEE, pp. 1138-1156.
H. Hu, Z. Salcic, L. Sun, G. Dobbie, P. S. Yu, and X. Zhang, “Membership inference attacks on machine learning: A survey”, ACM Computing Surveys (CSUR), vol. 54, no. 11s, pp. 1-37, 2022.
Y. Zhou, J. Wu, H. Wang, and J. He, “Adversarial robustness through bias variance decomposition: A new perspective for federated learning”, Proceedings of the 31st ACM international conference on information & knowledge management, 2022, pp. 2753-2762.
P. Papadopoulos, “Privacy-preserving systems around security, trust and identity”, 2022.
V. Avdiienko, “Mining patterns of sensitive data usage”, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, 2015, vol. 2: IEEE, pp. 891-894.
I. S. Rubinstein and W. Hartzog, “Anonymization and risk”, Wash. L. Rev., vol. 91, p. 703, 2016.
F. Kargl, R. W. van der Heijden, B. Erb, and C. Bösch, “Privacy in mobile sensing”, Digital Phenotyping and Mobile Sensing: New Developments in Psychoinformatics: Springer, 2022, pp. 13-23.
M. B. Kursa, “Robustness of Random Forest-based gene selection methods”, BMC bioinformatics, vol. 15, pp. 1-8, 2014.
X. Xie, M.-J. Yuan, X. Bai, W. Gao, and Z.-H. Zhou, “On the Gini-impurity preservation for privacy random forests”, Advances in Neural Information Processing Systems, vol. 36, pp. 45055-45082, 2023.
Z. Yan and Y. Yao, “Variable selection method for fault isolation using least absolute shrinkage and selection operator (LASSO)”, Chemometrics and Intelligent Laboratory Systems, vol. 146, pp. 136-146, 2015.
M. Raab, “Data correction of the TOI system by statistical methods and machine learning/submitted by Michaela Raab”, 2021.
B. Yu, “Stability”, 2013.
G. Chen and J. Chen, “A novel wrapper method for feature selection and its applications”, Neurocomputing, vol. 159, pp. 219-226, 2015.
I. Inza, P. Larrañaga, R. Etxeberria, and B. Sierra, “Feature subset selection by Bayesian network-based optimization”, Artificial intelligence, vol. 123, no. 1-2, pp. 157-184, 2000.
P. Kumari, “Different Approaches of Quantitative Structure Retention Relationship of Small Molecules in Liquid Chromatography (QSRR)”, Universite de Liege (Belgium), 2024.
N. Balakrishnan, V. Voinov, and M. S. Nikulin, Chi-squared goodness of fit tests with applications. Academic Press, 2013.
H. Zhou, X. Wang, and R. Zhu, “Feature selection based on mutual information with correlation coefficient”, Applied intelligence, vol. 52, no. 5, pp. 5457-5474, 2022.
L. Pappalardo, F. Simini, G. Barlacchi, and R. Pellungrini, “Scikit-mobility: A Python library for the analysis, generation, and risk assessment of mobility data”, Journal of Statistical Software, vol. 103, pp. 1-38, 2022.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Mehran University Research Journal of Engineering and Technology

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
How to Cite
Similar Articles
- Ashar Ahmed, Mario Muñoz-Organero, Bushra Aijaz, Exploring commuter stress dynamics through machine learning and double optimization , Mehran University Research Journal of Engineering and Technology: Vol. 44 No. 2 (2025): April Issue
- Beenish Ayesha Akram , Muhammad Irfan , Amna Zafar , Sidra Khan , Rubina Shaheen , Impact of computing platforms on classifier performance in heart disease prediction , Mehran University Research Journal of Engineering and Technology: Vol. 44 No. 2 (2025): April Issue
- Shakeel Ahmad , Sheikh Muhammad Saqib , Asif Hassan Syed , Nashwan Alromema , Ali Kararay, Exploring the best fit: A comparative analysis of AFINN, Textblob, VADER, and Pattern on Arabic reviews for optimal dictionary extraction , Mehran University Research Journal of Engineering and Technology: Vol. 44 No. 2 (2025): April Issue
- Zeeshan Ali Haider, Nasser A Alsadhan, Fida Muhammad Khan, Waleed Al-Azzawi, Inam Ullah Khan, Inam ullah, Deep learning-based dual optimization framework for accurate thyroid disease diagnosis using CNN architectures , Mehran University Research Journal of Engineering and Technology: Vol. 44 No. 2 (2025): April Issue
- Ayesha Rafique , Rabia Noor Enam, Madiha Abbasi , Noreen Akram, Application of deep learning models for pest detection and identification , Mehran University Research Journal of Engineering and Technology: Vol. 44 No. 2 (2025): April Issue
- Bilal Ahmed , Bushra Haq , Kamran Ali, Anum Tanveer Kiyani, Illuminate the integration of body area network with the existing telehealth infrastructure in Balochistan , Mehran University Research Journal of Engineering and Technology: Vol. 44 No. 2 (2025): April Issue
You may also start an advanced similarity search for this article.