A Protection Layer over MapReduce Framework for Big Data Privacy


  • Hidayath Ali Baig Department of Information Technology, University of Technology and Applied Sciences, Oman




Data Analytics; HADOOP; HDFS; Map Reduce Protection Layer (MRPL); Privacy-preserving


In many organizations, big data analytics has become a trend in gathering valuable data insights. The framework MapReduce, which is generally used for this purpose, has been accepted by most organizations for its exceptional characteristics. However, because of the availability of significant processing resources, dispersed privacy-sensitive details can be collected quickly, increasing the widespread privacy concerns.  This article reviews some of the existing research articles on the MapReduce framework's privacy issues and proposes an additional layer of privacy protection over the adopted framework. The data is split into bits and processed in the clouds, and two other steps are taken. Hadoop splits the file into bits of a smaller scale. The task tracker then allocates these bits to several mappers. First, the data is split up into key-value pairs, and the intermediate data sets are generated.  The efficiency of the suggested approach may then be effectively interpreted. Overall, the proposed method provides improved scalability. The following figures compare execution time with relation to file size and the number of partitions. As privacy protection technique is used, the loss of data content can be appropriately handled.  It has been demonstrated that MRPL outperforms current methods in terms of CPU optimization, memory usage, and reduced information loss.  Research reveals that the suggested strategy creates significant advantages for Big Data by enhancing privacy and protection. MRPL can considerably solve the privacy issues in Big Data.



Z. Bi and D. Cochran, "Big data analytics with applications," J. Manag. Anal., vol. 1, no. 4, pp. 249–265, Oct. 2014, DOI: 10.1080/23270012.2014.992985.

D. Y. K. Sharma, "Framework for Privacy-Preserving Classification in Data Mining," vol. 5, no. 9, p. 6, 2018.

P. Jain, M. Gyanchandani, and N. Khare, "Enhanced Secured Map-Reduce layer for Big Data privacy and security," J. Big Data, vol. 6, no. 1, p. 30, Dec. 2019, DOI: 10.1186/s40537-019-0193-4.

J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008, DOI: 10.1145/1327452.1327492.

H. A. Baig, Dr. Y. K. Sharma, and S. Z. Ali, "Privacy-Preserving in Big Data Analytics: State of the Art," SSRN Electron. J., 2020, DOI: 10.2139/ssrn.3713826.

B. C. Neuman and T. Ts, "Kerberos: An Authentication Sewice for Computer Networks," p. 6, 1994.

I. Roy, S. T. V. Setty, A. Kilzer, V. Shmatikov, and E. Witchel, "Airavat: Security and Privacy for MapReduce," Data Min., p. 51, 2010.

N. Cao, C. Wang, M. Li, and K. Ren, "Privacy-Preserving Multi-Keyword Ranked Search over Encrypted Cloud Data," IEEE Trans. PARALLEL Distrib. Syst., vol. 25, no. 1, p. 12, 2014.

V. Hu, T. Grance, D. Ferraiolo, and D. Kuhn, "An Access Control Scheme for Big Data Processing," presented at the 10th IEEE International Conference on Collaborative Computing: Networking, Applications, and Worksharing, Miami, United States, 2014, DOI: 10.4108/icst.collaboration.2014.257649.

M. Li, S. Yu, N. Cao, and W. Lou, "Authorized Private Keyword Search over Encrypted Data in Cloud Computing," p. 10, 2011.

K. P. N. Puttaswamy, C. Kruegel, and B. Y. Zhao, "Silverline: Toward Data Confidentiality in Storage-Intensive Cloud Applications," p. 13, 2011.

X. Zhang, C. Liu, S. Nepal, S. Pandey, and J. Chen, "A Privacy Leakage Upper Bound Constraint-Based Approach for Cost-Effective Privacy Preserving of Intermediate Data Sets in Cloud," IEEE Trans. PARALLEL Distrib. Syst., vol. 24, no. 6, p. 11, 2013.

E.-O. Blass, R. Di Pietro, R. Molva, and M. Önen, “PRISM – Privacy-Preserving Search in MapReduce,” in Privacy Enhancing Technologies, vol. 7384, S. Fischer-Hübner and M. Wright, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 180–200.

S. Y. Ko, K. Jeon, and R. Morales, "The HybrEx Model for Confidentiality and Privacy in Cloud Computing," p. 5, 2011.

K. Zhang, X. Zhou, Y. Chen, X. Wang, and Y. Ruan, "Sedic: privacy-aware data-intensive computing on hybrid clouds," p. 11, 2011.

W. Wei, J. Du, T. Yu, and X. Gu, "SecureMR: A Service Integrity Assurance Framework for MapReduce," p. 10, 2009.

P. Ram Mohan Rao, S. Murali Krishna, and A. P. Siva Kumar, "Privacy preservation techniques in big data analytics: a survey," J. Big Data, vol. 5, no. 1, p. 33, Dec. 2018, DOI: 10.1186/s40537-018-0141-8.

Z. Xiao and Y. Xiao, "Accountable MapReduce in cloud computing," in 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Shanghai, China, Apr. 2011, pp. 1082–1087, DOI: 10.1109/INFCOMW.2011.5928788.

Chaudhuri, S., "What next? A half-dozen data management research goals for big data and the cloud." Proc. 31st ACM SIGMOD-SIGACT-SIGAI Symp. Princ. Database Syst., p. (pp. 1-4)., May 2012.

C. Gentry, "Fully homomorphic encryption using ideal lattices," in Proceedings of the 41st annual ACM symposium on Symposium on the theory of computing - STOC '09, Bethesda, MD, USA, 2009, p. 169, DOI: 10.1145/1536414.1536440.

X. Zhang, L. T. Yang, C. Liu, and J. Chen, "A scalable two-phase top-down specialization approach for data anonymization using MapReduce on a cloud," IEEE Trans Parallel Distrib Syst, vol. 25, 2014, DOI: 10.1109/TPDS.2013.48.

S. Y. Ko, K. Jeon, and R. Morales, "The HybrEx Model for Confidentiality and Privacy in Cloud Computing," p. 5, 2007.

Gudditti, V., & Krishna, P. V. (2021). Light weight encryption model for map reduce layer to preserve security in the big data and cloud. Materials Today: Proceedings.

Cevher, V., Becker, S. and Schmidt, M., 2014. Convex optimization for big data: Scalable, randomized, and parallel algorithms for big data analytics. IEEE Signal Processing Magazine, 31(5), pp.32-43.

Patel, A.B., Birla, M. and Nair, U., 2012, December. Addressing big data problem using Hadoop and Map Reduce. In 2012 Nirma University International Conference on Engineering (NUiCONE) (pp. 1-5). IEEE.

Lin, J., 2013. MapReduce is good enough? The control project. IEEE Comput, 32.

Groves, P., Kayyali, B., Knott, D. and Kuiken, S.V., 2016. The'big data'revolution in healthcare: Accelerating value and innovation.

Chavan, V. and Phursule, R.N., 2014. Survey paper on big data. Int. J. Comput. Sci. Inf. Technol, 5(6), pp.7932-7939.

Sinanc, S.S., 2013. Big data: A review. In Proc. Int. Conf. CTS.

Borkar, V., Carey, M.J. and Li, C., 2012, March. Inside" Big Data management" ogres, onions, or parfaits?. In Proceedings of the 15th international conference on extending database technology (pp. 3-14).

Chaudhuri, S., 2012, May. What next? A half-dozen data management research goals for big data and the cloud. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems (pp. 1-4).

Dean, J. and Ghemawat, S., 2010. MapReduce: a flexible data processing tool. Communications of the ACM, 53(1), pp.72-77.

Shvachko, K., Kuang, H., Radia, S. and Chansler, R., 2010, May. The hadoop distributed file system. In 2010 IEEE 26th symposium on mass storage systems and technologies (MSST) (pp. 1-10). Ieee.

Bhatotia, P., Wieder, A., Rodrigues, R., Acar, U.A. and Pasquin, R., 2011, October. Incoop: MapReduce for incremental computations. In Proceedings of the 2nd ACM Symposium on Cloud Computing (pp. 1-14).

Bu, Y., Howe, B., Balazinska, M. and Ernst, M.D., 2012. The HaLoop approach to large-scale iterative data analysis. The VLDB Journal, 21(2), pp.169-190.

Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.H., Qiu, J. and Fox, G., 2010, June. Twister: a runtime for iterative mapreduce. In Proceedings of the 19th ACM international symposium on high performance distributed computing (pp. 810-818).




How to Cite

Baig, H. A. (2022). A Protection Layer over MapReduce Framework for Big Data Privacy. International Journal of Computer and Information Technology(2279-0764), 11(2). https://doi.org/10.24203/ijcit.v11i2.263