发布时间:2022-08-18 18:37
今天开始更新 Hadoop 高频面试题英文版本,分为 Freshers 1,Freshers 2,Experienced 1,Experienced 2 四个部分。
音频文件点击下方获取。
【Hadoop】Hadoop 高频面试题英文版(1)
【Hadoop】Hadoop 高频面试题英文版(2)
【Hadoop】Hadoop 高频面试题英文版(3)
Apache Hadoop is an open-source software library used to control data processing and storage in big data applications. Hadoop helps to analyze vast amounts of data parallelly and more swiftly. Apache Hadoop was acquainted with the public in 2012 by The Apache Software Foundation(ASF). Hadoop is economical to use as data is stored on affordable commodity Servers that run as clusters.
Before the digital period, the volume of data gathered was slow and could be examined and stored with a single storage format. At the same time, the format of the data received for similar purposes had the same format. However, with the development of the Internet and digital platforms like social media, the data comes in multiple formats (structured, semi-structured, and unstructured), and its velocity also massively grown. A new name was given to this data which is Big data. Then, the need for multiple processors and storage units arose to handle the big data. Therefore, as a solution, Hadoop was introduced.
Apache Hadoop 是一个开源软件库,用于控制大数据应用程序中的数据处理和存储。Hadoop 有助于更快速地并行分析大量数据。Apache Hadoop 于 2012 年由 Apache 软件基金会 (ASF) 为公众所知。Hadoop 使用起来很经济,因为数据存储在作为集群运行的经济实惠的商品服务器上。
在数字时代之前,收集的数据量很慢,可以使用单一的存储格式进行检查和存储。同时,为类似目的接收的数据格式也相同。然而,随着互联网和社交媒体等数字平台的发展,数据以多种格式(结构化、半结构化和非结构化)出现,其速度也在大幅增长。该数据被赋予了一个新名称,即大数据。然后,需要多个处理器和存储单元来处理大数据。因此,作为解决方案,引入了 Hadoop。
Gartner defined Big Data as–
“Big data” is high-volume, velocity, and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.”
Simply, big data is larger, more complex data sets, particularly from new data sources. These data sets are so large that conventional data processing software can’t manage them. But these massive volumes of data can be used to address business problems you wouldn’t have been able to tackle before.
Gartner 将大数据定义 为——
“大数据”是海量、速度和种类繁多的信息资产,需要具有成本效益、创新的信息处理形式,以增强洞察力和决策能力。”
简单地说,大数据是更大、更复杂的数据集,尤其是来自新数据源的数据。这些数据集是如此之大,以至于传统的数据处理软件无法管理它们。但这些海量数据可用于解决您以前无法解决的业务问题。
Characteristics of Big Data are: