What is Data Engineering?
The application of data collecting and analysis is the emphasis of data engineering. The information gathered from numerous sources is merely raw information. Data engineering helps in the transformation of unusable data into useful information. It is the process of transforming, cleansing, profiling, and aggregating huge data sets in a nutshell.
Define reserved capacity in Azure
Microsoft has included a reserved capacity option in Azure storage to optimize costs. The reserved storage gives its customers a fixed amount of capacity during the reservation period on the Azure cloud.
What do you mean by blob storage in Azure?
It is a service that lets users store massive amounts of unstructured object data such as binary data or text. It can even be used to publicly showcase data or privately store the application data. Blog storage is commonly used for:
- Providing images or documents to a browser directly
- Audio and video streaming
- Data storage for backup and restore disaster recovery
- Data storage for analysis using an on-premises or Azure-hosted service
Define serverless database computing in Azure.
The program code is typically present either on the client-side or the server. However, serverless computing accompanies the stateless code nature, which means the code doesn’t need any infrastructure.
Users have to pay to access the compute resources the code uses within the brief period in which the code is being executed. It's cost-effective, and users need to pay only for the resources they have used.
Explain the top-level concepts of Azure Data Factory.
- Pipeline
Used as a carrier for the numerous processes taking place. Every individual process is known as an activity.
- Activities
Activities stand for the process steps involved in a pipeline. A pipeline has one or multiple activities and can be anything. This means querying a data set or transferring the dataset from one source to the other.
- Datasets
Simply put, it’s a structure that holds the data.
- Linked Services
Used for storing critical information when connecting an external source.
Difference between Azure Synapse Analytics and Azure Data Lake Storage?
Azure Synapse Analytics |
Azure Data Lake |
It is optimized for processing structured data in a well-defined schema. |
It is optimized for storing and processing structured and non-structured data. |
Built on SQL(Structured Query Language) Server. |
Built to work with Hadoop. |
Built-in data pipelines and data streaming capabilities. |
Handle data streaming using Azure Stream Analytics. |
Compliant with regulatory standards. |
No regulatory compliance |
Used for Business Analytics. |
Used for data analytics and exploration by data scientists and engineers |
What are the different security options available in the Azure SQL database?
Security plays a vital role in databases. Some of the security options available in the Azure SQL database are:
- Azure SQL Firewall Rules: Azure provides two-level security. There are server-level firewall rules which are stored in the SQL Master database. Server-level firewall rules determine the access to the Azure database server. Users can also create database-level firewall rules that govern the individual databases’ keys.
- Azure SQL TDE (Transparent Data Encryption): TDE is the technology used to encrypt stored data. TDE is also available for Azure Synapse Analytics and Azure SQL Managed Instances. With TDE, the encryption and decryption of databases, backups, and transaction log files, happens in real-time.
- Always Encrypted: It is a feature designed to protect sensitive data stored in the Azure SQL database, such as credit card numbers. This feature encrypts data within the client applications using Always Encrypted-enabled driver. Encryption keys are not shared with SQL Database, which means database admins do not have access to sensitive data.
- Database Auditing: Azure provides comprehensive auditing capabilities along with the SQL Database. It is also possible to declare the audit policy at the individual database level, allowing users to choose based on the requirements.
What is the limit on the number of integration runtimes?
There is no hard limit on the number of integration runtime instances you can have in a data factory. There is, however, a limit on the number of VM cores that the integration runtime can use per subscription for SSIS package execution.
What is blob storage in Azure?
Azure Blob Storage is a service for storing large amounts of unstructured object data, such as text or binary data. You can use Blob Storage to expose data publicly to the world or to store application data privately. Common uses of Blob Storage include:
- Serving images or documents directly to a browser
- Storing files for distributed access
- Streaming video and audio
- Storing data for backup and restore disaster recovery, and archiving
- Storing data for analysis by an on-premises or Azure-hosted service
Which Data Factory version do I use to create data flows?
Use the Data Factory V2 version to create data flows