Chapter 5

computer-science • intermediate 11th

Data Analytics

Comprehensive notes, MCQs, and Short Questions for Chapter 5 Data Analytics. Covers statistical concepts, data collection, regression, clustering, and visualization.

Introduction to Models

Definition: A model is a simplified representation of a real-world problem. It has three parts: Input, Process, and Output.

Why Models are Important:
- Help make decisions
- Save time and resources
- Predict future outcomes

Examples: Weather forecasting, predicting sales, studying disease spread.

Measures of Central Tendency

Mean (Average): Sum of all values divided by the number of values.

Mode: The value that appears most frequently in a data set.

Median: The middle value when data is arranged in order. For even numbers, it's the average of the two middle values.

Measures of Dispersion

Variance: Shows how far each number in the data is from the mean. High variance means values are spread out.

Standard Deviation: The square root of variance. Shows the average distance from the mean. Easier to interpret than variance.

Introduction to Probability

Definition: Probability is the study of how likely an event is to happen.

Formula: Probability = Favorable Outcomes / Total Outcomes

Example: Probability of heads when tossing a coin = 1/2 = 50%

Uses: Weather forecasting, games, medical testing.

Data Collection Methods

Surveys: Questionnaires given to a group to collect standardized data quickly. Can be online, phone, or paper.

Observations: Watching people or events in their natural setting without asking questions.

Experiments: Changing one variable to see its effect on another. Used to test cause-and-effect relationships.

Data Preparation

Data Cleaning: Fixing or removing errors like incorrect entries, missing values, or duplicates.

Data Transformation: Changing data into a better format (e.g., making new columns, rearranging data).

Handling Missing Data:
- Imputation: Filling missing values with an average.
- Flagging: Marking data as missing.
- Removal: Deleting incomplete records.

Statistical Models

Linear Regression: Predicts a numeric value (dependent variable) based on another variable (independent). Formula: Y = a + bX.

Logistic Regression: Predicts a Yes/No outcome. Gives a probability between 0 and 1.

Clustering (K-Means): Groups similar items together without predefined labels. Useful for finding patterns.

Data Visualization

Bar Chart: Compares different categories using bars.

Line Graph: Shows changes over time.

Histogram: Shows how data is distributed across ranges.

Scatter Plot: Shows relationship between two variables.

Box Plot: Shows data spread, median, and outliers.

Tools: MS Excel, Google Sheets.

Download PDFPDF