In this Project I decided to understand better the data from Brazilian elections 2018.
Dataset.The dataset has pieces of information as the age of candidates, instructions, gender, marital status, and expenses.
The first part of the job was to decide which data I could work and then delete any outlier that impacts the data results.
Check the age was the first column to analyze. At that moment I found problem one data-entry error. Because one of the politicians has 825 years. The first part of the job was to decide which data I could work and then delete any outlier that impacts the data results.
To confirm that I don't have any outlier, Create any bar chart that you can visualize other possibles outliers.
This specific outlier case impacted and shifted the entire chart analysis to the left, in this case, the decision was to delete the entire column to reduce the impact for analyzes such as the average.
Aggregate sum/min/max functions to group age/ Expenses/ region
This tool of aggregate the information is a very strong tool to help you to answer many business questions referent your case.
Specific in this business case I sorted the 5 top Regions that spend more money in their campaign.
And the second table calculate how old the most of the politician is in the elections of 2018.
According to the data study, 2/3 of all Brazilian politicians are men.
Even at that point, the knowledge is very similar, just Women have 4% less Superior education than Men, that migrate to High School education.
Programming Language: