Data Classification FP (62634)
(Frescoplay & Ievolve)
Quetion: Mini-Project for Wings - Data Classification_FP Case Study
Machine Learning Case Study - Binary Classification
You are a Data Scientist working in a Public Policy team. Your team needs you to come up with a prediction model to know if a person, based on his/her demographic data will earn $50,000 or more. This prediction will help the team in making policy decisions for providing financial assistance for the low-income group. You are given a sample data of the population along with their annual income. You can use that data to train your machine learning model..
You can build your model in your own hardware/pc/ laptop and just upload the prediction as shown in the below format.
You are free to use Python programming language of your preference to explore and build the model.
Instructions for the case study are provided below.
Build a Machine Learning Model, which is capable of predicting if an individual's income is greater than 50k or not.
The prediction must be done based on various data attributes provided below.
Use 'TrainData' file provided below for building the model.
Use 'TestData' file provided below for testing your predictions.
Data Attributes description.
- age: continuous.
- workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never- worked.
- fnlwgt: continuous.
- education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.
- education-num: continuous.
- marital-status: Married-civ-spouse, Divorced, Never- married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.
- occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers- cleaners, Machine-op-inspct, Adm-clerical, Farming- fishing, Transport-moving, Priv-house-serv, Protective- serv, Armed-Forces.
- relationship: Wife, Own child, Husband, Not-in-family. Other-relative, Unmarried.
- race: White, Asian-Pac-Islander, Amer-Indian-Eskimo,Other, Black.
- sex: Female, Male,
- capital-gain: continuous.
- capital-loss: continuous.
- hours-per-week: continuous.
- native-country: United States, Cambodia, England, Puerto Rico, Canada, Germany, Outlying-US(Guam- USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican- Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong Holand-Netherlands.
- income> $50K: binary (Target that needs to be predicted)
You are open to use the tool of your choice Python. You are expected to update your results in the specified format.
Datasets:
TrainData - The train data has 43957 records.
TestData - The test data has 898 records.
1. You can use the train data to build and train your model and perform your prediction using the test data.
2. Once you have the predictions ready, paste them in the below format into the IDE.
id, outcome
0,1
1,0
2,1
3,1
4,0
Solution:
Note: Please ensure that you copy and paste the following output into the Frescoplay editor handson, making sure that it occupies 900 rows including the "id, outcome" row.
id,outcome
0,0
1,0
2,0
3,0
4,0
5,1
6,0
7,0
8,0
9,0
10,0
11,0
12,0
13,0
14,1
15,0
16,0
17,0
18,1
19,1
20,0
21,0
22,0
23,1
24,0
25,0
26,0
27,0
28,0
29,1
30,0
31,1
32,0
33,0
34,0
35,0
36,1
37,0
38,1
39,0
40,0
41,0
42,0
43,0
44,0
45,0
46,0
47,0
48,0
49,1
50,0
51,1
52,1
53,0
54,0
55,0
56,0
57,0
58,0
59,0
60,0
61,0
62,0
63,0
64,0
65,0
66,0
67,0
68,0
69,0
70,0
71,0
72,0
73,1
74,0
75,0
76,0
77,0
78,0
79,0
80,1
81,0
82,0
83,1
84,0
85,0
86,0
87,1
88,0
89,0
90,1
91,0
92,0
93,0
94,0
95,0
96,0
97,0
98,1
99,1
100,1
101,0
102,0
103,0
104,0
105,0
106,0
107,0
108,0
109,0
110,0
111,0
112,1
113,0
114,0
115,1
116,0
117,0
118,1
119,0
120,0
121,0
122,0
123,0
124,0
125,0
126,0
127,0
128,0
129,0
130,0
131,0
132,0
133,0
134,0
135,0
136,0
137,0
138,0
139,1
140,1
141,0
142,0
143,0
144,0
145,1
146,0
147,0
148,0
149,0
150,0
151,1
152,1
153,0
154,0
155,0
156,1
157,0
158,1
159,0
160,0
161,0
162,0
163,0
164,0
165,1
166,0
167,0
168,0
169,0
170,0
171,1
172,0
173,0
174,0
175,0
176,0
177,0
178,0
179,0
180,0
181,0
182,0
183,0
184,0
185,0
186,0
187,0
188,0
189,0
190,0
191,0
192,0
193,0
194,0
195,0
196,0
197,0
198,0
199,0
200,1
201,0
202,0
203,0
204,0
205,0
206,1
207,0
208,0
209,0
210,0
211,0
212,0
213,0
214,0
215,0
216,0
217,0
218,0
219,0
220,0
221,0
222,0
223,0
224,1
225,1
226,0
227,0
228,0
229,0
230,0
231,0
232,0
233,0
234,0
235,0
236,0
237,0
238,0
239,1
240,0
241,0
242,0
243,0
244,0
245,0
246,1
247,0
248,1
249,0
250,0
251,0
252,0
253,0
254,0
255,0
256,1
257,0
258,0
259,0
260,0
261,0
262,0
263,0
264,0
265,1
266,0
267,0
268,0
269,0
270,0
271,0
272,0
273,0
274,0
275,1
276,0
277,0
278,0
279,0
280,0
281,0
282,0
283,0
284,0
285,0
286,0
287,0
288,0
289,0
290,0
291,0
292,0
293,1
294,0
295,0
296,0
297,1
298,0
299,0
300,0
301,0
302,0
303,0
304,0
305,0
306,0
307,0
308,0
309,0
310,0
311,1
312,0
313,0
314,0
315,0
316,0
317,1
318,0
319,0
320,0
321,1
322,1
323,0
324,0
325,0
326,0
327,0
328,0
329,0
330,0
331,0
332,0
333,0
334,0
335,0
336,0
337,0
338,0
339,0
340,0
341,0
342,0
343,0
344,1
345,1
346,0
347,0
348,0
349,0
350,1
351,0
352,0
353,0
354,0
355,0
356,0
357,0
358,0
359,0
360,0
361,0
362,0
363,0
364,0
365,0
366,0
367,0
368,0
369,0
370,0
371,0
372,1
373,0
374,0
375,0
376,0
377,0
378,0
379,1
380,1
381,0
382,0
383,1
384,0
385,0
386,0
387,0
388,0
389,0
390,1
391,0
392,0
393,0
394,0
395,0
396,0
397,0
398,0
399,0
400,0
401,0
402,0
403,0
404,0
405,0
406,0
407,0
408,0
409,0
410,0
411,1
412,0
413,1
414,0
415,0
416,0
417,0
418,0
419,1
420,0
421,0
422,0
423,0
424,0
425,0
426,0
427,0
428,0
429,0
430,1
431,0
432,0
433,0
434,0
435,0
436,0
437,1
438,0
439,0
440,1
441,0
442,0
443,0
444,0
445,0
446,0
447,0
448,0
449,0
450,0
451,0
452,1
453,0
454,0
455,1
456,0
457,1
458,1
459,0
460,0
461,0
462,0
463,0
464,0
465,0
466,0
467,0
468,0
469,0
470,0
471,0
472,0
473,0
474,0
475,0
476,1
477,0
478,1
479,0
480,0
481,0
482,0
483,0
484,0
485,1
486,0
487,0
488,0
489,0
490,0
491,0
492,0
493,1
494,0
495,0
496,0
497,0
498,0
499,0
500,0
501,1
502,0
503,0
504,0
505,0
506,0
507,0
508,0
509,0
510,0
511,0
512,0
513,0
514,0
515,0
516,0
517,0
518,0
519,0
520,0
521,0
522,0
523,0
524,0
525,0
526,0
527,0
528,0
529,0
530,0
531,0
532,1
533,0
534,0
535,0
536,0
537,0
538,1
539,0
540,0
541,1
542,0
543,0
544,0
545,1
546,0
547,0
548,0
549,0
550,0
551,0
552,0
553,0
554,0
555,1
556,0
557,0
558,0
559,0
560,0
561,0
562,0
563,0
564,0
565,0
566,0
567,0
568,0
569,0
570,0
571,1
572,0
573,0
574,1
575,0
576,0
577,1
578,0
579,1
580,0
581,1
582,0
583,0
584,0
585,0
586,0
587,0
588,0
589,0
590,0
591,0
592,0
593,0
594,0
595,0
596,0
597,1
598,0
599,0
600,0
601,0
602,1
603,0
604,0
605,0
606,0
607,0
608,0
609,0
610,1
611,0
612,0
613,0
614,1
615,0
616,0
617,0
618,0
619,0
620,0
621,0
622,1
623,0
624,0
625,0
626,0
627,0
628,0
629,1
630,1
631,1
632,0
633,0
634,0
635,0
636,1
637,1
638,0
639,1
640,1
641,0
642,0
643,0
644,0
645,0
646,1
647,0
648,0
649,0
650,0
651,1
652,1
653,0
654,0
655,0
656,1
657,1
658,0
659,0
660,0
661,0
662,0
663,1
664,1
665,0
666,0
667,0
668,0
669,0
670,0
671,0
672,0
673,0
674,0
675,0
676,0
677,0
678,0
679,0
680,0
681,0
682,1
683,0
684,0
685,1
686,0
687,0
688,0
689,0
690,0
691,0
692,0
693,0
694,0
695,0
696,1
697,0
698,0
699,0
700,1
701,0
702,0
703,0
704,0
705,1
706,0
707,0
708,0
709,0
710,0
711,0
712,1
713,0
714,0
715,1
716,0
717,0
718,1
719,0
720,0
721,0
722,0
723,0
724,0
725,0
726,0
727,0
728,0
729,0
730,0
731,0
732,0
733,0
734,0
735,0
736,1
737,0
738,0
739,0
740,0
741,0
742,0
743,0
744,1
745,0
746,0
747,1
748,1
749,1
750,0
751,0
752,0
753,0
754,1
755,0
756,1
757,0
758,1
759,0
760,0
761,0
762,0
763,0
764,0
765,1
766,0
767,0
768,0
769,0
770,0
771,0
772,0
773,0
774,0
775,1
776,0
777,0
778,0
779,0
780,0
781,1
782,0
783,0
784,0
785,0
786,0
787,0
788,0
789,0
790,0
791,0
792,0
793,0
794,0
795,1
796,0
797,0
798,0
799,0
800,0
801,1
802,1
803,0
804,0
805,0
806,0
807,0
808,1
809,0
810,0
811,0
812,0
813,0
814,0
815,0
816,0
817,0
818,0
819,0
820,0
821,1
822,1
823,0
824,0
825,0
826,1
827,0
828,0
829,0
830,0
831,0
832,0
833,0
834,0
835,0
836,0
837,1
838,0
839,0
840,0
841,0
842,1
843,0
844,1
845,0
846,0
847,0
848,0
849,1
850,0
851,0
852,0
853,0
854,1
855,0
856,0
857,0
858,1
859,0
860,0
861,0
862,0
863,0
864,0
865,0
866,0
867,0
868,0
869,0
870,0
871,0
872,0
873,0
874,0
875,0
876,0
877,0
878,0
879,0
880,0
881,0
882,0
883,0
884,0
885,1
886,0
887,0
888,0
889,0
890,0
891,1
892,0
893,1
894,1
895,0
896,0
897,0
898,0