Introduction to Artificial Intelligence
Assignment (Semester 02, 2021)
Due 23.59PM Friday 15th Oct 2021
AI for HealthCare
Description
A hotel has plans to enter new markets, and in their existing market, the sales team classified customers into 4 segments (A,B,C,D), also collected customer opinions with two opinion orientations( positive, negative). Then, they can perform segmented outreach and communication for different segment of customers.
You are required to develop an AI application for this hotel to segment customers into four segments and analyse the opinion orientation of customers in each segment. In this case, you will apply AI techniques to predict segment of customers and analyse their opinions. You are given a list of customers (identified by their IDs) with their segment level in the labels_record.csv file. The hotel also provides two sets of features which can be used to detect levels of customers, and the third feature which can be used to analyse their opinion orientation.
The first set of features are in the Customer-PI_record.csv file. The features are about personal information of the customers, including ID, Gender, Ever_Married, Age, Graduated, Profession, Work_Experience and Family_Size. The ID refers to the customer ID in the labels_record.csv.
The second set of features are in the Payment_record.csv file. The features consist of IDs (for the customers in labels_record.csv), and amount of previous payment in last 12 months. Pay_AMT1 to Pay_AMT12 are recorded a series of payment information as below:
PAY_AMT1: Amount of previous payment in January, 2015 (AUS dollar)
PAY_AMT2: Amount of previous payment in February, 2015 (AUS dollar)
PAY_AMT3: Amount of previous payment in March, 2015 (AUS dollar)
PAY_AMT4: Amount of previous payment in April, 2015 (AUS dollar)
PAY_AMT5: Amount of previous payment in May, 2015 (AUS dollar)
PAY_AMT6: Amount of previous payment in June, 2015 (AUS dollar)
PAY_AMT7: Amount of previous payment in July, 2015 (AUS dollar)
PAY_AMT8: Amount of previous payment in August, 2015 (AUS dollar)
PAY_AMT9: Amount of previous payment in September, 2015 (AUS dollar)
PAY_AMT10: Amount of previous payment in October, 2015 (AUS dollar)
PAY_AMT11: Amount of previous payment in November, 2015 (AUS dollar)
PAY_AMT12: Amount of previous payment in December, 2015 (AUS dollar)
The third set of features are in the descriptive_opinion.csv file. The features consist of a list of opinions with polarity (positive and negative).
The hotel also has a list of unsegmented customers (unsegmented_customer.csv) whose results (level of segment and polarity of opinion) have not been revealed yet. They already recorded their features and store them in three files (Customer-PI_predict.csv, Payment_predict.csv, unsegmented_customer_opinion.csv ). The hotel wants to check the usefulness of your AI model later when the results available. You are asked to select a model that you have evidence that it is the best and apply that model to predict the segment levels and identify opinion polarity for these customers.
Part A: Modelling (80%)
You will follow the steps below for the task.
1.Data Exploration
a.Does the data balance? (10%)
b.Use the visualisation to show how payment history in the last 12 months (Pay_AMT1 to Pay_AMT12) looks like? (5%)
2.Data Preparation
a.Data Selection/Data Integration/Data Cleaning. (5% - do not need to apply all, choose which deemed relevant)
b.Data transformation/Normalization (5%-- do not need to apply all, choose which deemed relevant)
3.Modelling and Evaluation
a.Which types of features (personal information or payment information) give better performance? You should try several AI/ML methods (at least 3) on each type of features to make a case for your claim. (20%)
b.Which AI/ML methods can be used to predict opinion polarity based on collected opinion feature? Show the results (5%)
c.Can we apply CNN to the payment information feature to predict segment level of customers? If Yes, what the results are? If No, explain why? (5%)
d.Can combining two feature results in better performance? Show empirical evidence. (5%)
e.What is the best approach to predict the segment of customer? Provide an explanation using the results in 3.a, 3.c, 3.d.(10%)
4.Applying model (10%)
Apply the model to the predict the segment level for the unsegmented customers. Also apply the model to predict the opinion polarity of the comments collected from these customers. Save the predicted result to a CSV file which has three columns. The first column is customer ID, the second column is the predicted level of segment, and the third column is the predicted opinion polarity of their comments. Save the CSV file as
NOTE: See the unsegmented_customer.csv as an example.
Part B: Report and Analysis- 20%
1.Write a report about the steps in Part A and show the results at each step.
2.Report the performance from different models chosen in Step 3.
3.Correct justification for 3.a, 3.b, 3.c, 3.d.
How & What to submit.
Assignments will be submitted via MyLO (an Assignment submission will be created).
1.The report (docx or pdf) as required in Part B.
- Process files (one or more) extracted from RapidMiner (*.rmp).
3.A CSV file consists of the predicted results for unsegmented customer.
Notes
-Compressed files (zip, rar, tar, etc) are not accepted.
-If there are multiple submissions of the same file, the latest one will be marked.
Plagiarism and Academic misconduct
Plagiarism
Plagiarism is a form of cheating. It is taking and using someone else\'s thoughts, writings or inventions and representing them as your own; for example, using an author\'s words without putting them in quotation marks and citing the source, using an author\'s ideas without proper acknowledgement and citation, copying another student\'s work.
If you have any doubts about how to refer to the work of others in your assignments, please consult your lecturer or tutor for relevant referencing guidelines. You may also find the Academic Honesty site on MyLO of assistance.
The intentional copying of someone else’s work as one’s own is a serious offence punishable by penalties that may range from a fine or deduction/cancellation of marks and, in the most serious of cases, to exclusion from a unit, a course or the University.
The University and any persons authorised by the University may submit your assessable works to a plagiarism checking service, to obtain a report on possible instances of plagiarism. Assessable works may also be included in a reference database. It is a condition of this arrangement that the original author’s permission is required before a work within the database can be viewed.
For further information on this statement and general referencing guidelines, see the Plagiarism and Academic Integrity page on the University web site or the Academic Honesty site on MyLO.
Academic misconduct includes cheating, plagiarism, allowing another student to copy work for an assignment or an examination, and any other conduct by which a student:
a.seeks to gain, for themselves or for any other person, any academic advantage or advancement to which they or that other person are not entitled; or
b.improperly disadvantages any other student.
Students engaging in any form of academic misconduct may be dealt with under the Ordinance of Student Discipline, and this can include the imposition of penalties that range from a deduction/cancellation of marks to exclusion from a unit or the University. Details of penalties that can be imposed are available in Ordinance 9: Student Discipline – Part 3 Academic Misconduct.
KIT509_Sem2_2020 Rubric
Criteria Level 5 Level 4 Level 3 Level 2 Level 1
Data Exploration
15 points Correctly identify whether the data is balance or not. Correctly visualise the payment history in the last 12 months and explain how to do it. Correctly identify whether the data is balance or not.
Unable to visualise the payment history in the last 12 months and explain how to do it. Correctly visualise the payment history in the last 12 months and explain how to do it.
Unable to identify whether the data is balance or not. Attempt both tasks (whether the data is balance or not and visualise the payment history in the last 12 months) but not successful. Attempt one task (either data balance identification or visualisation of the payment history in the last 12 months) but not successful.
Data Preparation
10 points Correctly apply Data Selection OR/AND Data Integration OR/AND Data Cleaning where they deemed relevant. Correctly apply data normalisation. Can apply Data Selection OR/AND Data Integration OR/AND Data Cleaning but use one irrelevant technique.
AND
Correctly apply data normalisation. Correctly apply Data Selection OR/AND Data Integration OR/AND Data Cleaning where they deemed relevant.
OR
Correctly apply data normalisation. Can apply Data Selection OR/AND Data Integration OR/AND Data Cleaning but use one irrelevant technique.
OR
Correctly apply data normalisation. Can apply Data Selection OR/AND Data Integration OR/AND Data Cleaning but use more than one irrelevant techniques.
Identify best type of features
20 points Correctly identify the best type of features for the task. Provide all empirical evidence. (comparison of the performance of two type of features using 3 or more machine learning models) Correctly identify the best type of features for the task. Provide most of the empirical evidence (comparison of the performance of two type of features using only 2 machine learning models) Correctly identify the best type of features for the task. Provide some empirical evidence (comparison of the performance of two type of features using only 1 machine learning model) Incorrectly identify the best type of features for the task but able to provide some supporting evidence. Incorrectly identify the best type of features for the task and provide incorrect empirical evidence.
Opinion Mining model
5 points Correctly identify the type of features for the task. Provide all empirical evidence and explanation. Correctly identify the type of features for the task. Provide most of the empirical evidence and explanation. Correctly identify the type of features for the task. Provide some of the empirical evidence and explanation. Incorrectly identify the best type of features for the task but able to provide some supporting evidence. Incorrectly identify the type of features for the task and provide incorrect empirical evidence.
CNN
5 points Correctly identify whether CNN can be applied to the signal feature or not. Provide results if it is yes. Provide a sound explanation if it is No. Correctly identify whether CNN can be applied to the signal feature or not but provide wrong explanation/wrong results. Correctly identify whether CNN can be applied to the signal feature or not but no (or little) explanation is provided. Incorrectly identify whether CNN can be applied to the signal feature but can provide justifications and evidence. Incorrectly identify whether CNN can be applied to the signal feature but can provide justifications.
Combine features
5 points Correctly combine features and show the evidence whether the combined feature can improve the performance or not. Correctly combine features but cannot show the evidence whether the combined feature can improve the performance or not. Correctly combine features but DON\'T show the evidence whether the combined feature can improve the performance or not. Incorrectly combine features but it works. Incorrectly combine features but can provide an. explanation.
Best approach
10 points Correctly identify the best approach based on the results from 3.a, 3.c, 3.d in the Assignment Description. Correctly identify the best approach based only on the results from 3.a, 3.c in the Assignment Description. Correctly identify the best approach based only on the results from 3.a in the Assignment Description. Correctly identify the best approach without evidence but an explanation is provided. Incorrectly identify the best approach, justifications are provided.
Predict unsegmented customers and their opinion polarity
10 points Correctly apply the selected models (best models in step 3) to the features for the unsegmented customers as well as the predicted opinion polarity. The format of the csv file is correct. Correctly apply the selected models (best models in step 3) to the features for the for the unsegmented customers as well as the predicted opinion polarity. The format of the csv file is partially correct. Correctly apply the selected models (best models in step 3) to the features for the for the unsegmented customers as well as the predicted opinion polarity. The format of the csv file is not provided.
Incorrectly apply one of the selected models (best models in step 3) to the features for the unsegmented customers or the predicted opinion polarity (just one correctly). Incorrectly apply all the selected models (best model in step 3) to the features for the waiting patients, cannot generate a csv file.
Report
20 points A report that explains all the steps in part A, provides all results and comparison (in tables), and gives sound explanation the the questions in 3.a, 3.b, 3.c, 3.d. A report that explains 3/4 the steps in part A, provides almost results and comparison (in tables), and gives sound explanation the questions in three out of 4 tasks (3.a, 3.b, 3.c, 3.d) A report that explains 2/4 the steps in part A, provides some results and comparison (in tables). A report that explains 1/4 the steps in part A, provides some results. An incomplete report explains one step in part A without empirical results.