Progress on a dissertation research project

The final dissertation project as part of the MSc Artificial Intelligence program started in September 2022, this is the third month, and I thought to start sharing publically.

The major topic of my research is the healthcare sector, specifically maternal, specifically Maternal & Infant Health. I have created a details project proposal and submitted it to the university.

Here you will get the technical aspects of the work I am doing to complete my research, as it involves artificial intelligence and Machine learning on a huge dataset. I am into an interesting challenge, which almost daily enables me to learn something new.

Today onward, I will update here the progress and will also try to update the key aspects of the last 3 months work. let’s get started

A few tweets will give you a glimpse of the work over the last few weeks.

Getting ready with the dataset on women maternal health to load it to autonomous data warehouse #ai #ml pic.twitter.com/UhqBCNdxAy
— Kashif Manzoor (@kashifmanzoor) November 22, 2022

November 23, 2022: It began with the writing of this blog based on the difficulties of “preprocessing” datasets, I decided to revisit the CSV files and shorten the columns to the required only columns. Earlier, I did all this work after uploading it to the autonomous database.

Challenge: CSV file has approximately 3.6 million records, and MS excel can only support 1,048,576 rows. found some ideas to split it, and here is the method I have used to split the CSV file into multiple files and then work on it.

moved it into the desired folder, where my dataset files are available, and used the terminal window to execute the below command.

split -l 1000000 2021natdata.csv

This has created the files based on the data, so I have 4 files with naming conventions like xaa, xab, xac, xad.

now, these files need to be converted to CSV format, so I have used this statement, and all this found on google research.

for i in *; 
do mv "$i" "$i.csv";
done

It gives me all four files in CSV format, and I have to do it for all my source dataset files.

How to convert large CSV files into multiple files.

Nov 24, 2022: Today’s task was to update the column name to some meaningful name so that it is easy to understand while just reading the column’s name.

for example, column name ‘dmar’ to ‘MaritalStatus’ and ‘rf_cesar’ to ‘PreviousCesarean’

It was pretty challenging with the 16 different CSV files, and the average records were 1 million in each CSV file.

Loaded all CSVs data into Oracle Autonomous Datawarehouse in less than 30 minutes

Screen-Shot-2022-11-26-at-2.37.53-AM-1024x464 Progress on a dissertation research project

Spent a month on one maternal and infant data set (14m records ) to do machine learning. Looks like, i have to hunt another data set. A lot of time on data cleaning and preparation.
— Kashif Manzoor (@kashifmanzoor) November 16, 2022

Any easier method to rename column names in a single command in oracle autonomous data warehouse?

I need to make meaningful columns name for my dataset for #ai project.
100 columns to update…
— Kashif Manzoor (@kashifmanzoor) November 13, 2022

Progressing with the weekend work to prepare my dissertation for MSc #ArtificialIntelligence pic.twitter.com/LnEPaYRXdC
— Kashif Manzoor (@kashifmanzoor) November 12, 2022

Going through the hard part for my #AI project to prepare the dataset.
It is taking a lot time and im still trying to short list the columns from 270 to 127, which are required. pic.twitter.com/x5Q4jz0pj7
— Kashif Manzoor (@kashifmanzoor) November 10, 2022

I spent my weekend to debug this error while getting ready my dataset for data science project

Finally, reached to a expert friend and it got solved in few minutes

10 hours spent on debugging help me to learn
🧵 pic.twitter.com/2C7eWlo67L
— Kashif Manzoor (@kashifmanzoor) October 30, 2022

15-March-2023

first of all my apology for not being able to update this, as the initial idea was to document the journey.

However with the extensive work required me to do for this dissertation, I was not able to cope with the pace and was not able to update this post.

I submitted my dissertation during the first week of March, and from now onward I will try to write separate blogs to help others to pursue their career dreams.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Open Tech Talks – Technology worth Talking

OTechTalks.tv Lets Talk OPEN. Shares the best Technology ideas, tools & tips with the community.

OTechTalks.tv Lets Talk OPEN. Shares the best Technology ideas, tools & tips with the community.

Progress on a dissertation research project

Related posts: