Chaeyoon Kim

a London-based Data Scientist, skilled in Visual Analytics, Machine Learning, and Cloud computing

A Junior Data Scientist’s Day

Good afternoon from London :wave:

Who am I?

I am a Data Scientist with 2 years of professional experience in Machine Learning, Data analysis and visualization, possessing a wide range of technical skills in programming, data science, cloud computing, and project management.

Keywords: Visual Analytics, Statistical Data Analytics, Data Management

Technical Skills:

- Programming: Python, R, SQL, MATLAB
- Data Science: Pandas, Numpy, NLTK, Scikit-learn, Matplotlib, Seaborn, Plotly, Tableau, QGIS, Google Analytics
- Cloud Computing: Databricks, GCP, Azure + Data Lake
- Others: Git, Trello, LaTeX, MS Office tools, Jira
- Interests and Learning: Software development (Agile), Open-source contribution

Professional demeanor: Genuine, Connects with people, Intelligent and warm, Kind, So personable, Friendly, Curious, Determined, and so on The word cloud of positive feedback I’ve received from colleagues

Moves towards real-world problems; navigating skill emphases

After completing my postgraduate studies in 2021, I worked for further academic achievements and early career experiences such as writting a paper, assisting a textbook publication, and taking on the role of a lead instructor to teach programming. Since I transitioned into public healthcare, I have been involved in several projects spanning various NHS organizations and Trust sites. That gaves me positive evaluations during my lastest appraisal.

The table below provides a convenient comparison, illustraing how my master’s assessments and work projects distribute emphasis among different skills and tasks related to data processing, programming, review, and report/visualization.

	Data processing	Programming	Review	Report/Viz
MSc Data Science	-	70%	20%	10%
Project A	50%	30%	10%	10%
Project B	95%	5%	-	-
Project C	5%	65%	30%	-
Average(ABC)	50%	~30%	<15%	<5%

Whilst programming can include data processing, I primarily used robust datasets during my academic year. For the other cases, I found that loading the semi-structured data such as JSON (obtained from APIs) or unstructured data like images or WhatsApp text required more programming foundations for data processing. That’s why I choose to mark the full credits in solo programming during the master's programme.

At work, I sometimes had a strategic decision not to handle the review or reporting part of that specific project as stated in the table. This choice was agreed within the team. When other priorities required my immediate attention, I was able to pass all my model ideas and findings (with my thanks) to the other contributors who were working on the same project. That is certainly different from how I used to carry my own responsibilities for the university courseworks; the team needs me for team-work!

These differences highlight how an individual’s tech skill requirements to meet task commitments vary in terms of their collaborative level.

Why 50% in data processing?

It happens to me, especially the more if working in the wider group project settings where multiple data collection points and interactions with diverse feedback loops are more frequent occurrences. For example, the project A aims to produce a distributional guide of human resource (medical professionals) for NHS England regions based on multiple factors such as demography, morbidity, deprivation, and demand projections. Following list of datasets are the required reference sources for the project A:

Demography: ONS Clinical commissioning group population estimates, GP practice weighted populations
Morbidity: Trends in the burden of morbidity, Mortality from accidents: SMR<75 index.
Deprivation: CDRC Index of Multiple Deprivation (IMD), Quality and Outcomes Framework (QOF) data
Demand projections: Hospital Episode Statistics (HES), Specialised Commissioning data

Each datasets would demand delicate works on regional mappings of the national data to the NHS’s 42 Integrated Care Boards (ICBs) or their sub-locations. Additionally, a certain specific types of information such as the count number of the In-Patient & Out-Patient episodes would limit the bottom level data collection. Therefore, given these complexities, it explains that how the considerable amount of the data processing necessitates becomes evident.

… to be continued on Youtube

2023 2

2023

Enhancing Analytics: Move from Excel to Python

1 minute read

Good afternoon from London :wave::wave:

A Junior Data Scientist’s Day

3 minute read

Good afternoon from London :wave: