Categories
Uncategorised

New report outlines SA’s biggest challenges to AI adoption

Take yourself back to February 2020. Life was relatively normal, kids were at school, we physically went into work, and everyone was more certain of the paths they were on. A year later, people of all ages are now a lot more tech savvy, having been forced to work-from-home, do online schooling or have online gatherings, just to keep in touch with loved ones. We have had to embrace the change, and step out of our comfort zones, learning how to use technology to navigate everyday life. While it’s true that South Africa is still behind in digitization, it’s catching up fast thanks to COVID-19, catalyzed by boardrooms across the country focusing on digitization like never before.

One such focus is the efficiency driven by Artificial Intelligence and Machine Learning (AI/ML). SafriCloud surveyed SA’s leading IT decision makers to assess the sentiment and adoption outlook for these technologies amongst business and IT professionals. The results have been published in an eye-opening report entitled, ‘AI: SA – The state of AI in South African businesses 2021’.

‘Keen to start but facing a few challenges’ was the pervasive theme across the survey respondents, but with the global Machine Learning market projected to grow from $7.3 billion in 2020 to $30.6 billion by 2024*, why do we still see resistance to adoption?

Nearly 60% of respondents said that their business supports them in their desire to implement AI/ML and yet only 25% believed that it is understood well at an executive level. While ‘fear of the unknown’ ranked in the top three adoption challenges both locally and internationally (Gartner, 2020), only 9.34% of respondents cited ‘lack of support from C-suite’ as a challenge.

There is a clear degree of pessimism to the level of skills and knowledge to be found in the South African market. This pessimism is more exaggerated at a senior management level where more than 60% rated ‘low internal skill levels’ as the top challenge facing AI/ML adoption. With nearly 60% of the respondents rating the need to implement AI/ML in the next two years as ‘important’ to ‘very important’ and only 35% of businesses saying they currently have internal resources focused on AI/ML, the skills gap will continue to grow.

Artificial Intelligence and Machine Learning represent a new frontier in business. Like previous generations that faced new frontiers – such as personal computing and the industrial revolution – we can’t predict what these changes might lead to. All we can really say is that business will be different, jobs will be different and how we think will be different. Those open to being different will be the ones that succeed.

Get free and instant access to the full report, to discover whether your business is leading the way or falling behind: https://www.safricloud.com/ai-sa-the-state-of-ai-in-south-african-businesses/

Report highlights include:

  • The areas of AI/ML that are focused on the most.
  • The state of the AI job market and how to hire.
  • Practical steps to train and pilot AI/ML projects.
Categories
Machine Learning

How is Coding Used in Data Science & Analytics

What is Data Science?

In recent years the phrase “data science” has become a buzzword in the tech industry. The demand for data scientists has surged since the late 1990s, presenting new job opportunities and research areas for computer scientists. Before we delve into the computer science aspect of data science, it’s useful to know exactly what data science is and to explore the skills required to become a successful data scientist.

Data science is a field of study that involves the processing of large sets of data with statistical methods to extract trends, patterns, or other relevant information. In short, data science encapsulates anything related to obtaining insights, trends, or any other valuable information from data. The foundations of these tasks originate from the fields of statistics, programming, and visualization. In short, a successful data scientist has in-depth knowledge in these four pillars:

  1. Math and Statistics: From modeling to experimental design, encountering something math-related is inevitable, as data almost always requires quantitative analysis.
  2. Programming and Database: Knowing how to navigate program data hierarchies, or big data, and query certain datasets alongside knowing how to code algorithms and develop models is invaluable to a data scientist (more on this below).
  3. Domain Knowledge and Soft Skills: A successful and effective data scientist is knowledgeable about the company or firm at which they are working and proactive at strategizing and/or creating innovative solutions to data issues.
  4. Communication and Visualization: To make their work viable for all audiences, data scientists must be able to weave a coherent and impactful story through visuals and facts to convey the importance of their work. This is usually completed with certain programming languages or data visualization software, such as Tableau or Excel.

Does Data Science Require Coding?

Short answer: yes. As described in points 2 and 4, coding plays a significant role in data science, making appearances in almost every step of the process. Though, how is coding utilized in every step of solving a data science problem? Below, you’ll find the different stages of a typical data science experiment and a detailed account of how coding is integrated within the process. It’s important to remember that this process is not always linear; data scientists tend to ping-pong back and forth between different steps depending on the nature of the problem at hand.

Preplanning and Experimental Design

Before coding anything, it’s necessary for data scientists to understand the problem that is being solved and the desired objective. This step also requires data scientists to figure out which tools, software, and data be used throughout the process. Although coding is not involved in this phase, it can’t be skipped, as it allows a data scientist to keep his or her focus on their objective and not let white noise or unrelated data or results to distract.

Obtaining Data

The world has a massive amount of data that is growing constantly. In fact, Forbes reports that humans create 2.5 quintillion bytes of data daily. From such vast amounts of data arise vast amounts of data quality issues. These issues can be anything, ranging from duplicate or missing datasets and values, inconsistent data, misentered data, or even outdated data. Obtaining relevant and comprehensive datasets is tedious and difficult. Oftentimes, data scientists use multiple datasets, pulling the data they need from each one. This step requires coding with querying languages, such as SQL and NoSQL.

Cleaning Data

After all the necessary data is compiled in one location, the data needs to be cleaned. For example, data which is inconsistently labeled “doctor” or “Dr.” can cause problems when it is analyzed. Labeling errors, minor spelling mistakes, and other minutiae can cause major problems along the road. Data scientists can use languages like Python and R to clean data. They can also use applications, such as OpenRefine or Trifecta Wrangler, which are specifically made to clean data and transform it into different formats.

Analyzing Data

Once a dataset is clean and uniformly formatted, it is ready to be analyzed. Data analytics is a broad term with definitions that differ from application to application. When it comes to data analysis, Python is ubiquitous in the data science community. R and MATLAB are popular as well, as they were created to be used in data analysis. Though these languages have a steeper learning curve than Python, they are useful for an aspiring data scientist, as they are so widely used. Beyond these languages, there are a plethora of tools available online to help expedite and streamline data analysis.

Visualizing Data

Visualizing the results of data analysis helps data scientists convey the importance of their work as well as their findings. This can be done done using graphs, charts, and other easy-to-read visuals, which can allow broader audiences to understand a data scientist’s work. Python is a commonly used language for this step; packages such as seaborn and prettyplotlib can help data scientists make visuals. Other software, such as Tableau and Excel, are also readily available and are widely used to create graphics.

Programming Languages used in Data Science

Python is a household name in data science. It can be used to obtain, clean, analyze, and visualize data, and is often considered the programming language that serves as the foundation of data science. In fact, 40% of data scientists who responded to an O’Reilly survey claimed they used Python as their main coding language. The language has contributors that have created libraries solely dedicated to data science operations and extensions into artificial intelligence/machine learning, making it an ideal choice.

Common packages, such as numpy and pandas, can compute complex calculations with matrices of data, making it easier for data scientists to focus on solutions instead of mathematical formulas and algorithms. Even though these packages (along with others, such as sklearn) already take care of the mathematical formulas and calculations, it’s still important to have a solid understanding of said concepts in order to implement the correct procedure through code. Beyond these foundational packages, Python also has many specialized packages that can help with specific tasks.

R and MATLAB are also popular tools used in data science. They are often used for data analysis and can allow for hypothesis testing to validate statistical models. Though these languages have different setups and syntaxes than Python, the basic logic of the former two languages is based off of the latter, further affirming that Python is a keystone language in data science.

Other popular programming languages, such as Java, can be useful for the aspiring data scientist to learn as well. Java is used in a vast number of workplaces, and plenty of tools in the big data realm are written in Java. For example, TensorFlow is a software library that is available for Java. The list of coding languages that are relevant or being used directly in the field of data science goes on and on, just as the benefits of learning a new computing language are endless.

Case Study: Python, MATLAB, and R

  • At ForecastWatch, Python was used to write a parser to harvest forecasts from other websites.
  • Financial industries leveraged time-series data in MATLAB to backtest statistical models that are used to engineer fund portfolios.
  • In 2014, Facebook transitioned to using mostly Python for data analysis since it was already used widely throughout the firm.
  • R is widely used in healthcare industries, ranging from drug discovery, pre-clinical trial testing, and drug safety data analysis.
  • Sports analysts use R to analyze time-series sports data on certain players in predicting future performances.

Database and Querying

Beyond data analysis, it is imperative to be knowledgeable in querying languages. When obtaining data, data scientists oftentimes navigate multiple databases within different data hierarchies. Languages, such as SQL and its successors, as well as firm-specific cloud navigation systems are key in expediting the data wrangling process. Beyond this, querying languages can also compute basic formulas and operations based on the programmer’s preference.

Case Study: Querying in Data Science

  • The U.S. Congress Database is an open source database that can be queried using pSQL, and can answer questions about the demographics of our legislative branch.
  • When companies acquire smaller firms or startups, they often run into the issue of navigating multiple databases. To ease the process, SQL is a popular language used to navigate data.

Data Science is Growing

In almost every step of the data science process, programming is used to achieve different goals. As the field intensifies and becomes more complex, data scientists will rely more and more heavily on coding to ensure that they can successfully solve more complex problems. For these reasons, it is integral that aspiring data scientists learn to utilize coding to ensure that they are prepared for any role. Because of the rapid amounts of innovation, the field is constantly expanding and data scientist positions are constantly opening at companies of all sizes and fields. In short, data science and its future are nothing short of exciting!

This article originally appeared on junilearning.com

Categories
Machine Learning

The Future of HR from 2020: Machine Learning & Deep Learning

The future of HR lies in Deep Learning which is steroid machine learning. It uses a technique that gives machines an improved ability to find, and amplify, even the smallest patterns. This technique is called a deep neural network: deep because it has many layers of simple computational nodes that work together to search for data and deliver a final result in the form of prediction.

Neural networks were vaguely inspired by the inner workings of the human brain. The nodes are like neurons and the network is like the brain itself. But Hinton published his breakthrough at a time when neural networks had gone out of style. No one really knew how to train them, so they were not giving good results. The technique took almost 30 years to recover. But suddenly, it emerged from the abyss.

One last thing we should know in this introduction: machine learning (and deep) comes in three packages: supervised, unsupervised and reinforced.

In supervised learning, the most frequent, the data is labeled to indicate to the machine exactly what patterns to look for. Think of it as something like a tracking dog that will chase the targets once you know the wrapper you’re looking for. That’s what you are doing when you press play on a Netflix program: you are telling the algorithm to find similar programs.

In unsupervised learning, the data has no tags. The machine only searches for any pattern it can find. This is like letting a person check tons of different objects and classify them into groups with similar wrappers. Unsupervised techniques are not as popular because they have less obvious applications but curiously, they have gained strength in cybersecurity.

Finally, we have reinforcement learning, the last frontier of machine learning. A reinforcement algorithm learns by trial and error to achieve a clear objective. He tries many different things and is rewarded or penalized depending on whether his behaviors help or prevent him from reaching his goal. This is like when a child behaves well with a praise and affection. Reinforcement learning is the basis of Google’s AlphaGo, the program that surpasses the best human players in the complex Go game.

Applied to Human Resources, although the growth potential is wide, the current use of Machine Learning is limited and presents a dilemma that must be resolved in the future, related to the ability of machines to discover talent in human beings, beyond their hard and verifiable competencies, such as level of education, etc.

Software intelligence is transforming human resources. At the moment it has its main focus on recruitment processes, which in most cases is a very expensive and inefficient process where our goal is to find the best candidates among thousands of them, although we can find multiple application examples.

A first example would be the development of technology that would allow people to create job descriptions that are gender-neutral to attract the best possible candidates, whether male or female. This would boost a group of job seekers and a more balanced population of employees.

A second example is the training recommendations that employees could receive. On many occasions these employees have many training options, but often they cannot find what is most relevant to them; Therefore, these algorithms present the internal and external courses that best suit the employee’s development objectives based on many variables, including the skills that the employee intends to develop and the courses taken by other employees with similar professional objectives.

A third example will be Sentiment Analysis, which is a form of NLP (Natural Language Processing) that analyzes the social conversations that are generated on the Internet to identify opinions and extract the emotions (positive, negative or neutral) that these implicitly carry. With the sentiment analysis it is determined:

-Who is the subject of the opinion.

-About what is being said.

-How is the opinion: positive, negative or neutral.

This tool can be applied to words and expressions, as well as phrases, paragraphs and documents that we find in social networks, blogs, forums or review pages. The sentiment analysis will determine the hidden connotation behind the information that is subjective.

There are different systems of sentiment analysis:

-Analysis of feeling by polarity: Opinions are classified as very positive, positive, neutral, negative or very negative. This type of analysis is very Simple with reviews made with scoring mechanisms from 1 to 5, where number 1 is very negative and 5 is very positive.

-Analysis of feeling by type of emotion: The analysis detects emotions and specific feelings: happiness, sadness, anger, frustration, etc. For this, there is usually a list of words and the feelings with which they are usually related.

-Sentiment analysis by intention: This system interprets the comments according to the intention behind: Is it a complaint? A question? A request?

A fourth example is the Employee Attrittion through which we can predict which employees will remain in the company and which will not be based on several parameters as shown in the following example-

A screenshot of a cell phone

Description automatically generated
Source: IBM (IBM Watson sample dataset)

These four cases are clear examples in which Machine Learning elevates the role of human resources from tactical processes to strategic processes. Smart software is enabling the mechanics of workforce management, such as creating job applications, recommending courses or predicting which employees are more likely to leave the company, giving the possibility to react in time and apply corrective policies for those deficiencies.

From the business point of view, machine learning technology is an opportunity to drive greater efficiency and better efficiency in decision making. This will help everyone to make better decisions and, equally important, will give Human Resources a strategic and valuable voice at the executive level.

Prof Raul Villamarin Rodriguez