Online recruitment marketplace, The Search Party, is using machine learning algorithms to scan 15 million candidate resumes to ensure it provides the right candidates to employers. Speaking at the Chief Data Officer Forum in Sydney, head of data science, Dylan Hogg, discussed the use of a custom clustering algorithm and deep neural network to spot variations in resumes belonging to the same person and job titles. “As you can imagine a resume has a lot of text data, so a lot of what we do is natural language processing. It’s taking any natural language such as English and trying to infer the structure out of it and get insights from it,” Hogg said. SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe The Search Party is an online recruitment marketplace that was founded in Sydney in 2014. Its aim is to continuously improve its algorithms to serve up the most relevant candidate resumes to employers. Hogg and his team developed a custom clustering algorithm to find different versions of a candidate’s resume. A candidate might have updated their resume at different points in time with changes to contact details and skills, or they have created different resumes tailored to different roles. “It’s similar to solving multiple customer records to get a single view of the customer,” Hogg said, pointing out it is not as simple as cross referencing because variations in names and skills makes this harder to determine if two resumes belong to the same person. The variables the algorithm looks at are full name, email address, names of companies a candidate has worked for, phone number(s) and list of skills sets. The text is processed in a way that turns categorical data into numerical vectors, as clustering works best with numerical data. First, the data is tokenised into text snippets. For example, the name Dylan is broken up into segments ‘dy, ‘yl’, ‘la’, and ‘an’. This makes it robust to spelling variations, Hogg said. Then TF–IDF (term frequency – inverse document frequency) is applied, which looks at how frequently a word appears in a document and its importance relative to the whole set of documents. It can be used to represent a word as a vector of numbers. The next step is using a fast canopy clustering method, which groups potential duplicate candidates that require further investigation. The Search Party also built a deep neural network (a neural network with many hidden layers) to find variations in job titles. Using a list of job titles from the Internet as a source of truth to train the neural net, Hogg was able to map the job description to the job title and have the model learn how different job titles closely relate to each other. “Then once it’s trained, it gives you a probability distribution over what job titles it believes it is. It deals well with acronyms and synonyms,” Hogg said. Hogg first needed to turn textual data into numerical vectors, and did this using Word2Vec, Google’s open source tool. It enables vector arithmetic on words, and maps words into n-dimensional space. It is able to predict words using the context of other words. “We are finding we are getting a lot of use out of that method now. You can see which words are close to a word and stuff like that. That’s just the preprocessing of the text to then feed it into the neural network. “We give it the training data and job titles and over time it learns to rank the correct job titles above other job titles.” Related content feature Gen AI success starts with an effective pilot strategy To harness the promise of generative AI, IT leaders must develop processes for identifying use cases, educate employees, and get the tech (safely) into their hands. By Bob Violino Sep 27, 2023 10 mins Generative AI Generative AI Generative AI feature A fluency in business and tech yields success at NATO Manfred Boudreaux-Dehmer speaks with Lee Rennick, host of CIO Leadership Live, Canada, about innovation in technology, leadership across a vast cultural landscape, and what it means to hold the inaugural CIO role at NATO. By CIO staff Sep 27, 2023 6 mins CIO IT Skills Innovation feature The demand for new skills: How can CIOs optimize their team? By Andrea Benito Sep 27, 2023 3 mins opinion The CIO event of the year: What to expect at CIO100 ASEAN Awards By Shirin Robert Sep 26, 2023 3 mins IDG Events IT Leadership Podcasts Videos Resources Events SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe