Zoli is a Complytron founder and leads the data science team, directing product innovation and of course making sure the product stays on track. He’s a level headed guy with a super positive attitude, while also bringing his technical know-how and eye for the latest trends to the table.
I studied Linguistics and I had to take a course in natural language processing, or NLP for short. It was love at first sight since I had to learn how to deal with textual data.
A common NLP problem is how to classify various texts into different categories (e.g. into genres, or spam or ham) and identifying keywords from a collection of text. This was in 2002, when neural networks were regraded as computationally inefficient, which means the computers were too slow and weak to run them. At that time, Perl was the programming language of choice in the field and most of us didn’t expect the boom of Python.
I was working for various companies, from large financial firms to small boutique consultancies and I picked up most of my knowledge on the job. I tried myself out at almost every NLP task over the years, from building sentiment analysers which analysed the tone of financial news to spam and genre classifiers, and even Named Entity Recognisers (a handy tool which can extract the names of people, firms, etc. from texts).
Later, I got a job where I learnt more about enterprise search (the fancy term for making a company’s collection of documents, like emails and PDFs searchable) as well as data analytics. Slowly my horizons started to broaden.
Before I settled at NLP, I tried out various fields, from classics and philosophy to linguistics and I realised that I just loved working with data. Analysing data and deriving conclusions from various experiments, the so-called “scientific method,” is one of the coolest things on earth. And it is not only for scientists! You can solve many business problems with exactly the same mindset, using the same toolbox: some math, programming and, of course, solid data.
In the early 2000s the term “data science” was not in use. I learned Computational Linguistics, which is also called Natural Language Processing, but these were the academic terms for what we did. In “real life” it was called text mining, because data mining and business intelligence were very trendy terms in those days. These days it is called Machine Learning, Artificial Intelligence or Data Science – depending on who you ask, but basically, these are all one and the same since the birth of the modern scientific method.
I met Oliver (Complytron’s CEO) when I was one of the organisers of the Budapest Open Knowledge Meetup. We had an event on data journalism and Oliver delivered us a great talk on the topic. Luckily the next time we met, Oliver was building a team to analyse the source codes and other technical information that can be extracted from websites. We joined forces and built a proof-of-concept tool (thanks to a Google Digital News Initiative grant!) and the basic technology behind Complytron’s tools are based on that early work.
Due to the pandemic, we live our working life online. We start the day with a short stand-up and the rest of the day is devoted to the business. A typical day consists of about 50% coding and 50% other business-related activities like planning and touching base with the team.
I love reacting to our customers’ needs. Designing new features sometimes drives me crazy, but I just love it.
We only have very challenging tasks right now! We are working hard on adding new features to our digital fingerprinting tool. Also, we are constantly learning about the latest web technologies, since most of our data comes from web scraping.
Knowledge Graphs are very hot these days. These are the reincarnation of an old concept: the semantic databases which contains entities (or nodes) and their relationships (edges or links between nodes).
In our case, the entities can be people (sanctioned people, politically exposed persons etc.) or companies (sanctioned companies or companies owned by sanctioned people or companies). The relationship between these entities can be ownership like “X owns company Z”.
Right now I’m very keen on Graph Neural Networks which provide a brand new way to analyse big Knowledge Graphs. Basically, they provide a way to encode (or to represent) the nodes in a way that makes it possible to make comparisons between them (e.g. how similar are two entities to each other) and to predict if two nodes are connected somehow (link prediction).
We are just at the beginning of our journey. We are rolling out our digital fingerprinting technology right now, which we hope will be a very useful tool for KYC/AML officers. I do hope our AI-aided approach will spice up the field.
We are working on a Knowledge Graph of sanctioned entities right now. To achieve our goal, we scrape lots of sites and we extract names, addresses and other data from those sites. What we do is called Named Entity Recognition and Relation Mining, this means we extract names (like names of persons, firms, addresses) and we try to figure out how they are related to each other (e.g. X owns Y company or Z works for W).
I hope we can put our tool into production mode next year. It will be one of the first Knowledge Graphs built for KYC/AML sector. Using our Knowledge Graph will not only help the user find sanctioned or PEP entities but will also reveal their business relationships and associates.
In this way, the user has more information with which to assess the risk associated with someone who is not themselves sanctioned but is associated with a sanctioned or very risky entity.
This content is for general informational purposes only and does not substitute personalised professional advice. Although we aim to be both up-to-date and accurate, errors can occur. In addition, certain pieces of content, like interviews, podcasts and webinars, may contain opinions that do not necessarily reflect the position of our company. If you have noticed an error, omission or bug, please contact us at email@example.com