Data science is a field which currently is more frequently applied in problems which have recently been very difficult to solve.
Without the correct exploration of data and algorithms of machine learning it would be hard to imagine tasks such as recognising images, speech, handwriting or gestures. Apart from these flagship standards, the work of data scientists is also used more frequently to solve specific business problems, in the field of cybersecurity as well. Nevertheless, due to the fact that a widely understood data analysis has become more popular in the recent years in counteracting cybercriminals, there are many myths about this discipline. We will discuss in this article a few myths which are most important in our opinion.
Predicative systems are able to detect every attack.
Many companies make use of the data science achievements to create a predicative system which is capable of detecting the time of the occurrence of another network attack. This type of systems allows for the significant reduction of an attack risk and it constitutes an important security link. Yet, such systems do not guarantee 100% security.
This is because they learn to detect attacks, making use of data from the past, what causes that at the moment of the occurrence of a new attack type, there is a possibility that it will not be detected. Hackers adjust and change their strategies frequently in order to circumvent security measures. This is the disadvantage of predicative systems which are not capable of adjusting quickly to changes.
The effectiveness of the predicative system is always the same.
Unlike other fields, which make use of machine learning, in cybersecurity – the quality of predicative models drops relatively fast. It is affected by continuously changing hazards and the said activity of hackers in adjusting their attack strategies.
As far as traditional applications are concerned, such as the recognition of handwriting or a face on images, a problem does not change in time; therefore, the system quality remains at the same level for a long time. In the event of cybersecurity, in order to reduce an effectiveness drop, it is necessary to work on data on a current basis and update predicative models frequently.
The network security system may depend only on a predicative model.
Machine learning algorithms need the large volumes of data in order to attain a proper degree of problem generalisation. In the event of protection against network attacks, these are data with a correct network traffic and hazards. Sometimes a given hazard occurs very rarely or it is difficult to register it; therefore, a learnt predicative model will not learn to differentiate it from the normal network traffic.
In such situations, it is better to make use of a different method based on, for instance, hashes, masks, etc. Methods from outside the machine learning area may also be used as an additional, preliminary layer of security measures. Such a combination often brings in very good results and it is not advised to resign from traditional methods to the benefit of predicative models.
Predicative systems will replace the work of network analysts.
Despite the fact that currently we hear more about replacing the work of humans with algorithms, we do not have to worry that predicative systems will replace the work of network analysts completely. The cybersecurity field is dynamic; almost every day there are new attack methods created and they require analysing, correct processing and then expressing them in the form of decisive rules or predicative models. Instead of replacing humans, automatic security systems will rather supplement the work of analysts, allowing them to
focus on new or most difficult cases. In relation to the shortage of qualified specialists in the field of cybersecurity, a strategy of collaboration of humans and machines will surely improve the general security effectiveness.
Algorithms are more important than data.
This myth is connected not only with the cybersecurity field but it also refers to each discipline where machine learning is used. Frequently, in articles referring to the application of machine learning, the importance of algorithms is underlined, not mentioning a word about data. Whereas, it is possible to use the best learning algorithms but without the proper quantity of good quality data, such a model will not be effective.
Focusing primarily on an algorithm instead of data may be compared to purchasing a car without the possibility of visiting petrol stations. Therefore, during the implementation of a predicative model, it must be noted that a collection of data used in training is at least equally important as an algorithm.