ابدأ بالتواصل مع الأشخاص وتبادل معارفك المهنية

أنشئ حسابًا أو سجّل الدخول للانضمام إلى مجتمعك المهني.

متابعة

The complexity of some data sets leads people to add extra dimensions as Veracity and Variability. In what ways do these complicate things further?

user-image
تم إضافة السؤال من قبل Mohamed Fetoh , Openstack Operations Engineer , Orange Business Services
تاريخ النشر: 2015/03/23

Well, adding some veracity to an already-complex dataset will not only increase the amount of data to be analyzed, but it could introduce values which skew distributions or make predictions which are based on artificial values. Making a dataset easier to work with does not guarantee that it is going to output useful information. It might be necessary to tweek your data mining scripts or algorithms to accommodate changes to the accuracy of the content

 

Adding variability can complicate matters to an even greater extreme. For instance, imagine if you had a dataset of plaintext-English, but wanted to incorporate Arabic content to analyze similarities or differences between the two sets. Well, not only would you have to find someone of translating this content, but you might have to make sure the translated Arabic is encoded via the same standards (such as ASCII or UTF-8). Furthermore, you would have to account for basic differences in syntax, such as the structure of sentences, which would rely heavily on the bias of the human-translator. Such problems can make data-sets useless if not done through a proven method. 

المزيد من الأسئلة المماثلة

هل تحتاج لمساعدة في كتابة سيرة ذاتية تحتوي على الكلمات الدلالية التي يبحث عنها أصحاب العمل؟