Information whisperer Mary Ellen Bates, dove deep into the world of Text and Data Mining (TDM) with the help of an expert in the field. In this Q&A session, she had the pleasure of sitting down with Prathik Roy, Product Director for Data Solutions and Strategy at ϳԹ. Roy's vast experience in developing delivery mechanisms for enterprise customers and fostering transformative discoveries through TDM has given us unique insights into this fascinating domain.
Bates’s interview with Roy unravels the potential of TDM as he shares valuable knowledge about intellectual property and licensing considerations. He also explores the synergies between corporate and academic TDM projects and why it is so significant for researchers
TDM involves machines reading text, like scientific publications or documents, extracting information, and using it for machine learning and artificial intelligence purposes. TDM is essential for researchers as it opens various use cases, such as drug discovery, repurposing drugs, and enriching information for named entity recognition. Additionally, it enables companies from different industries to leverage valuable insights from scientific literature to improve their operations and make transformative discoveries.
Over the past five years, TDM has shifted from being human-assisted AI to AI-assisted humans, with automation playing a more significant role. This transition has resulted in higher F1 scores, indicating improved accuracy, precision, and recall in machine learning models. There has also been a shift from using open-source models like spaCy to creating new content from existing content sets. Looking ahead, we can expect small and midsize companies to fill the gap in TDM analysis for larger corporations, optimizing operations and driving innovation.
Implementing TDM requires substantial resources, particularly in terms of machine learning and computational power. However, AI platform training frameworks like Google's BERT have helped address some of these challenges. While transformer based models yield better outcomes, challenges like hallucination still exist, and traditional machine learning models are still heavily relied upon.
Intellectual property generated through TDM analysis belongs to the customer, while the underlying data set belongs to the license provider. Researchers must consult their legal teams and thoroughly understand the legal aspects and processes involved. Compliance with licensing agreements and proper use of data sets are crucial. If a license is discontinued, researchers may need to remove or cease using parts of the data to comply with terms and conditions.
In the corporate environment, TDM projects are built around specific purposes, such as drug discovery, while academic researchers aim for versatile models applicable to various use cases. However, collaborations and funding arrangements have blurred the line between the two, benefiting both parties with industry insights and valuable research outcomes.
Researchers should identify their requirements and the specific content they need. It's essential to contact the publisher to inquire about access options, such as open access APIs or data feeds. However, not all publishers offer these options, so researchers should read and understand the terms, conditions, permissions, and privacy policies associated with the data. They should be aware of copyright and licensing restrictions, especially for subscribed or paywalled content. Seeking assistance from librarians or information specialists for guidance on complying with copyright restrictions is advisable, as collaboration and teamwork are vital for maximizing output and societal benefits in TDM projects.
TDM's transformative power is undeniable, empowering researchers and industries alike to unearth the hidden gems within vast repositories of knowledge. The evolution of TDM practices from human-assisted AI to AI-assisted humans showcases the potential for continuous growth and innovation. While challenges may arise, the collective efforts of researchers, industry professionals, and data scientists will undoubtedly pave the way for even greater advancements.
Whether you're an academic researcher seeking versatile models or an industry professional targeting specific outcomes, TDM is a powerful tool that knows no boundaries. By working together and harnessing the vast potential of text and data, it can collectively contribute to the advancement of knowledge and the betterment of society. Download the comprehensive TDM report “Unlocking the power of text and data mining: Four ways TDM facilitates transformative discovery” to explore more about this transformative technology, including real-life examples of how TDM has helped companies access information they could not find anywhere else.
Dr. Prathik Roy is a seasoned professional with a profound passion for data-driven solutions and transformative technologies. As the Product Director for Data Solutions and Strategy at ϳԹ, he has been at the forefront of shaping cutting-edge delivery mechanisms, including APIs and data feeds, to cater to diverse enterprise needs and facilitate groundbreaking discoveries.
With a strong background in TDM, Prathik has been instrumental in driving innovation in various industries, particularly in pharmaceuticals and biotechnology. Leveraging TDM's potential, he has spearheaded projects focused on drug discovery, drug repurposing, and enhancing information for named entity recognition. With his expertise, dedication, and commitment to progress, Prathik Roy continues to be a driving force in the field of TDM, inspiring researchers and industry professionals alike to embark on transformative journeys of knowledge discovery.
Don't miss the latest news & blogs, subscribe to The Link Alerts today!