< Back to previous page

Project

A development framework for data analytics in genomics

The project aim is the deployment of a scalable high-performance data analytics infrastructure for applications in human genetics research and clinical genetic diagnosis. NGS data generation has reached an explosive phase with data throughput currently doubling every six months. High-performance data analytics has become essential to tackle this massive computing challenge, as NGS will shortly rival the most data and computing intensive areas of science. To do the computation on massive information generated by the NGS we should use new technologies and methodologies. Choosing right tools and methods will be the first challenge in the way that we define as our goal. Collection, storage and retrieval of large amounts of data from multiple experiments need knowledge of deployment of cluster or cloud computing infrastructure, on the other hand paralyzing the computation task lead us to use Hadoop/MapReduce strategies. As an interface and application point of view, we need a Rich Client Platforms because of the architecture and flexibility they offer to continually growing applications. Often rich clients are applications that are extendable via plugins and modules. In this way, rich clients are able to solve more than one problem. Rich clients can also potentially solve related problems, as well as those that are completely foreign to their original purpose. Here’s an overview of the characteristics of a rich client: Flexible and modular application architecture Platform independence Adaptability to the end user Ability to work online as well as offline Simplified distribution to the end user Simplified updating of the client The final goal of this thesis will be developed an application based on the NGS and related medical information but distributed in several places, we must define several rule and roles based on institute type with respect to their own privacy. We believe that with this system researcher can share their experiences and knowledge to find more reliable results. Since each institute has their own facility and structures the main challenge will be to find a fixed language, which defines a protocol to connect these structures with the lowest coast and changes.

Date:3 Oct 2011 →  3 Feb 2017
Keywords:Next generation sequensing
Disciplines:Applied mathematics in specific fields, Computer architecture and networks, Distributed computing, Information sciences, Information systems, Programming languages, Scientific computing, Theoretical computer science, Visual computing, Other information and computing sciences, Laboratory medicine, Medical systems biology, Molecular and cell biology, Control systems, robotics and automation, Design theories and methods, Mechatronics and robotics, Computer theory
Project type:PhD project