In which phase would the team expect to invest most of the project time? Why?Where would the team expect to spend the least time?
Data preparation tends to be the most labor-intensive step in the analytics lifecycle, often consuming at least 50% of the data science project’s time. The data preparation phase is generally the most iterative and the one that teams tend to underestimate most often. Data Preparation requires the presence of an analytic sandbox, in which the team can work with data and perform analytics for the duration of the project. The team needs to execute extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox. The ELT and ETL are sometimes abbreviated as ETLT. Data should be transformed in the ETLT process so the team can work with it and analyze it. In this phase, the team also needs to familiarize itself with the data thoroughly and take steps to condition the data.
The team is expected to spend the least time in Operationalize. In this phase, the team delivers final reports, briefings, code, and technical documents. In addition, the team may run a pilot project to implement the models in a production environment. In the final phase, the team communicates the benefits of the project more broadly and sets up a pilot project to deploy the work in a controlled way before broadening the work to a full enterprise or ecosystem of users. In Phase 4, the team scored the model in the analytics sandbox. Phase 6 represents the first time that most analytics teams approach deploying the new analytical methods or models in a production environment. Rather than deploying these models immediately on a wide-scale basis, the risk can be managed more effectively and the team can learn by undertaking a small scope, pilot deployment before a wide-scale rollout. This approach enables the team to learn about the performance and related constraints of the model in a production environment on a small scale and make adjustments before a full deployment. During the pilot project, the team may need to consider executing the algorithm in the database rather than with in-memory tools such as R because the run time is significantly faster and more efficient than running in-memory, especially on larger datasets.
The team would expect to invest most of the project time in the Data Preparation phase because it involves iterative processes like ELT/ETL and thorough data conditioning, which are essential for effective analysis.
The team would expect to spend the least time in the Operationalize phase because it primarily involves delivering final reports, briefings, and code, as well as running pilot projects for deployment.