[FIXED] Holen Sie sich Datumsinformationen aus einem Ordner in Linux und speichern Sie sie in einer Spalte in meinem DF

Ausgabe

Ich bin mir nicht sicher, wie ich diese Frage stellen soll, aber hier geht es.

Ich habe diese DF:

df

JOB_STREAM_NAME         JOB_NAME                        JOB_Command
0   P26_NEXT_MAU_TOD    PP_NEXT_RTBA_MAU_IND_INVE_D     /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh
1   P26_NEXT_MAU_TOD    PP_NEXT_RTBA_MAU_IND_EMPF_D     /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh
2   P26_NEXT_NBA_TOD    PP_NEXT_NBA_AS110001_D          /data/app_next_best_action/call_nba_as11.sh
3   P26_AAIN_TOD        PP_AAIN_SPARK_CDLC_ING_DFLT_D   /data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh cdlc_ing

und ich möchte das Datum (von Linux SO) im 4. Element der Baumstruktur im JOB_COMMAND abrufen

Ordner aanx-dataeng-slas-sysyphus:

[m292121@mz-vl-vb-415 ~]$ ll /data/application/AANX/
total 1348
ldrwxrwsr-x 12 root bgdt 4096 Sep 26 11:30 aanx-dataeng-slas-sysyphus

Hier gibt es kein viertes Element, also bekommt es das letzte, nämlich eine Datei namens call_nba_as11.sh

[m292121@al-vl-vb-408 ~]$ ll /data/app_next_best_action/call_nba_as11.sh
-rwxrwsr-x 1 root bgdt 371 Sep 20 19:20 /data/app_next_best_action/call_nba_as11.sh

Ordner aain-srv-motor-extracao-next:

[m292121@mz-vl-vb-415 ~]$ ll /data/application/AAIN/
total 136
ldrwxrwsr-x 12 root bgdt 4096 Jul 15 10:30 aain-srv-motor-extracao-next

Grundsätzlich versuche ich, dies zu erreichen

df

JOB_STREAM_NAME         JOB_NAME                        Last_Update         JOB_Command
0   P26_NEXT_MAU_TOD    PP_NEXT_RTBA_MAU_IND_INVE_D     2022-09-26 11:30:00 /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh
1   P26_NEXT_MAU_TOD    PP_NEXT_RTBA_MAU_IND_EMPF_D     2022-09-26 11:30:00 /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh
2   P26_NEXT_NBA_TOD    PP_NEXT_NBA_AS110001_D          2022-09-20 19:20:00 /data/app_next_best_action/call_nba_as11.sh
3   P26_AAIN_TOD        PP_AAIN_SPARK_CDLC_ING_DFLT_D   2022-07-15 10:30:00 /data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh cdlc_ing

Ich dachte, JOB_COMMAND in eine neue Spalte aufzuteilen und für die Suche zu verwenden, aber ich muss noch herausfinden, wie ich die Informationen bekomme.

Irgendwelche Ideen?

Lösung

Mit dem von Ihnen bereitgestellten Datenrahmen:

import pandas as pd

df = pd.DataFrame(
    {
        "JOB_STREAM_NAME": [
            "P26_NEXT_MAU_TOD",
            "P26_NEXT_MAU_TOD",
            "P26_NEXT_NBA_TOD",
            "P26_AAIN_TOD",
        ],
        "JOB_NAME": [
            "PP_NEXT_RTBA_MAU_IND_INVE_D",
            "PP_NEXT_RTBA_MAU_IND_EMPF_D",
            "PP_NEXT_NBA_AS110001_D",
            "PP_AAIN_SPARK_CDLC_ING_DFLT_D",
        ],
        "JOB_Command": [
            "/data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh",
            "/data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh",
            "/data/app_next_best_action/call_nba_as11.sh",
            "/data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh cdlc_ing",
        ],
    }
)

Hier ist eine Möglichkeit, dies mit den pathlib- und datetime- Modulen der Python-Standardbibliothek zu tun :

import datetime
import numpy as np
from pathlib import Path


def get_fourth_elem(file_path):
    """Helper function.

    Args:
        file_path: file path as a string.

    Returns:
        absolute path to the fourth element (or last one if shorter) as a Pathlib object.
    """
    file_path_length = len(file_path.strip("/").split("/"))
    file_path = Path(file_path)
    if file_path_length > 4:
        for _ in range(file_path_length - 4):
            file_path = Path(file_path.parent)
        return file_path
    else:
        return file_path
df["Last_Update"] = df["JOB_Command"].apply(
    lambda x: datetime.datetime.fromtimestamp(
        get_fourth_elem(x).stat().st_mtime
    ).strftime("%Y-%m-%d %H:%H:%S")
    if Path(x).exists()
    else np.nan
)
df = df.reindex(columns=["JOB_STREAM_NAME", "JOB_NAME", "Last_Update", "JOB_Command"])
print(df)
# Output
JOB_STREAM_NAME         JOB_NAME                        Last_Update         JOB_Command
0   P26_NEXT_MAU_TOD    PP_NEXT_RTBA_MAU_IND_INVE_D     2022-09-26 11:30:00 /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh
1   P26_NEXT_MAU_TOD    PP_NEXT_RTBA_MAU_IND_EMPF_D     2022-09-26 11:30:00 /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh
2   P26_NEXT_NBA_TOD    PP_NEXT_NBA_AS110001_D          2022-09-20 19:20:00 /data/app_next_best_action/call_nba_as11.sh
3   P26_AAIN_TOD        PP_AAIN_SPARK_CDLC_ING_DFLT_D   2022-07-15 10:30:00 /data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh cdlc_ing


Beantwortet von –
Laurent


Antwort geprüft von –
Senaida (FixError Volunteer)

0 Shares:
Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like